openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	xfs: btree scrub should check minrecs	Darrick J. Wong	2018-05-16	1	-0/+40
\| \| \| \| \| \| \| \| \| \|	Strengthen the btree block header checks to detect the number of records being less than the btree type's minimum record count. Certain blocks are allowed to violate this constraint -- specifically any btree block at the top of the tree can have fewer than minrecs records. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: clean up scrub usage of KM_NOFS	Darrick J. Wong	2018-05-16	3	-3/+4
\| \| \| \| \| \| \| \| \| \|	All scrub code runs in transaction context, which means that memory allocations are automatically run in PF_MEMALLOC_NOFS context. It's therefore unnecessary to pass in KM_NOFS to allocation routines, so clean them all out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: avoid ilock games in the quota scrubber	Darrick J. Wong	2018-05-16	3	-28/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor the quota scrubber to take the quotaofflock and grab the quota inode in the setup function so that we can treat quota in the same "scrub in the context of this inode" (i.e. sc->ip) manner as we treat any other inode. We do have to drop the quota inode's ILOCK_EXCL to use dqiterate, but since dquots have their own individual locks the ILOCK wasn't helping us anyway. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: refactor dquot iteration	Darrick J. Wong	2018-05-16	3	-30/+63
\| \| \| \| \| \| \| \| \| \|	Create a helper function to iterate all the dquots of a given type in the system, and refactor the dquot scrub to use it. This will get more use in the quota repair code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: rename on-disk dquot counter zap functions	Darrick J. Wong	2018-05-10	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	The function 'xfs_qm_dqiterate' doesn't iterate dquots at all, it iterates all dquot blocks of a quota inode and clears the counters. Therefore, change the name to something more descriptive so that we can introduce a real dquot iterator later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: replace XFS_QMOPT_DQALLOC with a simple boolean	Darrick J. Wong	2018-05-10	6	-28/+19
\| \| \| \| \| \| \| \| \| \|	DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the _dqget family of functions cares about is DQALLOC. Therefore, change it to a boolean 'can alloc?' flag for the dqget interfaces where that makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: remove direct calls to _qm_dqread	Darrick J. Wong	2018-05-10	3	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \|	The quota initialization code needs an "uncached" variant of _dqget to read in default quota limits and timers before the dquot cache is fully set up. We've already split up _dqget into its component pieces so create a fourth variant to address this need, and make dqread internal to xfs_dquot.c again. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: refactor xfs_qm_dqtobp and xfs_qm_dqalloc	Darrick J. Wong	2018-05-10	1	-148/+122
\| \| \| \| \| \| \| \|	Separate the disk dquot read and allocation functionality into two helper functions, then refactor dqread to call them directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: refactor incore dquot initialization functions	Darrick J. Wong	2018-05-10	1	-30/+51
\| \| \| \| \| \| \| \| \|	Create two incore dquot initialization functions that will help us to disentangle dqget and dqread. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: fetch dquots directly during quotacheck	Darrick J. Wong	2018-05-10	1	-16/+12
\| \| \| \| \| \| \| \| \| \| \|	Quotacheck only runs during mount, which means that there are no other processes in the system that could be doing chown or chproj. Therefore there's no potential for racing to attach dquots to the inode so we can drop all the ILOCK and race detection bits from quotacheck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: split out dqget for inodes from regular dqget	Darrick J. Wong	2018-05-10	8	-77/+134
\| \| \| \| \| \| \| \| \| \| \|	There are two uses of dqget here -- one is to return the dquot for a given type and id, and the other is to return the dquot for a given type and inode. Those are two separate things, so split them into two smaller functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: remove unnecessary xfs_qm_dqattach parameter	Darrick J. Wong	2018-05-10	8	-20/+19
\| \| \| \| \| \| \| \|	The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: delegate dqget input checks to helper function	Darrick J. Wong	2018-05-10	1	-9/+31
\| \| \| \| \| \| \| \| \|	Move the dqget input checks to a separate function in preparation for splitting up the dqget functionality. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: refactor dquot cache handling	Darrick J. Wong	2018-05-10	1	-34/+78
\| \| \| \| \| \| \| \| \| \|	Delegate the dquot cache handling (radix tree lookup and insertion) to separate helper functions so that we can continue to simplify the body of dqget. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: refactor XFS_QMOPT_DQNEXT out of existence	Darrick J. Wong	2018-05-10	7	-71/+108
\| \| \| \| \| \| \| \| \|	There's only one caller of DQNEXT and its semantics can be moved into a separate function, so create the function and get rid of the flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	xfs: don't spray logs when dquot flush/purge fail	Darrick J. Wong	2018-05-10	2	-12/+3
\| \| \| \| \| \| \| \| \|	When dquot flush or purge fail there's no need to spam the logs, we've already logged the IO error or fs shutdown that caused the flush failures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: release new dquot buffer on defer_finish error	Darrick J. Wong	2018-05-10	1	-20/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when allocating a new dquot record", we allocate a new dquot block, grab a buffer to initialize it, and return the locked initialized dquot buffer to the caller for further in-core dquot initialization. Unfortunately, if the _bmap_finish errored out, _qm_dqalloc would also error out without bothering to free the (locked) buffer. Leaking a locked buffer caused hangs in generic/388 when quotas are enabled. Furthermore, the _bmap_finish -> _defer_finish conversion in 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*") failed to observe that the buffer was held going into _defer_finish and therefore failed to notice that the buffer lock is /not/ maintained afterwards. Now that we can bjoin a buffer to a defer_ops, use this mechanism to ensure that the buffer stays locked across the _defer_finish. Release the holds and locks on the buffer as appropriate if we have to error out. There is a subtlety here for the caller in that the buffer emerges locked and held to the transaction, so if the _trans_commit fails we have to release the buffer explicitly. This fixes the unmount hang in generic/388 when quotas are enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: don't discard on free of unwritten extents	Brian Foster	2018-05-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: skip online discard during eofblocks trims	Brian Foster	2018-05-10	3	-12/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: add bmapi nodiscard flag	Brian Foster	2018-05-10	7	-18/+87
\| \| \| \| \| \| \| \| \| \| \| \| \|	Freed extents are unconditionally discarded when online discard is enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass discards when unnecessary. For example, this will be useful for eofblocks trimming. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: get rid of the log item descriptor	Dave Chinner	2018-05-10	18	-124/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: add some more debug checks to buffer log item reuse	Dave Chinner	2018-05-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Just to make sure the item isn't associated with another transaction when we try to reuse it. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: fix double ijoin in xfs_reflink_clear_inode_flag()	Dave Chinner	2018-05-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_reflink_clear_inode_flag double-joins an inode to a transaction, which is not allowed. Fix that and document that the caller must have already joined it. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: edit out trace for nonexistent ASSERT] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: fix double ijoin in xfs_reflink_cancel_cow_range	Dave Chinner	2018-05-10	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_reflink_cancel_cow_range joins an inode twice to the same transaction. This is not allowed, so fix it and document that the callers of xfs_reflink_cancel_cow_blocks() must have already joined the inode to the permanent transaction passed in. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: edited the commit log to remove trace for nonexistent ASSERT] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: fix double ijoin in xfs_inactive_symlink_rmt()	Dave Chinner	2018-05-10	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_inactive_symlink_rmt() does something nasty - it joins an inode into a transaction it is already joined to. This means the inode can have multiple log item descriptors attached to the transaction for it. This breaks teh 1:1 mapping that is supposed to exist between the log item and log item descriptor. This results in the log item being processed twice during transaction commit and CIL formatting, and there are lots of other potential issues tha arise from double processing of log items in the transaction commit state machine. In this case, the inode is already held by the rolling transaction returned from xfs_defer_finish(), so there's no need to join it again. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: don't assert fail with AIL lock held	Dave Chinner	2018-05-10	1	-11/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Been hitting AIL ordering assert failures recently, but been unable to trace them down because the system immediately hangs up onteh spinlock that was held when this assert fires: XFS: Assertion failed: XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0, file: fs/xfs/xfs_trans_ail.c, line: 52 Move the assertions outside of the spinlock so the corpse can be dissected. Thanks to Brian Foster for supplying a clean way of doing this. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: adder caller IP to xfs_defer* tracepoints	Dave Chinner	2018-05-10	2	-12/+17
\| \| \| \| \| \| \| \| \| \|	So it's clear in the trace where they are being called from. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: add tracing to high level transaction operations	Dave Chinner	2018-05-10	3	-1/+57
\| \| \| \| \| \| \| \| \| \| \| \|	Because currently we have no idea what the transaction context we are operating in is, and I need to know that information to track down bugs in multiple log item joins to transactions. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: log item flags are racy	Dave Chinner	2018-05-10	19	-48/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log item flags contain a field that is protected by the AIL lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to set and clear these flags, but most of the updates and checks are not done with the AIL lock held and so are susceptible to update races. Fix this by changing the log item flags to use atomic bitops rather than be reliant on the AIL lock for update serialisation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: add missing rmap error return	Darrick J. Wong	2018-05-10	1	-0/+2
\| \| \| \| \| \| \| \| \|	xfs_rmap_lookup_le_range can return errors, so we need to check for them and bail out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: bmap debugging should never panic the system	Darrick J. Wong	2018-05-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Don't panic() the system if the bmap records are garbage, just call ASSERT which gives us the same backtrace but enables developers to control if the system goes down or not. This makes debugging with generic/388 much easier because it won't reboot the machine midway through a run just because btree_read_bufl returns EIO when the fs has already shut down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
*	xfs: defer agfl frees from directory op transactions	Brian Foster	2018-05-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	Directory operations can perform block allocations as entries are added/removed from directories. Defer AGFL block frees from the remaining directory operation transactions. This covers the hard link, remove and rename operations. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: defer frees from common inode allocation paths	Brian Foster	2018-05-09	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inode allocation can require block allocation for physical inode chunk allocation, inode btree record insertion, and/or directory block allocation for entry insertion. Any of these block allocation requests can require AGFL fixups prior to the actual allocation. Update the common file creation transacions to defer AGFL frees from these contexts to avoid too much log reservation consumption per-transaction. Since these transactions are already passed down through the btree cursors and da_args structure, this simply requires to attach dfops to the transaction. Note that this covers tr_create, tr_mkdir and tr_symlink. Other transactions such as tr_create_tmpfile do not already make use of deferred operations and so are left alone for the time being. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: defer agfl frees from inode inactivation	Brian Foster	2018-05-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XFS inode chunks are already freed via deferred operations (which now also defer AGFL block frees), but inode btree blocks are freed directly in the associated context. This has been known to lead to log reservation overruns in particular workloads where an inobt block free may require several AGFL block frees (and thus several allocation btree modifications) before the inobt block itself is actually freed. To avoid this problem, defer the frees of any AGFL blocks before the inobt block free takes place. This requires passing the dfops from xfs_inactive_ifree() down through the inobt ->[alloc\|free]_block() callouts, which essentially only requires to attach the dfops to the transaction since it is already carried all the way through to the inobt update and allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: defer agfl block frees from deferred ops processing context	Brian Foster	2018-05-09	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that AGFL block frees are deferred when dfops is set in the transaction, start deferring AGFL block frees from contexts that are known to push the limits of existing log reservations. The first such context is deferred operation processing itself. This primarily targets deferred extent frees (such as file extents and inode chunks), but in doing so covers all allocation operations that occur in deferred operation processing context. Update xfs_defer_finish() to set and reset ->t_agfl_dfops across the processing sequence. This means that any AGFL block frees due to allocation events result in the addition of new EFIs to the dfops rather than being processed immediately. xfs_defer_finish() rolls the transaction at least once more to process the frees of the AGFL blocks back to the allocation btrees and returns once the AGFL is rectified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: defer agfl block frees when dfops is available	Brian Foster	2018-05-09	6	-7/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AGFL fixup code executes before every block allocation/free and rectifies the AGFL based on the current, dynamic allocation requirements of the fs. The AGFL must hold a minimum number of blocks to satisfy a worst case split of the free space btrees caused by the impending allocation operation. The AGFL is also updated to maintain the implicit requirement for a minimum number of free slots to satisfy a worst case join of the free space btrees. Since the AGFL caches individual blocks, AGFL reduction typically involves multiple, single block frees. We've had reports of transaction overrun problems during certain workloads that boil down to AGFL reduction freeing multiple blocks and consuming more space in the log than was reserved for the transaction. Since the objective of freeing AGFL blocks is to ensure free AGFL free slots are available for the upcoming allocation, one way to address this problem is to release surplus blocks from the AGFL immediately but defer the free of those blocks (similar to how file-mapped blocks are unmapped from the file in one transaction and freed via a deferred operation) until the transaction is rolled. This turns AGFL reduction into an operation with predictable log reservation consumption. Add the capability to defer AGFL block frees when a deferred ops list is available to the AGFL fixup code. Add a dfops pointer to the transaction to carry dfops through various contexts to the allocator context. Deferring AGFL frees is conditional behavior based on whether the transaction pointer is populated. The long term objective is to reuse the transaction pointer to clean up all unrelated callchains that pass dfops on the stack along with a transaction and in doing so, consistently defer AGFL blocks from the allocator. A bit of customization is required to handle deferred completion processing because AGFL blocks are accounted against a per-ag reservation pool and AGFL blocks are not inserted into the extent busy list when freed (they are inserted when used and released back to the AGFL). Reuse the majority of the existing deferred extent free infrastructure and customize it appropriately to handle AGFL blocks. Note that this patch only adds infrastructure. It does not change behavior because no callers have been updated to pass ->t_agfl_dfops into the allocation code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: create agfl block free helper function	Brian Foster	2018-05-09	2	-10/+29
\| \| \| \| \| \| \| \| \| \| \|	Refactor the AGFL block free code into a new helper such that it can be invoked from deferred context. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: print specific dqblk that failed verifiers	Eric Sandeen	2018-05-09	1	-19/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Rather than printing the top of the buffer that held a corrupted dqblk, restructure things to print out the specific one that failed by pushing the calls to the verifier_error function down into the verifier which iterates over the buffer and detects the error. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: add full xfs_dqblk verifier	Eric Sandeen	2018-05-09	4	-11/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Add an xfs_dqblk verifier so that it can check the uuid on V5 filesystems; it calls the existing xfs_dquot_verify verifier to validate the xfs_disk_dquot_t contained inside it. This lets us move the uuid verification out of the crc verifier, which makes little sense. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: pass full xfs_dqblk to repair during quotacheck	Eric Sandeen	2018-05-09	3	-14/+11
\| \| \| \| \| \| \| \| \| \| \|	It's a bit dicey to pass in the smaller xfs_disk_dquot and then cast it to something larger; pass in the full xfs_dqblk so we know the caller has sent us the right thing. Rename the function to xfs_dqblk_repair for clarity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: check type in quota verifier during quotacheck	Eric Sandeen	2018-05-09	1	-1/+3
\| \| \| \| \| \| \| \| \|	During quotacheck we send in the quota type, so verify that as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: remove unused flags arg from xfs_dquot_verify	Eric Sandeen	2018-05-09	5	-9/+7
\| \| \| \| \| \| \| \| \| \| \|	Long ago the flags argument was used to determine whether to issue warnings about corruptions, but that's done elsewhere now and the flag is unused here, so remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: clean up locking in xfs_file_iomap_begin	Dave Chinner	2018-05-09	1	-37/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than checking what kind of locking is needed in a helper function and then jumping through hoops to do the locking in line, move the locking to the helper function that does all the checks and rename it to xfs_ilock_for_iomap(). This also allows us to hoist all the nonblocking checks up into the locking helper, further simplifier the code flow in xfs_file_iomap_begin() and making it easier to understand. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: simplify xfs_file_iomap_begin() logic	Dave Chinner	2018-05-09	1	-36/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current logic that determines whether allocation should be done has grown somewhat spaghetti like with the addition of IOMAP_NOWAIT functionality. Separate out each of the different cases into single, obvious checks to get rid most of the nested IOMAP_NOWAIT checks in the allocation logic. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	iomap: Use FUA for pure data O_DSYNC DIO writes	Dave Chinner	2018-05-09	1	-5/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are doing direct IO writes with datasync semantics, we often have to flush metadata changes along with the data write. However, if we are overwriting existing data, there are no metadata changes that we need to flush. In this case, optimising the IO by using FUA write makes sense. We know from the IOMAP_F_DIRTY flag as to whether a specific inode requires a metadata flush - this is currently used by DAX to ensure extent modification as stable in page fault operations. For direct IO writes, we can use it to determine if we need to flush metadata or not once the data is on disk. Hence if we have been returned a mapped extent that is not new and the IO mapping is not dirty, then we can use a FUA write to provide datasync semantics. This allows us to short-cut the generic_write_sync() call in IO completion and hence avoid unnecessary operations. This makes pure direct IO data write behaviour identical to the way block devices use REQ_FUA to provide datasync semantics. On a FUA enabled device, a synchronous direct IO write workload (sequential 4k overwrites in 32MB file) had the following results: # xfs_io -fd -c "pwrite -V 1 -D 0 32m" /mnt/scratch/boo kernel time write()s write iops Write b/w ------ ---- -------- ---------- --------- (no dsync) 4s 2173/s 2173 8.5MB/s vanilla 22s 370/s 750 1.4MB/s patched 19s 420/s 420 1.6MB/s The patched code clearly doesn't send cache flushes anymore, but instead uses FUA (confirmed via blktrace), and performance improves a bit as a result. However, the benefits will be higher on workloads that mix O_DSYNC overwrites with other write IO as we won't be flushing the entire device cache on every DSYNC overwrite IO anymore. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	iomap: iomap_dio_rw() handles all sync writes	Dave Chinner	2018-05-09	2	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently iomap_dio_rw() only handles (data)sync write completions for AIO. This means we can't optimised non-AIO IO to minimise device flushes as we can't tell the caller whether a flush is required or not. To solve this problem and enable further optimisations, make iomap_dio_rw responsible for data sync behaviour for all IO, not just AIO. In doing so, the sync operation is now accounted as part of the DIO IO by inode_dio_end(), hence post-IO data stability updates will no long race against operations that serialise via inode_dio_wait() such as truncate or hole punch. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: move generic_write_sync calls inwards	Dave Chinner	2018-05-09	1	-15/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To prepare for iomap iinfrastructure based DSYNC optimisations. While moving the code araound, move the XFS write bytes metric update for direct IO into xfs_dio_write_end_io callback so that we always capture the amount of data written via AIO+DIO. This fixes the problem where queued AIO+DIO writes are not accounted to this metric. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: don't retry xfs_buf_find on XBF_TRYLOCK failure	Dave Chinner	2018-05-09	1	-28/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When looking at an event trace recently, I noticed that non-blocking buffer lookup attempts would fail on cached locked buffers and then run the slow cache-miss path. This means we are doing an xfs_buf allocation, lookup and free unnecessarily every time we avoid blocking on a locked buffer. Fix this by changing _xfs_buf_find() to return an error status to the caller to indicate that we failed the lock attempt rather than just returning a NULL. This allows the higher level code to discriminate between a cache miss and an cache hit that we failed to lock. This also allows us to return a -EFSCORRUPTED state if we are asked to look up a block number outside the range of the filesystem in _xfs_buf_find(), which moves us one step closer to being able to handle such errors in a more graceful manner at the higher levels. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: make xfs_buf_incore out of line	Dave Chinner	2018-05-09	4	-23/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move xfs_buf_incore out of line and make it the only way to look up a buffer in the buffer cache from outside the buffer cache. Convert the external users of _xfs_buf_find() to xfs_buf_incore() and make _xfs_buf_find() static. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: actually rename xfs_incore -> xfs_buf_incore] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
*	xfs: trace ATTR flags in xattr tracepoints	Eric Sandeen	2018-05-09	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	This will trace i.e. the ATTR_SECURE/ATTR_CREATE/ATTR_REPLACE flags as well as the OP_FLAGS. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>