summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | Btrfs: don't BUG_ON() in btrfs_num_copiesJosef Bacik2013-05-061-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user sent me a btrfs-image that was panicing because of some corruption. This is because we pass in a bogus value to btrfs_num_copies, and it panics. Instead just return 1. We only call btrfs_num_copies to see if there are other copies to try and read for things, so if we just return 1 it will make the callers exit out with an appropriate error value. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: don't call readahead hook until we have read the entire ebJosef Bacik2013-05-061-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin Steigerwald reported a BUG_ON() where we were given a bogus bytenr to map. Turns out he is using > PAGESIZE leafsizes. The readahead stuff is called every time we do a completion, but we may not have finished reading in all the pages, so the bytenr we read off the node could be completely bogus. Fix this by only calling the readahead hook once all pages have been read in. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: deal with bad mappings in btrfs_map_blockJosef Bacik2013-05-061-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find a chunk for a particular block we were trying to map. This happened because the block was bogus. We shouldn't be BUG_ON()'ing in this case, just print a message and return an error. This came from reada_add_block and it appears to deal with an error fine so we should be good there. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: use REQ_META for all metadata IOJosef Bacik2013-05-061-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to tag metadata io with REQ_META to avoid priority inversion when using io throttling cqroups. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix possible infinite loop in slow cachingJosef Bacik2013-05-061-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I noticed there is an infinite loop in the slow caching code. If we return 1 when we hit the end of the tree, so we could end up caching the last block group the slow way and suddenly we're looping forever because we just keep re-searching and trying again. Fix this by only doing btrfs_next_leaf() if we don't need_resched(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix lockdep warningJosef Bacik2013-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The locking order for stuff is __sb_start_write ordered_mutex but with sync() we don't do __sb_start_write for some strange reason, which means that our iput in wait_ordered_extents could start a transaction which does the __sb_start_write while we're holding the ordered_mutex. Fix this by using delayed iput in sync. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: add all ioctl checks before user change for quota operationsWang Shilong2013-05-061-5/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all the quota configurations are loaded in memory, and we can have ioctl checks before operating in the disk. It is safe to do such things because qgroup_ioctl_lock is held outside. Without these extra checks firstly, it should be ok to do user change for quota operations. For example: if we want to add an existed qgroup, we will do: ->add_qgroup_item() ->add_qgroup_rb() add_qgroup_item() will return -EEXIST to us, however, qgroups are all in memory, why not check them in memory firstly. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix missing check about ulist_add() in qgroup.cWang Shilong2013-05-061-18/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ulist_add() may return -ENOMEM, fix missing check about return value. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: clear received_uuid field for new writable snapshotsStefan Behrens2013-05-061-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For created snapshots, the full root_item is copied from the source root and afterwards selectively modified. The current code forgets to clear the field received_uuid. The only problem is that it is confusing when you look at it with 'btrfs subv list', since for writable snapshots, the contents of the snapshot can be completely unrelated to the previously received snapshot. The receiver ignores such snapshots anyway because he also checks the field stransid in the root_item and that value used to be reset to zero for all created snapshots. This commit changes two things: - clear the received_uuid field for new writable snapshots. - don't clear the send/receive related information like the stransid for read-only snapshots (which makes them useable as a parent for the automatic selection of parents in the receive code). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: don't force pages under writeback to finish when abortingJosef Bacik2013-05-061-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave reported a BUG_ON() that happened in end_page_writeback() after an abort. This happened because we unconditionally call end_page_writeback() in the endio case, which is right. However when we abort the transaction we will call end_page_writeback() on any writeback pages we find, which is wrong. We need to lock the page and wait on page writeback to complete if it is. There is nothing unsafe about this since we are discarding the transaction anyway. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: remove unused variable in the iterate_extent_inodes()Wang Shilong2013-05-061-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: return error when we specify wrong start to defragLiu Bo2013-05-061-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need such a sanity check for wrong start when we defrag a file, otherwise, even with a wrong start that's larger than file size, we can end up changing not only inode's force compress flag but also FS's incompat flags. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix reada debug code compilationVincent2013-05-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following errors: fs/btrfs/reada.c: In function ‘btrfs_reada_wait’: fs/btrfs/reada.c:958:42: error: invalid operands to binary < (have ‘atomic_t’ and ‘int’) fs/btrfs/reada.c:961:41: error: invalid operands to binary < (have ‘atomic_t’ and ‘int’) Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: cleanup of function where btrfs_extend_item() is calledTsutomu Itoh2013-05-061-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Argument 'trans' became unnecessary from setup_inline_extent_backref() that called btrfs_extend_item(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: remove unused argument of btrfs_extend_item()Tsutomu Itoh2013-05-067-11/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Argument 'trans' is not used in btrfs_extend_item(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: cleanup of function where fixup_low_keys() is calledTsutomu Itoh2013-05-0610-51/+38Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If argument 'trans' is unnecessary in the function where fixup_low_keys() is called, 'trans' is deleted. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: remove unused argument of fixup_low_keys()Tsutomu Itoh2013-05-061-10/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Argument 'trans' is not used in fixup_low_keys(). So, remove it. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix confusing edquot happening caseWang Shilong2013-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Step to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> dd if=/dev/zero of=/<mnt>/data bs=1M count=10 sync btrfs quota enable <mnt> btrfs qgroup create 0/5 <mnt> btrfs qgroup limit 5M 0/5 <mnt> rm -f /<mnt>/data sync btrfs qgroup show <mnt> dd if=/dev/zero of=data bs=1M count=1 >From the perspective of users, qgroup's referenced or exclusive is negative,but user can not continue to write data! a workaround way is to cast u64 to s64 when doing qgroup reservation. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: do not continue if out of memory happensWang Shilong2013-05-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If out of memory happens, we should return -ENOMEM directly to the caller rather than continue the work. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | btrfs: fix minor typo in commentNathaniel Yazdani2013-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the comment describing the sync_writers field of the btrfs_inode struct, "fsyncing" was misspelled "fsycing." Signed-off-by: Nathaniel Yazdani <n1ght.4nd.d4y@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: cleanup to remove reduplicate code in transaction.cWang Shilong2013-05-061-12/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix unlock after free on rewinded tree blocksJan Schmidt2013-05-061-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When tree_mod_log_rewind decides to make a copy of the current tree buffer for its modifications, it subsequently freed the buffer before unlocking it. Obviously, those operations are required in reverse order. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix accessing the root pointer in tree mod log functionsJan Schmidt2013-05-061-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tree mod log functions were accessing root->node->... directly, without use of btrfs_root_node() or explicit rcu locking. This could lead to an extent buffer reference being leaked and another reference being freed too early when preemtion was enabled. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix tree mod log regression on root split operationsJan Schmidt2013-05-061-26/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d9abbf1c changed tree mod log locking around ROOT_REPLACE operations. When a tree root is split, however, we were logging removal of all elements from the root node before logging removal of half of the elements for the split operation. This leads to a BUG_ON when rewinding. This commit removes the erroneous logging of removal of all elements. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: use a lock to protect incompat/compat flag of the super blockMiao Xie2013-05-063-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following case will make the incompat/compat flag of the super block be recovered. Task1 |Task2 flags = btrfs_super_incompat_flags(); | |flags = btrfs_super_incompat_flags(); flags |= new_flag1; | |flags |= new_flag2; btrfs_set_super_incompat_flags(flags); | |btrfs_set_super_incompat_flags(flags); the new_flag1 is recovered. In order to avoid this problem, we introduce a lock named super_lock into the btrfs_fs_info structure. If we want to update incompat/compat flags of the super block, we must hold it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix unblocked autodefraggers when remountMiao Xie2013-05-061-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new mount option is set after parsing the remount arguments, so it is wrong that checking the autodefrag is close or not at btrfs_remount_prepare(). Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: add a rb_tree to improve performance of ulist searchWang Shilong2013-05-062-8/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Walking backref tree and btrfs quota rely on ulist very much. This patch tries to use rb_tree to speed up search time. The original code always checks whether an element exists before adding a new element, however it costs O(n). I try to add a rb_tree in the ulist,this is only used to speed up search. I also do some measurements with quota enabled. fsstress -p 4 -n 10000 Without this path: real 0m51.058s 2m4.745s 1m28.222s 1m5.137s user 0m0.035s 0m0.041s 0m0.105s 0m0.100s sys 0m12.009s 0m11.246s 0m10.901s 0m10.999s 0m11.287s With this path: real 0m55.295s 0m50.960s 1m2.214s 0m48.273s user 0m0.053s 0m0.095s 0m0.135s 0m0.107s sys 0m7.766s 0m6.013s 0m6.319s 0m6.030s 0m6.532s After applying the patch,the execute time is down by ~42%.(11.287s->6.532s) Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: allow omitting stream header and end-cmd for btrfs sendStefan Behrens2013-05-061-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new flags are added to allow omitting the stream header and the end command for btrfs send streams. This is used in cases where you send multiple snapshots back-to-back in one stream. This used to be encoded like this (with 2 snapshots in this example): <stream header> + <sequence of commands> + <end cmd> + <stream header> + <sequence of commands> + <end cmd> + EOF The new format (if the two new flags are used) is this one: <stream header> + <sequence of commands> + <sequence of commands> + <end cmd> Note that the currently existing receivers treat <end cmd> only as an indication that a new <stream header> is following. This means, you can just skip the sequence <end cmd> <stream header> without loosing compatibility. As long as an EOF is following, the currently existing receivers handle the new format (if the two new flags are used) exactly as the old one. So what is the benefit of this change? The goal is to be able to use a single stream (one TCP connection) to multiplex a request/response handshake plus Btrfs send streams, all in the same stream. In this case you cannot evaluate an EOF condition as an end of the Btrfs send stream. You need something else, and the <end cmd> is just perfect for this purpose. The summary is: The format change is driven by the need to send several Btrfs send streams over a single TCP connections, with the ability for a repeated request/response handshake in the middle. And this format change does not break any existing tool, it is completely compatible. You could compare the old behaviour of the Btrfs send stream to the one of ftp where you need a seperate request/response channel and newly opened data transfer channels for each file, while the new behaviour is more like http using a single stream for everything. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: make __merge_refs() return type be voidWang Shilong2013-05-061-8/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __merge_refs() always return 0, it is unnecessary for the caller to check the return value. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: remove some BUG_ONs() when walking backref treeWang Shilong2013-05-061-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only error return value of __add_prelim_ref() is -ENOMEM, just return errors rather than trigger BUG_ON(). Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: use tree_root to avoid edquot when disabling quotaWang Shilong2013-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steps to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs sub create <mnt>/subv btrfs qgroup limit 10K <mnt>/subv btrfs quota disable <mnt>/subv It is wrong for qgroup to reserve when disabling quota, so just use tree_root to avoid edquot when disabling quota. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix a warning when updating qgroup limitWang Shilong2013-05-061-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Step to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs qgroup limit 0/1 <mnt> dmesg If the relative qgroup dosen't exist, flag 'BTRFS_QGROUP_STATUS_ FLAG_INCONSISTENT' will be set, and print the noise message. This is wrong, we can just move find_qgroup_rb() before update_qgroup_limit_item().this dosen't change the logic of the function. But it can avoid unnecessary noise message and wrong set of flag. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix missing check in the btrfs_qgroup_inherit()Wang Shilong2013-05-061-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original code forgot to check 'inherit', we should gurantee that all the qgroups in the struct 'inherit' exist. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix missing check before creating a qgroup relationWang Shilong2013-05-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Step to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs qgroup assign 0/1 1/1 <mnt> umount <mnt> btrfs-debug-tree <disk> | grep QGROUP If we want to add a qgroup relation, we should gurantee that 'src' and 'dst' exist, otherwise, such qgroup relation should not be allowed to create. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: remove some unnecessary spin_lock usagesWang Shilong2013-05-061-21/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use mutex lock to protect all the user change operations. So when we are calling find_qgroup_rb() to check whether qgroup exists, we don't have to hold spin_lock. Besides, when enabling/disabling quota, it must be single thread when operations come here. spin lock must be firstly used to clear quota_root when disabling quota, while enabling quota, spin lock must be used to complete the last assign work. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: introduce a mutex lock for btrfs quota operationsWang Shilong2013-05-063-25/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original code has one spin_lock 'qgroup_lock' to protect quota configurations in memory. If we want to add a BTRFS_QGROUP_INFO_KEY, it will be added to Btree firstly, and then update configurations in memory,however, a race condition may happen between these operations. For example: ->add_qgroup_info_item() ->add_qgroup_rb() For the above case, del_qgroup_info_item() may happen just before add_qgroup_rb(). What's worse, when we want to add a qgroup relation: ->add_qgroup_relation_item() ->add_qgroup_relations() We don't have any checks whether 'src' and 'dst' exist before add_qgroup_relation_item(), a race condition can also happen for the above case. To avoid race condition and have all the necessary checks, we introduce a mutex lock 'qgroup_ioctl_lock', and we make all the user change operations protected by the mutex lock. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: creating the subvolume qgroup automatically when enabling quotaWang Shilong2013-05-062-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Creating the subvolume/snapshots(including root subvolume) qgroup auotomatically when enabling quota. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | btrfs: abort unlink trans in missed error caseZach Brown2013-05-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __btrfs_unlink_inode() aborts its transaction when it sees errors after it removes the directory item. But it missed the case where btrfs_del_dir_entries_in_log() returns an error. If this happens then the unlink appears to fail but the items have been removed without updating the directory size. The directory then has leaked bytes in i_size and can never be removed. Adding the missing transaction abort at least makes this failure consistent with the other failure cases. I noticed this while reading the code after someone on irc reported having a directory with i_size but no entries. I tested it by forcing btrfs_del_dir_entries_in_log() to return -ENOMEM. Signed-off-by: Zach Brown <zab@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | btrfs: ignore device open failures in __btrfs_open_devicesEric Sandeen2013-05-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This: # mkfs.btrfs /dev/sdb{1,2} ; wipefs -a /dev/sdb1; mount /dev/sdb2 /mnt/test would lead to a blkdev open/close mismatch when the mount fails, and a permanently busy (opened O_EXCL) sdb2: # wipefs -a /dev/sdb2 wipefs: error: /dev/sdb2: probing initialization failed: Device or resource busy It's because btrfs_open_devices() may open some devices, fail on the last one, and return that failure stored in "ret." The mount then fails, but the caller then does not clean up the open devices. Chris assures me that: "btrfs_open_devices just means: go off and open every bdev you can from this uuid. It should return success if we opened any of them at all." So change the logic to ignore any open failures; just skip processing of that device. Later on it's decided whether we have enough devices to continue. Reported-by: Jan Safranek <jsafrane@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: improve the performance of the csums lookupMiao Xie2013-05-065-31/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is very likely that there are several blocks in bio, it is very inefficient if we get their csums one by one. This patch improves this problem by getting the csums in batch. According to the result of the following test, the execute time of __btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us). # dd if=<mnt>/file of=/dev/null bs=1M count=1024 Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix bad extent loggingJosef Bacik2013-05-0610-324/+40Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: log ram bytes properlyJosef Bacik2013-05-064-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When logging changed extents I was logging ram_bytes as the current length, which isn't correct, it's supposed to be the ram bytes of the original extent. This is for compression where even if we split the extent we need to know the ram bytes so when we uncompress the extent we know how big it will be. This was still working out right with compression for some reason but I think we were getting lucky. It was definitely off for prealloc which is why I noticed it, btrfsck was complaining about it. With this patch btrfsck no longer complains after a log replay. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: don't wait on ordered extents if we have a trans openJosef Bacik2013-05-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave was hitting a lockdep warning because we're now properly taking the ordered operations mutex in the ordered wait stuff. This is because some cases we will have a trans handle when we are flushing delalloc space, but we can't wait on ordered extents because we could potentially deadlock, so fix this by not doing the wait if we have a trans handle. Thanks Reported-and-tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix error handling in make/read block groupJosef Bacik2013-05-061-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed that we will add a block group to the space info before we add it to the block group cache rb tree, so we could potentially allocate from the block group before it's able to be searched for. I don't think this is too much of a problem, the race window is microscopic, but just in case move the tree insertion to above the space info linking. This makes it easier to adjust the error handling as well, so we can remove a couple of BUG_ON(ret)'s and have real error handling setup for these scenarios. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix double free in the iterate_extent_inodes()Wang Shilong2013-05-061-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If btrfs_find_all_roots() fails, 'roots' has been freed or 'roots' fails to allocate. We don't need to free it outside btrfs_find_all_roots() again.Fix it. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: kill some BUG_ONs() in the find_parent_nodes()Wang Shilong2013-05-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reason that BUG_ON() happens in these places is just because of ENOMEM. We try ro return ENOMEM rather than trigger BUG_ON(), the caller will abort the transaction thus avoiding the kernel panic. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: compare relevant parts of delayed tree refsJosef Bacik2013-05-061-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported a panic while running a balance. What was happening was he was relocating a block, which added the reference to the relocation tree. Then relocation would walk through the relocation tree and drop that reference and free that block, and then it would walk down a snapshot which referenced the same block and add another ref to the block. The problem is this was all happening in the same transaction, so the parent block was free'ed up when we drop our reference which was immediately available for allocation, and then it was used _again_ to add a reference for the same block from a different snapshot. This resulted in something like this in the delayed ref tree add ref to 90234880, parent=2067398656, ref_root 1766, level 1 del ref to 90234880, parent=2067398656, ref_root 18446744073709551608, level 1 add ref to 90234880, parent=2067398656, ref_root 1767, level 1 as you can see the ref_root's don't match, because when we inc the ref we use the header owner, which is the original tree the block belonged to, instead of the data reloc tree. Then when we remove the extent we use the reloc tree objectid. But none of this matters, since it is a shared reference which means only the parent matters. When the delayed ref stuff runs it adds all the increments first, and then does all the drops, to make sure that we don't delete the ref if we net a positive ref count. But tree blocks aren't allowed to have multiple refs from the same block, so this panics when it tries to add the second ref. We need the add and the drop to cancel each other out in memory so we only do the final add. So to fix this we need to adjust how the delayed refs are added to the tree. Only the ref_root matters when it is a normal backref, and only the parent matters when it is a shared backref. So make our decision based on what ref type we have. This allows us to keep the ref_root in memory in case anybody wants to use it for something else, and it allows the delayed refs to be merged properly so we don't end up with this panic. With this patch the users image no longer panics on mount, and it has a clean fsck after a normal mount/umount cycle. Thanks, Cc: stable@vger.kernel.org Reported-by: Roman Mamedov <rm@romanrm.ru> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix infinite loop when we abort on mountJosef Bacik2013-05-062-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing my enospc log code I managed to abort a transaction during mount, which put me into an infinite loop. This is because of two things, first we don't reset trans_no_join if we abort during transaction commit, which will force anybody trying to start a transaction to just loop endlessly waiting for it to be set to 0. But this is still just a symptom, the second issue is we don't set the fs state to error during errors on mount. This is because we don't want to do the flip read only thing during mount, but we still really want to set the fs state to an error to keep us from even getting to the trans_no_join check. So fix both of these things, make sure to reset trans_no_join if we abort during a commit, and make sure we set the fs state to error no matter if we're mounting or not. This should keep us from getting into this infinite loop again. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: fix a warning when disabling quotaWang Shilong2013-05-061-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steps to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs sub create <mnt>/subv i=1 while [ $i -le 10000 ] do dd if=/dev/zero of=<mnt>/subv/data_$i bs=1K count=1 i=$(($i+1)) if [ $i -eq 500 ] then btrfs quota disable $mnt fi done dmesg Obviously, this warn_on() is unnecessary, and it will be easily triggered. Just remove it. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | | | Btrfs: pass NULL instead of 0Liu Bo2013-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_extent_bit()'s (u64 *failed_start) expects NULL not 0. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>