summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/free-space-cache.c
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: only map pages if we know we need them when reading the space cacheJosef Bacik2011-11-111-7/+10
| | | | | | | | | | | | People have been running into a warning when loading space cache because the page is already mapped when trying to read in a bitmap. The way we read in entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry maps the entries if it needs to, and if it hits the end of the page it simply unmaps the page. That way we can unconditionally unmap the io_ctl before reading in the bitmap and we should stop hitting these warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use the global reserve when truncating the free space cache inodeJosef Bacik2011-11-061-5/+17
| | | | | | | | | We no longer use the orphan block rsv for holding the reservation for truncating the inode, so instead use the global block rsv and check to make sure it has enough space for us to truncate the space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAINChris Mason2011-11-061-1/+3
| | | | | | | | btrfs_remove_free_space needs to make sure to set ret back to a valid return value after setting it to EAGAIN, otherwise we return it to the callers. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: don't flush the cache inode before writing itJosef Bacik2011-10-191-4/+0Star
| | | | | | | | | | I noticed we had a little bit of latency when writing out the space cache inodes. It's because we flush it before we write anything in case we have dirty pages already there. This doesn't matter though since we're just going to overwrite the space, and there really shouldn't be any dirty pages anyway. This makes some of my tests run a little bit faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: seperate out btrfs_block_rsv_check out into 2 different functionsJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | | Currently btrfs_block_rsv_check does 2 things, it will either refill a block reserve like in the truncate or refill case, or it will check to see if there is enough space in the global reserve and possibly refill it. However because of overcommit we could be well overcommitting ourselves just to try and refill the global reserve, when really we should just be committing the transaction. So breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill will try to reserve more metadata if it can and btrfs_block_rsv_check will not, it will only tell you if the factor of the total space is still reserved. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: inline checksums into the disk free space cacheJosef Bacik2011-10-191-57/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | Yeah yeah I know this is how we used to do it and then I changed it, but damnit I'm changing it back. The fact is that writing out checksums will modify metadata, which could cause us to dirty a block group we've already written out, so we have to truncate it and all of it's checksums and re-write it which will write new checksums which could dirty a blockg roup that has already been written and you see where I'm going with this? This can cause unmount or really anything that depends on a transaction to commit to take it's sweet damned time to happen. So go back to the way it was, only this time we're specifically setting NODATACOW because we can't go through the COW pathway anyway and we're doing our own built-in cow'ing by truncating the free space cache. The other new thing is once we truncate the old cache and preallocate the new space, we don't need to do that song and dance at all for the rest of the transaction, we can just overwrite the existing space with the new cache if the block group changes for whatever reason, and the NODATACOW will let us do this fine. So keep track of which transaction we last cleared our cache in and if we cleared it in this transaction just say we're all setup and carry on. This survives xfstests and stress.sh. The inode cache will continue to use the normal csum infrastructure since it only gets written once and there will be no more modifications to the fs tree in a transaction commit. Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: check the return value of filemap_write_and_wait in the space cacheJosef Bacik2011-10-191-2/+5
| | | | | | | | We need to check the return value of filemap_write_and_wait in the space cache writeout code. Also don't set the inode's generation until we're sure nothing else is going to fail. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: add a io_ctl struct and helpers for dealing with the space cacheJosef Bacik2011-10-191-318/+375
| | | | | | | | | | | | | | | In writing and reading the space cache we have one big loop that keeps track of which page we are on and then a bunch of sizeable loops underneath this big loop to try and read/write out properly. Especially in the write case this makes things hugely complicated and hard to follow, and makes our error checking and recovery equally as complex. So add a io_ctl struct with a bunch of helpers to keep track of the pages we have, where we are, if we have enough space etc. This unifies how we deal with the pages we're writing and keeps all the messy tracking internal. This allows us to kill the big loops in both the read and write case and makes reviewing and chaning the write and read paths much simpler. I've run xfstests and stress.sh on this code and it survives. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: don't skip writing out a empty block groups cacheJosef Bacik2011-10-191-4/+6
| | | | | | | | | I noticed a slight bug where we will not bother writing out the block group cache's space cache if it's space tree is empty. Since it could have a cluster or pinned extents that need to be written out this is just not a valid test. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: use the inode's mapping mask for allocating pagesJosef Bacik2011-10-191-2/+4
| | | | | | | | | | Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: stop passing a trans handle all around the reservation codeJosef Bacik2011-10-191-3/+1Star
| | | | | | | | | | The only thing that we need to have a trans handle for is in reserve_metadata_bytes and thats to know how much flushing we can do. So instead of passing it around, just check current->journal_info for a trans_handle so we know if we can commit a transaction to try and free up space or not. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: handle enospc accounting for free space inodesJosef Bacik2011-10-191-15/+29
| | | | | | | | | Since free space inodes now use normal checksumming we need to make sure to account for their metadata use. So reserve metadata space, and then if we fail to write out the metadata we can just release it, otherwise it will be freed up when the io completes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: put the block group cache after we commit the superJosef Bacik2011-10-191-1/+1
| | | | | | | | | | In moving some enospc stuff around I noticed that when we unmount we are often evicting the free space cache inodes before we do our last commit. This isn't bad, but it makes us constantly have to re-read the inodes back. So instead don't evict the cache until after we do our last commit, this will make things a little less crappy and makes a future enospc change work properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: fix call to btrfs_search_slot in free space cacheJosef Bacik2011-10-191-1/+1
| | | | | | | | We are setting ins_len to 1 even tho we are just modifying an item that should be there already. This may cause the search stuff to split nodes on the way down needelessly. Set this to 0 since we aren't inserting anything. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_checkJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | If you run xfstest 224 it you will get lots of messages about not being able to delete inodes and that they will be cleaned up next mount. This is because btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to flush, so if there was not enough space, it simply failed. But in truncate and evict case we could easily flush space to try and get enough space to do our work, so make btrfs_block_rsv_check take a flush argument to pass down to reserve_metadata_bytes. Now xfstests 224 runs fine without all those complaints. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: ratelimit the generation printk for the free space cacheJosef Bacik2011-10-191-5/+7
| | | | | | | | | | | A user reported getting spammed when moving to 3.0 by this message. Since we switched to the normal checksumming infrastructure all old free space caches will be wrong and need to be regenerated so people are likely to see this message a lot, so ratelimit it so it doesn't fill up their logs and freak them out. Thanks, Reported-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: use bytes_may_use for all ENOSPC reservationsJosef Bacik2011-10-191-6/+23
| | | | | | | | | | | | | We have been using bytes_reserved for metadata reservations, which is wrong since we use that to keep track of outstanding reservations from the allocator. This resulted in us doing a lot of silly things to make sure we don't allocate a bunch of metadata chunks since we never had a real view of how much space was actually in use by metadata. This passes Arne's enospc test and xfstests as well as my own enospc tests. Hopefully this will get us moving in the right direction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: reset to appropriate block rsv after orphan operationsLiu Bo2011-09-111-0/+4
| | | | | | | | | | | | | | While truncating free space cache, we forget to change trans->block_rsv back to the original one, but leave it with the orphan_block_rsv, and then with option inode_cache enable, it leads to countless warnings of btrfs_alloc_free_block and btrfs_orphan_commit_root: WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]() ... WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]() Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix wrong free space informationMiao Xie2011-08-171-5/+11
| | | | | | | | | | | | | Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use find_or_create_page instead of grab_cache_pageJosef Bacik2011-07-271-2/+2
| | | | | | | | grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: use the normal checksumming infrastructure for free space cacheJosef Bacik2011-07-111-110/+59Star
| | | | | | | | | | We used to store the checksums of the space cache directly in the space cache, however that doesn't work out too well if we have more space than we can fit the checksums into the first page. So instead use the normal checksumming infrastructure. There were problems with doing this originally but those problems don't exist now so this works out fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: make sure to update total_bitmaps when freeing cache V3Josef Bacik2011-06-251-3/+6
| | | | | | | | | | A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make sure to recheck for bitmaps in clustersChris Mason2011-06-101-4/+5
| | | | | | | | | | | | | | | | | Josef recently changed the free extent cache to look in the block group cluster for any bitmaps before trying to add a new bitmap for the same offset. This avoids BUG_ON()s due covering duplicate ranges. But it didn't go quite far enough. A given free range might span between one or more bitmaps or free space entries. The code has looping to cover this, but it doesn't check for clustered bitmaps every time. This shuffles our gotos to check for a bitmap in the cluster for every new bitmap entry we try to add. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix duplicate checking logicJosef Bacik2011-06-081-3/+3
| | | | | | | | | When merging my code into the integration test the second check for duplicate entries got screwed up. This patch fixes it by dropping ret2 and just using ret for the return value, and checking if we got an error before adding the bitmap to the local list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: fix bitmap regressionJosef Bacik2011-06-081-19/+69
| | | | | | | | | | | | | | | | | | In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: noinline the cluster searching functionsJosef Bacik2011-06-081-8/+10
| | | | | | | | When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: cache bitmaps when searching for a clusterJosef Bacik2011-06-081-5/+49
| | | | | | | | | | | If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* btrfs: add helper for fs_info->closingDavid Sterba2011-06-041-6/+4Star
| | | | | | | wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs: add mount -o inode_cacheChris Mason2011-06-041-0/+6
| | | | | | | | This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make sure we don't overflow the free space cache crc pageChris Mason2011-06-041-8/+19
| | | | | | | | | The free space cache uses only one page for crcs right now, which means we can't have a cache file bigger than the crcs we can fit in the first page. This adds a check to enforce that restriction. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge branch 'for-chris' ofChris Mason2011-05-281-3/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: check for duplicate entries in the free space cacheJosef Bacik2011-05-231-3/+24
| | | | | | | | | | | | | | If there are duplicate entries in the free space cache, discard the entire cache and load it the old fashioned way. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into ↵Chris Mason2011-05-221-13/+10Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | btrfs: remove all unused functionsDavid Sterba2011-05-061-15/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove static and global declarations and/or definitions. Reduces size of btrfs.ko by ~3.4kB. text data bss dec hex filename 402081 7464 200 409745 64091 btrfs.ko.base 398620 7144 200 405964 631cc btrfs.ko.remove-all Signed-off-by: David Sterba <dsterba@suse.cz>
| * | btrfs: drop unused parameter from btrfs_release_pathDavid Sterba2011-05-021-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | parameter tree root it's not used since commit 5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer interface for large blocksizes") Signed-off-by: David Sterba <dsterba@suse.cz>
| * | btrfs: make functions static when possibleDavid Sterba2011-05-021-2/+2
| | | | | | | | | | | | Signed-off-by: David Sterba <dsterba@suse.cz>
| * | btrfs: remove nested duplicate variable declarationsDavid Sterba2011-05-021-3/+0Star
| |/ | | | | | | Signed-off-by: David Sterba <dsterba@suse.cz>
* | Merge branch 'ino-alloc' of git://repo.or.cz/linux-btrfs-devel into ↵Chris Mason2011-05-211-380/+596
|\ \ | |/ |/| | | | | | | | | | | | | inode_numbers Conflicts: fs/btrfs/free-space-cache.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Support reading/writing on disk free ino cacheLi Zefan2011-04-251-1/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is similar to block group caching. We dedicate a special inode in fs tree to save free ino cache. At the very first time we create/delete a file after mount, the free ino cache will be loaded from disk into memory. When the fs tree is commited, the cache will be written back to disk. To keep compatibility, we check the root generation against the generation of the special inode when loading the cache, so the loading will fail if the btrfs filesystem was mounted in an older kernel before. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| * Btrfs: Make the code for reading/writing free space cache genericLi Zefan2011-04-251-154/+204
| | | | | | | | | | | | | | | | | | Extract out block group specific code from lookup_free_space_inode(), create_free_space_inode(), load_free_space_cache() and btrfs_write_out_cache(), so the code can be used to read/write free ino cache. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| * Btrfs: Cache free inode numbers in memoryLi Zefan2011-04-251-20/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently btrfs stores the highest objectid of the fs tree, and it always returns (highest+1) inode number when we create a file, so inode numbers won't be reclaimed when we delete files, so we'll run out of inode numbers as we keep create/delete files in 32bits machines. This fixes it, and it works similarly to how we cache free space in block cgroups. We start a kernel thread to read the file tree. By scanning inode items, we know which chunks of inode numbers are free, and we cache them in an rb-tree. Because we are searching the commit root, we have to carefully handle the cross-transaction case. The rb-tree is a hybrid extent+bitmap tree, so if we have too many small chunks of inode numbers, we'll use bitmaps. Initially we allow 16K ram of extents, and a bitmap will be used if we exceed this threshold. The extents threshold is adjusted in runtime. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| * Btrfs: Make free space cache code genericLi Zefan2011-04-251-199/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we can re-use the code to cache free inode numbers. The change is quite straightforward. Two new structures are introduced. - struct btrfs_free_space_ctl We move those variables that are used for caching free space from struct btrfs_block_group_cache to this new struct. - struct btrfs_free_space_op We do block group specific work (e.g. calculation of extents threshold) through functions registered in this struct. And then we can remove references to struct btrfs_block_group_cache. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| * Btrfs: Use bitmap_set/clear()Li Zefan2011-04-251-12/+8Star
| | | | | | | | | | | | No functional change. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| * Btrfs: Remove unused btrfs_block_group_free_space()Li Zefan2011-04-251-15/+0Star
| | | | | | | | | | | | We've already recorded the value in block_group->frees_space. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
* | Btrfs: free bitmaps properly when evicting the cacheJosef Bacik2011-04-261-4/+7
| | | | | | | | | | | | | | | | | | | | | | If our space cache is wrong, we do the right thing and free up everything that we loaded, however we don't reset the total_bitmaps counter or the thresholds or anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap() if it's a bitmap, this will keep us from panicing when we check to make sure we don't have too many bitmaps. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Btrfs: Free free_space item properly in btrfs_trim_block_group()Li Zefan2011-04-261-1/+1
|/ | | | | | | | Since commit dc89e9824464e91fa0b06267864ceabe3186fd8b, we've changed to use a specific slab for alocation of free_space items. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix free space cache leakChris Mason2011-04-181-1/+1
| | | | | | | | | | The free space caching code was recently reworked to cache all the pages it needed instead of using find_get_page everywhere. One loop was missed though, so it ended up leaking pages. This fixes it to use our page array instead of find_get_page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: deal with the case that we run out of space in the cacheJosef Bacik2011-04-081-62/+55Star
| | | | | | | | | | | Currently we don't handle running out of space in the cache, so to fix this we keep track of how far in the cache we are. Then we only dirty the pages if we successfully modify all of them, otherwise if we have an error or run out of space we can just drop them and not worry about the vm writing them out. Thanks, Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: fix free space cache when there are pinned extents and clusters V2Josef Bacik2011-04-051-4/+78
| | | | | | | | | | | | | | | | | | | | | I noticed a huge problem with the free space cache that was presenting as an early ENOSPC. Turns out when writing the free space cache out I forgot to take into account pinned extents and more importantly clusters. This would result in us leaking free space everytime we unmounted the filesystem and remounted it. I fix this by making sure to check and see if the current block group has a cluster and writing out any entries that are in the cluster to the cache, as well as writing any pinned extents we currently have to the cache since those will be available for us to use the next time the fs mounts. This patch also adds a check to the end of load_free_space_cache to make sure we got the right amount of free space cache, and if not make sure to clear the cache and re-cache the old fashioned way. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: clear __GFP_FS flag in the space cache inodeMiao Xie2011-04-051-0/+2
| | | | | | | | | | | the object id of the space cache inode's key is allocated from the relative root, just like the regular file. So we can't identify space cache inode by checking the object id of the inode's key, and we have to clear __GFP_FS flag at the time we look up the space cache inode. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>