summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * constify d_lookup() argumentsAl Viro2013-02-231-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify __d_lookup() argumentsAl Viro2013-02-231-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * lookup_slow: get rid of name argumentAl Viro2013-02-231-4/+3Star
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * lookup_fast: get rid of name argumentAl Viro2013-02-231-5/+5
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * get rid of name and type arguments of walk_component()Al Viro2013-02-231-10/+8Star
| | | | | | | | | | | | ... always can be found in nameidata now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * link_path_walk(): move assignments to nd->last/nd->last_type upAl Viro2013-02-231-12/+10Star
| | | | | | | | | | | | ... and clean the main loop a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * vfs: remove d_path_with_unreachableJeff Layton2013-02-231-31/+0Star
| | | | | | | | | | | | | | The last caller was removed >2 years ago in commit 7b2a69ba7. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: Preserve error code in get_empty_filp(), part 2Anatol Pomozov2013-02-234-15/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocating a file structure in function get_empty_filp() might fail because of several reasons: - not enough memory for file structures - operation is not allowed - user is over its limit Currently the function returns NULL in all cases and we loose the exact reason of the error. All callers of get_empty_filp() assume that the function can fail with ENFILE only. Return error through pointer. Change all callers to preserve this error code. [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit (things remaining here deal with alloc_file()), removed pipe(2) behaviour change] Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * propagate error from get_empty_filp() to its callersAl Viro2013-02-233-30/+28Star
| | | | | | | | | | | | Based on parts from Anatol's patch (the rest is the next commit). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * new helper: file_inode(file)Al Viro2013-02-23158-375/+373Star
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * mount: consolidate permission checksAl Viro2013-02-231-33/+7Star
| | | | | | | | | | | | ... and ask for global CAP_SYS_ADMIN only for superblock-level remounts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2013-02-2631-1622/+2105
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Theodore Ts'o: "The one new feature added in this patch series is the ability to use the "punch hole" functionality for inodes that are not using extent maps. In the bug fix category, we fixed some races in the AIO and fstrim code, and some potential NULL pointer dereferences and memory leaks in error handling code paths. In the optimization category, we fixed a performance regression in the jbd2 layer introduced by commit d9b01934d56a ("jbd: fix fsync() tid wraparound bug", introduced in v3.0) which shows up in the AIM7 benchmark. We also further optimized jbd2 by minimize the amount of time that transaction handles are held active. This patch series also features some additional enhancement of the extent status tree, which is now used to cache extent information in a more efficient/compact form than what we use on-disk." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (65 commits) ext4: fix free clusters calculation in bigalloc filesystem ext4: no need to remove extent if len is 0 in ext4_es_remove_extent() ext4: fix xattr block allocation/release with bigalloc ext4: reclaim extents from extent status tree ext4: adjust some functions for reclaiming extents from extent status tree ext4: remove single extent cache ext4: lookup block mapping in extent status tree ext4: track all extent status in extent status tree ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag ext4: rename and improbe ext4_es_find_extent() ext4: add physical block and status member into extent status tree ext4: refine extent status tree ext4: use ERR_PTR() abstraction for ext4_append() ext4: refactor code to read directory blocks into ext4_read_dirblock() ext4: add debugging context for warning in ext4_da_update_reserve_space() ext4: use KERN_WARNING for warning messages jbd2: use module parameters instead of debugfs for jbd_debug ext4: use module parameters instead of debugfs for mballoc_debug ext4: start handle at the last possible moment when creating inodes ext4: fix the number of credits needed for acl ops with inline data ...
| * | ext4: fix free clusters calculation in bigalloc filesystemLukas Czerner2013-02-221-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_has_free_clusters() should tell us whether there is enough free clusters to allocate, however number of free clusters in the file system is converted to blocks using EXT4_C2B() which is not only wrong use of the macro (we should have used EXT4_NUM_B2C) but it's also completely wrong concept since everything else is in cluster units. Moreover when calculating number of root clusters we should be using macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be off by one. However r_blocks_count should always be a multiple of the cluster ratio so doing a plain bit shift should be enough here. We avoid using EXT4_B2C() because it's confusing. As a result of the first problem number of free clusters is much bigger than it should have been and ext4_has_free_clusters() would return 1 even if there is really not enough free clusters available. Fix this by removing the EXT4_C2B() conversion of free clusters and using bit shift when calculating number of root clusters. This bug affects number of xfstests tests covering file system ENOSPC situation handling. With this patch most of the ENOSPC problems with bigalloc file system disappear, especially the errors caused by delayed allocation not having enough space when the actual allocation is finally requested. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: no need to remove extent if len is 0 in ext4_es_remove_extent()Eryu Guan2013-02-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | len is 0 means no extent needs to be removed, so return immediately. Otherwise it could trigger the following BUG_ON() in ext4_es_remove_extent() end = lblk + len - 1; BUG_ON(end < lblk); This could be reproduced by a simple truncate(1) command by an unprivileged user truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile The same is true for __es_insert_extent(). Patched kernel passed xfstests regression test. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
| * | ext4: fix xattr block allocation/release with bigallocLukas Czerner2013-02-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when new xattr block is created or released we we would call dquot_free_block() or dquot_alloc_block() respectively, among the else decrementing or incrementing the number of blocks assigned to the inode by one block. This however does not work for bigalloc file system because we always allocate/free the whole cluster so we have to count with that in dquot_free_block() and dquot_alloc_block() as well. Use the clusters-to-blocks conversion EXT4_C2B() when passing number of blocks to the dquot_alloc/free functions to fix the problem. The problem has been revealed by xfstests #117 (and possibly others). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
| * | ext4: reclaim extents from extent status treeZheng Liu2013-02-184-0/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although extent status is loaded on-demand, we also need to reclaim extent from the tree when we are under a heavy memory pressure because in some cases fragmented extent tree causes status tree costs too much memory. Here we maintain a lru list in super_block. When the extent status of an inode is accessed and changed, this inode will be move to the tail of the list. The inode will be dropped from this list when it is cleared. In the inode, a counter is added to count the number of cached objects in extent status tree. Here only written/unwritten/hole extent is counted because delayed extent doesn't be reclaimed due to fiemap, bigalloc and seek_data/hole need it. The counter will be increased as a new extent is allocated, and it will be decreased as a extent is freed. In this commit we use normal shrinker framework to reclaim memory from the status tree. ext4_es_reclaim_extents_count() traverses the lru list to count the number of reclaimable extents. ext4_es_shrink() tries to reclaim written/unwritten/hole extents from extent status tree. The inode that has been shrunk is moved to the tail of lru list. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: adjust some functions for reclaiming extents from extent status treeZheng Liu2013-02-181-26/+24Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit changes some interfaces in extent status tree because we need to use inode to count the cached objects in a extent status tree. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: remove single extent cacheZheng Liu2013-02-185-163/+38Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | Single extent cache could be removed because we have extent status tree as a extent cache, and it would be better. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: lookup block mapping in extent status treeZheng Liu2013-02-185-3/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: track all extent status in extent status treeZheng Liu2013-02-183-30/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By recording the phycisal block and status, extent status tree is able to track the status of every extents. When we call _map_blocks functions to lookup an extent or create a new written/unwritten/delayed extent, this extent will be inserted into extent status tree. We don't load all extents from disk in alloc_inode() because it costs too much memory, and if a file is opened and closed frequently it will takes too much time to load all extent information. So currently when we create/lookup an extent, this extent will be inserted into extent status tree. Hence, the extent status tree may not comprehensively contain all of the extents found in the file. Here a condition we need to take care is that an extent might contains unwritten and delayed status simultaneously because an extent is delayed allocated and could be allocated by fallocate. At this time we need to keep delayed status because later we need to update delayed reservation space using it. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flagZheng Liu2013-02-182-10/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag because in later commit ext4_map_blocks needs to use this flag to determine the extent status. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: rename and improbe ext4_es_find_extent()Zheng Liu2013-02-184-31/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent and improve this function. First, we split input and output parameter. Second, this function never return the first block of the next delayed extent after 'es'. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
| * | ext4: add physical block and status member into extent status treeZheng Liu2013-02-183-14/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two members in extent_status structure to let it record physical block and extent status. Here es_pblk is used to record both of them because physical block only has 48 bits. So extent status could be stashed into it so that we can save some memory. Now written, unwritten, delayed and hole are defined as status. Due to new member is added into extent status tree, all interfaces need to be adjusted. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: refine extent status treeZheng Liu2013-02-184-162/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refines the extent status tree code. 1) A prefix 'es_' is added to to the extent status tree structure members. 2) Refactored es_remove_extent() so that __es_remove_extent() can be used by es_insert_extent() to remove the old extent entry(-ies) before inserting a new one. 3) Rename extent_status_end() to ext4_es_end() 4) ext4_es_can_be_merged() is define to check whether two extents can be merged or not. 5) Update and clarified comments. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: use ERR_PTR() abstraction for ext4_append()Theodore Ts'o2013-02-152-34/+33Star
| | | | | | | | | | | | | | | | | | | | | Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate pointer to an integer for the error code, as a code cleanup. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: refactor code to read directory blocks into ext4_read_dirblock()Theodore Ts'o2013-02-151-172/+119Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to read in directory blocks and verify their metadata checksums was replicated in ten different places across fs/ext4/namei.c, and the code was buggy in subtle ways in a number of those replicated sites. In some cases, ext4_error() was called with a training newline. In others, in particularly in empty_dir(), it was possible to call ext4_dirent_csum_verify() on an index block, which would trigger false warnings requesting the system adminsitrator to run e2fsck. By refactoring the code, we make the code more readable, as well as shrinking the compiled object file by over 700 bytes and 50 lines of code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: add debugging context for warning in ext4_da_update_reserve_space()Theodore Ts'o2013-02-141-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print some additional debugging context to hopefully help to debug a warning which is getting triggered by xfstests #74. Also remove extraneous newlines from when printk's were converted to ext4_warning() and ext4_msg(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: use KERN_WARNING for warning messagesTheodore Ts'o2013-02-142-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some messages printed related to a WARN_ON(1) were printed using KERN_NOTICE. Use KERN_WARNING or ext4_warning() instead so that context related to the WARN_ON() is printed at the same printk warning level (and log files, etc.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: use module parameters instead of debugfs for jbd_debugTheodore Ts'o2013-02-091-42/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple reasons to move away from debugfs. First of all, we are only using it for a single parameter, and it is much more complicated to set up (some 30 lines of code compared to 3), and one more thing that might fail while loading the jbd2 module. Secondly, as a module paramter it can be specified as a boot option if jbd2 is built into the kernel, or as a parameter when the module is loaded, and it can also be manipulated dynamically under /sys/module/jbd2/parameters/jbd2_debug. So it is more flexible. Ultimately we want to move away from using jbd_debug() towards tracepoints, but for now this is still a useful simplification of the code base. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: use module parameters instead of debugfs for mballoc_debugTheodore Ts'o2013-02-092-40/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple reasons to move away from debugfs. First of all, we are only using it for a single parameter, and it is much more complicated to set up (some 30 lines of code compared to 3), and one more thing that might fail while loading the ext4 module. Secondly, as a module paramter it can be specified as a boot option if ext4 is built into the kernel, or as a parameter when the module is loaded, and it can also be manipulated dynamically under /sys/module/ext4/parameters/mballoc_debug. So it is more flexible. Ultimately we want to move away from using mb_debug() towards tracepoints, but for now this is still a useful simplification of the code base. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: start handle at the last possible moment when creating inodesTheodore Ts'o2013-02-093-55/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle until the inode has been succesfully allocated. In order to do this, we need to start the handle in the ext4_new_inode(). So create a new variant of this function, ext4_new_inode_start_handle(), so the handle can be created at the last possible minute, before we need to modify the inode allocation bitmap block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: fix the number of credits needed for acl ops with inline dataTheodore Ts'o2013-02-095-78/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Operations which modify extended attributes may need extra journal credits if inline data is used, since there is a chance that some extended attributes may need to get pushed to an external attribute block. Changes to reflect this was made in xattr.c, but they were missed in fs/ext4/acl.c. To fix this, abstract the calculation of the number of credits needed for xattr operations to an inline function defined in ext4_jbd2.h, and use it in acl.c and xattr.c. Also move the function declarations used in inline.c from xattr.h (where they are non-obviously hidden, and caused problems since ext4_jbd2.h needs to use the function ext4_has_inline_data), and move them to ext4.h. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Tao Ma <boyu.mt@taobao.com> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir()Theodore Ts'o2013-02-092-8/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ext4_unlink() and ext4_rmdir() don't actually release the blocks associated with the file/directory. This gets done in a separate jbd2 handle called via ext4_evict_inode(). Thus, we don't need to reserve lots of journal credits for the truncate. Note that using too many journal credits is non-optimal because it can leading to the journal transmit getting closed too early, before it is strictly necessary. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: fix the number of credits needed for ext4_ext_migrate()Theodore Ts'o2013-02-091-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | The migration ioctl creates a temporary inode. Since this inode is never linked to a directory, we don't need to reserve journal credits required for modifying the directory. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: start handle at the last possible moment in ext4_rmdir()Theodore Ts'o2013-02-091-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Don't start the jbd2 transaction handle until after the directory entry has been found, to minimize the amount of time that a handle is held active. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: start handle at the last possible moment in ext4_unlink()Theodore Ts'o2013-02-091-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Don't start the jbd2 transaction handle until after the directory entry has been found, to minimize the amount of time that a handle is held active. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: grab page before starting transaction handle in write_begin()Theodore Ts'o2013-02-091-43/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The grab_cache_page_write_begin() function can potentially sleep for a long time, since it may need to do memory allocation which can block if the system is under significant memory pressure, and because it may be blocked on page writeback. If it does take a long time to grab the page, it's better that we not hold an active jbd2 handle. So grab a handle on the page first, and _then_ start the transaction handle. This commit fixes the following long transaction handle hold time: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | ext4: pass context information to jbd2__journal_start()Theodore Ts'o2013-02-0916-69/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we can better understand what bits of ext4 are responsible for long-running jbd2 handles, use jbd2__journal_start() so we can pass context information for logging purposes. The recommended way for finding the longer-running handles is: T=/sys/kernel/debug/tracing EVENT=$T/events/jbd2/jbd2_handle_stats echo "interval > 5" > $EVENT/filter echo 1 > $EVENT/enable ./run-my-fs-benchmark cat $T/trace > /tmp/problem-handles This will list handles that were active for longer than 20ms. Having longer-running handles is bad, because a commit started at the wrong time could stall for those 20+ milliseconds, which could delay an fsync() or an O_SYNC operation. Here is an example line from the trace file describing a handle which lived on for 311 jiffies, or over 1.2 seconds: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: move the jbd2 wrapper functions out of super.cTheodore Ts'o2013-02-083-105/+105
| | | | | | | | | | | | | | | | | | | | | | | | Move the jbd2 wrapper functions which start and stop handles out of super.c, where they don't really logically belong, and into ext4_jbd2.c. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: add tracepoints which provide per-handle statistics Theodore Ts'o2013-02-081-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handles which stay open a long time are problematic when it comes time to close down a transaction so it can be committed. These tracepoints will help us determine which ones are the problematic ones, and to validate whether changes makes things better or worse. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: track request delay statisticsTheodore Ts'o2013-02-073-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: optimize mballoc for large allocationsTheodore Ts'o2013-02-041-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ext4 block allocator only maintains buddy bitmaps for chunks which are less than or equal to one quarter of a block group. That is, for a file aystem with a 1k blocksize, and where the number of blocks in a block group is 8192 blocks, the largest chunk size tracked by buddy bitmaps is 2048 blocks. For a file system with a 4k blocksize, and where the number of blocks in a block group is 32768 blocks, the largest chunk size tracked by buddy bitmaps is 8192 blocks. To work around this code, mballoc.c before this commit would truncate allocation requests to the number of blocks in a block group minus 10. Why 10? Aside from being a completely arbitrary number, it avoids block allocation to be a power of two larger than 25% of the block group. If you try to explicitly fallocate 50% of the block group size, this will demonstrate the problem; the block allocation code will scan the all of the blocks in the file system with cr==0 (since the request is for a natural power of two), but then completely fail for all blocks groups, since the buddy bitmaps don't track chunk sizes of 50% of the block group. To fix this, in these we use ext4_mb_complex_scan_group() instead of ext4_mb_simple_scan_group(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger@dilger.ca>
| * | ext4: check incompatible mount options while mounting ext2/3Theodore Ts'o2013-02-031-12/+35
| | | | | | | | | | | | | | | | | | | | | Check for incompatible mount options when using the ext4 file system driver to mount ext2 or ext3 file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: print error when argument of inode_readahead_blk is invalidJan Kara2013-02-031-5/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | If argument of inode_readahead_blk is too big, we just bail out without printing any error. Fix this since it could confuse users. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: make mount option parsing loop more logicalJan Kara2013-02-031-117/+113Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The loop looking for correct mount option entry is more logical if it is written rewritten as an empty loop looking for correct option entry and then code handling the option. It also saves one level of indentation for a lot of code so we can join a couple of split lines. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: move several mount options to standard handling loopJan Kara2013-02-031-31/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several mount option (resuid, resgid, journal_dev, journal_ioprio) are currently handled before we enter standard option handling loop. I don't see a reason for this so move them to normal handling loop to make things more regular. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: reduce one "if" comparison in ext4_dirhash()Cong Ding2013-02-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | It is unnecessary to check i<4 after the loop; just do it before the break. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: fix race in ext4_mb_add_n_trim()Niu Yawei2013-02-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when changing the lg_prealloc_list. Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: fix smatch warning in move_extent.c's mext_replace_branches()Akria Fujita2013-02-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2147b1a6a48 resulted in a new smatch warning: > fs/ext4/move_extent.c:693 mext_replace_branches() > warn: variable dereferenced before check 'dext' (see line 683) Fix this by adding a check to make sure dext is non-NULL before we derefrence it. Signed-off-by: Akria Fujita <a-fujita@rs.jp.nec.com> [ modified by tytso to make sure an ext4_error is called ] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: use WARN in ext4_alloc_blocksJulia Lawall2013-02-021-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use WARN rather than printk followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>