summaryrefslogtreecommitdiffstats
path: root/fs/nilfs2/inode.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-03-241-1/+0Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
| * block: remove per-queue pluggingJens Axboe2011-03-101-1/+0Star
| | | | | | | | | | | | | | | | Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nilfs2: get rid of nilfs_sb_info structureRyusuke Konishi2011-03-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This directly uses sb->s_fs_info to keep a nilfs filesystem object and fully removes the intermediate nilfs_sb_info structure. With this change, the hierarchy of on-memory structures of nilfs will be simplified as follows: Before: super_block -> nilfs_sb_info -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) After: super_block -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) The reason why we didn't design so from the beginning is because the initial shape also differed from the above. The early hierachy was composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a shared nilfs object. On the kernel 2.6.37, it was changed to the current shape in order to unify super block instances into one per device, and this cleanup became applicable as the result. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: move next generation counter into nilfs objectRyusuke Konishi2011-03-091-4/+4
| | | | | | | | | | | | | | Moves s_next_generation counter and a spinlock protecting it to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: move s_inode_lock and s_dirty_files into nilfs objectRyusuke Konishi2011-03-091-15/+15
| | | | | | | | | | | | | | Moves s_inode_lock spinlock and s_dirty_files list to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: record used amount of each checkpoint in checkpoint listRyusuke Konishi2011-03-081-0/+18
| | | | | | | | | | | | | | | | | | | | | | This records the number of used blocks per checkpoint in each checkpoint entry of cpfile. Even though userland tools can get the block count via nilfs_get_cpinfo ioctl, it was not updated by the nilfs2 kernel code. This fixes the issue and makes it available for userland tools to calculate used amount per checkpoint. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp>
* | nilfs2: tighten restrictions on inode flagsRyusuke Konishi2011-03-081-5/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Nilfs has few rectrictions on which flags may be set on which inodes like ext2/3/4 filesystems used to be. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. This introduces a flags masking function like those of extN and uses it during inode creation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is setRyusuke Konishi2011-03-081-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | At present, nilfs marks S_NOATIME flag on all inodes. This restricts nilfs_set_inode_flags function so that it marks S_NOATIME only if a given inode has an FS_NOATIME_FL flag. Although nilfs does not support atime yet, touch_atime() still safely returns on IS_NOATIME check since MS_NOATIME is always set on sb. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: use common file attribute macrosRyusuke Konishi2011-03-081-7/+7
| | | | | | | | | | | | | | | | Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL, NILFS_COMPR_FL, and so forth) with common inode flags, and removes the own flag declarations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: decrement inodes count only if raw inode was successfully deletedRyusuke Konishi2011-03-081-2/+4
|/ | | | | | | | This fixes the issue that inodes count will not add up after removal of raw inodes fails. Hence, this prevents possible under flow of the inodes count. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: unfold nilfs_dat_inode functionRyusuke Konishi2011-01-101-6/+5Star
| | | | | | | | | | | nilfs_dat_inode function was a wrapper to switch between normal dat inode and gcdat, a clone of the dat inode for garbage collection. This function got obsolete when the gcdat inode was removed, and now we can access the dat inode directly from a nilfs object. So, we will unfold the wrapper and remove it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: do not pass sbi to functions which can get it from inodeRyusuke Konishi2011-01-101-10/+8Star
| | | | | | | | | | | This removes argument for passing nilfs_sb_info structure from nilfs_set_file_dirty and nilfs_load_inode_block functions. We can get a pointer to the structure from inodes. [Stephen Rothwell <sfr@canb.auug.org.au>: fix conflict with commit b74c79e99389cd79b31fcc08f82c24e492e63c7e] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fiemap supportRyusuke Konishi2011-01-101-0/+131
| | | | | | | | | | | | | | This adds fiemap to nilfs. Two new functions, nilfs_fiemap and nilfs_find_uncommitted_extent are added. nilfs_fiemap() implements the fiemap inode operation, and nilfs_find_uncommitted_extent() helps to get a range of data blocks whose physical location has not been determined. nilfs_fiemap() collects extent information by looping through nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: mark buffer heads as delayed until the data is written to diskRyusuke Konishi2011-01-101-0/+1
| | | | | | | | | | | Nilfs does not allocate new blocks on disk until they are actually written to. To implement fiemap, we need to deal with such blocks. To allow successive fiemap patch to distinguish mapped but unallocated regions, this marks buffer heads of those new blocks as delayed and clears the flag after the blocks are written to disk. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: call nilfs_error inside bmap routinesRyusuke Konishi2011-01-101-14/+5Star
| | | | | | | | | | | Some functions using nilfs bmap routines can wrongly return invalid argument error (i.e. -EINVAL) that bmap returns as an internal code for btree corruption. This fixes the issue by catching and converting the internal EINVAL to EIO and calling nilfs_error function inside bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* fs: provide rcu-walk aware permission i_opsNick Piggin2011-01-071-3/+7
| | | | Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* nilfs2: see state of root dentry for mount check of snapshotsRyusuke Konishi2010-10-231-0/+10
| | | | | | | | | | | | | After applied the patch that unified sb instances, root dentry of snapshots can be left in dcache even after their trees are unmounted. The orphan root dentry/inode keeps a root object, and this causes false positive of nilfs_checkpoint_is_mounted function. This resolves the issue by having nilfs_checkpoint_is_mounted test whether the root dentry is busy or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: use iget for all metadata filesRyusuke Konishi2010-10-231-3/+10
| | | | | | | This makes use of iget5_locked to allocate or get inode for metadata files to stop using own inode allocator. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: allow nilfs_clear_inode to clear metadata file inodesRyusuke Konishi2010-10-231-0/+4
| | | | | | | | Allows clear inode function (nilfs_clear_inode) to handle metadata files that uses bitmap-based object alloctor. DAT and ifile correspond to this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: deny write access to inodes in snapshotsRyusuke Konishi2010-10-231-0/+11
| | | | | | | | | | | | | | | Snapshots of nilfs are read-only. After super block instances (sb) will be unified, nilfs will need to check write access by a way other than implicit test with IS_RDONLY(inode). This is because IS_RDONLY() refers to MS_RDONLY bit of inode->i_sb->s_flags and it will become inaccurate after the unification of sb. To prepare for the issue, this uses i_op->permission to deny write access to inodes in snapshots. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move inode count and block count into root objectRyusuke Konishi2010-10-231-2/+2
| | | | | | | This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: use root object to get ifileRyusuke Konishi2010-10-231-27/+21Star
| | | | | | | | This rewrites functions using ifile so that they get ifile from nilfs_root object, and will remove sbi->s_ifile. Some functions that don't know the root object are extended to receive it from caller. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: set pointer to root object in inodesRyusuke Konishi2010-10-231-5/+22
| | | | | | | | | | | | This puts a pointer to nilfs_root object in the private part of on-memory inode, and makes nilfs_iget function pick up the inode with the same root object. Non-root inodes inherit its nilfs_root object from parent inode. That of the root inode is allocated through nilfs_attach_checkpoint() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: remove own inode hash used for GCRyusuke Konishi2010-10-231-0/+22
| | | | | | | | This uses inode hash function that vfs provides instead of the own hash table for caching gc inodes. This finally removes the own inode hash from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: use iget5_locked to get inodeRyusuke Konishi2010-10-231-3/+34
| | | | | | | This uses iget5_locked instead of iget_locked so that gc cache can look up inodes with an inode number and an optional checkpoint number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirtyRyusuke Konishi2010-10-231-0/+5
| | | | | | | This allows sop->dirty_inode callback function (nilfs_dirty_inode) to handle metadata file inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* convert nilfs2 to ->evict_inode()Al Viro2010-08-091-4/+24
| | | | | | [folded build fix from sfr] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* remove inode_setattrChristoph Hellwig2010-08-091-6/+19
| | | | | | | | | | | | | | | | | | | Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of block_write_begin_newtruncChristoph Hellwig2010-08-091-4/+8
| | | | | | | | | | | Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* sort out blockdev_direct_IO variantsChristoph Hellwig2010-08-091-0/+13
| | | | | | | | | | | | Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nilfs2: replace inode uid,gid,mode initialization with helper functionDmitry Monakhov2010-05-221-10/+1Star
| | | | | | Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nilfs2: use huge_encode_dev/huge_decode_devRyusuke Konishi2010-05-101-2/+2
| | | | | | | | | This replaces uses of new_encode_dev/new_decode_dev with their 64-bit counterparts, huge_encode_dev/huge_decode_dev respectively. This is just for clarification and has no impact on the disk format. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirtyJiro SEKIBA2009-11-271-3/+3
| | | | | | | | Replace mark_inode_dirty() as nilfs_mark_inode_dirty() to reduce deep function calls. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: delete mark_inode_dirty in nilfs_new_inodeJiro SEKIBA2009-11-271-1/+0Star
| | | | | | | | | It is redundant to call mark_inode_dirty() in nilfs_new_inode() because all caller of nilfs_new_inode() will call mark_inode_dirty() after calling nilfs_new_inode() directly or indirectly in transaction. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: move out mark_inode_dirty calls from bmap routinesRyusuke Konishi2009-11-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called mark_inode_dirty() after they changed the number of data blocks. This moves these calls outside bmap outermost functions like nilfs_bmap_insert() or nilfs_bmap_truncate(). This will mitigate overhead for truncate or delete operation since they repeatedly remove set of blocks. Nearly 10 percent improvement was observed for removal of a large file: # dd if=/dev/zero of=/test/aaa bs=1M count=512 # time rm /test/aaa real 2.968s -> 2.705s Further optimization may be possible by eliminating these mark_inode_dirty() uses though I avoid mixing separate changes here. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: remove buffer locking in nilfs_mark_inode_dirtyRyusuke Konishi2009-11-201-3/+0Star
| | | | | | | | This lock is eliminable because inodes on the buffer can be updated independently. Although a log writer also fills in bmap data on the on-disk inodes, this update is exclusively done by a log writer lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: deleted inconsistent comment in nilfs_load_inode_block()Jiro SEKIBA2009-11-151-1/+0Star
| | | | | | | | The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix missing initialization of i_dir_start_lookup memberRyusuke Konishi2009-09-291-0/+1
| | | | | | | | | | | The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* const: mark remaining address_space_operations constAlexey Dobriyan2009-09-221-1/+1
| | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: fix ignored error code in __nilfs_read_inode()Ryusuke Konishi2009-09-141-1/+2
| | | | | | | | | | The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* switch nilfs2 to inode->i_aclAl Viro2009-06-241-8/+0Star
| | | | | | | Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nilfs2: support contiguous lookup of blocksRyusuke Konishi2009-06-101-7/+8
| | | | | | | | | | | | Although get_block() callback function can return extent of contiguous blocks with bh->b_size, nilfs_get_block() function did not support this feature. This adds contiguous lookup feature to the block mapping codes of nilfs, and allows the nilfs_get_blocks() function to return the extent information by applying the feature. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: enable sync_page methodRyusuke Konishi2009-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This adds a missing sync_page method which unplugs bio requests when waiting for page locks. This will improve read performance of nilfs. Here is a measurement result using dd command. Without this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s With this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* NILFS2: Pagecache usage optimization on NILFS2Hisashi Hifumi2009-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi, I introduced "is_partially_uptodate" aops for NILFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. 1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil e-rw-ratio=1 run -2.6.30-rc5 Test execution summary: total time: 151.2907s total number of events: 200000 total time taken by event execution: 2409.8387 per-request statistics: min: 0.0000s avg: 0.0120s max: 0.9306s approx. 95 percentile: 0.0439s Threads fairness: events (avg/stddev): 12500.0000/238.52 execution time (avg/stddev): 150.6149/0.01 -2.6.30-rc5-patched Test execution summary: total time: 140.8828s total number of events: 200000 total time taken by event execution: 2240.8577 per-request statistics: min: 0.0000s avg: 0.0112s max: 0.8750s approx. 95 percentile: 0.0418s Threads fairness: events (avg/stddev): 12500.0000/218.43 execution time (avg/stddev): 140.0536/0.01 arch: ia64 pagesize: 16k Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: support nanosecond timestampRyusuke Konishi2009-04-071-7/+6Star
| | | | | | | | | | | | | | | | | | | | After a review of user's feedback for finding out other compatibility issues, I found nilfs improperly initializes timestamps in inode; CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs didn't have nanosecond timestamps on disk. A few users gave us the report that the tar program sometimes failed to expand symbolic links on nilfs, and it turned out to be the cause. Instead of applying the above displacement, I've decided to support nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit field was in the nilfs_inode struct, and I found it's available for this purpose without impact for the users. So, this will do the enhancement and resolve the tar problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: clean up sketch fileRyusuke Konishi2009-04-071-33/+2Star
| | | | | | | | | | | | | | The sketch file is a file to mark checkpoints with user data. It was experimentally introduced in the original implementation, and now obsolete. The file was handled differently with regular files; the file size got truncated when a checkpoint was created. This stops the special treatment and will treat it as a regular file. Most users are not affected because mkfs.nilfs2 no longer makes this file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: replace BUG_ON and BUG calls triggerable from ioctlRyusuke Konishi2009-04-071-14/+5Star
| | | | | | | | | | | | | | | Pekka Enberg advised me: > It would be nice if BUG(), BUG_ON(), and panic() calls would be > converted to proper error handling using WARN_ON() calls. The BUG() > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be > triggerable from user-space via the ioctl() system call. This will follow the comment and keep them to a minimum. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: avoid double error caused by nilfs_transaction_endRyusuke Konishi2009-04-071-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Pekka Enberg pointed out that double error handlings found after nilfs_transaction_end() can be avoided by separating abort operation: OK, I don't understand this. The only way nilfs_transaction_end() can fail is if we have NILFS_TI_SYNC set and we fail to construct the segment. But why do we want to construct a segment if we don't commit? I guess what I'm asking is why don't we have a separate nilfs_transaction_abort() function that can't fail for the erroneous case to avoid this double error value tracking thing? This does the separation and renames nilfs_transaction_end() to nilfs_transaction_commit() for clarification. Since, some calls of these functions were used just for exclusion control against the segment constructor, they are replaced with semaphore operations. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: fix missed-sync issue for do_sync_mapping_range()Ryusuke Konishi2009-04-071-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | Chris Mason pointed out that there is a missed sync issue in nilfs_writepages(): On Wed, 17 Dec 2008 21:52:55 -0500, Chris Mason wrote: > It looks like nilfs_writepage ignores WB_SYNC_NONE, which is used by > do_sync_mapping_range(). where WB_SYNC_NONE in do_sync_mapping_range() was replaced with WB_SYNC_ALL by Nick's patch (commit: ee53a891f47444c53318b98dac947ede963db400). This fixes the problem by letting nilfs_writepages() write out the log of file data within the range if sync_mode is WB_SYNC_ALL. This involves removal of nilfs_file_aio_write() which was previously needed to ensure O_SYNC sync writes. Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>