summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ext4: teach ext4_mb_init_cache() to skip uptodate buddy cachesAmir Goldstein2011-05-101-8/+25
| | | | | | | | | | | | | After online resize which adds new groups, some of the groups in a buddy page may be initialized and uptodate, while other (new ones) may be uninitialized. The indication for init of new block groups is when ext4_mb_init_cache() is called with an uptodate buddy page. In this case, initialized groups on that buddy page must be skipped when initializing the buddy cache. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: synchronize ext4_mb_init_group() with buddy page lockAmir Goldstein2011-05-101-113/+62Star
| | | | | | | | | | | | | | | The old routines ext4_mb_[get|put]_buddy_cache_lock(), which used to take grp->alloc_sem for all groups on the buddy page have been replaced with the routines ext4_mb_[get|put]_buddy_page_lock(). The new routines take both buddy and bitmap page locks to protect against concurrent init of groups on the same buddy page. The GROUP_NEED_INIT flag is tested again under page lock to check if the group was initialized by another caller. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: implement ext4_add_groupblocks() by freeing blocksAmir Goldstein2011-05-101-17/+18
| | | | | | | | | | | | The old imlementation used to take grp->alloc_sem and set the GROUP_NEED_INIT flag, so that the buddy cache would be reloaded. The new implementation updates the buddy cache by freeing the added blocks and making them available for use, so there is no need to reload the buddy cache and there is no need to take grp->alloc_sem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove unneeded ext4_journal_get_undo_accessTheodore Ts'o2011-05-093-27/+4Star
| | | | | | | | | | | | | | | | The block allocation code used to use jbd2_journal_get_undo_access as a way to make changes that wouldn't show up until the commit took place. The new multi-block allocation code has a its own way of preventing newly freed blocks from getting reused until the commit takes place (it avoids updating the buddy bitmaps until the commit is done), so we don't need to use jbd2_journal_get_undo_access(), which has extra overhead compared to jbd2_journal_get_write_access(). There was one last vestigal use of ext4_journal_get_undo_access() in ext4_add_groupblocks(); change it to use ext4_journal_get_write_access() and then remove the ext4_journal_get_undo_access() support. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: move ext4_add_groupblocks() to mballoc.cAmir Goldstein2011-05-093-126/+126
| | | | | | | | In preparation for the next patch, the function ext4_add_groupblocks() is moved to mballoc.c, where it could use some static functions. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove redundant #ifdef in super.cAmerigo Wang2011-05-091-2/+0Star
| | | | | | | | There is already an #ifdef CONFIG_QUOTA some lines above, so this one is totally useless. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove redundant check for first_not_zeroed in ext4_register_li_requestTao Ma2011-05-091-8/+1Star
| | | | | | | | | | | | We have checked first_not_zeroed == ngroups already above, so remove this redundant check. sbi->s_li_request = NULL above is also removed since it is NULL already. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: use s_inodes_per_block directly in __ext4_get_inode_locTao Ma2011-05-091-1/+1
| | | | | | | | | | In __ext4_get_inode_loc, we calculate inodes_per_block every time by EXT4_BLOCK_SIZE(sb) / EXT4_INODE_SIZE(sb). AFAICS, this function is a hot path for ext4, so we'd better use s_inodes_per_block directly instead of calculating every time. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.cTao Ma2011-05-091-1/+1
| | | | | | | | | | | | | | | | We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG for the new mballoc debug, but there isn't any EXT4_DEBUG. As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use EXT4FS_DEBUG in fsync.c. [ It doesn't really matter; although I'm including this commit for consistency's sake. The whole point of the #ifdef's is to disable the debugging code. In general you're not going to want to enable all of the code protected by EXT4FS_DEBUG at the same time. -- Ted ] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: only print the debugging information for tid wraparound onceTheodore Ts'o2011-05-091-4/+5
| | | | | | | If we somehow wrap, we don't want to keep printing the warning message over and over again. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Fix forever sleeping process in do_get_write_access()Jan Kara2011-05-091-2/+7
| | | | | | | | | | | | | | In do_get_write_access() we wait on BH_Unshadow bit for buffer to get from shadow state. The waking code in journal_commit_transaction() has a bug because it does not issue a memory barrier after the buffer is moved from the shadow state and before wake_up_bit() is called. Thus a waitqueue check can happen before the buffer is actually moved from the shadow state and waiting process may never be woken. Fix the problem by issuing proper barrier. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: reimplement convert and split_unwrittenYongqiang Yang2011-05-031-408/+72Star
| | | | | | | | | Reimplement ext4_ext_convert_to_initialized() and ext4_split_unwritten_extents() using ext4_split_extent() Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
* ext4: add ext4_split_extent_at() and ext4_split_extent()Yongqiang Yang2011-05-031-0/+187
| | | | | | | | | | Add two functions: ext4_split_extent_at(), which splits an extent into two extents at given logical block, and ext4_split_extent() which splits an extent into three extents. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
* ext4: add a function merging extents right and leftYongqiang Yang2011-05-031-29/+36
| | | | | | | | | | | | | 1) Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right(). 2) Add a new function ext4_ext_try_to_merge() which tries to merge an extent both left and right. 3) Use the new function in ext4_ext_convert_unwritten_endio() and ext4_ext_insert_extent(). Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
* ext4: fix deadlock in ext4_symlink() in ENOSPC conditionsJan Kara2011-05-031-11/+55
| | | | | | | | | | | | | | | | | | ext4_symlink() cannot call __page_symlink() with transaction open. __page_symlink() calls ext4_write_begin() which can wait for transaction commit if we are running out of space thus causing a deadlock. Also error recovery in ext4_truncate_failed_write() does not count with the transaction being already started (although I'm not aware of any particular deadlock here). Fix the problem by stopping a transaction before calling __page_symlink() (we have to be careful and put inode to orphan list so that it gets deleted in case of crash) and starting another one after __page_symlink() returns for addition of symlink into a directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Fix fs corruption when make_indexed_dir() failsJan Kara2011-05-031-2/+12
| | | | | | | | | | | | | When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: set extents flag when migrating file to use extentsTheodore Ts'o2011-05-031-1/+1
| | | | | | | | | Fix a typo that was introduced in commit 07a038245b (in 2.6.36) which caused the extents flag not to be set at the conclusion of converting an inode to use extents. Reported-by: Peter Uchno <peter.uchno@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix fsync() tid wraparound bugTheodore Ts'o2011-05-021-3/+13
| | | | | | | | | | | | | | | | | | | | | | If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd2/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that we is valid for us to wait until it is commited: namely, the currently running transaction (if it exists). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove dead code in ext4_has_free_blocks()Shaohua Li2011-05-021-5/+0Star
| | | | | | | percpu_counter_sum_positive() never returns a negative value. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: ignore errors when issuing discardsTheodore Ts'o2011-04-301-25/+9Star
| | | | | | | | | | | | | | This is an effective revert of commit a30eec2a8: "ext4: stop issuing discards if not supported by device". The problem is that there are some devices that may return errors in response to a discard request some times but not others. (One example would be a hybrid dm device which concatenates an SSD and an HDD device). By this logic, I also removed the error checking from ext4's FITRIM code; so that an error from a discard will not stop the FITRIM from trying to trim the rest of the file system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: don't set PageUptodate in ext4_end_bio()Curt Wohlgemuth2011-04-301-28/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the bio completion routine, we should not be setting PageUptodate at all -- it's set at sys_write() time, and is unaffected by success/failure of the write to disk. This can cause a page corruption bug when the file system's block size is less than the architecture's VM page size. if we have only written a single block -- we might end up setting the page's PageUptodate flag, indicating that page is completely read into memory, which may not be true. This could cause subsequent reads to get bad data. This commit also takes the opportunity to clean up error handling in ext4_end_bio(), and remove some extraneous code: - fixes ext4_end_bio() to set AS_EIO in the page->mapping->flags on error, which was left out by mistake. This is needed so that fsync() will return an error if there was an I/O error. - remove the clear_buffer_dirty() call on unmapped buffers for each page. - consolidate page/buffer error handling in a single section. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Jim Meyering <jim@meyering.net> Reported-by: Hugh Dickins <hughd@google.com> Cc: Mingming Cao <cmm@us.ibm.com>
* ext4: check for ext[23] file system features when mounting as ext[23]Theodore Ts'o2011-04-182-9/+80
| | | | | | | | | | | | | | | Provide better emulation for ext[23] mode by enforcing that the file system does not have any unsupported file system features as defined by ext[23] when emulating the ext[23] file system driver when CONFIG_EXT4_USE_FOR_EXT23 is defined. This causes the file system type information in /proc/mounts to be correct for the automatically mounted root file system. This also means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda contains an ext3 or ext4 file system, just as one would expect if the original ext2 file system driver were in use. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: release page cache in ext4_mb_load_buddy error pathYang Ruirui2011-04-171-0/+2
| | | | | | | | Add missing page_cache_release in the error path of ext4_mb_load_buddy Signed-off-by: Yang Ruirui <ruirui.r.yang@tieto.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
* Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-04-1217-507/+531
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: use proper interfaces for on-stack plugging xfs: fix xfs_debug warnings xfs: fix variable set but not used warnings xfs: convert log tail checking to a warning xfs: catch bad block numbers freeing extents. xfs: push the AIL from memory reclaim and periodic sync xfs: clean up code layout in xfs_trans_ail.c xfs: convert the xfsaild threads to a workqueue xfs: introduce background inode reclaim work xfs: convert ENOSPC inode flushing to use new syncd workqueue xfs: introduce a xfssyncd workqueue xfs: fix extent format buffer allocation size xfs: fix unreferenced var error in xfs_buf.c Also, applied patch from Tony Luck that fixes ia64: xfs_destroy_workqueues() should not be tagged with__exit in the branch before merging.
| * xfs_destroy_workqueues() should not be tagged with__exitLuck, Tony2011-04-121-1/+1
| | | | | | | | | | | | | | | | | | | | ia64 throws away .exit sections for the built-in CONFIG case, so routines that are used in other circumstances should not be tagged as __exit. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * xfs: use proper interfaces for on-stack pluggingChristoph Hellwig2011-04-081-11/+9Star
| | | | | | | | | | | | | | | | | | | | Add proper blk_start_plug/blk_finish_plug pairs for the two places where we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and xfs_buf_iowait, given that context switches already flush the per-process plugging lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: fix xfs_debug warningsChristoph Hellwig2011-04-082-29/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no effect in xfs_debug: fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles': fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect The reason for that is that the various new xfs message functions have a return value which is never used, and in case of the non-debug build xfs_debug the macro evaluates to a plain 0 which produces the above warnings. This can be fixed by turning xfs_debug into an inline function instead of a macro, but in addition to that I've also changed all the message helpers to return void as we never use their return values. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: fix variable set but not used warningsChristoph Hellwig2011-04-085-18/+0Star
| | | | | | | | | | | | | | | | GCC 4.6 now warnings about variables set but not used. Fix the trivially fixable warnings of this sort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * xfs: convert log tail checking to a warningDave Chinner2011-04-082-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the Power platform, the log tail debug checks fire excessively causing the system to panic early in testing. The debug checks are known to be racy, though on x86_64 there is no evidence that they trigger at all. We want to keep the checks active on debug systems to alert us to problems with log space accounting, but we need to reduce the impact of a racy check on testing on the Power platform. As a result, convert the ASSERT conditions to warnings, and allow them to fire only once per filesystem mount. This will prevent false positives from interfering with testing, whilst still providing us with the indication that they may be a problem with log space accounting should that occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: catch bad block numbers freeing extents.Dave Chinner2011-04-081-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | A fuzzed filesystem crashed a kernel when freeing an extent with a block number beyond the end of the filesystem. Convert all the debug asserts in xfs_free_extent() to active checks so that we catch bad extents and return that the filesytsem is corrupted rather than crashing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: push the AIL from memory reclaim and periodic syncDave Chinner2011-04-084-9/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are short on memory, we want to expedite the cleaning of dirty objects. Hence when we run short on memory, we need to kick the AIL flushing into action to clean as many dirty objects as quickly as possible. To implement this, sample the lsn of the log item at the head of the AIL and use that as the push target for the AIL flush. Further, we keep items in the AIL that are dirty that are not tracked any other way, so we can get objects sitting in the AIL that don't get written back until the AIL is pushed. Hence to get the filesystem to the idle state, we might need to push the AIL to flush out any remaining dirty objects sitting in the AIL. This requires the same push mechanism as the reclaim push. This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to match the new xfs_ail_max_lsn() function introduced in this patch. Similarly for xfs_trans_ail_push -> xfs_ail_push. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: clean up code layout in xfs_trans_ail.cDave Chinner2011-04-081-136/+118Star
| | | | | | | | | | | | | | | | | | | | This patch rearranges the location of functions in xfs_trans_ail.c to remove the need for forward declarations of those functions in preparation for adding new functions without the need for forward declarations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: convert the xfsaild threads to a workqueueDave Chinner2011-04-083-151/+124Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the xfssyncd, the per-filesystem xfsaild threads can be converted to a global workqueue and run periodically by delayed works. This makes sense for the AIL pushing because it uses variable timeouts depending on the work that needs to be done. By removing the xfsaild, we simplify the AIL pushing code and remove the need to spread the code to implement the threading and pushing across multiple files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: introduce background inode reclaim workDave Chinner2011-04-082-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background inode reclaim needs to run more frequently that the XFS syncd work is run as 30s is too long between optimal reclaim runs. Add a new periodic work item to the xfs syncd workqueue to run a fast, non-blocking inode reclaim scan. Background inode reclaim is kicked by the act of marking inodes for reclaim. When an AG is first marked as having reclaimable inodes, the background reclaim work is kicked. It will continue to run periodically untill it detects that there are no more reclaimable inodes. It will be kicked again when the first inode is queued for reclaim. To ensure shrinker based inode reclaim throttles to the inode cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the background inode reclaim so that when we are low on memory we are trying to reclaim inodes as efficiently as possible. This kick shoul d not be necessary, but it will protect against failures to kick the background reclaim when inodes are first dirtied. To provide the rate throttling, make the shrinker pass do synchronous inode reclaim so that it blocks on inodes under IO. This means that the shrinker will reclaim inodes rather than just skipping over them, but it does not adversely affect the rate of reclaim because most dirty inodes are already under IO due to the background reclaim work the shrinker kicked. These two modifications solve one of the two OOM killer invocations Chris Mason reported recently when running a stress testing script. The particular workload trigger for the OOM killer invocation is where there are more threads than CPUs all unlinking files in an extremely memory constrained environment. Unlike other solutions, this one does not have a performance impact on performance when memory is not constrained or the number of concurrent threads operating is <= to the number of CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: convert ENOSPC inode flushing to use new syncd workqueueDave Chinner2011-04-083-102/+36Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: introduce a xfssyncd workqueueDave Chinner2011-04-084-61/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the work xfssyncd does is background functionality. There is no need for a thread per filesystem to do this work - it can al be managed by a global workqueue now they manage concurrency effectively. Introduce a new gglobal xfssyncd workqueue, and convert the periodic work to use this new functionality. To do this, use a delayed work construct to schedule the next running of the periodic sync work for the filesystem. When the sync work is complete, queue a new delayed work for the next running of the sync work. For laptop mode, we wait on completion for the sync works, so ensure that the sync work queuing interface can flush and wait for work to complete to enable the work queue infrastructure to replace the current sequence number and wakeup that is used. Because the sync work does non-trivial amounts of work, mark the new work queue as CPU intensive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: fix extent format buffer allocation sizeDave Chinner2011-04-081-27/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When formatting an inode item, we have to allocate a separate buffer to hold extents when there are delayed allocation extents on the inode and it is in extent format. The allocation size is derived from the in-core data fork representation, which accounts for delayed allocation extents, while the on-disk representation does not contain any delalloc extents. As a result of this mismatch, the allocated buffer can be far larger than needed to hold the real extent list which, due to the fact the inode is in extent format, is limited to the size of the literal area of the inode. However, we can have thousands of delalloc extents, resulting in an allocation size orders of magnitude larger than is needed to hold all the real extents. Fix this by limiting the size of the buffer being allocated to the size of the literal area of the inodes in the filesystem (i.e. the maximum size an inode fork can grow to). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
| * xfs: fix unreferenced var error in xfs_buf.cDave Chinner2011-03-311-2/+0Star
| | | | | | | | | | Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | Merge branch 'for_linus' of ↵Linus Torvalds2011-04-126-35/+102
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix data corruption regression by reverting commit 6de9843dab3f ext4: Allow indirect-block file to grow the file size to max file size ext4: allow an active handle to be started when freezing ext4: sync the directory inode in ext4_sync_parent() ext4: init timer earlier to avoid a kernel panic in __save_error_info jbd2: fix potential memory leak on transaction commit ext4: fix a double free in ext4_register_li_request ext4: fix credits computing for indirect mapped files ext4: remove unnecessary [cm]time update of quota file jbd2: move bdget out of critical section
| * | ext4: fix data corruption regression by reverting commit 6de9843dab3fTheodore Ts'o2011-04-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it caused a data corruption regression with BitTorrent downloads. Thanks to Damien for discovering and bisecting to find the problem commit. https://bugzilla.kernel.org/show_bug.cgi?id=32972 Reported-by: Damien Grassart <damien@grassart.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Allow indirect-block file to grow the file size to max file sizeKazuya Mio2011-04-111-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can create 4402345721856 byte file with indirect block mapping. However, if we grow an indirect-block file to the size with ftruncate(), we can see an ext4 warning. The following patch fixes this problem. How to reproduce: # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s # tail -n 1 /var/log/messages Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12 Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: allow an active handle to be started when freezingYongqiang Yang2011-04-111-11/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_journal_start_sb() should not prevent an active handle from being started due to s_frozen. Otherwise, deadlock is easy to happen, below is a situation. ================================================ freeze | truncate ================================================ | ext4_ext_truncate() freeze_super() | starts a handle sets s_frozen | | ext4_ext_truncate() | holds i_data_sem ext4_freeze() | waits for updates | | ext4_free_blocks() | calls dquot_free_block() | | dquot_free_blocks() | calls ext4_dirty_inode() | | ext4_dirty_inode() | trys to start an active | handle | | block due to s_frozen ================================================ Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Amir Goldstein <amir73il@users.sf.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
| * | ext4: sync the directory inode in ext4_sync_parent()Curt Wohlgemuth2011-04-111-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4 has taken the stance that, in the absence of a journal, when an fsync/fdatasync of an inode is done, the parent directory should be sync'ed if this inode entry is new. ext4_sync_parent(), which implements this, does indeed sync the dirent pages for parent directories, but it does not sync the directory *inode*. This patch fixes this. Also now return error status from ext4_sync_parent(). I tested this using a power fail test, which panics a machine running a file server getting requests from a client. Without this patch, on about every other test run, the server is missing many, many files that had been synced. With this patch, on > 6 runs, I see zero files being lost. Google-Bug-Id: 4179519 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: init timer earlier to avoid a kernel panic in __save_error_infoTao Ma2011-04-061-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During mount, when we fail to open journal inode or root inode, the __save_error_info will mod_timer. But actually s_err_report isn't initialized yet and the kernel oops. The detailed information can be found https://bugzilla.kernel.org/show_bug.cgi?id=32082. The best way is to check whether the timer s_err_report is initialized or not. But it seems that in include/linux/timer.h, we can't find a good function to check the status of this timer, so this patch just move the initializtion of s_err_report earlier so that we can avoid the kernel panic. The corresponding del_timer is also added in the error path. Reported-by: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: fix potential memory leak on transaction commitZhang Huan2011-04-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is potential memory leak of journal head in function jbd2_journal_commit_transaction. The problem is that JBD2 will not reclaim the journal head of commit record if error occurs or journal is abotred. I use the following script to reproduce this issue, on a RHEL6 system. I found it very easy to reproduce with async commit enabled. mount /dev/sdb /mnt -o journal_checksum,journal_async_commit touch /mnt/xxx echo offline > /sys/block/sdb/device/state sync umount /mnt rmmod ext4 rmmod jbd2 Removal of the jbd2 module will make slab complaining that "cache `jbd2_journal_head': can't free all objects". Signed-off-by: Zhang Huan <zhhuan@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: fix a double free in ext4_register_li_requestTao Ma2011-04-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4_register_li_request, we malloc a ext4_li_request and inserts it into ext4_li_info->li_request_list. In case of any error later, we free it in the end. But if we have some error in ext4_run_lazyinit_thread, the whole li_request_list will be dropped and freed in it. So we will double free this ext4_li_request. This patch just sets elr to NULL after it is inserted to the list so that the latter kfree won't double free it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: fix credits computing for indirect mapped filesYongqiang Yang2011-04-041-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When writing a contiguous set of blocks, two indirect blocks could be needed depending on how the blocks are aligned, so we need to increase the number of credits needed by one. [ Also fixed a another bug which could further underestimate the number of journal credits needed by 1; the code was using integer division instead of DIV_ROUND_UP() -- tytso] Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: remove unnecessary [cm]time update of quota fileJan Kara2011-04-042-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not necessary to update [cm]time of quota file on each quota file write and it wastes journal space and IO throughput with inode writes. So just remove the updating from ext4_quota_write() and only update times when quotas are being turned off. Userspace cannot get anything reliable from quota files while they are used by the kernel anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: move bdget out of critical sectionZhu Yanhai2011-04-041-1/+2
| |/ | | | | | | | | | | | | | | | | bdget() should not be called when we hold spinlocks since it might sleep. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2011-04-122-4/+6
|\ \ | | | | | | | | | | | | | | | * 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: nfsd4: fix oops on lock failure nfsd: fix auth_domain reference leak on nlm operations