summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* jbd: Delay discarding buffers in journal_unmap_bufferJan Kara2010-03-052-17/+36
| | | | | | | | | | | | Delay discarding buffers in journal_unmap_buffer until we know that "add to orphan" operation has definitely been committed, otherwise the log space of committing transation may be freed and reused before truncate get committed, updates may get lost if crash happens. This patch is a backport of JBD2 fix by dingdinghua <dingdinghua@nrchpc.ac.cn>. Signed-off-by: Jan Kara <jack@suse.cz>
* ext3: quota_write cross block boundary behaviourDmitry Monakhov2010-03-051-35/+34Star
| | | | | | | | | | | | We always assume what dquot update result in changes in one data block But ext3_quota_write() function may handle cross block boundary writes In fact if this ever happen it will result in incorrect journal credits reservation. And later bug_on triggering. As soon this never happen the boundary cross loop is NOOP. In order to make things straight let's remove this loop and assert cross boundary condition. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquotaChristoph Hellwig2010-03-051-4/+0Star
| | | | | | | We already do these checks in the generic code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: split out compat_sys_quotactl support from quota.cChristoph Hellwig2010-03-054-117/+124
| | | | | | | Instead of adding ifdefs just split it into a new file. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: split out netlink notification support from quota.cChristoph Hellwig2010-03-053-93/+96
| | | | | | | Instead of adding ifdefs just split it into a new file. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: remove invalid optimization from quota_sync_allChristoph Hellwig2010-03-051-15/+0Star
| | | | | | | | | | Checking the "VFS" quota enabled and dirty bits from generic code means this code will never get called for other implementations, e.g. XFS and GFS2. Grabbing the reference on the superblock really isn't much overhead for a global Q_SYNC call, so just drop this optimization. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: move code from sync_quota_sb into vfs_quota_syncChristoph Hellwig2010-03-057-54/+50Star
| | | | | | | | | | | | | Currenly sync_quota_sb does a lot of sync and truncate action that only applies to "VFS" style quotas and is actively harmful for the sync performance in XFS. Move it into vfs_quota_sync and add a wait parameter to ->quota_sync to tell if we need it or not. My audit of the GFS2 code says it's also not needed given the way GFS2 implements quotas, but I'd be happy if this can get a detailed review. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: clean up Q_XQUOTASYNCChristoph Hellwig2010-03-052-19/+7Star
| | | | | | | | | | | | | | | | | | | | Currently Q_XQUOTASYNC calls into the quota_sync method, but XFS does something entirely different in it than the rest of the filesystems. xfs_quota which calls Q_XQUOTASYNC expects an asynchronous data writeout to flush delayed allocations, while the "VFS" quota support wants to flush changes to the quota file. So make Q_XQUOTASYNC call into the writeback code directly and make the quota_sync method optional as XFS doesn't need in the sense expected by the rest of the quota code. GFS2 was using limited XFS-style quota and has a quota_sync method fitting neither the style used by vfs_quota_sync nor xfs_fs_quota_sync. I left it in for now as per discussion with Steve it expects to be called from the sync path this way. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: simplify permission checkingChristoph Hellwig2010-03-051-61/+31Star
| | | | | | | | | | Stop having complicated different routines for checking permissions for XQM vs "VFS" quotas. Instead do the checks for having sb->s_qcop and a valid type directly in do_quotactl, and munge the *quotactl_valid functions into a check_quotactl_permission helper that only checks for permissions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: special case Q_SYNC without device nameChristoph Hellwig2010-03-051-13/+27
| | | | | | | | | | The Q_SYNC command can be called without the path to a device, in which case it iterates over all superblocks. Special case this variant directly in sys_quotactl so that the other code always gets a superblock and doesn't need to deal with this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: clean up checks for supported quota methodsChristoph Hellwig2010-03-051-84/+37Star
| | | | | | | | Move the checks for sb->s_qcop->foo next to the actual calls for them, same for sb_has_quota_active checks where applicable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: split do_quotactlChristoph Hellwig2010-03-051-104/+146
| | | | | | | Split out a helper for each non-trivial command from do_quotactl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* quota: Fix warning when a delayed write happens before quota is enabledJan Kara2010-03-051-4/+31
| | | | | | | | | | | | | | | If a delayed-allocation write happens before quota is enabled, the kernel spits out a warning: WARNING: at fs/quota/dquot.c:988 dquot_claim_space+0x77/0x112() because the fact that user has some delayed allocation is not recorded in quota structure. Make dquot_initialize() update amount of reserved space for user if it sees inode has some space reserved. Also make sure that reserved quota space does not go negative and we warn about the filesystem bug just once. Signed-off-by: Jan Kara <jack@suse.cz>
* quota: manage reserved space when quota is not active [v2]Dmitry Monakhov2010-03-051-4/+6
| | | | | | | | | | | | | | | | | | | | | | | Since we implemented generic reserved space management interface, then it is possible to account reserved space even when quota is not active (similar to i_blocks/i_bytes). Without this patch following testcase result in massive comlain from WARN_ON in dquot_claim_space() TEST_CASE: mount /dev/sdb /mnt -oquota dd if=/dev/zero of=/mnt/test bs=1M count=1 quotaon /mnt # fs_reserved_spave == 1Mb # quota_reserved_space == 0, because quota was disabled dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1 # fs_reserved_spave == 2Mb # quota_reserved_space == 1Mb sync # ->dquot_claim_space() -> WARN_ON Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
* ext3: trivial quota cleanupDmitry Monakhov2010-03-051-54/+67
| | | | | | | | | | | The patch is aimed to reorganize and simplify quota code a bit. Quota code is itself complex enouth, but we can make it more readable in some places: - Move quota option parsing to separate functions. - Simplify old-quota and journaled-quota mix check. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
* ext3: mount flags manipulation cleanupDmitry Monakhov2010-03-051-27/+20Star
| | | | | | | | Replace intermediate EXT3_MOUNT_XXX flags manipulation to corresponding macro. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
* ext3: Use bitops to read/modify EXT3_I(inode)->i_stateJan Kara2010-03-053-18/+18
| | | | | | | | | At several places we modify EXT3_I(inode)->i_state without holding i_mutex (ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Signed-off-by: Jan Kara <jack@suse.cz>
* quota: Cleanup S_NOQUOTA handlingJan Kara2010-03-051-36/+11Star
| | | | | | | | | | | | | | | Cleanup handling of S_NOQUOTA inode flag and document it a bit. The flag does not have to be set under dqptr_sem. Only functions modifying inode's dquot pointers have to check the flag under dqptr_sem before going forward with the modification. This way we are sure that we cannot add new dquot pointers to the inode which is just becoming a quota file. The good thing about this cleanup is that there are no more places in quota code which enforce i_mutex vs. dqptr_sem lock ordering (in particular that dqptr_sem -> i_mutex of quota file). This should silence some (false) lockdep warnings with ext4 + quota and generally make life of some filesystems easier. Signed-off-by: Jan Kara <jack@suse.cz>
* Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-03-021-2/+2
|\ | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Revert "blkdev: fix merge_bvec_fn return value checks"
| * Revert "blkdev: fix merge_bvec_fn return value checks"Jens Axboe2010-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | This reverts commit 9f7cdbc33f36d28e57eaba0093f68f0d14c38c5b. It's causing oopses om dm setups, so revert it until we investigate. Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2010-03-021-3/+127
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1341 commits) virtio_net: remove forgotten assignment be2net: fix tx completion polling sis190: fix cable detect via link status poll net: fix protocol sk_buff field bridge: Fix build error when IGMP_SNOOPING is not enabled bnx2x: Tx barriers and locks scm: Only support SCM_RIGHTS on unix domain sockets. vhost-net: restart tx poll on sk_sndbuf full vhost: fix get_user_pages_fast error handling vhost: initialize log eventfd context pointer vhost: logging thinko fix wireless: convert to use netdev_for_each_mc_addr ethtool: do not set some flags, if others failed ipoib: returned back addrlen check for mc addresses netlink: Adding inode field to /proc/net/netlink axnet_cs: add new id bridge: Make IGMP snooping depend upon BRIDGE. bridge: Add multicast count/interval sysfs entries bridge: Add hash elasticity/max sysfs entries bridge: Add multicast_snooping sysfs toggle ... Trivial conflicts in Documentation/feature-removal-schedule.txt
| * Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2010-03-0196-1671/+1933
| |\ | | | | | | | | | | | | Conflicts: drivers/firmware/iscsi_ibft.c
| * | seq_file: add RCU versions of new hlist/list iterators (v3)stephen hemminger2010-02-231-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | Many usages of seq_file use RCU protected lists, so non RCU iterators will not work safely. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2010-02-17105-476/+888
| |\ \ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | | seq_file: Add helpers for iteration over a hlistLi Zefan2010-02-101-3/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some places in kernel need to iterate over a hlist in seq_file, so provide some common helpers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'for-2.6.34' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-03-012-11/+9Star
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block: (38 commits) block: don't access jiffies when initialising io_context cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds block: fix for "Consolidate phys_segment and hw_segment limits" cfq-iosched: quantum check tweak blktrace: perform cleanup after setup error blkdev: fix merge_bvec_fn return value checks cfq-iosched: requests "in flight" vs "in driver" clarification cciss: Fix problem with scatter gather elements in the scsi half of the driver cciss: eliminate unnecessary pointer use in cciss scsi code cciss: do not use void pointer for scsi hba data cciss: factor out scatter gather chain block mapping code cciss: fix scatter gather chain block dma direction kludge cciss: simplify scatter gather code cciss: factor out scatter gather chain block allocation and freeing cciss: detect bad alignment of scsi commands at build time cciss: clarify command list padding calculation cfq-iosched: rethink seeky detection for SSDs cfq-iosched: rework seeky detection block: remove padding from io_context on 64bit builds block: Consolidate phys_segment and hw_segment limits ...
| * | | blkdev: fix merge_bvec_fn return value checksDmitry Monakhov2010-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | merge_bvec_fn() returns bvec->bv_len on success. So we have to check against this value. But in case of fs_optimization merge we compare with wrong value. This patch must be included in b428cd6da7e6559aca69aa2e3a526037d3f20403 But accidentally i've forgot to add this in the initial patch. To make things straight let's replace all such checks. In fact this makes code easy to understand. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | block: Consolidate phys_segment and hw_segment limitsMartin K. Petersen2010-02-261-6/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | Merge branch 'master' into for-2.6.34Jens Axboe2010-02-251-1/+0Star
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/blkdev.h Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * \ \ \ Merge branch 'master' into for-2.6.34Jens Axboe2010-02-22158-1781/+2478
| |\ \ \ \
| * | | | | block: Stop using byte offsetsMartin K. Petersen2010-01-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers of the stacking functions use 512-byte sector units rather than byte offsets. Simplify the code so the stacking functions take sectors when specifying data offsets. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2010-02-283-2/+8
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits) rcu: Fix accelerated GPs for last non-dynticked CPU rcu: Make non-RCU_PROVE_LOCKING rcu_read_lock_sched_held() understand boot rcu: Fix accelerated grace periods for last non-dynticked CPU rcu: Export rcu_scheduler_active rcu: Make rcu_read_lock_sched_held() take boot time into account rcu: Make lockdep_rcu_dereference() message less alarmist sched, cgroups: Fix module export rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task information rcu: Fix rcutorture mod_timer argument to delay one jiffy rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detection rcu: Convert to raw_spinlocks rcu: Stop overflowing signed integers rcu: Use canonical URL for Mathieu's dissertation rcu: Accelerate grace period if last non-dynticked CPU rcu: Fix citation of Mathieu's dissertation rcu: Documentation update for CONFIG_PROVE_RCU security: Apply lockdep-based checking to rcu_dereference() uses idr: Apply lockdep-based diagnostics to rcu_dereference() uses radix-tree: Disable RCU lockdep checking in radix tree vfs: Abstract rcu_dereference_check for files-fdtable use ...
| * | | | | | vfs: Apply lockdep-based checking to rcu_dereference() usesPaul E. McKenney2010-02-253-2/+8
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add lockdep-ified RCU primitives to alloc_fd(), files_fdtable() and fcheck_files(). Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: Alexander Viro <viro@zeniv.linux.org.uk> LKML-Reference: <1266887105-1528-8-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | Remove EXPERIMENTAL from NFS_FSCACHEChristian Kujau2010-02-271-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE (and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel. However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen. As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that much any more, I propose the following patch to fs/nfs/Kconfig. I'd do the same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I feel it would not be appropriate for me to remove the flag. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5 [1] http://lkml.org/lkml/2010/1/23/145 Signed-off-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-02-278-58/+180
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use bastmode in debugfs output dlm: Send lockspace name with uevents dlm: send reply before bast dlm: fix ordering of bast and cast
| * | | | | | dlm: use bastmode in debugfs outputDavid Teigland2010-02-262-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bast mode that appears in the debugfs output should be useful on both master and process nodes. lkb_highbast is currently printed, and is only useful on the master node. lkb_bastmode is only useful on the process node. This patch sets lkb_bastmode on the master node as well, and uses that value in the debugfs print. Signed-off-by: David Teigland <teigland@redhat.com>
| * | | | | | dlm: Send lockspace name with ueventsSteven Whitehouse2010-02-261-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | | | | | dlm: send reply before bastDavid Teigland2010-02-261-26/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the lock master processes a successful operation (request, convert, cancel, or unlock), it will process the effects of the change before sending the reply for the operation. The "effects" of the operation are: - blocking callbacks (basts) for any newly granted locks - waiting or converting locks that can now be granted The cast is queued on the local node when the reply from the lock master is received. This means that a lock holder can receive a bast for a lock mode that is doesn't yet know has been granted. Signed-off-by: David Teigland <teigland@redhat.com>
| * | | | | | dlm: fix ordering of bast and castDavid Teigland2010-02-246-28/+78
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When both blocking and completion callbacks are queued for lock, the dlm would always deliver the completion callback (cast) first. In some cases the blocking callback (bast) is queued before the cast, though, and should be delivered first. This patch keeps track of the order in which they were queued and delivers them in that order. This patch also keeps track of the granted mode in the last cast and eliminates the following bast if the bast mode is compatible with the preceding cast mode. This happens when a remotely mastered lock is demoted, e.g. EX->NL, in which case the local node queues a cast immediately after sending the demote message. In this way a cast can be queued for a mode, e.g. NL, that makes an in-transit bast extraneous. Signed-off-by: David Teigland <teigland@redhat.com>
* | | | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-02-2779-1596/+1666
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: (52 commits) fs/xfs: Correct NULL test xfs: optimize log flushing in xfs_fsync xfs: only clear the suid bit once in xfs_write xfs: kill xfs_bawrite xfs: log changed inodes instead of writing them synchronously xfs: remove invalid barrier optimization from xfs_fsync xfs: kill the unused XFS_QMOPT_* flush flags V2 xfs: Use delay write promotion for dquot flushing xfs: Sort delayed write buffers before dispatch xfs: Don't issue buffer IO direct from AIL push V2 xfs: Use delayed write for inodes rather than async V2 xfs: Make inode reclaim states explicit xfs: more reserved blocks fixups xfs: turn off sign warnings xfs: don't hold onto reserved blocks on remount,ro xfs: quota limit statvfs available blocks xfs: replace KM_LARGE with explicit vmalloc use xfs: cleanup up xfs_log_force calling conventions xfs: kill XLOG_VEC_SET_TYPE xfs: remove duplicate buffer flags ...
| * \ \ \ \ \ Merge branch 'linux-2.6.33'Alex Elder2010-02-26120-709/+1281
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | |
| * | | | | | fs/xfs: Correct NULL testJulia Lawall2010-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test the value that was just allocated rather than the previously tested one. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ expression *x; expression e; identifier l; @@ if (x == NULL || ...) { ... when forall return ...; } ... when != goto l; when != x = e when != &x *x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | xfs: optimize log flushing in xfs_fsyncChristoph Hellwig2010-02-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a pinned inode it must have a log item attached to it. Usually that log item will have ili_last_lsn already set, in which case we only need to flush the log up to that LSN instead of doing a full log force. This gives speedups of about 5% in some fsync heavy workloads. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | xfs: only clear the suid bit once in xfs_writeChristoph Hellwig2010-02-123-55/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | file_remove_suid already calls into ->setattr to clear the suid and sgid bits if needed, no need to start a second transaction to do it ourselves. Note that xfs_write_clear_setuid issues a sync transaction while the path through ->setattr doesn't, but that is consistant with the other filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | xfs: kill xfs_bawriteDave Chinner2010-02-042-20/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no more users of this function left in the XFS code now that we've switched everything to delayed write flushing. Remove it. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | | | | | xfs: log changed inodes instead of writing them synchronouslyChristoph Hellwig2010-02-091-29/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an inode has already be flushed delayed write, xfs_inode_clean() returns true and hence xfs_fs_write_inode() can return on a synchronous inode write without having written the inode. Currently these sycnhronous writes only come sync(1), unmount, a sycnhronous NFS export and cachefiles so should be relatively rare and out of common performance paths. Realistically, a synchronous inode write is not necessary here; we can avoid writing the inode by logging any non-transactional changes that are pending. This needs to be done with synchronous transactions, but it avoids seeking between the log and inode clusters as we do now. We don't force the log if the inode is pinned, though, so this differs from the fsync case. For normal sys_sync and unmount behaviour this is fine because we do a synchronous log force in xfs_sync_data which is called from the ->sync_fs code. It does however break the NFS synchronous export guarantees for now, but work is under way to fix this at a higher level or for the higher level to provide an additional flag in the writeback control to tell us that a log force is needed. Portions of this patch are based on work from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | xfs: remove invalid barrier optimization from xfs_fsyncChristoph Hellwig2010-02-021-10/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always need to flush the disk write cache and can't skip it just because the no inode attributes have changed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | | xfs: kill the unused XFS_QMOPT_* flush flags V2Dave Chinner2010-02-034-23/+14Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dquots are never flushed asynchronously. Remove the flag and the async write support from the flush function. Make the default flush a delwri flush to make the inode flush code, which leaves the XFS_QMOPT_SYNC the only flag remaining. Convert that to use SYNC_WAIT instead, just like the inode flush code. V2: - just pass flush flags straight through Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | | | | | xfs: Use delay write promotion for dquot flushingDave Chinner2010-01-261-15/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item pushing used to do to flush out delayed write dquot buffers. Change it to use the new promotion method rather than an async flush. Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock held, yet the callers make the assumption that after this call the flush lock is held. Always return with the flush lock held. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | | | | | xfs: Sort delayed write buffers before dispatchDave Chinner2010-01-261-27/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the xfsbufd writes delayed write buffers, it pushes them to disk in the order they come off the delayed write list. If there are lots of buffers ѕpread widely over the disk, this results in overwhelming the elevator sort queues in the block layer and we end up losing the posibility of merging adjacent buffers to minimise the number of IOs. Use the new generic list_sort function to sort the delwri dispatch queue before issue to ensure that the buffers are pushed in the most friendly order possible to the lower layers. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>