summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/dir.c
Commit message (Collapse)AuthorAgeFilesLines
* GFS2: Make rename not save dirent locationBob Peterson2014-10-011-2/+7
| | | | | | | | | | | | | | | | | | This patch fixes a regression in the patch "GFS2: Remember directory insert point", commit 2b47dad866d04f14c328f888ba5406057b8c7d33. The problem had to do with the rename function: The function found space for the new dirent, and remembered that location. But then the old dirent was removed, which often moved the eligible location for the renamed dirent. Putting the new dirent at the saved location caused file system corruption. This patch adds a new "save_loc" variable to struct gfs2_diradd. If 1, the dirent location is saved. If 0, the dirent location is not saved and the buffer_head is released as per previous behavior. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use pr_<level> more consistentlyJoe Perches2014-03-071-6/+8
| | | | | | | | | | | | | | | Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: global conversion to pr_foo()Fabian Frederick2014-03-061-4/+4
| | | | | | | | | | -All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add meta readahead field in directory entriesSteven Whitehouse2014-02-071-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent of this new field in the directory entry is to allow a subsequent lookup to know how many blocks, which are contiguous with the inode, contain metadata which relates to the inode. This will then allow the issuing of a single read to read these blocks, rather than reading the inode first, and then issuing a second read for the metadata. This only works under some fairly strict conditions, since we do not have back pointers from inodes to directory entries we must ensure that the blocks referenced in this way will always belong to the inode. This rules out being able to use this system for indirect blocks, as these can change as a result of truncate/rewrite. So the idea here is to restrict this to xattr blocks only for the time being. For most inodes, that means only a single block. Also, when using ACLs and/or SELinux or other LSMs, these will be added at inode creation time so that they will be contiguous with the inode on disk and also will almost always be needed when we read the inode in for permissions checks. Once an xattr block for an inode is allocated, it will never change until the inode is deallocated. This patch adds the new field, a further patch will add the readahead in due course. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add hints to directory leaf blocksSteven Whitehouse2014-01-081-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds four new fields to directory leaf blocks. The intent is not to use them in the kernel itself, although perhaps we may be able to use them as hints at some later date, but instead to provide more information for debug/fsck use. One new field adds a pointer to the inode to which the leaf belongs. This can be useful if the pointer to the leaf block has become corrupt, as it will allow us to know which inode this block should be associated with. This field is set when the leaf is created and never changed over its lifetime. The second field is a "distance from the hash table" field. The meaning is as follows: 0 = An old leaf in which this value has not been set 1 = This leaf is pointed to directly from the hash table 2+ = This leaf is part of a chain, pointed to by another leaf block, the value gives the position in the chain. The third and fourth fields combine to give a time stamp of the most recent directory insertion or deletion from this leaf block. The time stamp is not updated when a new leaf block is chained from the current one. The code is currently written such that the timestamp on the dir inode will match that of the leaf block for the most recent insertion/deletion. For backwards compatibility, any of these new fields which is zero should be considered to be "unknown". Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: For exhash conversion, only one block is neededSteven Whitehouse2014-01-081-0/+5
| | | | | | | | | | | | | | | | | For most cases, only a single new block is needed when we reach the point of converting from stuffed to exhash directory. The exception being when the file name is so long that it will not fit within the new leaf block. So this patch adds a simple test for that situation so that we do not need to request the full reservation size in this case. Potentially we could calculate more accurately the value to use in other cases too, but that is much more complicated to do and it is doubtful that the benefit would outweigh the extra cost in code complexity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remember directory insert pointSteven Whitehouse2014-01-061-10/+23
| | | | | | | | | | | | | When we look to see if there is enough space to add a dir entry without allocation, we have then been repeating the same search later when we do the actual insertion. This patch caches the details of the location in the gfs2_diradd structure, so that we do not have to repeat the search. This will provide a performance improvement which will be greater as the size of the directory increases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add directory addition info structureSteven Whitehouse2014-01-061-3/+9
| | | | | | | | | | | | | | | | | | | | | The intent is that this structure will hold the information required when adding entries to a directory (linking). To start with, it will contain only the number of blocks which are required to link the new entry into the directory. The current calculation returns either 0 or the maximim number of blocks that can ever be requested by such a transaction. The intent is that in a later patch, we can update the dir code to calculate this value more accurately. In addition further patches will also add further fields to the new structure to increase its utility. In addition this patch fixes a bug where the link used during inode creation was adding requesting too many blocks in some cases. This is harmless unless the fs is close to being full. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacksJoe Perches2013-08-201-1/+1
| | | | | | | | | | | Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds2013-07-021-11/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull GFS2 updates from Steven Whitehouse: "There are a few bug fixes for various, mostly very minor corner cases, plus some interesting new features. The new features include atomic_open whose main benefit will be the reduction in locking overhead in case of combined lookup/create and open operations, sorting the log buffer lists by block number to improve the efficiency of AIL writeback, and aggressively issuing revokes in gfs2_log_flush to reduce overhead when dropping glocks." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Reserve journal space for quota change in do_grow GFS2: Fix fstrim boundary conditions GFS2: fix warning message GFS2: aggressively issue revokes in gfs2_log_flush GFS2: fix regression in dir_double_exhash GFS2: Add atomic_open support GFS2: Only do one directory search on create GFS2: fix error propagation in init_threads() GFS2: Remove no-op wrapper function GFS2: Cocci spatch "ptr_ret.spatch" GFS2: Eliminate gfs2_rg_lops GFS2: Sort buffer lists by inplace block number
| * GFS2: fix regression in dir_double_exhashBob Peterson2013-06-141-1/+2
| | | | | | | | | | | | | | | | | | Recent commit e8830d8 introduced a bug in function dir_double_exhash; it was failing to set h in the fall-back case. This patch corrects it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Only do one directory search on createSteven Whitehouse2013-06-111-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Creation of a new inode requires a directory search in order to ensure that we are not trying to create an inode with the same name as an existing one. This was hidden away inside the create_ok() function. In the case that there was an existing inode, and a lookup can be substituted for a create (which is the case with regular files when the O_EXCL flag is not in use) then we were doing a second lookup in order to return the inode. This patch merges these two lookups into one. This can be done by passing a flag to gfs2_dir_search() to tell it to just return -EEXIST in the cases where we don't actually want to look up the inode. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | [readdir] convert gfs2Al Viro2013-06-291-32/+24Star
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* GFS2: Fall back to vmalloc if kmalloc fails for dir hash tablesBob Peterson2013-06-031-10/+33
| | | | | | | | | | | | | | | | This version has one more correction: the vmalloc calls are replaced by __vmalloc calls to preserve the GFP_NOFS flag. When GFS2's directory management code allocates buffers for a directory hash table, if it can't get the memory it needs, it currently gives a bad return code. Rather than giving an error, this patch allows it to use virtual memory rather than kernel memory for the hash table. This should make it possible for directories to function properly, even when kernel memory becomes very fragmented. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-02-261-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
| * gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGEEric W. Biederman2013-02-131-1/+1
| | | | | | | | | | | | | | | | Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE so the constants may be well typed. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | GFS2: Split gfs2_trans_add_bh() into twoSteven Whitehouse2013-01-291-15/+15
|/ | | | | | | | | | | | | | There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use dirty_inode in gfs2_dir_addBob Peterson2012-11-131-6/+1Star
| | | | | | | | | | This patch changes the gfs2_dir_add function so that it uses the dirty_inode function (via mark_inode_dirty) rather than manually updating the dinode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fold quota data into the reservations structBob Peterson2012-06-061-8/+1Star
| | | | | | | | | | This patch moves the ancillary quota data structures into the block reservations structure. This saves GFS2 some time and effort in allocating and deallocating the qadata structure. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* vfs: make it possible to access the dentry hash/len as one 64-bit entryLinus Torvalds2012-05-111-1/+1
| | | | | | | | | | | | | | | This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* GFS2: Make sure rindex is uptodate before starting transactionsBob Peterson2012-04-051-0/+4
| | | | | | | | | This patch removes the call from gfs2_blk2rgrd to function gfs2_rindex_update and replaces it with individual calls. The former way turned out to be too problematic. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: decouple quota allocations from block allocationsBob Peterson2011-11-221-2/+2
| | | | | | | | | | | | This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: move toward a generic multi-block allocatorBob Peterson2011-11-211-1/+1
| | | | | | | | | | | | | | | | | | | | | This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: combine gfs2_alloc_block and gfs2_alloc_diBob Peterson2011-11-151-1/+1
| | | | | | | | | | | | | GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: f_ra is always valid in dir readahead functionSteven Whitehouse2011-11-091-4/+6
| | | | | | | As a result, we don't need to test it each time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
* GFS2: Add readahead to sequential directory traversalBob Peterson2011-11-081-3/+53
| | | | | | | | | | | | | This patch adds read-ahead capability to GFS2's directory hash table management. It greatly improves performance for some directory operations. For example: In one of my file systems that has 1000 directories, each of which has 1000 files, time to execute a recursive ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced from 2m2.814s on a stock kernel to 0m45.938s. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use cached rgrp in gfs2_rlist_add()Steven Whitehouse2011-10-211-1/+1
| | | | | | | | | | | Each block which is deallocated, requires a call to gfs2_rlist_add() and each of those calls was calling gfs2_blk2rgrpd() in order to figure out which rgrp the block belonged in. This can be speeded up by making use of the rgrp cached in the inode. We also reset this cached rgrp in case the block has changed rgrp. This should provide a big reduction in gfs2_blk2rgrpd() calls during deallocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Make resource groups "append only" during life of fsSteven Whitehouse2011-10-211-6/+0Star
| | | | | | | | | | | | | | | | | | | | | Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use ->dirty_inode()Steven Whitehouse2011-10-211-9/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | The aim of this patch is to use the newly enhanced ->dirty_inode() super block operation to deal with atime updates, rather than piggy backing that code into ->write_inode() as is currently done. The net result is a simplification of the code in various places and a reduction of the number of gfs2_dinode_out() calls since this is now implied by ->dirty_inode(). Some of the mark_inode_dirty() calls have been moved under glocks in order to take advantage of then being able to avoid locking in ->dirty_inode() when we already have suitable locks. One consequence is that generic_write_end() now correctly deals with file size updates, so that we do not need a separate check for that afterwards. This also, indirectly, means that fdatasync should work correctly on GFS2 - the current code always syncs the metadata whether it needs to or not. Has survived testing with postmark (with and without atime) and also fsx. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Clean up dir hash table readingSteven Whitehouse2011-10-211-23/+9Star
| | | | | | | | | Since there is now only a single caller to gfs2_dir_read_data() and it has a number of constant arguments, we can factor those out. Also some tests relating to the inode size were being done twice. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Cache dir hash table in a contiguous bufferSteven Whitehouse2011-07-151-112/+109Star
| | | | | | | | | | | | | | | | | | This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: When adding a new dir entry, inc link count if it is a subdirSteven Whitehouse2011-05-091-2/+4
| | | | | | | | | | | | This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Make gfs2_dir_del update link count when requiredSteven Whitehouse2011-05-091-1/+4
| | | | | | | | | | | | | | | | | When we remove an entry from a directory, we can save ourselves some trouble if we know the type of the entry in question, since if it is itself a directory, we can update the link count of the parent at the same time as removing the directory entry. In addition this patch also merges the rmdir and unlink code which was almost identical anyway. This eliminates the calls to remove the . and .. directory entries on each rmdir (not needed since the directory will be deallocated, anyway) which was the only thing preventing passing the dentry to gfs2_dir_del(). The passing of the dentry rather than just the name allows us to figure out the type of the entry which is being removed, and thus adjust the link count when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: move function foreach_leaf to gfs2_dir_exhash_deallocBob Peterson2011-04-201-81/+65Star
| | | | | | | | | The previous patches made function gfs2_dir_exhash_dealloc do nothing but call function foreach_leaf. This patch simplifies the code by moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: pass leaf_bh into leaf_deallocBob Peterson2011-04-201-11/+24
| | | | | | | | | | Function foreach_leaf used to look up the leaf block address and get a buffer_head. Then it would call leaf_dealloc which did the same lookup. This patch combines the two operations by making foreach_leaf pass the leaf bh to leaf_dealloc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Combine transaction from gfs2_dir_exhash_deallocBob Peterson2011-04-201-35/+14Star
| | | | | | | | | | | At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode type to "file" to prevent directory corruption in case of a crash. It was doing so in its own journal transaction. This patch makes the change occur when the last call is make to leaf_dealloc, since it needs to rewrite the directory dinode at that time anyway. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: remove *leaf_call_t and simplify leaf_deallocBob Peterson2011-04-201-8/+6Star
| | | | | | | | | | | Since foreach_leaf is only called with leaf_dealloc as its only possible call function, we can simplify the code by making it call leaf_dealloc directly. This simplifies the code and eliminates the need for leaf_call_t, the generic call method. This is a first small step in simplifying the directory leaf deallocation code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: filesystem hang caused by incorrect lock orderBob Peterson2011-04-181-1/+1
| | | | | | | | | | | | | | This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Make . and .. qstrs constantSteven Whitehouse2010-09-201-0/+3
| | | | | | | | | Rather than calculating the qstrs for . and .. each time we need them, its better to keep a constant version of these and just refer to them when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>
* GFS2: Remove i_disksizeSteven Whitehouse2010-09-201-13/+15
| | | | | | | | | With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: remove dependency on __GFP_NOFAILDavid Rientjes2010-07-291-2/+9
| | | | | | | | | The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable, so remove __GFP_NOFAIL from their masks. Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use kmalloc when possible for ->readdir()Steven Whitehouse2010-07-281-6/+25
| | | | | | | | | | | | | | | | | | | | | | | If we don't need a huge amount of memory in ->readdir() then we can use kmalloc rather than vmalloc to allocate it. This should cut down on the greater overheads associated with vmalloc for smaller directories. We may be able to eliminate vmalloc entirely at some stage, but this is easy to do right away. Also using GFP_NOFS to avoid any issues wrt to deleting inodes while under a glock, and suggestion from Linus to factor out the alloc/dealloc. I've given this a test with a variety of different sized directories and it seems to work ok. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* GFS2: rename causes kernel OopsBob Peterson2010-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: glock livelockBob Peterson2010-04-141-1/+1
| | | | | | | | | | | | | | This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* GFS2: Remove dirent_first() functionSteven Whitehouse2009-12-031-33/+1Star
| | | | | | | | | | This function only had one caller left, and that caller only called it for leaf blocks, hence one branch of the "if" was never taken. In addition the call to get_left had already verified the metadata type, so the function can be reduced to a single line of code in its caller. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Improve resource group error handlingSteven Whitehouse2009-05-201-2/+9
| | | | | | | | | | | | | | | This patch improves the error handling in the case where we discover that the summary information in the resource group doesn't match the bitmap information while in the process of allocating blocks. Originally this resulted in a kernel bug, but this patch changes that so that we return -EIO and print some messages explaining what went wrong, and how to fix it. We also remember locally not to try and allocate from the same rgrp again, so that a subsequent allocation in a different rgrp should succeed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge lock_dlm module into GFS2Steven Whitehouse2009-03-241-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Banish struct gfs2_dinode_hostSteven Whitehouse2009-01-051-8/+8
| | | | | | | | | | | | The final field in gfs2_dinode_host was the i_flags field. Thats renamed to i_diskflags in order to avoid confusion with the existing inode flags, and moved into the inode proper at a suitable location to avoid creating a "hole". At that point struct gfs2_dinode_host is no longer needed and as promised (quite some time ago!) it can now be removed completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksizeSteven Whitehouse2009-01-051-13/+13
| | | | | | | This patch moved the i_size field from the gfs2_dinode_host and following the ext3 convention renames it i_disksize. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Move "entries" into "proper" inodeSteven Whitehouse2009-01-051-10/+10
| | | | | | | | | This moves the directory entry count into the proper inode. Potentially we could get this to share the space used by something else in the future, but this is one more step on the way to removing the gfs2_dinode_host structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>