summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* GFS2: Add S_NOSEC supportSteven Whitehouse2011-07-153-2/+6
| | | | | | | | | | | This adds S_NOSEC support to GFS2. We set/reset the flag either when a user calls setattr or when we have just regained the glock from another node. The flag is only set if there are no xattrs on the inode and there is no suid bit set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
* GFS2: Automatically adjust glock min hold timeBob Peterson2011-07-154-13/+36
| | | | | | | | This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Cache dir hash table in a contiguous bufferSteven Whitehouse2011-07-156-113/+115
| | | | | | | | | | | | | | | | | | This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2011-07-147-10/+50
|\ | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Resolve inode eviction and ail list interaction bug GFS2: Fix race during filesystem mount GFS2: force a log flush when invalidating the rindex glock
| * GFS2: Resolve inode eviction and ail list interaction bugSteven Whitehouse2011-07-144-8/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>
| * GFS2: Fix race during filesystem mountSteven Whitehouse2011-07-123-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a potential race during filesystem mounting which has recently been reported. It occurs when the userland gfs_controld is able to process requests fast enough that it tries to use the sysfs interface before the lock module is properly initialised. This is a pretty unusual case as normally the lock module initialisation is very quick compared with gfs_controld. This patch adds an interruptible completion which is used to ensure that userland will wait for the initialisation of the lock module to complete. There are other potential solutions to this problem, but this is the quickest at this stage and has been tested both with and without mount.gfs2 present in the system. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Booher <dbooher@adams.net>
| * GFS2: force a log flush when invalidating the rindex glockBenjamin Marzinski2011-07-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, there is nothing that forces the log to get flushed when a node drops its rindex glock so that another node can grow the filesystem. If the log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the following way. A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is dropped so the other node can grow the filesystem. When the node reacquires the rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being removed from the list by gfs2_log_flush(). This code simply forces a log flush when the rindex glock is invalidated, solving the problem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2011-07-133-2/+3
|\ \ | |/ |/| | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix use of static variable in rpcb_getport_async NFSv4.1: update nfs4_fattr_bitmap_maxsz SUNRPC: Fix a race between work-queue and rpc_killall_tasks pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
| * NFSv4.1: update nfs4_fattr_bitmap_maxszAndy Adamson2011-07-121-1/+1
| | | | | | | | | | | | | | | | | | Attribute IDs assigned in RFC 5661 now require three bitmaps. Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Signed-off-by: Andy Adamson <andros@netapp.com> Cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfs: write: Set mds_offset in the generic layer - it is needed by all LDsBoaz Harrosh2011-06-282-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current pnfs tree, all the layouts set mds_offset in their .write_pagelist member. mds_offset is only used by generic layer and should be handled by it. This patch is for upstream. It is needed in this -rc series to fix a bug in objects layout_commit. I'll send patches for objects and blocks to be squashed into current pnfs tree. TODO: It looks like the read path needs the same patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2011-07-113-57/+73
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: drop spinlock before calling cifs_put_tlink cifs: fix expand_dfs_referral cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALL cifs: factor smb_vol allocation out of cifs_setup_volume_info cifs: have cifs_cleanup_volume_info not take a double pointer cifs: fix build_unc_path_to_root to account for a prefixpath cifs: remove bogus call to cifs_cleanup_volume_info
| * | cifs: drop spinlock before calling cifs_put_tlinkJeff Layton2011-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | ...as that function can sleep. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: fix expand_dfs_referralJeff Layton2011-07-091-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Regression introduced in commit 724d9f1cfba. Prior to that, expand_dfs_referral would regenerate the mount data string and then call cifs_parse_mount_options to re-parse it (klunky, but it worked). The above commit moved cifs_parse_mount_options out of cifs_mount, so the re-parsing of the new mount options no longer occurred. Fix it by making expand_dfs_referral re-parse the mount options. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: move bdi_setup_and_register outside of CONFIG_CIFS_DFS_UPCALLJeff Layton2011-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This needs to be done regardless of whether that KConfig option is set or not. Reported-by: Sven-Haegar Koch <haegar@sdinet.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: factor smb_vol allocation out of cifs_setup_volume_infoJeff Layton2011-07-083-33/+33
| | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: have cifs_cleanup_volume_info not take a double pointerJeff Layton2011-07-063-9/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | ...as that makes for a cumbersome interface. Make it take a regular smb_vol pointer and rely on the caller to zero it out if needed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: fix build_unc_path_to_root to account for a prefixpathJeff Layton2011-07-061-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | Regression introduced by commit f87d39d9513. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: remove bogus call to cifs_cleanup_volume_infoJeff Layton2011-07-061-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's called the following call to cifs_get_tcp_session will oops as the volume_info pointer will then be NULL. The caller of cifs_mount should clean up this data since it passed it in. There's no need for us to call this here. Regression introduced by commit 724d9f1cfba. Reported-by: Adam Williamson <awilliam@redhat.com> Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-094-6/+20
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix oops when doing space balance Btrfs: don't panic if we get an error while balancing V2 btrfs: add missing options displayed in mount output
| * | | btrfs: fix oops when doing space balanceMiao Xie2011-07-071-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: don't panic if we get an error while balancing V2Josef Bacik2011-07-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported an error where if we try to balance an fs after a device has been removed it will blow up. This is because we get an EIO back and this is where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks, Reported-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | btrfs: add missing options displayed in mount outputDavid Sterba2011-07-072-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-07-082-8/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: unpin stale inodes directly in IOP_COMMITTED
| * | | | xfs: unpin stale inodes directly in IOP_COMMITTEDDave Chinner2011-07-062-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When inodes are marked stale in a transaction, they are treated specially when the inode log item is being inserted into the AIL. It tries to avoid moving the log item forward in the AIL due to a race condition with the writing the underlying buffer back to disk. The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes in the AIL"). To avoid moving the item forward, we return a LSN smaller than the commit_lsn of the completing transaction, thereby trying to trick the commit code into not moving the inode forward at all. I'm not sure this ever worked as intended - it assumes the inode is already in the AIL, but I don't think the returned LSN would have been small enough to prevent moving the inode. It appears that the reason it worked is that the lower LSN of the inodes meant they were inserted into the AIL and flushed before the inode buffer (which was moved to the commit_lsn of the transaction). The big problem is that with delayed logging, the returning of the different LSN means insertion takes the slow, non-bulk path. Worse yet is that insertion is to a position -before- the commit_lsn so it is doing a AIL traversal on every insertion, and has to walk over all the items that have already been inserted into the AIL. It's expensive. To compound the matter further, with delayed logging inodes are likely to go from clean to stale in a single checkpoint, which means they aren't even in the AIL at all when we come across them at AIL insertion time. Hence these were all getting inserted into the AIL when they simply do not need to be as inodes marked XFS_ISTALE are never written back. Transactional/recovery integrity is maintained in this case by the other items in the unlink transaction that were modified (e.g. the AGI btree blocks) and committed in the same checkpoint. So to fix this, simply unpin the stale inodes directly in xfs_inode_item_committed() and return -1 to indicate that the AIL insertion code does not need to do any further processing of these inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | | FS-Cache: Add a helper to bulk uncache pages on an inodeDavid Howells2011-07-073-5/+48
| |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | FDPIC: Fix memory leakDavidlohr Bueso2011-07-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The shdr4extnum variable isn't being freed in the cleanup process of elf_fdpic_core_dump(). Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | fs: fix lock initializationMiklos Szeredi2011-07-061-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | locks_alloc_lock() assumed that the allocated struct file_lock is already initialized to zero members. This is only true for the first allocation of the structure, after reuse some of the members will have random values. This will for example result in passing random fl_start values to userspace in fuse for FL_FLOCK locks, which is an information leak at best. Fix by reinitializing those members which may be non-zero after freeing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-051-8/+6Star
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix sync and dio writes across stripe boundaries libceph: fix page calculation for non-page-aligned io ceph: fix page alignment corrections
| * | | | ceph: fix sync and dio writes across stripe boundariesSage Weil2011-06-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were iterating across stripe boundaries properly, but not moving the write buffer pointer forward. This caused us to rewrite the same data after the break. Fix by adjusting the data pointer forward, and recalculating the io and buffer alignment after the break. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | | ceph: fix page alignment correctionsSage Weil2011-06-141-5/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1 Reported-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-052-3/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: Fix double iput of the same inode in hfsplus_fill_super() hfsplus: add missing call to bio_put()
| * | | | | hfsplus: Fix double iput of the same inode in hfsplus_fill_super()Alexey Khoroshilov2011-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a misprint in resource deallocation code on error path in hfsplus_fill_super(): the sbi->alloc_file inode is iput twice, while the root inode in not iput at all. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | hfsplus: add missing call to bio_put()Seth Forshee2011-06-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hfsplus leaks bio objects by failing to call bio_put() on the bios it allocates. Add the missing call to fix the leak. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: <stable@kernel.org> # .38.x, .39.x Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | | | cifs: set socket send and receive timeouts before attempting connectJeff Layton2011-07-011-8/+8
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benjamin S. reported that he was unable to suspend his machine while it had a cifs share mounted. The freezer caused this to spew when he tried it: -----------------------[snip]------------------ PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 Call Trace: [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f [<ffffffff811e81c7>] ? release_sock+0x19/0xef [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffff810444a1>] ? kthread+0x7a/0x82 [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81044427>] ? kthread+0x0/0x82 [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 Restarting tasks ... done. -----------------------[snip]------------------ We do attempt to perform a try_to_freeze in cifs_reconnect, but the connection attempt itself seems to be taking longer than 20s to time out. The connect timeout is governed by the socket send and receive timeouts, so we can shorten that period by setting those timeouts before attempting the connect instead of after. Adam Williamson tested the patch and said that it seems to have fixed suspending on his laptop when a cifs share is mounted. Reported-by: Benjamin S <da_joind@gmx.net> Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | / proc: restrict access to /proc/PID/ioVasiliy Kulikov2011-06-281-2/+5
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: fix assertion mapping->nrpages == 0 in end_writeback()Jan Kara2011-06-281-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under heavy memory and filesystem load, users observe the assertion mapping->nrpages == 0 in end_writeback() trigger. This can be caused by page reclaim reclaiming the last page from a mapping in the following race: CPU0 CPU1 ... shrink_page_list() __remove_mapping() __delete_from_page_cache() radix_tree_delete() evict_inode() truncate_inode_pages() truncate_inode_pages_range() pagevec_lookup() - finds nothing end_writeback() mapping->nrpages != 0 -> BUG page->mapping = NULL mapping->nrpages-- Fix the problem by doing a reliable check of mapping->nrpages under mapping->tree_lock in end_writeback(). Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out by Miklos Szeredi <mszeredi@suse.de>. Cc: Jay <jinshan.xiong@whamcloud.com> Cc: Miklos Szeredi <mszeredi@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | romfs: fix romfs_get_unmapped_area() argument checkBob Liu2011-06-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | romfs_get_unmapped_area() checks argument `len' without considering PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after commit f67d9b1576c ("nommu: add page_align to mmap"). Fix the check by changing it in same way ramfs_nommu_get_unmapped_area() was changed in ramfs/file-nommu.c. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-275-31/+100
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes
| * | | btrfs: fix inconsonant inode informationMiao Xie2011-06-273-26/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: make sure to update total_bitmaps when freeing cache V3Josef Bacik2011-06-251-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: fix type mismatch in find_free_extent()Ilya Dryomov2011-06-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data parameter should be u64 because a full-sized chunk flags field is passed instead of 0/1 for distinguishing data from metadata. All underlying functions expect u64. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | Btrfs: make sure to record the transid in new inodesChris Mason2011-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2011-06-274-6/+31
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent bogus assert when trying to remove non-existent attribute xfs: clear XFS_IDIRTY_RELEASE on truncate down xfs: reset inode per-lifetime state when recycling it
| * | | xfs: prevent bogus assert when trying to remove non-existent attributeDave Chinner2011-06-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the attribute fork on an inode is in btree format and has multiple levels (i.e node format rather than leaf format), then a lookup failure will trigger an assert failure in xfs_da_path_shift if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to indicate to the directory btree code that not finding an entry is not a fatal error. In the case of doing a lookup for a directory name removal, this is valid as a user cannot insert an arbitrary name to remove from the directory btree. However, in the case of the attribute tree, a user has direct control over the attribute name and can ask for any random name to be removed without any validation. In this case, fsstress is asking for a non-existent user.selinux attribute to be removed, and that is causing xfs_da_path_shift() to fall off the bottom of the tree where it asserts that a lookup failure is allowed. Because the flag is not set, we die a horrible death on a debug enable kernel. Prevent this assert from firing on attribute removes by adding the op_flag XFS_DA_OP_OKNOENT to atribute removal operations. Discovered when testing on a SELinux enabled system by fsstress in test 070 by trying to remove a non-existent user.selinux attribute. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | xfs: clear XFS_IDIRTY_RELEASE on truncate downDave Chinner2011-06-241-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an inode is truncated down, speculative preallocation is removed from the inode. This should also reset the state bits for controlling whether preallocation is subsequently removed when the file is next closed. The flag is not being cleared, so repeated operations on a file that first involve a truncate (e.g. multiple repeated dd invocations on a file) give different file layouts for the second and subsequent invocations. Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure that speculative delalloc is removed on files that have been truncated down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | xfs: reset inode per-lifetime state when recycling itDave Chinner2011-06-242-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2011-06-272-5/+3Star
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN cifs: free blkcipher in smbhash
| * | | | cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKENJeff Layton2011-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not work properly with CIFS as current servers do not enable support for the FILE_OPEN_BY_FILE_ID on SMB NTCreateX and not all NFS clients handle ESTALE. For now, it just plain doesn't work. Mark it BROKEN to discourage distros from enabling it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | cifs: free blkcipher in smbhashJeff Layton2011-06-241-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is currently leaked in the rc == 0 case. Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-274-116/+98Star
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: cifs: propagate errors from cifs_get_root() to mount(2) cifs: tidy cifs_do_mount() up a bit cifs: more breakage on mount failures cifs: close sget() races cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() cifs: move cifs_umount() call into ->kill_sb() cifs: pull cifs_mount() call up sanitize cifs_umount() prototype cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() cifs: allocate mountdata earlier cifs: leak on mount if we share superblock cifs: don't pass superblock to cifs_mount() cifs: don't leak nls on mount failure cifs: double free on mount failure take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Steve French <smfrench@gmail.com>