summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* vfs: missed source of ->f_pos racesAl Viro2012-08-201-2/+8
| | | | | | | | | compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'vfs-fixes' of ↵Linus Torvalds2012-08-183-5/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull vfs fixes from Miklos Szeredi. This mainly fixes some confusion about whether the open 'mode' variable passed around should contain the full file type (S_IFREG etc) information or just the permission mode. In particular, the lack of proper file type information had confused fuse. * 'vfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: fix propagation of atomic_open create error on negative dentry fuse: check create mode in atomic open vfs: pass right create mode to may_o_create() vfs: atomic_open(): fix create mode usage vfs: canonicalize create mode in build_open_flags()
| * vfs: fix propagation of atomic_open create error on negative dentrySage Weil2012-08-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If ->atomic_open() returns -ENOENT, we take care to return the create error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1 and provides a negative dentry. This fixes a regression where an unprivileged open O_CREAT fails with ENOENT instead of EACCES, introduced with the new atomic_open code. It is tested by the open/08.t test in the pjd posix test suite, and was observed on top of fuse (backed by ceph-fuse). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: check create mode in atomic openMiklos Szeredi2012-08-151-0/+3
| | | | | | | | | | | | | | | | | | | | Verify that the VFS is passing us a complete create mode with the S_IFREG to atomic open. Reported-by: Steve <steveamigauk@yahoo.co.uk> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>
| * vfs: pass right create mode to may_o_create()Miklos Szeredi2012-08-151-1/+1
| | | | | | | | | | | | | | Pass the umask-ed create mode to may_o_create() instead of the original one. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>
| * vfs: atomic_open(): fix create mode usageMiklos Szeredi2012-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | Don't mask S_ISREG off the create mode before passing to ->atomic_open(). Other methods (->create, ->mknod) also get the complete file mode and filesystems expect it. Reported-by: Steve <steveamigauk@yahoo.co.uk> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>
| * vfs: canonicalize create mode in build_open_flags()Miklos Szeredi2012-08-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> CC: stable@vger.kernel.org
* | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2012-08-175-27/+46
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "The following are all bug fixes and regressions. The most notable are the ones which cause problems for ext4 on RAID --- a performance problem when mounting very large filesystems, and a kernel OOPS when doing an rm -rf on large directory hierarchies on fast devices." * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix kernel BUG on large-scale rm -rf commands ext4: fix long mount times on very big file systems ext4: don't call ext4_error while block group is locked ext4: avoid kmemcheck complaint from reading uninitialized memory ext4: make sure the journal sb is written in ext4_clear_journal_err()
| * | ext4: fix kernel BUG on large-scale rm -rf commandsTheodore Ts'o2012-08-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit 968dee7722 was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: fix long mount times on very big file systemsTheodore Ts'o2012-08-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: don't call ext4_error while block group is lockedTheodore Ts'o2012-08-172-26/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While in ext4_validate_block_bitmap(), if an block allocation bitmap is found to be invalid, we call ext4_error() while the block group is still locked. This causes ext4_commit_super() to call a function which might sleep while in an atomic context. There's no need to keep the block group locked at this point, so hoist the ext4_error() call up to ext4_validate_block_bitmap() and release the block group spinlock before calling ext4_error(). The reported stack trace can be found at: http://article.gmane.org/gmane.comp.file-systems.ext4/33731 Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o2012-08-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: make sure the journal sb is written in ext4_clear_journal_err()Theodore Ts'o2012-08-062-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
* | | autofs4 - fix expire checkIan Kent2012-08-171-5/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases when an autofs indirect mount is contained in a file system that is marked as shared (such as when systemd does the equivalent of "mount --make-rshared /" early in the boot), mounts stop expiring. When this happens the first expiry check on a mountpoint dentry in autofs_expire_indirect() sees a mountpoint dentry with a higher than minimal reference count. Consequently the dentry is condidered busy and the actual expiry check is never done. This particular check was originally meant as an optimisation to detect a path walk in progress but with the addition of rcu-walk it can be ineffective anyway. Removing the test allows automounts to expire again since the actual expire check doesn't rely on the dentry reference count. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | autofs4 - fix get_next_positive_subdir()Ian Kent2012-08-161-18/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following a report of a crash during an automount expire I found that the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong. Not only is the locking wrong but the function is more complex than it needs to be. The function is meant to calculate (and dget) the next entry in the list of directories contained in the root of an autofs mount point (an autofs indirect mount to be precise). The main problem was that the d_lock of the owner of the list was not being taken when walking the list, which lead to list corruption under load. The only other lock that needs to be taken is against the next dentry candidate so it can be checked for usability. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2012-08-163-10/+40
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: verify all ioctl retry iov elements fuse: add missing INIT flag descriptions fuse: add missing INIT flags fuse: update attributes on aio_read fuse: invalidate inode mapping if mtime changes fuse: add FUSE_AUTO_INVAL_DATA init flag
| * | fuse: verify all ioctl retry iov elementsZach Brown2012-08-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: <stable@vger.kernel.org> [2.6.37+]
| * | fuse: add missing INIT flagsMiklos Szeredi2012-07-181-1/+2
| | | | | | | | | | | | | | | | | | | | | Add missing flags that userspace derived from the protocol version number. This makes the protocol more flexible. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | fuse: update attributes on aio_readBrian Foster2012-07-181-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A fuse-based network filesystem might allow for the inode and/or file data to change unexpectedly. A local client that opens and repeatedly reads a file might never pick up on such changes and indefinitely return stale data. Always invoke fuse_update_attributes() in the read path to cause an attr revalidation when the attributes expire. This leads to a page cache invalidation if necessary and ensures fuse issues new read requests to the fuse client. The original logic (reval only on reads beyond EOF) is preserved unless the client specifies FUSE_AUTO_INVAL_DATA on init. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | fuse: invalidate inode mapping if mtime changesBrian Foster2012-07-181-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently invalidate the inode address space mapping if the file size changes unexpectedly. In the case of a fuse network filesystem, a portion of a file could be overwritten remotely without changing the file size. Compare the old mtime as well to detect this condition and invalidate the mapping if the file has been updated. The original logic (to ignore changes in mtime) is preserved unless the client specifies FUSE_AUTO_INVAL_DATA on init. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | fuse: add FUSE_AUTO_INVAL_DATA init flagBrian Foster2012-07-182-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | FUSE_AUTO_INVAL_DATA is provided to enable updated/auto cache invalidation logic. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | | Merge branch 'for-linus-3.6' of ↵Linus Torvalds2012-08-121-5/+0Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs merge fix from Chris Mason: "This fixes a merge error in rc1. The calls to mnt_want_write should have been removed." * 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove mnt_want_write call in btrfs_mksubvol
| * | | Btrfs: remove mnt_want_write call in btrfs_mksubvolAlexander Block2012-08-091-5/+0Star
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We got a recursive lock in mksubvol because the caller already held a lock. I think we got into this due to a merge error. Commit a874a63 removed the mnt_want_write call from btrfs_mksubvol and added a replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid. Commit e7848683 however tried to move all calls to mnt_want_write above i_mutex. So somewhere while merging this, it got mixed up. The solution is to remove the mnt_want_write call completely from mksubvol. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* | | missed mnt_drop_write() in do_dentry_open()Al Viro2012-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | This one ought to be __mnt_drop_write(), to match __mnt_want_write() in the beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | UBIFS: nuke pdflush from commentsArtem Bityutskiy2012-08-042-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | The pdflush thread is long gone, so this patch removes references to pdflush from UBIFS comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | gfs2: nuke pdflush from commentsArtem Bityutskiy2012-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The pdflush thread is long gone, so this patch removes references to pdflush from gfs comments. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | nilfs2: nuke write_super from commentsArtem Bityutskiy2012-08-042-6/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ntfs. Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | hfs: nuke write_super from commentsArtem Bityutskiy2012-08-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from hfs. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: nuke pdflush from commentsArtem Bityutskiy2012-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The pdflush thread is long gone, so this patch removes references to pdflush from vfs comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | jbd/jbd2: nuke write_super from commentsArtem Bityutskiy2012-08-042-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from various jbd and jbd2. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | btrfs: nuke pdflush from commentsArtem Bityutskiy2012-08-042-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pdflush thread is long gone, so this patch removes references to pdflush from btrfs comments. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | btrfs: nuke write_super from commentsArtem Bityutskiy2012-08-042-8/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from btrfs. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | ext4: nuke pdflush from commentsArtem Bityutskiy2012-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pdflush thread is long gone, so this patch removes references to pdflush from ext4 comments. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | ext4: nuke write_super from commentsArtem Bityutskiy2012-08-042-19/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | ext3: nuke write_super from commentsArtem Bityutskiy2012-08-042-19/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | vfs: kill write_super and sync_supersArtem Bityutskiy2012-08-031-40/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finally we can kill the 'sync_supers' kernel thread along with the '->write_super()' superblock operation because all the users are gone. Now every file-system is supposed to self-manage own superblock and its dirty state. The nice thing about killing this thread is that it improves power management. Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up every 5 seconds no matter what - even if there were no dirty superblocks and even if there were no file-systems using this service (e.g., btrfs and journalled ext4 do not need it). So it was wasting power most of the time. And because the thread was in the core of the kernel, all systems had to have it. So I am quite happy to make it go away. Interestingly, this thread is a left-over from the pdflush kernel thread which was a self-forking kernel thread responsible for all the write-back in old Linux kernels. It was turned into per-block device BDI threads, and 'sync_supers' was a left-over. Thus, R.I.P, pdflush as well. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds2012-08-033-27/+25Star
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull exofs update from Boaz Harrosh: "They are all mostly fixes, except the most important patch by Artem Bityutskiy which removes the use of s_dirt. After this patch s_dirt can be completely removed from the tree." * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Fix out-of-bounds access in _ios_obj() exofs: Use proper max_IO calculations from ore exofs: Fix __r4w_get_page when offset is beyond i_size exofs: stop using s_dirt exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page)
| * | ore: Fix out-of-bounds access in _ios_obj()Boaz Harrosh2012-08-021-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _ios_obj() is accessed by group_index not device_table index. The oc->comps array is only a group_full of devices at a time it is not like ore_comp_dev() which is indexed by a global device_table index. This did not BUG until now because exofs only uses a single COMP for all devices. But with other FSs like PanFS this is not true. This bug was only in the write_path, all other users were using it correctly [This is a bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Use proper max_IO calculations from oreBoaz Harrosh2012-08-021-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exofs_max_io_pages should just use the ORE's calculated layout->max_io_length, And avoid unnecessary BUGs, calculations made here were also a layering violation. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Fix __r4w_get_page when offset is beyond i_sizeBoaz Harrosh2012-08-021-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global ZERO_PAGE, if offset is beyond i_size. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: stop using s_dirtArtem Bityutskiy2012-08-021-11/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exofs has the '->write_super()' handler and makes some use of the '->s_dirt' superblock flag, but it really needs neither of them because it never sets 's_dirt' to one which means the VFS never calls its '->write_super()' handler. Thus, remove both. Note, I am trying to remove both 's_dirt' and 'write_super()' from VFS altogether once all users are gone. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page)Kautuk Consul2012-08-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | readpage_strip can be called from several code paths all of which require that the page be locked before any operations are carried out. Since we export the exofs_readpage callback to the VFS, add a BUG_ON to check for PageLocked(page) to make sure that this understanding is never compromised. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2012-08-023-66/+40Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two ceph fixes from Sage Weil: "The first patch fixes up the old crufty open intent code to use the atomic_open stuff properly, and the second fixes a possible null deref and memory leak with the crypto keys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix crypto key null deref, memory leak ceph: simplify+fix atomic_open
| * | | ceph: simplify+fix atomic_openSage Weil2012-08-023-66/+40Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial ->atomic_open op was carried over from the old intent code, which was incomplete and didn't really work. Replace it with a fresh method. In particular: * always attempt to do an atomic open+lookup, both for the create case and for lookups of existing files. * fix symlink handling by returning 1 to the VFS so that we can follow the link to its destination. This fixes a longstanding ceph bug (#2392). Signed-off-by: Sage Weil <sage@inktank.com>
* | | | Merge tag 'ecryptfs-3.6-rc1-fixes' of ↵Linus Torvalds2012-08-027-316/+158Star
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: - Fixes a bug when the lower filesystem mount options include 'acl', but the eCryptfs mount options do not - Cleanups in the messaging code - Better handling of empty files in the lower filesystem to improve usability. Failed file creations are now cleaned up and empty lower files are converted into eCryptfs during open(). - The write-through cache changes are being reverted due to bugs that are not easy to fix. Stability outweighs the performance enhancements here. - Improvement to the mount code to catch unsupported ciphers specified in the mount options * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: check for eCryptfs cipher support at mount eCryptfs: Revert to a writethrough cache model eCryptfs: Initialize empty lower files when opening them eCryptfs: Unlink lower inode when ecryptfs_create() fails eCryptfs: Make all miscdev functions use daemon ptr in file private_data eCryptfs: Remove unused messaging declarations and function eCryptfs: Copy up POSIX ACL and read-only flags from lower mount
| * | | | eCryptfs: check for eCryptfs cipher support at mountTim Sally2012-07-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue occurs when eCryptfs is mounted with a cipher supported by the crypto subsystem but not by eCryptfs. The mount succeeds and an error does not occur until a write. This change checks for eCryptfs cipher support at mount time. Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009. https://bugs.launchpad.net/ecryptfs/+bug/338914 Signed-off-by: Tim Sally <tsally@atomicpeace.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * | | | eCryptfs: Revert to a writethrough cache modelTyler Hicks2012-07-143-63/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A change was made about a year ago to get eCryptfs to better utilize its page cache during writes. The idea was to do the page encryption operations during page writeback, rather than doing them when initially writing into the page cache, to reduce the number of page encryption operations during sequential writes. This meant that the encrypted page would only be written to the lower filesystem during page writeback, which was a change from how eCryptfs had previously wrote to the lower filesystem in ecryptfs_write_end(). The change caused a few eCryptfs-internal bugs that were shook out. Unfortunately, more grave side effects have been identified that will force changes outside of eCryptfs. Because the lower filesystem isn't consulted until page writeback, eCryptfs has no way to pass lower write errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported that quotas could be bypassed because of the way eCryptfs may sometimes open the lower filesystem using a privileged kthread. It would be nice to resolve the latest issues, but it is best if the eCryptfs commits be reverted to the old behavior in the meantime. This reverts: 32001d6f "eCryptfs: Flush file in vma close" 5be79de2 "eCryptfs: Flush dirty pages in setattr" 57db4e8d "ecryptfs: modify write path to encrypt page in writepage" Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin King <colin.king@canonical.com> Cc: Colin King <colin.king@canonical.com> Cc: Thieu Le <thieule@google.com>
| * | | | eCryptfs: Initialize empty lower files when opening themTyler Hicks2012-07-083-28/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, eCryptfs has only initialized lower files in the ecryptfs_create() path. Lower file initialization is the act of writing the cryptographic metadata from the inode's crypt_stat to the header of the file. The ecryptfs_open() path already expects that metadata to be in the header of the file. A number of users have reported empty lower files in beneath their eCryptfs mounts. Most of the causes for those empty files being left around have been addressed, but the presence of empty files causes problems due to the lack of proper cryptographic metadata. To transparently solve this problem, this patch initializes empty lower files in the ecryptfs_open() error path. If the metadata is unreadable due to the lower inode size being 0, plaintext passthrough support is not in use, and the metadata is stored in the header of the file (as opposed to the user.ecryptfs extended attribute), the lower file will be initialized. The number of nested conditionals in ecryptfs_open() was getting out of hand, so a helper function was created. To avoid the same nested conditional problem, the conditional logic was reversed inside of the helper function. https://launchpad.net/bugs/911507 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com>
| * | | | eCryptfs: Unlink lower inode when ecryptfs_create() failsTyler Hicks2012-07-081-23/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ecryptfs_create() creates a lower inode, allocates an eCryptfs inode, initializes the eCryptfs inode and cryptographic metadata attached to the inode, and then writes the metadata to the header of the file. If an error was to occur after the lower inode was created, an empty lower file would be left in the lower filesystem. This is a problem because ecryptfs_open() refuses to open any lower files which do not have the appropriate metadata in the file header. This patch properly unlinks the lower inode when an error occurs in the later stages of ecryptfs_create(), reducing the chance that an empty lower file will be left in the lower filesystem. https://launchpad.net/bugs/872905 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com>
| * | | | eCryptfs: Make all miscdev functions use daemon ptr in file private_dataTyler Hicks2012-07-083-172/+47Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that a pointer to a valid struct ecryptfs_daemon is stored in the private_data of an opened /dev/ecryptfs file, the remaining miscdev functions can utilize the pointer rather than looking up the ecryptfs_daemon at the beginning of each operation. The security model of /dev/ecryptfs is simplified a little bit with this patch. Upon opening /dev/ecryptfs, a per-user ecryptfs_daemon is registered. Another daemon cannot be registered for that user until the last file reference is released. During the lifetime of the ecryptfs_daemon, access checks are not performed on the /dev/ecryptfs operations because it is assumed that the application securely handles the opened file descriptor and does not unintentionally leak it to processes that are not trusted. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Sasha Levin <levinsasha928@gmail.com>