summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* hfsplus: honor setgid flag on directoriesErnesto A. Fernandez2018-02-074-8/+8
| | | | | | | | | | | | | When creating a file inside a directory that has the setgid flag set, give the new file the group ID of the parent, and also the setgid flag if it is a directory itself. Link: http://lkml.kernel.org/r/20171204192705.GA6101@debian.home Signed-off-by: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: use time64_t internallyArnd Bergmann2018-02-079-24/+23Star
| | | | | | | | | | | | | | | | | | | | | | | | | | The superblock and segment timestamps are used only internally in nilfs2 and can be read out using sysfs. Since we are using the old 'get_seconds()' interface and store the data as timestamps, the behavior differs slightly between 64-bit and 32-bit kernels, the latter will show incorrect timestamps after 2038 in sysfs, and presumably fail completely in 2106 as comparisons go wrong. This changes nilfs2 to use time64_t with ktime_get_real_seconds() to handle timestamps, making the behavior consistent and correct on both 32-bit and 64-bit machines. The on-disk format already uses 64-bit timestamps, so nothing changes there. Link: http://lkml.kernel.org/r/20180122211050.1286441-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* elf: fix NT_FILE integer overflowAlexey Dobriyan2018-02-071-0/+2
| | | | | | | | | | | | | | | If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then it should be possible to overflow 32-bit "size", pass paranoia check, allocate very little vmalloc space and oops while writing into vmalloc guard page... But I didn't test this, only coredump of regular process. Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/consoles.c: use seq_putc() in show_console_dev()Markus Elfring2018-02-071-2/+1Star
| | | | | | | | | | | | A single character (line break) should be put into a sequence. Thus use the corresponding function "seq_putc". This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/04fb69fe-d820-9141-820f-07e9a48f4635@users.sourceforge.net Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: rearrange argsAlexey Dobriyan2018-02-073-14/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | Rearrange args for smaller code. lookup revolves around memcmp() which gets len 3rd arg, so propagate length as 3rd arg. readdir and lookup add additional arg to VFS ->readdir and ->lookup, so better add it to the end. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-18 (-18) Function old new delta proc_readdir 22 13 -9 proc_lookup 18 9 -9 proc_match() is smaller if not inlined, I promise! Link: http://lkml.kernel.org/r/20180104175958.GB5204@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: spread likely/unlikely a bitAlexey Dobriyan2018-02-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | use_pde() is used at every open/read/write/... of every random /proc file. Negative refcount happens only if PDE is being deleted by module (read: never). So it gets "likely". unuse_pde() gets "unlikely" for the same reason. close_pdeo() gets unlikely as the completion is filled only if there is a race between PDE removal and close() (read: never ever). It even saves code on x86_64 defconfig: add/remove: 0/0 grow/shrink: 1/2 up/down: 2/-20 (-18) Function old new delta close_pdeo 183 185 +2 proc_reg_get_unmapped_area 119 111 -8 proc_reg_poll 85 73 -12 Link: http://lkml.kernel.org/r/20180104175657.GA5204@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc: use __ro_after_initAlexey Dobriyan2018-02-074-5/+9
| | | | | | | | | | /proc/self inode numbers, value of proc_inode_cache and st_nlink of /proc/$TGID are fixed constants. Link: http://lkml.kernel.org/r/20180103184707.GA31849@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/internal.h: fix up commentAlexey Dobriyan2018-02-071-1/+2
| | | | | | | | | Document what ->pde_unload_lock actually does. Link: http://lkml.kernel.org/r/20180103185120.GB31849@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/internal.h: rearrange struct proc_dir_entryAlexey Dobriyan2018-02-071-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct proc_dir_entry became bit messy over years: * move 16-bit ->mode_t before namelen to get rid of padding * make ->in_use first field: it seems to be most used resulting in smaller code on x86_64 (defconfig): add/remove: 0/0 grow/shrink: 7/13 up/down: 24/-67 (-43) Function old new delta proc_readdir_de 451 455 +4 proc_get_inode 282 286 +4 pde_put 65 69 +4 remove_proc_subtree 294 297 +3 remove_proc_entry 297 300 +3 proc_register 295 298 +3 proc_notify_change 94 97 +3 unuse_pde 27 26 -1 proc_reg_write 89 85 -4 proc_reg_unlocked_ioctl 85 81 -4 proc_reg_read 89 85 -4 proc_reg_llseek 87 83 -4 proc_reg_get_unmapped_area 123 119 -4 proc_entry_rundown 139 135 -4 proc_reg_poll 91 85 -6 proc_reg_mmap 79 73 -6 proc_get_link 55 49 -6 proc_reg_release 108 101 -7 proc_reg_open 298 291 -7 close_pdeo 228 218 -10 * move writeable fields together to a first cacheline (on x86_64), those include * ->in_use: reference count, taken every open/read/write/close etc * ->count: reference count, taken at readdir on every entry * ->pde_openers: tracks (nearly) every open, dirtied * ->pde_unload_lock: spinlock protecting ->pde_openers * ->proc_iops, ->proc_fops, ->data: writeonce fields, used right together with previous group. * other rarely written fields go into 1st/2nd and 2nd/3rd cacheline on 32-bit and 64-bit respectively. Additionally on 32-bit, ->subdir, ->subdir_node, ->namelen, ->name go fully into 2nd cacheline, separated from writeable fields. They are all used during lookup. Link: http://lkml.kernel.org/r/20171220215914.GA7877@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/kcore.c: use probe_kernel_read() instead of memcpy()Heiko Carstens2018-02-071-13/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") added a bounce buffer to avoid hardened usercopy checks. Copying to the bounce buffer was implemented with a simple memcpy() assuming that it is always valid to read from kernel memory iff the kern_addr_valid() check passed. A simple, but pointless, test case like "dd if=/proc/kcore of=/dev/null" now can easily crash the kernel, since the former execption handling on invalid kernel addresses now doesn't work anymore. Also adding a kern_addr_valid() implementation wouldn't help here. Most architectures simply return 1 here, while a couple implemented a page table walk to figure out if something is mapped at the address in question. With DEBUG_PAGEALLOC active mappings are established and removed all the time, so that relying on the result of kern_addr_valid() before executing the memcpy() also doesn't work. Therefore simply use probe_kernel_read() to copy to the bounce buffer. This also allows to simplify read_kcore(). At least on s390 this fixes the observed crashes and doesn't introduce warnings that were removed with df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data"), even though the generic probe_kernel_read() implementation uses uaccess functions. While looking into this I'm also wondering if kern_addr_valid() could be completely removed...(?) Link: http://lkml.kernel.org/r/20171202132739.99971-1-heiko.carstens@de.ibm.com Fixes: df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") Fixes: f5509cc18daa ("mm: Hardened usercopy") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/array.c: delete children_seq_release()Alexey Dobriyan2018-02-071-7/+1Star
| | | | | | | | | | It is 1:1 wrapper around seq_release(). Link: http://lkml.kernel.org/r/20171122171510.GA12161@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: less memory for /proc/*/map_files readdirAlexey Dobriyan2018-02-071-6/+9
| | | | | | | | | | | dentry name can be evaluated later, right before calling into VFS. Also, spend less time under ->mmap_sem. Link: http://lkml.kernel.org/r/20171110163034.GA2534@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/vmcore.c: simpler /proc/vmcore cleanupAlexey Dobriyan2018-02-071-4/+2Star
| | | | | | | | | | | Iterators aren't necessary as you can just grab the first entry and delete it until no entries left. Link: http://lkml.kernel.org/r/20171121191121.GA20757@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: fix /proc/*/map_files lookupAlexey Dobriyan2018-02-071-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code does: if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2) However sscanf() is broken garbage. It silently accepts whitespace between format specifiers (did you know that?). It silently accepts valid strings which result in integer overflow. Do not use sscanf() for any even remotely reliable parsing code. OK # readlink '/proc/1/map_files/55a23af39000-55a23b05b000' /lib/systemd/systemd broken # readlink '/proc/1/map_files/ 55a23af39000-55a23b05b000' /lib/systemd/systemd broken # readlink '/proc/1/map_files/55a23af39000-55a23b05b000 ' /lib/systemd/systemd very broken # readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000' /lib/systemd/systemd Andrei said: : This patch breaks criu. It was a bug in criu. And this bug is on a minor : path, which works when memfd_create() isn't available. It is a reason why : I ask to not backport this patch to stable kernels. : : In CRIU this bug can be triggered, only if this patch will be backported : to a kernel which version is lower than v3.16. Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: don't use READ_ONCE/WRITE_ONCE for /proc/*/fail-nthAlexey Dobriyan2018-02-071-3/+2Star
| | | | | | | | | | | READ_ONCE and WRITE_ONCE are useless when there is only one read/write is being made. Link: http://lkml.kernel.org/r/20171120204033.GA9446@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: use %u for pid printing and slightly less stackAlexey Dobriyan2018-02-074-15/+14Star
| | | | | | | | | | | | | PROC_NUMBUF is 13 which is enough for "negative int + \n + \0". However PIDs and TGIDs are never negative and newline is not a concern, so use just 10 per integer. Link: http://lkml.kernel.org/r/20171120203005.GA27743@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexander Viro <viro@ftp.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2018-02-053-11/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more xfs updates from Darrick Wong: "As promised, here's a (much smaller) second pull request for the second week of the merge cycle. This time around we have a couple patches shutting off unsupported fs configurations, and a couple of cleanups. Last, we turn off EXPERIMENTAL for the reverse mapping btree, since the primary downstream user of that information (online fsck) is now upstream and I haven't seen any major failures in a few kernel releases. Summary: - Print scrub build status in the xfs build info. - Explicitly call out the remaining two scenarios where we don't support reflink and never have. - Remove EXPERIMENTAL tag from reverse mapping btree!" * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove experimental tag for reverse mapping xfs: don't allow reflink + realtime filesystems xfs: don't allow DAX on reflink filesystems xfs: add scrub to XFS_BUILD_OPTIONS xfs: fix u32 type usage in sb validation function
| * xfs: remove experimental tag for reverse mappingDarrick J. Wong2018-02-021-8/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Reverse mapping has had a while to soak, so remove the experimental tag. Now that we've landed space metadata cross-referencing in scrub, the feature actually has a purpose. Reject rmap filesystems with an rt device until the code to support it is actually implemented. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
| * xfs: don't allow reflink + realtime filesystemsDarrick J. Wong2018-02-021-0/+7
| | | | | | | | | | | | | | | | We don't support realtime filesystems with reflink either, so fail those mounts. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
| * xfs: don't allow DAX on reflink filesystemsDarrick J. Wong2018-02-021-1/+4
| | | | | | | | | | | | | | | | | | Now that reflink is no longer experimental, reject attempts to mount with DAX until that whole mess gets sorted out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * xfs: add scrub to XFS_BUILD_OPTIONSEric Sandeen2018-02-021-0/+7
| | | | | | | | | | | | | | | | Advertise this config option along with the others. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: fix u32 type usage in sb validation functionDarrick J. Wong2018-02-011-2/+2
| | | | | | | | | | | | | | Don't use u32, use uint32_t, because this won't work in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
* | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2018-02-0513-406/+1800
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This work from Amir adds NFS export capability to overlayfs. NFS exporting an overlay filesystem is a challange because we want to keep track of any copy-up of a file or directory between encoding the file handle and decoding it. This is achieved by indexing copied up objects by lower layer file handle. The index is already used for hard links, this patchset extends the use to NFS file handle decoding" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits) ovl: check ERR_PTR() return value from ovl_encode_fh() ovl: fix regression in fsnotify of overlay merge dir ovl: wire up NFS export operations ovl: lookup indexed ancestor of lower dir ovl: lookup connected ancestor of dir in inode cache ovl: hash non-indexed dir by upper inode for NFS export ovl: decode pure lower dir file handles ovl: decode indexed dir file handles ovl: decode lower file handles of unlinked but open files ovl: decode indexed non-dir file handles ovl: decode lower non-dir file handles ovl: encode lower file handles ovl: copy up before encoding non-connectable dir file handle ovl: encode non-indexed upper file handles ovl: decode connected upper dir file handles ovl: decode pure upper file handles ovl: encode pure upper file handles ovl: document NFS export vfs: factor out helpers d_instantiate_anon() and d_alloc_anon() ovl: store 'has_upper' and 'opaque' as bit flags ...
| * | ovl: check ERR_PTR() return value from ovl_encode_fh()Amir Goldstein2018-02-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Another fix for an issue reported by 0-day robot. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: fix regression in fsnotify of overlay merge dirAmir Goldstein2018-02-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A re-factoring patch in NFS export series has passed the wrong argument to ovl_get_inode() causing a regression in the very recent fix to fsnotify of overlay merge dir. The regression has caused merge directory inodes to be hashed by upper instead of lower real inode, when NFS export and directory indexing is disabled. That caused an inotify watch to become obsolete after directory copy up and drop caches. LTP test inotify07 was improved to catch this regression. The regression also caused multiple redirect dirs to same origin not to be detected on lookup with NFS export disabled. An xfstest was added to cover this case. Fixes: 0aceb53e73be ("ovl: do not pass overlay dentry to ovl_get_inode()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: wire up NFS export operationsAmir Goldstein2018-01-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Now that NFS export operations are implemented, enable overlayfs NFS export support if the "nfs_export" feature is enabled. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: lookup indexed ancestor of lower dirAmir Goldstein2018-01-243-7/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ovl_lookup_real() in lower layer walks back lower parents to find the topmost indexed parent. If an indexed ancestor is found before reaching lower layer root, ovl_lookup_real() is called recursively with upper layer to walk back from indexed upper to the topmost connected/hashed upper parent (or up to root). ovl_lookup_real() in upper layer then walks forward to connect the topmost upper overlay dir dentry and ovl_lookup_real() in lower layer continues to walk forward to connect the decoded lower overlay dir dentry. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: lookup connected ancestor of dir in inode cacheAmir Goldstein2018-01-243-13/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding a dir file handle requires walking backward up to layer root and for lower dir also checking the index to see if any of the parents have been copied up. Lookup overlay ancestor dentry in inode/dentry cache by decoded real parents to shortcut looking up all the way back to layer root. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: hash non-indexed dir by upper inode for NFS exportAmir Goldstein2018-01-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-indexed upper dirs are encoded as upper file handles. When NFS export is enabled, hash non-indexed directory inodes by upper inode, so we can find them in inode cache using the decoded upper inode. When NFS export is disabled, directories are not indexed on copy up, so hash non-indexed directory inodes by origin inode, the same hash key that is used before copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode pure lower dir file handlesAmir Goldstein2018-01-241-17/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to decoding a pure upper dir file handle, decoding a pure lower dir file handle is implemented by looking an overlay dentry of the same path as the pure lower path and verifying that the overlay dentry's real lower matches the decoded real lower file handle. Unlike the case of upper dir file handle, the lookup of overlay path by lower real path can fail or find a mismatched overlay dentry if any of the lower parents have been copied up and renamed. To address this case we will need to check if any of the lower parents are indexed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode indexed dir file handlesAmir Goldstein2018-01-243-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding an indexed dir file handle is done by looking up the file handle in index dir by name and then decoding the upper dir from the index origin file handle. The decoded upper path is used to lookup an overlay dentry of the same path. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode lower file handles of unlinked but open filesAmir Goldstein2018-01-243-2/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | Lookup overlay inode in cache by origin inode, so we can decode a file handle of an open file even if the index has a whiteout index entry to mark this overlay inode was unlinked. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode indexed non-dir file handlesAmir Goldstein2018-01-241-25/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding an indexed non-dir file handle is similar to decoding a lower non-dir file handle, but additionally, we lookup the file handle in index dir by name to find the real upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode lower non-dir file handlesAmir Goldstein2018-01-243-15/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding a lower non-dir file handle is done by decoding the lower dentry from underlying lower fs, finding or allocating an overlay inode that is hashed by the real lower inode and instantiating an overlay dentry with that inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: encode lower file handlesAmir Goldstein2018-01-241-8/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | For indexed or lower non-dir, encode a non-connectable lower file handle from origin inode. For indexed or lower dir, when ofs->numlower == 1, encode a lower file handle from lower dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: copy up before encoding non-connectable dir file handleAmir Goldstein2018-01-241-4/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding a merge dir, whose origin's parent is under a redirected lower dir is not always possible. As a simple aproximation, we do not encode lower dir file handles when overlay has multiple lower layers and origin is below the topmost lower layer. We should later relax this condition and copy up only the parent that is under a redirected lower. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: encode non-indexed upper file handlesAmir Goldstein2018-01-241-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to encode origin if there is a chance that the same object was encoded pre copy up and then we need to stay consistent with the same encoding also after copy up. In case a non-pure upper is not indexed, then it was copied up before NFS export support was enabled. In that case, we don't need to worry about staying consistent with pre copy up encoding and we encode an upper file handle. This mitigates the problem that with no index, we cannot find an upper inode from origin inode, so we cannot decode a non-indexed upper from origin file handle. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode connected upper dir file handlesAmir Goldstein2018-01-241-1/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until this change, we decoded upper file handles by instantiating an overlay dentry from the real upper dentry. This is sufficient to handle pure upper files, but insufficient to handle merge/impure dirs. To that end, if decoded real upper dir is connected and hashed, we lookup an overlay dentry with the same path as the real upper dir. If decoded real upper is non-dir, we instantiate a disconnected overlay dentry as before this change. Because ovl_fh_to_dentry() returns a connected overlay dir dentry, exportfs never needs to call get_parent() and get_name() to reconnect an upper overlay dir. Because connectable non-dir file handles are not supported, exportfs will not be able to use fh_to_parent() and get_name() methods to reconnect a disconnected non-dir to its parent. Therefore, the methods get_parent() and get_name() are implemented just to print out a sanity warning and the method fh_to_parent() is implemented to warn the user that using the 'subtree_check' exportfs option is not supported. An alternative approach could have been to implement instantiating of an overlay directory inode from origin/index and implement get_parent() and get_name() by calling into underlying fs operations and them instantiating the overlay parent dir. The reasons for not choosing the get_parent() approach were: - Obtaining a disconnected overlay dir dentry would requires a delicate re-factoring of ovl_lookup() to get a dentry with overlay parent info. It was preferred to avoid doing that re-factoring unless it was proven worthy. - Going down the path of disconnected dir would mean that the (non trivial) code path of d_splice_alias() could be traveled and that meant writing more tests and introduces race cases that are very hard to hit on purpose. Taking the path of connecting overlay dentry by forward lookup is therefore the safe and boring way to avoid surprises. The culprits of the chosen "connected overlay dentry" approach: - We need to take special care to rename of ancestors while connecting the overlay dentry by real dentry path. These subtleties are usually handled by generic exportfs and VFS code. - In a hypothetical workload, we could end up in a loop trying to connect, interrupted by rename and restarting connect forever. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: decode pure upper file handlesAmir Goldstein2018-01-243-2/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decoding an upper file handle is done by decoding the upper dentry from underlying upper fs, finding or allocating an overlay inode that is hashed by the real upper inode and instantiating an overlay dentry with that inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: encode pure upper file handlesAmir Goldstein2018-01-243-1/+106
| | | | | | | | | | | | | | | | | | | | | | | | Encode overlay file handles as struct ovl_fh containing the file handle encoding of the real upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()Miklos Szeredi2018-01-241-31/+56
| | | | | | | | | | | | | | | | | | | | | | | | Those helpers are going to be used by overlayfs to implement NFS export decode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: store 'has_upper' and 'opaque' as bit flagsAmir Goldstein2018-01-245-20/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make some room in struct ovl_entry to store information about redirected ancestors for NFS export, so cram two booleans as bit flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: copy up of disconnected dentriesAmir Goldstein2018-01-243-19/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With NFS export, some operations on decoded file handles (e.g. open, link, setattr, xattr_set) may call copy up with a disconnected non-dir. In this case, we will copy up lower inode to index dir without linking it to upper dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: use d_splice_alias() in place of d_add() in lookupAmir Goldstein2018-01-241-3/+1Star
| | | | | | | | | | | | | | | | | | | | | This is required for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: do not pass overlay dentry to ovl_get_inode()Amir Goldstein2018-01-243-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | This is needed for using ovl_get_inode() for decoding file handles for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: factor out ovl_get_index_fh() helperAmir Goldstein2018-01-242-10/+50
| | | | | | | | | | | | | | | | | | | | | The helper is needed to lookup an index by file handle for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: whiteout orphan index entries on mountAmir Goldstein2018-01-242-4/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Orphan index entries are non-dir index entries whose union nlink count dropped to zero. With index=on, orphan index entries are removed on mount. With NFS export feature enabled, orphan index entries are replaced with white out index entries to block future open by handle from opening the lower file. When dir index has a stale 'upper' xattr, we assume that the upper dir was removed and we treat the dir index as orphan entry that needs to be whited out or removed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: whiteout index when union nlink drops to zeroAmir Goldstein2018-01-243-29/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With NFS export feature enabled, when overlay inode nlink drops to zero, instead of removing the index entry, replace it with a whiteout index entry. This is needed for NFS export in order to prevent future open by handle from opening the lower file directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: cleanup dir index when dir nlink drops to zeroAmir Goldstein2018-01-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | When non-dir index union nlink drops to zero the non-dir index is cleaned. Do the same for directory type index entries when union directory is removed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | ovl: index directories on copy up for NFS exportAmir Goldstein2018-01-242-7/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the NFS export feature enabled, all dirs are indexed on copy up. Non-dir files are copied up directly to indexdir and then hardlinked to upper dir. Directories are copied up to indexdir, then an index entry is created in indexdir with 'upper' xattr pointing to the copied up dir and then the copied up dir is moved to upper dir. Directory index is also used for consistency verification, like detecting multiple redirected dirs to the same lower dir on lookup. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>