bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	virtiofsd: Convert some functions to return bool	Greg Kurz	2021-03-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	Both currently only return 0 or 1. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-3-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Don't allow empty paths in lookup_name()	Greg Kurz	2021-03-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When passed an empty filename, lookup_name() returns the inode of the parent directory, unless the parent is the root in which case the st_dev doesn't match and lo_find() returns NULL. This is because lookup_name() passes AT_EMPTY_PATH down to fstatat() or statx(). This behavior doesn't quite make sense because users of lookup_name() then pass the name to unlinkat(), renameat() or renameat2(), all of which will always fail on empty names. Drop AT_EMPTY_PATH from the flags in lookup_name() so that it has the consistent behavior of "returning an existing child inode or NULL" for all directories. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-2-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Don't allow empty filenames	Greg Kurz	2021-03-15	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POSIX.1-2017 clearly stipulates that empty filenames aren't allowed ([1] and [2]). Since virtiofsd is supposed to mirror the host file system hierarchy and the host can be assumed to be linux, we don't really expect clients to pass requests with an empty path in it. If they do so anyway, this would eventually cause an error when trying to create/lookup the actual inode on the underlying POSIX filesystem. But this could still confuse some code that wouldn't be ready to cope with this. Filter out empty names coming from the client at the top level, so that the rest doesn't have to care about it. This is done everywhere we already call is_safe_path_component(), but in a separate helper since the usual error for empty path names is ENOENT instead of EINVAL. [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_170 [2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-4-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Add qemu version and copyright info	Vivek Goyal	2021-03-15	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Option "-V" currently displays the fuse protocol version virtiofsd is using. For example, I see this. $ ./virtiofsd -V "using FUSE kernel interface version 7.33" People also want to know software version of virtiofsd so that they can figure out if a certain fix is part of currently running virtiofsd or not. Eric Ernst ran into this issue. David Gilbert thinks that it probably is best that we simply carry the qemu version and display that information given we are part of qemu tree. So this patch enhances version information and also adds qemu version and copyright info. Not sure if copyright information is supposed to be displayed along with version info. Given qemu-storage-daemon and other utilities are doing it, so I continued with same pattern. This is how now output looks like. $ ./virtiofsd -V virtiofsd version 5.2.50 (v5.2.0-2357-gcbcf09872a-dirty) Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers using FUSE kernel interface version 7.33 Reported-by: Eric Ernst <eric.g.ernst@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210303195339.GB3793@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofs: drop remapped security.capability xattr as needed	Dr. David Alan Gilbert	2021-03-04	1	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Linux, the 'security.capability' xattr holds a set of capabilities that can change when an executable is run, giving a limited form of privilege escalation to those programs that the writer of the file deemed worthy. Any write causes the 'security.capability' xattr to be dropped, stopping anyone from gaining privilege by modifying a blessed file. Fuse relies on the daemon to do this dropping, and in turn the daemon relies on the host kernel to drop the xattr for it. However, with the addition of -o xattrmap, the xattr that the guest stores its capabilities in is now not the same as the one that the host kernel automatically clears. Where the mapping changes 'security.capability', explicitly clear the remapped name to preserve the same behaviour. This bug is assigned CVE-2021-20263. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
*	viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2	Vivek Goyal	2021-02-16	1	-7/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2". By default this is enabled as long as client supports it Enabling this option helps with performance in write path. Without this option, currently every write is first preceeded with a getxattr() operation to find out if security.capability is set. (Write is supposed to clear security.capability). With this option enabled, server is signing up for clearing security.capability on every WRITE and also clearing suid/sgid subject to certain rules. This gets rid of extra getxattr() call for every WRITE and improves performance. This is true when virtiofsd is run with option -o xattr. What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server implementation. It needs to adhere to following rules. Thanks to Miklos for this summary. - clear "security.capability" on write, truncate and chown unconditionally - clear suid/sgid in case of following. Note, sgid is cleared only if group executable bit is set. o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set. o setattr has FATTR_UID or FATTR_GID o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set. o write has FUSE_WRITE_KILL_SUIDGID >From Linux VFS client perspective, here are the requirements. - caps are always cleared on chown/write/truncate - suid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID. - sgid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID as well as file has group execute permission. virtiofsd implementation has not changed much to adhere to above ruls. And reason being that current assumption is that we are running on Linux and on top of filesystems like ext4/xfs which already follow above rules. On write, truncate, chown, seucurity.capability is cleared. And virtiofsd drops CAP_FSETID if need be and that will lead to clearing of suid/sgid. But if virtiofsd is running on top a filesystem which breaks above assumptions, then it will have to take extra actions to emulate above. That's a TODO for later when need arises. Note: create normally is supposed to be called only when file does not exist. So generally there should not be any question of clearing setuid/setgid. But it is possible that after client checks that file is not present, some other client creates file on server and this race can trigger sending FUSE_CREATE. In that case, if O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID is also set. v3: - Resolved conflicts due to lo_inode_open() changes. - Moved capability code in lo_do_open() so that both lo_open() and lo_create() can benefit from common code. - Dropped changes to kernel headers as these are part of qemu already. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210208224024.43555-3-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Save error code early at the failure callsite	Vivek Goyal	2021-02-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change error code handling slightly in lo_setattr(). Right now we seem to jump to out_err and assume that "errno" is valid and use that to send reply. But if caller has to do some other operations before jumping to out_err, then it does the dance of first saving errno to saverr and the restore errno before jumping to out_err. This makes it more confusing. I am about to make more changes where caller will have to do some work after error before jumping to out_err. I found it easier to change the convention a bit. That is caller saves error in "saverr" before jumping to out_err. And out_err uses "saverr" to send error back and does not rely on "errno" having actual error. v3: Resolved conflicts in lo_setattr() due to lo_inode_open() changes. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210208224024.43555-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: Replace the word 'whitelist'	Philippe Mathieu-Daudé	2021-02-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow the inclusive terminology from the "Conscious Language in your Open Source Projects" guidelines [] and replace the words "whitelist" appropriately. [] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210205171817.2108907-3-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: prevent opening of special files (CVE-2020-35517)	Stefan Hajnoczi	2021-02-04	1	-52/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A well-behaved FUSE client does not attempt to open special files with FUSE_OPEN because they are handled on the client side (e.g. device nodes are handled by client-side device drivers). The check to prevent virtiofsd from opening special files is missing in a few cases, most notably FUSE_OPEN. A malicious client can cause virtiofsd to open a device node, potentially allowing the guest to escape. This can be exploited by a modified guest device driver. It is not exploitable from guest userspace since the guest kernel will handle special files inside the guest instead of sending FUSE requests. This patch fixes this issue by introducing the lo_inode_open() function to check the file type before opening it. This is a short-term solution because it does not prevent a compromised virtiofsd process from opening device nodes on the host. Restructure lo_create() to try O_CREAT \| O_EXCL first. Note that O_CREAT \| O_EXCL does not follow symlinks, so O_NOFOLLOW masking is not necessary here. If the file exists and the user did not specify O_EXCL, open it via lo_do_open(). Reported-by: Alex Xu <alex@alxu.ca> Fixes: CVE-2020-35517 Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210204150208.367837-4-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: optionally return inode pointer from lo_do_lookup()	Stefan Hajnoczi	2021-02-04	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lo_do_lookup() finds an existing inode or allocates a new one. It increments nlookup so that the inode stays alive until the client releases it. Existing callers don't need the struct lo_inode so the function doesn't return it. Extend the function to optionally return the inode. The next commit will need it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210204150208.367837-3-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: extract lo_do_open() from lo_open()	Stefan Hajnoczi	2021-02-04	1	-27/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both lo_open() and lo_create() have similar code to open a file. Extract a common lo_do_open() function from lo_open() that will be used by lo_create() in a later commit. Since lo_do_open() does not otherwise need fuse_req_t req, convert lo_add_fd_mapping() to use struct lo_data *lo instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210204150208.367837-2-stefanha@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: update FUSE_FORGET comment on "lo_inode.nlookup"	Laszlo Ersek	2020-12-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Miklos confirms it's only the FUSE_FORGET request that the client can use for decrementing "lo_inode.nlookup". Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 1222f015558fc34cea02aa3a5a92de608c82cec8 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Message-Id: <20201208073936.8629-1-lersek@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Check file type in lo_flush()	Vivek Goyal	2020-12-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently lo_flush() is written in such a way that it expects to receive a FLUSH requests on a regular file (and not directories). For example, we call lo_fi_fd() which searches lo->fd_map. If we open directories using opendir(), we keep don't keep track of these in lo->fd_map instead we keep them in lo->dir_map. So we expect lo_flush() to be called on regular files only. Even linux fuse client calls FLUSH only for regular files and not directories. So put a check for filetype and return EBADF if lo_flush() is called on a non-regular file. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201211142544.GB3285@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Disable posix_lock hash table if remote locks are not enabled	Vivek Goyal	2020-12-18	1	-17/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If remote posix locks are not enabled (lo->posix_lock == false), then disable code paths taken to initialize inode->posix_lock hash table and corresponding destruction and search etc. lo_getlk() and lo_setlk() have been modified to return ENOSYS if daemon does not support posix lock but client still sends a lock/unlock request. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201207183021.22752-3-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Set up posix_lock hash table for root inode	Vivek Goyal	2020-12-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We setup per inode hash table ->posix_lock to support remote posix locks. But we forgot to initialize this table for root inode. Laszlo managed to trigger an issue where he sent a FUSE_FLUSH request for root inode and lo_flush() found inode with inode->posix_lock NULL and accessing this table crashed virtiofsd. May be we can get rid of initializing this hash table for directory objects completely. But that optimization is for another day. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201207195539.GB3107@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: make the debug log timestamp on stderr more human-readable	Laszlo Ersek	2020-12-18	1	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current timestamp format doesn't help me visually notice small jumps in time ("small" as defined on human scale, such as a few seconds or a few ten seconds). Replace it with a local time format where such differences stand out. Before: > [13316826770337] [ID: 00000004] unique: 62, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 1 > [13316826778175] [ID: 00000004] unique: 62, success, outsize: 16 > [13316826781156] [ID: 00000004] virtio_send_msg: elem 0: with 1 in desc of length 16 > [15138279317927] [ID: 00000001] virtio_loop: Got VU event > [15138279504884] [ID: 00000001] fv_queue_set_started: qidx=1 started=0 > [15138279519034] [ID: 00000003] fv_queue_thread: kill event on queue 1 - quitting > [15138280876463] [ID: 00000001] fv_remove_watch: TODO! fd=9 > [15138280897381] [ID: 00000001] virtio_loop: Waiting for VU event > [15138280946834] [ID: 00000001] virtio_loop: Got VU event > [15138281175421] [ID: 00000001] virtio_loop: Waiting for VU event > [15138281182387] [ID: 00000001] virtio_loop: Got VU event > [15138281189474] [ID: 00000001] virtio_loop: Waiting for VU event > [15138309321936] [ID: 00000001] virtio_loop: Unexpected poll revents 11 > [15138309434150] [ID: 00000001] virtio_loop: Exit (Notice how you don't (easily) notice the gap in time after "virtio_send_msg", and especially the amount of time passed is hard to estimate.) After: > [2020-12-08 06:43:22.58+0100] [ID: 00000004] unique: 51, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 1 > [2020-12-08 06:43:22.58+0100] [ID: 00000004] unique: 51, success, outsize: 16 > [2020-12-08 06:43:22.58+0100] [ID: 00000004] virtio_send_msg: elem 0: with 1 in desc of length 16 > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event > [2020-12-08 06:43:29.34+0100] [ID: 00000001] fv_queue_set_started: qidx=1 started=0 > [2020-12-08 06:43:29.34+0100] [ID: 00000003] fv_queue_thread: kill event on queue 1 - quitting > [2020-12-08 06:43:29.34+0100] [ID: 00000001] fv_remove_watch: TODO! fd=9 > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event > [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event > [2020-12-08 06:43:29.37+0100] [ID: 00000001] virtio_loop: Unexpected poll revents 11 > [2020-12-08 06:43:29.37+0100] [ID: 00000001] virtio_loop: Exit Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Message-Id: <20201208055043.31548-1-lersek@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	Clean up includes	Markus Armbruster	2020-12-10	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes, with the changes to the following files manually reverted: contrib/libvhost-user/libvhost-user-glib.h contrib/libvhost-user/libvhost-user.c contrib/libvhost-user/libvhost-user.h contrib/plugins/hotblocks.c contrib/plugins/hotpages.c contrib/plugins/howvec.c contrib/plugins/lockstep.c linux-user/mips64/cpu_loop.c linux-user/mips64/signal.c linux-user/sparc64/cpu_loop.c linux-user/sparc64/signal.c linux-user/x86_64/cpu_loop.c linux-user/x86_64/signal.c target/s390x/gen-features.c tests/fp/platform.h tests/migration/s390x/a-b-bios.c tests/plugin/bb.c tests/plugin/empty.c tests/plugin/insn.c tests/plugin/mem.c tests/test-rcu-simpleq.c tests/test-rcu-slist.c tests/test-rcu-tailq.c tests/uefi-test-tools/UefiTestToolsPkg/BiosTablesTest/BiosTablesTest.c contrib/plugins/, tests/plugin/, and tests/test-rcu-slist.c appear not to include osdep.h intentionally. The remaining reverts are the same as in commit bbfff19688d. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113061216.2483385-1-armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Alexander Bulekov <alxndr@bu.edu>
*	virtiofsd: check whether strdup lo.source return NULL in main func	Haotian Li	2020-11-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	In main func, strdup lo.source may fail. So check whether strdup lo.source return NULL before using it. Signed-off-by: Haotian Li <lihaotian9@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Message-Id: <f1e48ca8-d6de-d901-63c8-4f4024bda518@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: check whether lo_map_reserve returns NULL in, main func	Haotian Li	2020-11-12	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	In main func, func lo_map_reserve is called without NULL check. If reallocing new_elems fails in func lo_map_grow, the func lo_map_reserve may return NULL. We should check whether lo_map_reserve returns NULL before using it. Signed-off-by: Haotian Li <lihaotian9@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Message-Id: <48887813-1c95-048c-6d10-48e3dd2bac71@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Announce submounts even without statx()	Max Reitz	2020-11-12	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Contrary to what the check (and warning) in lo_init() claims, we can announce submounts just fine even without statx() -- the check is based on comparing both the mount ID and st_dev of parent and child. Without statx(), we will not have the mount ID; but we always have st_dev. The only problems we have (without statx() and its mount ID) are: (1) Mounting the same device twice may lead to both trees being treated as exactly the same tree by virtiofsd. But that is a problem that is completely independent of mirroring host submounts in the guest. Both submount roots will still show the FUSE_SUBMOUNT flag, because their st_dev still differs from their respective parent. (2) There is only one exception to (1), and that is if you mount a device inside a mount of itself: Then, its st_dev will be the same as that of its parent, and so without a mount ID, virtiofsd will not be able to recognize the nested mount's root as a submount. However, thanks to virtiofsd then treating both trees as exactly the same tree, it will be caught up in a loop when the guest tries to examine the nested submount, so the guest will always see nothing but an ELOOP there. Therefore, this case is just fully broken without statx(), whether we check for submounts (based on st_dev) or not. All in all, checking for submounts works well even without comparing the mount ID (i.e., without statx()). The only concern is an edge case that, without statx() mount IDs, is utterly broken anyway. Thus, drop said check in lo_init(). Reported-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201103164135.169325-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Announce sub-mount points	Max Reitz	2020-11-02	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Whenever we encounter a directory with an st_dev or mount ID that differs from that of its parent, we set the FUSE_ATTR_SUBMOUNT flag so the guest can create a submount for it. We only need to do so in lo_do_lookup(). The following functions return a fuse_attr object: - lo_create(), though fuse_reply_create(): Calls lo_do_lookup(). - lo_lookup(), though fuse_reply_entry(): Calls lo_do_lookup(). - lo_mknod_symlink(), through fuse_reply_entry(): Calls lo_do_lookup(). - lo_link(), through fuse_reply_entry(): Creating a link cannot create a submount, so there is no need to check for it. - lo_getattr(), through fuse_reply_attr(): Announcing submounts when the node is first detected (at lookup) is sufficient. We do not need to return the submount attribute later. - lo_do_readdir(), through fuse_add_direntry_plus(): Calls lo_do_lookup(). Make announcing submounts optional, so submounts are only announced to the guest with the announce_submounts option. Some users may prefer the current behavior, so that the guest learns nothing about the host mount structure. (announce_submounts is force-disabled when the guest does not present the FUSE_SUBMOUNTS capability, or when there is no statx().) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-6-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Add mount ID to the lo_inode key	Max Reitz	2020-11-02	1	-10/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using st_dev is not sufficient to uniquely identify a mount: You can mount the same device twice, but those are still separate trees, and e.g. by mounting something else inside one of them, they may differ. Using statx(), we can get a mount ID that uniquely identifies a mount. If that is available, add it to the lo_inode key. Most of this patch is taken from Miklos's mail here: https://marc.info/?l=fuse-devel&m=160062521827983 (virtiofsd-use-mount-id.patch attachment) Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-5-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	Revert series: virtiofsd: Announce submounts to the guest	Alex Williamson	2020-10-28	1	-76/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts the following commits due to their basis on a bogus linux kernel header update: c93a656f7b65 ("tests/acceptance: Add virtiofs_submounts.py") 45ced7ca2f27 ("tests/acceptance/boot_linux: Accept SSH pubkey") 08dce386e77e ("virtiofsd: Announce sub-mount points") eba8b096c17c ("virtiofsd: Store every lo_inode's parent_dev") ede24b6be798 ("virtiofsd: Add fuse_reply_attr_with_flags()") e2577435d343 ("virtiofsd: Add attr_flags to fuse_entry_param") 2f10415abfc5 ("virtiofsd: Announce FUSE_ATTR_FLAGS") 97d741cc96dd ("linux/fuse.h: Pull in from Linux") Cc: Max Reitz <mreitz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 160385090886.20017.13382256442750027666.stgit@gimli.home Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	virtiofsd: Announce sub-mount points	Max Reitz	2020-10-26	1	-8/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Whenever we encounter a directory with an st_dev that differs from that of its parent, we set the FUSE_ATTR_SUBMOUNT flag so the guest can create a submount for it. Make this behavior optional, so submounts are only announced to the guest with the announce_submounts option. Some users may prefer the current behavior, so that the guest learns nothing about the host mount structure. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200909184028.262297-7-mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge
*	virtiofsd: Store every lo_inode's parent_dev	Max Reitz	2020-10-26	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to detect mount points in the shared tree. We report them to the guest by setting the FUSE_ATTR_SUBMOUNT flag in fuse_attr.flags, but because the FUSE client will create a submount for every directory that has this flag set, we must do this only for the actual mount points. We can detect mount points by comparing a directory's st_dev with its parent's st_dev. To be able to do so, we need to store the parent's st_dev in the lo_inode object. Note that mount points need not necessarily be directories; a single file can be a mount point as well. However, for the sake of simplicity let us ignore any non-directory mount points for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200909184028.262297-6-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: xattr name mappings: Simple 'map'	Dr. David Alan Gilbert	2020-10-26	1	-1/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mapping rule system implemented in the last few patches is extremely flexible, but not easy to use. Add a simple 'map' type as a sprinkling of sugar to make it easy. e.g. -o xattrmap=":map::user.virtiofs.:" would be sufficient to prefix all xattr's or -o xattrmap=":map:trusted.:user.virtiofs.:" would just prefix 'trusted.' xattr's and leave everything else alone. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201023165812.36028-6-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: xattr name mappings: Map server xattr names	Dr. David Alan Gilbert	2020-10-26	1	-0/+90
\| \| \| \| \| \| \| \| \| \|	Map xattr names coming from the server, i.e. the host filesystem; currently this is only from listxattr. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201023165812.36028-4-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: xattr name mappings: Map client xattr names	Dr. David Alan Gilbert	2020-10-26	1	-3/+98
\| \| \| \| \| \| \| \| \|	Map xattr names originating at the client; from get/set/remove xattr. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201023165812.36028-3-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: xattr name mappings: Add option	Dr. David Alan Gilbert	2020-10-26	1	-0/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an option to define mappings of xattr names so that the client and server filesystems see different views. This can be used to have different SELinux mappings as seen by the guest, to run the virtiofsd with less privileges (e.g. in a case where it can't set trusted/system/security xattrs but you want the guest to be able to), or to isolate multiple users of the same name; e.g. trusted attributes used by stacking overlayfs. A mapping engine is used with 3 simple rules; the rules can be combined to allow most useful mapping scenarios. The ruleset is defined by -o xattrmap='rules...'. This patch doesn't use the rule maps yet. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201023165812.36028-2-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: add container-friendly -o sandbox=chroot option	Stefan Hajnoczi	2020-10-26	1	-2/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd cannot run in a container because CAP_SYS_ADMIN is required to create namespaces. Introduce a weaker sandbox mode that is sufficient in container environments because the container runtime already sets up namespaces. Use chroot to restrict path traversal to the shared directory. virtiofsd loses the following: 1. Mount namespace. The process chroots to the shared directory but leaves the mounts in place. Seccomp rejects mount(2)/umount(2) syscalls. 2. Pid namespace. This should be fine because virtiofsd is the only process running in the container. 3. Network namespace. This should be fine because seccomp already rejects the connect(2) syscall, but an additional layer of security is lost. Container runtime-specific network security policies can be used drop network traffic (except for the vhost-user UNIX domain socket). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201008085534.16070-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: passthrough_ll: set FUSE_LOG_INFO as default log_level	Misono Tomohiro	2020-10-26	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	Just noticed that although help message says default log level is INFO, it is actually 0 (EMRGE) and no mesage will be shown when error occurs. It's better to follow help message. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20201008110148.2757734-1-misono.tomohiro@jp.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: avoid /proc/self/fd tempdir	Stefan Hajnoczi	2020-10-12	1	-23/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to prevent /proc/self/fd escapes a temporary directory is created where /proc/self/fd is bind-mounted. This doesn't work on read-only file systems. Avoid the temporary directory by bind-mounting /proc/self/fd over /proc. This does not affect other processes since we remounted / with MS_REC \| MS_SLAVE. /proc must exist and virtiofsd does not use it so it's safe to do this. Path traversal can be tested with the following function: static void test_proc_fd_escape(struct lo_data *lo) { int fd; int level = 0; ino_t last_ino = 0; fd = lo->proc_self_fd; for (;;) { struct stat st; if (fstat(fd, &st) != 0) { perror("fstat"); return; } if (last_ino && st.st_ino == last_ino) { fprintf(stderr, "inode number unchanged, stopping\n"); return; } last_ino = st.st_ino; fprintf(stderr, "Level %d dev %lu ino %lu\n", level, (unsigned long)st.st_dev, (unsigned long)last_ino); fd = openat(fd, "..", O_PATH \| O_DIRECTORY \| O_NOFOLLOW); level++; } } Before and after this patch only Level 0 is displayed. Without /proc/self/fd bind-mount protection it is possible to traverse parent directories. Fixes: 397ae982f4df4 ("virtiofsd: jail lo->proc_self_fd") Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201006095826.59813-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Call qemu_init_exec_dir	Dr. David Alan Gilbert	2020-10-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since fcb4f59c879 qemu_get_local_state_pathname relies on the init_exec_dir, and virtiofsd asserts because we never set it. Set it. Reported-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201002124015.44820-1-dgilbert@redhat.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Silence gcc warning	Dr. David Alan Gilbert	2020-10-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Gcc worries fd might be used unset, in reality it's always set if fi is set, and only used if fi is set so it's safe. Initialise it to -1 just to keep gcc happy for now. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200827153657.111098-2-dgilbert@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Add -o allow_direct_io\|no_allow_direct_io options	Jiachen Zhang	2020-09-25	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the commit 65da4539803373ec4eec97ffc49ee90083e56efd, the O_DIRECT open flag of guest applications will be discarded by virtiofsd. While this behavior makes it consistent with the virtio-9p scheme when guest applications use direct I/O, we no longer have any chance to bypass the host page cache. Therefore, we add a flag 'allow_direct_io' to lo_data. If '-o no_allow_direct_io' option is added, or none of '-o allow_direct_io' or '-o no_allow_direct_io' is added, the 'allow_direct_io' will be set to 0, and virtiofsd discards O_DIRECT as before. If '-o allow_direct_io' is added to the starting command-line, 'allow_direct_io' will be set to 1, so that the O_DIRECT flags will be retained and host page cache can be bypassed. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: drop CAP_DAC_READ_SEARCH	Stefan Hajnoczi	2020-08-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd does not need CAP_DAC_READ_SEARCH because it already has the more powerful CAP_DAC_OVERRIDE. Drop it from the list of capabilities. This is important because container runtimes may not include CAP_DAC_READ_SEARCH by default. This patch allows virtiofsd to reduce its capabilities when running inside a Docker container. Note that CAP_DAC_READ_SEARCH may be necessary again in the future if virtiofsd starts using open_by_handle_at(2). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200727190223.422280-2-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Disable remote posix locks by default	Vivek Goyal	2020-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now we enable remote posix locks by default. That means when guest does a posix lock it sends request to server (virtiofsd). But currently we only support non-blocking posix lock and return -EOPNOTSUPP for blocking version. This means that existing applications which are doing blocking posix locks get -EOPNOTSUPP and fail. To avoid this, people have been running virtiosd with option "-o no_posix_lock". For new users it is still a surprise and trial and error takes them to this option. Given posix lock implementation is not complete in virtiofsd, disable it by default. This means that posix locks will work with-in applications in a guest but not across guests. Anyway we don't support sharing filesystem among different guests yet in virtiofs so this should not lead to any kind of surprise or regression and will make life little easier for virtiofs users. Reported-by: Aa Aa <jimbothom@yandex.com> Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: convert to Meson	Paolo Bonzini	2020-08-21	1	-1/+1
\| \| \| \|	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	virtiofsd: Allow addition or removal of capabilities	Dr. David Alan Gilbert	2020-07-03	1	-2/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow capabilities to be added or removed from the allowed set for the daemon; e.g. default: CapPrm: 00000000880000df CapEff: 00000000880000df -o modcaps=+sys_admin CapPrm: 00000000882000df CapEff: 00000000882000df -o modcaps=+sys_admin:-chown CapPrm: 00000000882000de CapEff: 00000000882000de Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200629115420.98443-4-dgilbert@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Check capability calls	Dr. David Alan Gilbert	2020-07-03	1	-3/+13
\| \| \| \| \| \| \| \| \| \|	Check the capability calls worked. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Terminate capability list	Dr. David Alan Gilbert	2020-07-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	capng_updatev is a varargs function that needs a -1 to terminate it, but it was missing. In practice what seems to have been happening is that it's added the capabilities we asked for, then runs into junk on the stack, so if we're unlucky it might be adding some more, but in reality it's failing - but after adding the capabilities we asked for. Fixes: a59feb483b8 ("virtiofsd: only retain file system capabilities") Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: remove symlink fallbacks	Miklos Szeredi	2020-06-01	1	-169/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Path lookup in the kernel has special rules for looking up magic symlinks under /proc. If a filesystem operation is instructed to follow symlinks (e.g. via AT_SYMLINK_FOLLOW or lack of AT_SYMLINK_NOFOLLOW), and the final component is such a proc symlink, then the target of the magic symlink is used for the operation, even if the target itself is a symlink. I.e. path lookup is always terminated after following a final magic symlink. I was erronously assuming that in the above case the target symlink would also be followed, and so workarounds were added for a couple of operations to handle the symlink case. Since the symlink can be handled simply by following the proc symlink, these workardouds are not needed. Also remove the "norace" option, which disabled the workarounds. Commit bdfd66788349 ("virtiofsd: Fix xattr operations") already dealt with the same issue for xattr operations. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200514140736.20561-1-mszeredi@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: drop all capabilities in the wait parent process	Stefan Hajnoczi	2020-05-01	1	-0/+13
\| \| \| \| \| \| \| \| \|	All this process does is wait for its child. No capabilities are needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: only retain file system capabilities	Stefan Hajnoczi	2020-05-01	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd runs as root but only needs a subset of root's Linux capabilities(7). As a file server its purpose is to create and access files on behalf of a client. It needs to be able to access files with arbitrary uid/gid owners. It also needs to be create device nodes. Introduce a Linux capabilities(7) whitelist and drop all capabilities that we don't need, making the virtiofsd process less powerful than a regular uid root process. # cat /proc/PID/status ... Before After CapInh: 0000000000000000 0000000000000000 CapPrm: 0000003fffffffff 00000000880000df CapEff: 0000003fffffffff 00000000880000df CapBnd: 0000003fffffffff 0000000000000000 CapAmb: 0000000000000000 0000000000000000 Note that file capabilities cannot be used to achieve the same effect on the virtiofsd executable because mount is used during sandbox setup. Therefore we drop capabilities programmatically at the right point during startup. This patch only affects the sandboxed child process. The parent process that sits in waitpid(2) still has full root capabilities and will be addressed in the next patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200416164907.244868-2-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Show submounts	Max Reitz	2020-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, setup_mounts() bind-mounts the shared directory without MS_REC. This makes all submounts disappear. Pass MS_REC so that the guest can see submounts again. Fixes: 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424133516.73077-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Changed Fixes to point to the commit with the problem rather than the commit that turned it on
*	virtiofsd: jail lo->proc_self_fd	Miklos Szeredi	2020-05-01	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While it's not possible to escape the proc filesystem through lo->proc_self_fd, it is possible to escape to the root of the proc filesystem itself through "../..". Use a temporary mount for opening lo->proc_self_fd, that has it's root at /proc/self/fd/, preventing access to the ancestor directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200429124733.22488-1-mszeredi@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: add --rlimit-nofile=NUM option	Stefan Hajnoczi	2020-05-01	1	-14/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make it possible to specify the RLIMIT_NOFILE on the command-line. Users running multiple virtiofsd processes should allocate a certain number to each process so that the system-wide limit can never be exhausted. When this option is set to 0 the rlimit is left at its current value. This is useful when a management tool wants to configure the rlimit itself. The default behavior remains unchanged: try to set the limit to 1,000,000 file descriptors if the current rlimit is lower. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200501140644.220940-2-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd/passthrough_ll: Fix double close()	Philippe Mathieu-Daudé	2020-03-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On success, the fdopendir() call closes fd. Later on the error path we try to close an already-closed fd. This can lead to use-after-free. Fix by only closing the fd if the fdopendir() call failed. Cc: qemu-stable@nongnu.org Fixes: b39bce121b (add dirp_map to hide lo_dirp pointers) Reported-by: Coverity (CID 1421933 USE_AFTER_FREE) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200321120654.7985-1-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Fix xattr operations	Misono Tomohiro	2020-03-03	1	-47/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current virtiofsd has problems about xattr operations and they does not work properly for directory/symlink/special file. The fundamental cause is that virtiofsd uses openat() + f...xattr() systemcalls for xattr operation but we should not open symlink/special file in the daemon. Therefore the function is restricted. Fix this problem by: 1. during setup of each thread, call unshare(CLONE_FS) 2. in xattr operations (i.e. lo_getxattr), if inode is not a regular file or directory, use fchdir(proc_loot_fd) + ...xattr() + fchdir(root.fd) instead of openat() + f...xattr() (Note: for a regular file/directory openat() + f...xattr() is still used for performance reason) With this patch, xfstests generic/062 passes on virtiofs. This fix is suggested by Miklos Szeredi and Stefan Hajnoczi. The original discussion can be found here: https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: passthrough_ll: cleanup getxattr/listxattr	Misono Tomohiro	2020-03-03	1	-32/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a cleanup patch to simplify the following xattr fix and there is no functional changes. - Move memory allocation to head of the function - Unify fgetxattr/flistxattr call for both size == 0 and size != 0 case - Remove redundant lo_inode_put call in error path (Note: second call is ignored now since @inode is already NULL) Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-2-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>