bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	tools/virtiofsd: xattr name mappings: Add option	Dr. David Alan Gilbert	2020-10-26	1	-0/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an option to define mappings of xattr names so that the client and server filesystems see different views. This can be used to have different SELinux mappings as seen by the guest, to run the virtiofsd with less privileges (e.g. in a case where it can't set trusted/system/security xattrs but you want the guest to be able to), or to isolate multiple users of the same name; e.g. trusted attributes used by stacking overlayfs. A mapping engine is used with 3 simple rules; the rules can be combined to allow most useful mapping scenarios. The ruleset is defined by -o xattrmap='rules...'. This patch doesn't use the rule maps yet. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201023165812.36028-2-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: add container-friendly -o sandbox=chroot option	Stefan Hajnoczi	2020-10-26	2	-2/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd cannot run in a container because CAP_SYS_ADMIN is required to create namespaces. Introduce a weaker sandbox mode that is sufficient in container environments because the container runtime already sets up namespaces. Use chroot to restrict path traversal to the shared directory. virtiofsd loses the following: 1. Mount namespace. The process chroots to the shared directory but leaves the mounts in place. Seccomp rejects mount(2)/umount(2) syscalls. 2. Pid namespace. This should be fine because virtiofsd is the only process running in the container. 3. Network namespace. This should be fine because seccomp already rejects the connect(2) syscall, but an additional layer of security is lost. Container runtime-specific network security policies can be used drop network traffic (except for the vhost-user UNIX domain socket). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201008085534.16070-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: passthrough_ll: set FUSE_LOG_INFO as default log_level	Misono Tomohiro	2020-10-26	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	Just noticed that although help message says default log level is INFO, it is actually 0 (EMRGE) and no mesage will be shown when error occurs. It's better to follow help message. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20201008110148.2757734-1-misono.tomohiro@jp.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	configure: add option for virtiofsd	Misono Tomohiro	2020-10-26	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently it is unknown whether virtiofsd will be built at configuration time. It will be automatically built when dependency is met. Also, required libraries are not clear. To make this clear, add configure option --{enable,disable}-virtiofsd. The default is the same as current (enabled if available) like many other options. When --enable-virtiofsd is given and dependency is not met, we get: ERROR: Problem encountered: virtiofsd requires libcap-ng-devel and seccomp-devel or ERROR: Problem encountered: virtiofsd needs tools and vhost-user support In addition, configuration summary now includes virtiofsd entry: build virtiofs daemon: YES/NO Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20201008103133.2722903-1-misono.tomohiro@jp.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge
*	Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵	Peter Maydell	2020-10-26	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	staging * fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) # gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: machine: move SMP initialization from vl.c machine: move UP defaults to class_base_init machine: remove deprecated -machine enforce-config-section option win32: boot broken when bind & data dir are the same WHPX: Fix WHPX build break configure: move install_blobs from configure to meson configure: remove unused variable from config-host.mak configure: move directory options from config-host.mak to meson configure: allow configuring localedir Makefile: separate meson rerun from the rest of the ninja invocation Remove deprecated -no-kvm option replay: do not build if TCG is not available qtest: unbreak non-TCG builds in bios-tables-test hw/core/qdev-clock: add a reference on aliased clocks do not use colons in test names meson: rewrite curses/iconv test build: fix macOS --enable-modules build Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
\| *	configure: move directory options from config-host.mak to meson	Paolo Bonzini	2020-10-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since installation is not part of Makefiles anymore, Make need not know the directories anymore. Meson already knows them through built-in options, do everything using them instead of the config_host dictionary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* \|	libvhost-user: Allow vu_message_read to be replaced	Coiby Xu	2020-10-23	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \|	Allow vu_message_read to be replaced by one which will make use of the QIOChannel functions. Thus reading vhost-user message won't stall the guest. For slave channel, we still use the default vu_message_read. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Coiby Xu <coiby.xu@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200918080912.321299-2-coiby.xu@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
*	virtiofsd: avoid /proc/self/fd tempdir	Stefan Hajnoczi	2020-10-12	1	-23/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to prevent /proc/self/fd escapes a temporary directory is created where /proc/self/fd is bind-mounted. This doesn't work on read-only file systems. Avoid the temporary directory by bind-mounting /proc/self/fd over /proc. This does not affect other processes since we remounted / with MS_REC \| MS_SLAVE. /proc must exist and virtiofsd does not use it so it's safe to do this. Path traversal can be tested with the following function: static void test_proc_fd_escape(struct lo_data *lo) { int fd; int level = 0; ino_t last_ino = 0; fd = lo->proc_self_fd; for (;;) { struct stat st; if (fstat(fd, &st) != 0) { perror("fstat"); return; } if (last_ino && st.st_ino == last_ino) { fprintf(stderr, "inode number unchanged, stopping\n"); return; } last_ino = st.st_ino; fprintf(stderr, "Level %d dev %lu ino %lu\n", level, (unsigned long)st.st_dev, (unsigned long)last_ino); fd = openat(fd, "..", O_PATH \| O_DIRECTORY \| O_NOFOLLOW); level++; } } Before and after this patch only Level 0 is displayed. Without /proc/self/fd bind-mount protection it is possible to traverse parent directories. Fixes: 397ae982f4df4 ("virtiofsd: jail lo->proc_self_fd") Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201006095826.59813-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Call qemu_init_exec_dir	Dr. David Alan Gilbert	2020-10-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since fcb4f59c879 qemu_get_local_state_pathname relies on the init_exec_dir, and virtiofsd asserts because we never set it. Set it. Reported-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201002124015.44820-1-dgilbert@redhat.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: add support for --socket-group	Alex Bennée	2020-10-12	3	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you like running QEMU as a normal user (very common for TCG runs) but you have to run virtiofsd as a root user you run into connection problems. Adding support for an optional --socket-group allows the users to keep using the command line. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200925125147.26943-2-alex.bennee@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Split long line
*	virtiofsd: Silence gcc warning	Dr. David Alan Gilbert	2020-10-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Gcc worries fd might be used unset, in reality it's always set if fi is set, and only used if fi is set so it's safe. Initialise it to -1 just to keep gcc happy for now. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200827153657.111098-2-dgilbert@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Add -o allow_direct_io\|no_allow_direct_io options	Jiachen Zhang	2020-09-25	2	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the commit 65da4539803373ec4eec97ffc49ee90083e56efd, the O_DIRECT open flag of guest applications will be discarded by virtiofsd. While this behavior makes it consistent with the virtio-9p scheme when guest applications use direct I/O, we no longer have any chance to bypass the host page cache. Therefore, we add a flag 'allow_direct_io' to lo_data. If '-o no_allow_direct_io' option is added, or none of '-o allow_direct_io' or '-o no_allow_direct_io' is added, the 'allow_direct_io' will be set to 0, and virtiofsd discards O_DIRECT as before. If '-o allow_direct_io' is added to the starting command-line, 'allow_direct_io' will be set to 1, so that the O_DIRECT flags will be retained and host page cache can be bypassed. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Used glib "shared" thread pool	Vivek Goyal	2020-09-25	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glib offers thread pools and it seems to support "exclusive" and "shared" thread pools. https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new Currently we use "exlusive" thread pools but its performance seems to be poor. I tried using "shared" thread pools and performance seems much better. I posted performance results here. https://www.redhat.com/archives/virtio-fs/2020-September/msg00080.html So lets switch to shared thread pools. We can think of making it optional once somebody can show in what cases exclusive thread pools offer better results. For now, my simple performance tests across the board see better results with shared thread pools. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200921213216.GE13362@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> With seccomp fix from Miklos
*	meson: use meson datadir instead of qemu_datadir	Marc-André Lureau	2020-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	When cross-compiling, by default qemu_datadir is 'c:\Program Files\QEMU', which is not recognized as being an absolute path, and meson will end up adding the prefix again. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200826110419.528931-6-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	virtiofsd: probe unshare(CLONE_FS) and print an error	Stefan Hajnoczi	2020-08-28	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An assertion failure is raised during request processing if unshare(CLONE_FS) fails. Implement a probe at startup so the problem can be detected right away. Unfortunately Docker/Moby does not include unshare in the seccomp.json list unless CAP_SYS_ADMIN is given. Other seccomp.json lists always include unshare (e.g. podman is unaffected): https://raw.githubusercontent.com/seccomp/containers-golang/master/seccomp.json Use "docker run --security-opt seccomp=path/to/seccomp.json ..." if the default seccomp.json is missing unshare. Cc: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200727190223.422280-4-stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: drop CAP_DAC_READ_SEARCH	Stefan Hajnoczi	2020-08-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd does not need CAP_DAC_READ_SEARCH because it already has the more powerful CAP_DAC_OVERRIDE. Drop it from the list of capabilities. This is important because container runtimes may not include CAP_DAC_READ_SEARCH by default. This patch allows virtiofsd to reduce its capabilities when running inside a Docker container. Note that CAP_DAC_READ_SEARCH may be necessary again in the future if virtiofsd starts using open_by_handle_at(2). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200727190223.422280-2-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Remove "norace" from cmdline help and docs	Sergio Lopez	2020-08-28	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Commit 93bb3d8d4cda ("virtiofsd: remove symlink fallbacks") removed the implementation of the "norace" option, so remove it from the cmdline help and the documentation too. Signed-off-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200717121110.50580-1-slp@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Disable remote posix locks by default	Vivek Goyal	2020-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now we enable remote posix locks by default. That means when guest does a posix lock it sends request to server (virtiofsd). But currently we only support non-blocking posix lock and return -EOPNOTSUPP for blocking version. This means that existing applications which are doing blocking posix locks get -EOPNOTSUPP and fail. To avoid this, people have been running virtiosd with option "-o no_posix_lock". For new users it is still a surprise and trial and error takes them to this option. Given posix lock implementation is not complete in virtiofsd, disable it by default. This means that posix locks will work with-in applications in a guest but not across guests. Anyway we don't support sharing filesystem among different guests yet in virtiofs so this should not lead to any kind of surprise or regression and will make life little easier for virtiofs users. Reported-by: Aa Aa <jimbothom@yandex.com> Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd: convert to Meson	Paolo Bonzini	2020-08-21	6	-14/+31
\| \| \| \|	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	virtiofsd: Allow addition or removal of capabilities	Dr. David Alan Gilbert	2020-07-03	2	-2/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow capabilities to be added or removed from the allowed set for the daemon; e.g. default: CapPrm: 00000000880000df CapEff: 00000000880000df -o modcaps=+sys_admin CapPrm: 00000000882000df CapEff: 00000000882000df -o modcaps=+sys_admin:-chown CapPrm: 00000000882000de CapEff: 00000000882000de Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200629115420.98443-4-dgilbert@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Check capability calls	Dr. David Alan Gilbert	2020-07-03	1	-3/+13
\| \| \| \| \| \| \| \| \| \|	Check the capability calls worked. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Terminate capability list	Dr. David Alan Gilbert	2020-07-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	capng_updatev is a varargs function that needs a -1 to terminate it, but it was missing. In practice what seems to have been happening is that it's added the capabilities we asked for, then runs into junk on the stack, so if we're unlucky it might be adding some more, but in reality it's failing - but after adding the capabilities we asked for. Fixes: a59feb483b8 ("virtiofsd: only retain file system capabilities") Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20200629115420.98443-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Whitelist fchmod	Max Reitz	2020-06-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lo_setattr() invokes fchmod() in a rarely used code path, so it should be whitelisted or virtiofsd will crash with EBADSYS. Said code path can be triggered for example as follows: On the host, in the shared directory, create a file with the sticky bit set and a security.capability xattr: (1) # touch foo (2) # chmod u+s foo (3) # setcap '' foo Then in the guest let some process truncate that file after it has dropped all of its capabilities (at least CAP_FSETID): int main(int argc, char *argv[]) { capng_setpid(getpid()); capng_clear(CAPNG_SELECT_BOTH); capng_updatev(CAPNG_ADD, CAPNG_PERMITTED \| CAPNG_EFFECTIVE, 0); capng_apply(CAPNG_SELECT_BOTH); ftruncate(open(argv[1], O_RDWR), 0); } This will cause the guest kernel to drop the sticky bit (i.e. perform a mode change) as part of the truncate (where FATTR_FH is set), and that will cause virtiofsd to invoke fchmod() instead of fchmodat(). (A similar configuration exists further below with futimens() vs. utimensat(), but the former is not a syscall but just a wrapper for the latter, so no further whitelisting is required.) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1842667 Reported-by: Qian Cai <caiqian@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608093111.14942-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: remove symlink fallbacks	Miklos Szeredi	2020-06-01	1	-169/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Path lookup in the kernel has special rules for looking up magic symlinks under /proc. If a filesystem operation is instructed to follow symlinks (e.g. via AT_SYMLINK_FOLLOW or lack of AT_SYMLINK_NOFOLLOW), and the final component is such a proc symlink, then the target of the magic symlink is used for the operation, even if the target itself is a symlink. I.e. path lookup is always terminated after following a final magic symlink. I was erronously assuming that in the above case the target symlink would also be followed, and so workarounds were added for a couple of operations to handle the symlink case. Since the symlink can be handled simply by following the proc symlink, these workardouds are not needed. Also remove the "norace" option, which disabled the workarounds. Commit bdfd66788349 ("virtiofsd: Fix xattr operations") already dealt with the same issue for xattr operations. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200514140736.20561-1-mszeredi@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: drop all capabilities in the wait parent process	Stefan Hajnoczi	2020-05-01	1	-0/+13
\| \| \| \| \| \| \| \| \|	All this process does is wait for its child. No capabilities are needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: only retain file system capabilities	Stefan Hajnoczi	2020-05-01	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	virtiofsd runs as root but only needs a subset of root's Linux capabilities(7). As a file server its purpose is to create and access files on behalf of a client. It needs to be able to access files with arbitrary uid/gid owners. It also needs to be create device nodes. Introduce a Linux capabilities(7) whitelist and drop all capabilities that we don't need, making the virtiofsd process less powerful than a regular uid root process. # cat /proc/PID/status ... Before After CapInh: 0000000000000000 0000000000000000 CapPrm: 0000003fffffffff 00000000880000df CapEff: 0000003fffffffff 00000000880000df CapBnd: 0000003fffffffff 0000000000000000 CapAmb: 0000000000000000 0000000000000000 Note that file capabilities cannot be used to achieve the same effect on the virtiofsd executable because mount is used during sandbox setup. Therefore we drop capabilities programmatically at the right point during startup. This patch only affects the sandboxed child process. The parent process that sits in waitpid(2) still has full root capabilities and will be addressed in the next patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200416164907.244868-2-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Show submounts	Max Reitz	2020-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, setup_mounts() bind-mounts the shared directory without MS_REC. This makes all submounts disappear. Pass MS_REC so that the guest can see submounts again. Fixes: 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424133516.73077-1-mreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Changed Fixes to point to the commit with the problem rather than the commit that turned it on
*	virtiofsd: jail lo->proc_self_fd	Miklos Szeredi	2020-05-01	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While it's not possible to escape the proc filesystem through lo->proc_self_fd, it is possible to escape to the root of the proc filesystem itself through "../..". Use a temporary mount for opening lo->proc_self_fd, that has it's root at /proc/self/fd/, preventing access to the ancestor directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Message-Id: <20200429124733.22488-1-mszeredi@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: stay below fs.file-max sysctl value (CVE-2020-10717)	Stefan Hajnoczi	2020-05-01	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The system-wide fs.file-max sysctl value determines how many files can be open. It defaults to a value calculated based on the machine's RAM size. Previously virtiofsd would try to set RLIMIT_NOFILE to 1,000,000 and this allowed the FUSE client to exhaust the number of open files system-wide on Linux hosts with less than 10 GB of RAM! Take fs.file-max into account when choosing the default RLIMIT_NOFILE value. Fixes: CVE-2020-10717 Reported-by: Yuval Avrahami <yavrahami@paloaltonetworks.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200501140644.220940-3-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: add --rlimit-nofile=NUM option	Stefan Hajnoczi	2020-05-01	3	-14/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make it possible to specify the RLIMIT_NOFILE on the command-line. Users running multiple virtiofsd processes should allocate a certain number to each process so that the system-wide limit can never be exhausted. When this option is set to 0 the rlimit is left at its current value. This is useful when a management tool wants to configure the rlimit itself. The default behavior remains unchanged: try to set the limit to 1,000,000 file descriptors if the current rlimit is lower. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200501140644.220940-2-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd/passthrough_ll: Fix double close()	Philippe Mathieu-Daudé	2020-03-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On success, the fdopendir() call closes fd. Later on the error path we try to close an already-closed fd. This can lead to use-after-free. Fix by only closing the fd if the fdopendir() call failed. Cc: qemu-stable@nongnu.org Fixes: b39bce121b (add dirp_map to hide lo_dirp pointers) Reported-by: Coverity (CID 1421933 USE_AFTER_FREE) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200321120654.7985-1-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Fix xattr operations	Misono Tomohiro	2020-03-03	3	-47/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current virtiofsd has problems about xattr operations and they does not work properly for directory/symlink/special file. The fundamental cause is that virtiofsd uses openat() + f...xattr() systemcalls for xattr operation but we should not open symlink/special file in the daemon. Therefore the function is restricted. Fix this problem by: 1. during setup of each thread, call unshare(CLONE_FS) 2. in xattr operations (i.e. lo_getxattr), if inode is not a regular file or directory, use fchdir(proc_loot_fd) + ...xattr() + fchdir(root.fd) instead of openat() + f...xattr() (Note: for a regular file/directory openat() + f...xattr() is still used for performance reason) With this patch, xfstests generic/062 passes on virtiofs. This fix is suggested by Miklos Szeredi and Stefan Hajnoczi. The original discussion can be found here: https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: passthrough_ll: cleanup getxattr/listxattr	Misono Tomohiro	2020-03-03	1	-32/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a cleanup patch to simplify the following xattr fix and there is no functional changes. - Move memory allocation to head of the function - Unify fgetxattr/flistxattr call for both size == 0 and size != 0 case - Remove redundant lo_inode_put call in error path (Note: second call is ignored now since @inode is already NULL) Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Message-Id: <20200227055927.24566-2-misono.tomohiro@jp.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Remove fuse.h and struct fuse_module	Xiao Yang	2020-02-21	2	-1245/+0
\| \| \| \| \| \| \| \| \| \|	All code in fuse.h and struct fuse_module are not used by virtiofsd so removing them is safe. Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd/fuse_lowlevel: Fix fuse_out_header::error value	Philippe Mathieu-Daudé	2020-02-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/fuse_lowlevel.o tools/virtiofsd/fuse_lowlevel.c:195:9: warning: Value stored to 'error' is never read error = -ERANGE; ^ ~~~~~~~ Fixes: 3db2876 Reported-by: Clang Static Analyzer Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment	Philippe Mathieu-Daudé	2020-02-21	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/passthrough_ll.o tools/virtiofsd/passthrough_ll.c:925:9: warning: Value stored to 'newfd' is never read newfd = -1; ^ ~~ tools/virtiofsd/passthrough_ll.c:942:9: warning: Value stored to 'newfd' is never read newfd = -1; ^ ~~ Fixes: 7c6b66027 Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment	Philippe Mathieu-Daudé	2020-02-21	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix warning reported by Clang static code analyzer: CC tools/virtiofsd/passthrough_ll.o tools/virtiofsd/passthrough_ll.c:1083:5: warning: Value stored to 'saverr' is never read saverr = ENOMEM; ^ ~~~~~~ Fixes: 7c6b66027 Reported-by: Clang Static Analyzer Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Help message fix for 'seconds'	Dr. David Alan Gilbert	2020-02-21	1	-1/+1
\| \| \| \| \| \| \|	second should be seconds. Reported-by: Christophe de Dinechin <dinechin@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: do_read missing NULL check	Dr. David Alan Gilbert	2020-02-10	1	-0/+4
\| \| \| \| \| \| \| \| \|	Missing a NULL check if the argument fetch fails. Fixes: Coverity CID 1413119 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
*	virtiofsd: load_capng missing unlock	Dr. David Alan Gilbert	2020-02-10	1	-0/+1
\| \| \| \| \| \| \| \| \|	Missing unlock in error path. Fixes: Covertiy CID 1413123 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
*	virtiofsd: fv_create_listen_socket error path socket leak	Dr. David Alan Gilbert	2020-02-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	If we fail when bringing up the socket we can leak the listen_fd; in practice the daemon will exit so it's not really a problem. Fixes: Coverity CID 1413121 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
*	virtiofsd: Remove fuse_req_getgroups	Dr. David Alan Gilbert	2020-02-10	3	-118/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove fuse_req_getgroups that's unused in virtiofsd; it came in from libfuse but we don't actually use it. It was called from fuse_getgroups which we previously removed (but had left it's header in). Coverity had complained about null termination in it, but removing it is the easiest answer. Fixes: Coverity CID: 1413117 (String not null terminated) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
*	virtiofsd: add some options to the help message	Masayoshi Mizuma	2020-01-23	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add following options to the help message: - cache - flock\|no_flock - norace - posix_lock\|no_posix_lock - readdirplus\|no_readdirplus - timeout - writeback\|no_writeback - xattr\|no_xattr Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> dgilbert: Split cache, norace, posix_lock, readdirplus off into our own earlier patches that added the options Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: stop all queue threads on exit in virtio_loop()	Eryu Guan	2020-01-23	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On guest graceful shutdown, virtiofsd receives VHOST_USER_GET_VRING_BASE request from VMM and shuts down virtqueues by calling fv_set_started(), which joins fv_queue_thread() threads. So when virtio_loop() returns, there should be no thread is still accessing data in fuse session and/or virtio dev. But on abnormal exit, e.g. guest got killed for whatever reason, vhost-user socket is closed and virtio_loop() breaks out the main loop and returns to main(). But it's possible fv_queue_worker()s are still working and accessing fuse session and virtio dev, which results in crash or use-after-free. Fix it by stopping fv_queue_thread()s before virtio_loop() returns, to make sure there's no-one could access fuse session and virtio dev. Reported-by: Qingming Su <qingming.su@linux.alibaba.com> Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()	Xiao Yang	2020-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lo_copy_file_range() passes -errno to fuse_reply_err() and then fuse_reply_err() changes it to errno again, so that subsequent fuse_send_reply_iov_nofree() catches the wrong errno.(i.e. reports "fuse: bad error value: ..."). Make fuse_send_reply_iov_nofree() accept the correct -errno by passing errno directly in lo_copy_file_range(). Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Eryu Guan <eguan@linux.alibaba.com> dgilbert: Sent upstream and now Merged as aa1185e153f774f1df65 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: Convert lo_destroy to take the lo->mutex lock itself	Dr. David Alan Gilbert	2020-01-23	1	-14/+17
\| \| \| \| \| \| \| \| \| \| \|	lo_destroy was relying on some implicit knowledge of the locking; we can avoid this if we create an unref_inode that doesn't take the lock and then grab it for the whole of the lo_destroy. Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: add --thread-pool-size=NUM option	Stefan Hajnoczi	2020-01-23	3	-3/+10
\| \| \| \| \| \| \| \| \| \|	Add an option to control the size of the thread pool. Requests are now processed in parallel by default. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: fix lo_destroy() resource leaks	Stefan Hajnoczi	2020-01-23	1	-21/+20
\| \| \| \| \| \| \| \| \| \|	Now that lo_destroy() is serialized we can call unref_inode() so that all inode resources are freed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races	Stefan Hajnoczi	2020-01-23	2	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	When running with multiple threads it can be tricky to handle FUSE_INIT/FUSE_DESTROY in parallel with other request types or in parallel with themselves. Serialize FUSE_INIT and FUSE_DESTROY so that malicious clients cannot trigger race conditions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
*	virtiofsd: process requests in a thread pool	Stefan Hajnoczi	2020-01-23	1	-158/+201
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a thread pool so that fv_queue_thread() just pops VuVirtqElements and hands them to the thread pool. For the time being only one worker thread is allowed since passthrough_ll.c is not thread-safe yet. Future patches will lift this restriction so that multiple FUSE requests can be processed in parallel. The main new concept is struct FVRequest, which contains both VuVirtqElement and struct fuse_chan. We now have fv_VuDev for a device, fv_QueueInfo for a virtqueue, and FVRequest for a request. Some of fv_QueueInfo's fields are moved into FVRequest because they are per-request. The name FVRequest conforms to QEMU coding style and I expect the struct fv_* types will be renamed in a future refactoring. This patch series is not optimal. fbuf reuse is dropped so each request does malloc(se->bufsize), but there is no clean and cheap way to keep this with a thread pool. The vq_lock mutex is held for longer than necessary, especially during the eventfd_write() syscall. Performance can be improved in the future. prctl(2) had to be added to the seccomp whitelist because glib invokes it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>