summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* f2fs: remove unnecessary goto statementTiezhu Yang2016-07-081-2/+2
| | | | | | | | | When base_addr is NULL, there is no need to call kzfree, it should return -ENOMEM directly. Additionally, it is better to initialize variable 'error' with 0. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add nodiscard mount optionChao Yu2016-07-082-1/+7
| | | | | | | This patch adds 'nodiscard' mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to redirty page if fail to gc data pageChao Yu2016-07-081-1/+12
| | | | | | | | If we fail to move data page during foreground GC, we should give another chance to writeback that page which was set dirty previously by writer. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to detect truncation prior rather than EIO during readChao Yu2016-07-083-13/+13
| | | | | | | | | | | | | In procedure of synchonized read, after sending out the read request, reader will try to lock the page for waiting device to finish the read jobs and unlock the page, but meanwhile, truncater will race with reader, so after reader get lock of the page, it should check page's mapping to detect whether someone has truncated the page in advance, then reader has the chance to do the retry if truncation was done, otherwise read can be failed due to previous condition check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to avoid reading out encrypted data in page cacheChao Yu2016-07-081-43/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | For encrypted inode, if user overwrites data of the inode, f2fs will read encrypted data into page cache, and then do the decryption. However reader can race with overwriter, and it will see encrypted data which has not been decrypted by overwriter yet. Fix it by moving decrypting work to background and keep page non-uptodated until data is decrypted. Thread A Thread B - f2fs_file_write_iter - __generic_file_write_iter - generic_perform_write - f2fs_write_begin - f2fs_submit_page_bio - generic_file_read_iter - do_generic_file_read - lock_page_killable - unlock_page - copy_page_to_iter hit the encrypted data in updated page - lock_page - fscrypt_decrypt_page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid latency-critical readahead of node pagesJaegeuk Kim2016-07-062-2/+2
| | | | | | | The f2fs_map_blocks is very related to the performance, so let's avoid any latency to read ahead node pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid writing node/metapages during writesJaegeuk Kim2016-07-061-3/+3
| | | | | | Let's keep more node/meta pages in run time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: produce more nids and reduce readahead natsJaegeuk Kim2016-07-066-8/+18
| | | | | | | | | The readahead nat pages are more likely to be reclaimed quickly, so it'd better to gather more free nids in advance. And, let's keep some free nids as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: detect host-managed SMR by feature flagJaegeuk Kim2016-07-064-8/+35
| | | | | | | If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: call update_inode_page for orphan inodesJaegeuk Kim2016-07-066-25/+9Star
| | | | | | Let's store orphan inode pages right away. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: report error for f2fs_parent_dirJaegeuk Kim2016-07-061-6/+9
| | | | | | If there is no dentry, we can report its error correctly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: find parent dentry correctlySheng Yong2016-06-163-35/+2Star
| | | | | | | | | | | If dotdot directory is corrupted, its slot may be ocupied by another file. In this case, dentry[1] is not the parent directory. Rename and cross-rename will update the inode in dentry[1] incorrectly. This patch finds dotdot dentry by name. Signed-off-by: Sheng Yong <shengyong1@huawei.com> [Jaegeuk Kim: remove wron bug_on] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix deadlock in add_link failureJaegeuk Kim2016-06-131-3/+0Star
| | | | | | | | | | | | | | | | | | | | | mkdir sync_dirty_inode - init_inode_metadata - lock_page(node) - make_empty_dir - filemap_fdatawrite() - do_writepages - lock_page(data) - write_page(data) - lock_page(node) - f2fs_init_acl - error - truncate_inode_pages - lock_page(data) So, we don't need to truncate data pages in this error case, which will be done by f2fs_evict_inode. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce mode=lfs mount optionJaegeuk Kim2016-06-139-4/+74
| | | | | | | | | This mount option is to enable original log-structured filesystem forcefully. So, there should be no random writes for main area. Especially, this supports host-managed SMR device. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: skip clean segment for gcJaegeuk Kim2016-06-081-0/+4
| | | | | | | If a segment in a section is clean or prefreed, we don't need to get its summary and do gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: drop any block pluggingJaegeuk Kim2016-06-084-22/+11Star
| | | | | | | In f2fs, we don't need to keep block plugging for NODE and DATA writes, since we already merged bios as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid reverse IO order for NODE and DATAJaegeuk Kim2016-06-083-0/+9
| | | | | | | There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(), which incur unnecessary reversed bio submission. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: set mapping error for EIOJaegeuk Kim2016-06-081-1/+1
| | | | | | If EIO occurred, we need to set all the mapping to avoid any further IOs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: control not to exceed # of cached nat entriesJaegeuk Kim2016-06-073-0/+16
| | | | | | This is to avoid cache entry management overhead including radix tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong percentageJaegeuk Kim2016-06-071-1/+1
| | | | | | This should be 1%, 10MB / 1GB. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid data race between FI_DIRTY_INODE flag and update_inodeJaegeuk Kim2016-06-071-1/+2
| | | | | | | | | | | | | | | | | FI_DIRTY_INODE flag is not covered by inode page lock, so it can be unset at any time like below. Thread #1 Thread #2 - lock_page(ipage) - update i_fields - update i_size/i_blocks/and so on - set FI_DIRTY_INODE - reset FI_DIRTY_INODE - set_page_dirty(ipage) In this case, we can lose the latest i_field information. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove obsolete parameter in f2fs_truncateJaegeuk Kim2016-06-074-6/+6
| | | | | | We don't need lock parameter, which is always true. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid wrong count on dirty inodesJaegeuk Kim2016-06-071-2/+2
| | | | | | | The number should be covered by spin_lock. Otherwise we can see wrong count in f2fs_stat. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove deprecated parameterJaegeuk Kim2016-06-073-6/+5Star
| | | | | | Remove deprecated paramter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: handle writepage correctlyJaegeuk Kim2016-06-031-30/+14Star
| | | | | | | | | | | | | Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls f2fs_write_data_page(). If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage() calls mapping_set_error(). But, this should not happen at every time, since sometimes f2fs_write_data_page() tries to skip writing pages without error. For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed out. Reported-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: return error of f2fs_lookupJaegeuk Kim2016-06-032-2/+5
| | | | | | | Now we can report an error to f2fs_lookup given by f2fs_find_entry. Suggested-by: He YunLei <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: return the errno to the caller to avoid using a wrong pageYunlong Song2016-06-031-5/+10
| | | | | | | | | | | | | | | | | | | Commit aaf9607516ed38825268515ef4d773289a44f429 ("f2fs: check node page contents all the time") pointed out that "sometimes it was reported that its contents was missing", so it checks the page's mapping and contents. When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the caller. However, commit e1c51b9f1df2f9efc2ec11488717e40cd12015f9 ("f2fs: clean up node page updating flow") moves "nid != nid_of_node(page)" test to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it was reported that its contents was missing" happens. This patch restores to check node page contents all the time, and returns the errno to make the caller known something is wrong and avoid to use the page. This patch also moves f2fs_bug_on to its proper location. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove two steps to flush dirty data pagesJaegeuk Kim2016-06-031-10/+1Star
| | | | | | | | | | | | | | | | | If there is no cold page, we don't need to do a loop to flush dirty data pages. On /dev/pmem0, 1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync Before : 1.1 GB/s After : 1.2 GB/s 2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 Before : 2.2 GB/s After : 2.3 GB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not skip writing data pagesJaegeuk Kim2016-06-032-9/+6Star
| | | | | | | | | | | | | | | | For data pages, let's try to flush as much as possible in background. On /dev/pmem0, 1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync Before : 800 MB/s After : 1.1 GB/s 2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 Before : 1.3 GB/s After : 2.2 GB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: inject to produce some orphan inodesJaegeuk Kim2016-06-033-0/+9
| | | | Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: propagate error given by f2fs_find_entryJaegeuk Kim2016-06-033-8/+24
| | | | | | | | If we get ENOMEM or EIO in f2fs_find_entry, we should stop right away. Otherwise, for example, we can get duplicate directory entry by ->chash and ->clevel. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove writepages lockJaegeuk Kim2016-06-033-9/+0Star
| | | | | | | | | | | This patch removes writepages lock. We can improve multi-threading performance. tiobench, 32 threads, 4KB write per fsync on SSD Before: 25.88 MB/s After: 28.03 MB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: set flush_merge by defaultJaegeuk Kim2016-06-031-0/+6
| | | | | | This patch sets flush_merge by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: detect congestion of flush command issuesJaegeuk Kim2016-06-032-1/+7
| | | | | | | If flush commands do not incur any congestion, we don't need to throw that to dispatching queue which causes unnecessary latency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add lazytime mount optionJaegeuk Kim2016-06-031-0/+14
| | | | | | This patch adds lazytime support. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid unnecessary updating inode during fsyncJaegeuk Kim2016-06-037-6/+43
| | | | | | | If roll-forward recovery can recover i_size, we don't need to update inode's metadata during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove syncing inode page in all the casesJaegeuk Kim2016-06-0310-121/+17Star
| | | | | | | | | | | | | | | This patch reduces to call them across the whole tree. - sync_inode_page() - update_inode_page() - update_inode() - f2fs_write_inode() Instead, checkpoint will flush all the dirty inode metadata before syncing node pages. Note that, this is doable, since we call mark_inode_dirty_sync() for all inode's field change which needs to update on-disk inode as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: flush inode metadata when checkpoint is doingJaegeuk Kim2016-06-037-7/+107
| | | | | | | This patch registers all the inodes which have dirty metadata to sync when checkpoint is doing. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: call mark_inode_dirty_sync for i_field changesJaegeuk Kim2016-06-0310-46/+92
| | | | | | | | | | | | | | | | This patch calls mark_inode_dirty_sync() for the following on-disk inode changes. -> largest -> ctime/mtime/atime -> i_current_depth -> i_xattr_nid -> i_pino -> i_advise -> i_flags -> i_mode Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_i_links_write with mark_inode_dirty_syncJaegeuk Kim2016-06-034-26/+24Star
| | | | | | | This patch introduces f2fs_i_links_write() to call mark_inode_dirty_sync() when changing inode->i_links. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_i_blocks_write with mark_inode_dirty_syncJaegeuk Kim2016-06-032-5/+13
| | | | | | | This patch introduces f2fs_i_blocks_write() to call mark_inode_dirty_sync() when changing inode->i_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_i_size_write with mark_inode_dirty_syncJaegeuk Kim2016-06-037-18/+23
| | | | | | | This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with i_size_write(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use inode pointer for {set, clear}_inode_flagJaegeuk Kim2016-06-0316-160/+145Star
| | | | | | | This patch refactors to use inode pointer for set_inode_flag and clear_inode_flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Revert "f2fs: no need inc dirty pages under inode lock"Jaegeuk Kim2016-06-031-5/+4Star
| | | | | | | This reverts commit b951a4ec165af4973b2bd9c80fb5845fbd840435. Conflicts: fs/f2fs/checkpoint.c
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-06-0311-59/+95
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Radim Krčmář: "ARM: - two fixes for 4.6 vgic [Christoffer] (cc stable) - six fixes for 4.7 vgic [Marc] x86: - six fixes from syzkaller reports [Paolo] (two of them cc stable) - allow OS X to boot [Dmitry] - don't trust compilers [Nadav]" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR KVM: Handle MSR_IA32_PERF_CTL KVM: x86: avoid write-tearing of TDP KVM: arm/arm64: vgic-new: Removel harmful BUG_ON arm64: KVM: vgic-v3: Relax synchronization when SRE==1 arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1 arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value KVM: arm/arm64: vgic-v3: Always resample level interrupts KVM: arm/arm64: vgic-v2: Always resample level interrupts KVM: arm/arm64: vgic-v3: Clear all dirty LRs KVM: arm/arm64: vgic-v2: Clear all dirty LRs
| * KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGSPaolo Bonzini2016-06-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40 RSP <ffff88005836bd50> Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUIDPaolo Bonzini2016-06-021-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_irq_routing ir = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_SET_GSI_ROUTING, &ir); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsiPaolo Bonzini2016-06-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] PGD 6f80b067 PUD b6535067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 [...] Call Trace: [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm] [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm] [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a1062>] tracesys_phase2+0x84/0x89 Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85 RIP [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] RSP <ffff8800926cbca8> CR2: 0000000000000120 Testcase: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[26]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); struct kvm_irqfd ifd; ifd.fd = syscall(SYS_eventfd2, 5, 0); ifd.gsi = 3; ifd.flags = 2; ifd.resamplefd = ifd.fd; r[25] = ioctl(r[3], KVM_IRQFD, &ifd); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * KVM: fail KVM_SET_VCPU_EVENTS with invalid exception numberPaolo Bonzini2016-06-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return EINVAL. It causes a WARN from exception_type: WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]() CPU: 3 PID: 16732 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2 ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm] [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm] [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 ---[ end trace b1a0391266848f50 ]--- Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> long r[31]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0); struct kvm_vcpu_events ve = { .exception.injected = 1, .exception.nr = 0xd4 }; r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve); r[30] = ioctl(r[7], KVM_RUN, 0); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUIDPaolo Bonzini2016-06-021-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_cpuid2 c = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8); r[7] = ioctl(r[4], KVM_SET_CPUID, &c); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>