summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fanotify: cache fsid in fsnotify_mark_connectorAmir Goldstein2019-02-075-58/+128
| | | | | | | | | | | | | | | | | | | | | For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on every event. To avoid having to call vfs_statfs() on every event to get fsid, we store the fsid in fsnotify_mark_connector on the first time we add a mark and on handle event we use the cached fsid. Subsequent calls to add mark on the same object are expected to pass the same fsid, so the call will fail on cached fsid mismatch. If an event is reported on several mark types (inode, mount, filesystem), all connectors should already have the same fsid, so we use the cached fsid from the first connector. [JK: Simplify code flow around fanotify_get_fid() make fsid argument of fsnotify_add_mark_locked() unconditional] Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fanotify: enable FAN_REPORT_FID init flagAmir Goldstein2019-02-072-2/+61
| | | | | | | | | | | | | | | | | | | When setting up an fanotify listener, user may request to get fid information in event instead of an open file descriptor. The fid obtained with event on a watched object contains the file handle returned by name_to_handle_at(2) and fsid returned by statfs(2). Restrict FAN_REPORT_FID to class FAN_CLASS_NOTIF, because we have have no good reason to support reporting fid on permission events. When setting a mark, we need to make sure that the filesystem supports encoding file handles with name_to_handle_at(2) and that statfs(2) encodes a non-zero fsid. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fanotify: copy event fid info to userAmir Goldstein2019-02-073-5/+102
| | | | | | | | | | | | | | | | | | | | If group requested FAN_REPORT_FID and event has file identifier, copy that information to user reading the event after event metadata. fid information is formatted as struct fanotify_event_info_fid that includes a generic header struct fanotify_event_info_header, so that other info types could be defined in the future using the same header. metadata->event_len includes the length of the fid information. The fid information includes the filesystem's fsid (see statfs(2)) followed by an NFS file handle of the file that could be passed as an argument to open_by_handle_at(2). Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fanotify: encode file identifier for FAN_REPORT_FIDAmir Goldstein2019-02-074-13/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | When user requests the flag FAN_REPORT_FID in fanotify_init(), a unique file identifier of the event target object will be reported with the event. The file identifier includes the filesystem's fsid (i.e. from statfs(2)) and an NFS file handle of the file (i.e. from name_to_handle_at(2)). The file identifier makes holding the path reference and passing a file descriptor to user redundant, so those are disabled in a group with FAN_REPORT_FID. Encode fid and store it in event for a group with FAN_REPORT_FID. Up to 12 bytes of file handle on 32bit arch (16 bytes on 64bit arch) are stored inline in fanotify_event struct. Larger file handles are stored in an external allocated buffer. On failure to encode fid, we print a warning and queue the event without the fid information. [JK: Fold part of later patched into this one to use exportfs_encode_inode_fh() right away] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fanotify: open code fill_event_metadata()Amir Goldstein2019-02-071-44/+27Star
| | | | | | | | The helper is quite trivial and open coding it will make it easier to implement copying event fid info to user. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fanotify: rename struct fanotify_{,perm_}event_infoAmir Goldstein2019-02-063-26/+26
| | | | | | | | | | | | | struct fanotify_event_info "inherits" from struct fsnotify_event and therefore a more appropriate (and short) name for it is fanotify_event. Same for struct fanotify_perm_event_info, which now "inherits" from struct fanotify_event. We plan to reuse the name struct fanotify_event_info for user visible event info record format. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: move mask out of struct fsnotify_eventAmir Goldstein2019-02-068-40/+29Star
| | | | | | | | | | | | | | Common fsnotify_event helpers have no need for the mask field. It is only used by backend code, so move the field out of the abstract fsnotify_event struct and into the concrete backend event structs. This change packs struct inotify_event_info better on 64bit machine and will allow us to cram some more fields into struct fanotify_event_info. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: send all event types to super block marksAmir Goldstein2019-02-061-8/+7Star
| | | | | | | | | | | | So far, existence of super block marks was checked only on events with data type FSNOTIFY_EVENT_PATH. Use the super block of the "to_tell" inode to report the events of all event types to super block marks. This change has no effect on current backends. Soon, this will allow fanotify backend to receive all event types on a super block mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: remove dirent events from FS_EVENTS_POSS_ON_CHILD maskAmir Goldstein2019-02-061-15/+21
| | | | | | | | | | | | | | | | | | | | | | | "dirent" events are referring to events that modify directory entries, such as create,delete,rename. Those events are always be reported on a watched directory, regardless if FS_EVENT_ON_CHILD is set on the watch mask. ALL_FSNOTIFY_DIRENT_EVENTS defines all the dirent event types and those event types are removed from FS_EVENTS_POSS_ON_CHILD. That means for a directory with an inotify watch and only dirent events in the mask (i.e. create,delete,move), all children dentries will no longer have the DCACHE_FSNOTIFY_PARENT_WATCHED flag set. This will allow all events that happen on children to be optimized away in __fsnotify_parent() without the need to dereference child->d_parent->d_inode->i_fsnotify_mask. Since the dirent events are never repoted via __fsnotify_parent(), this results in no change of logic, but only an optimization. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: annotate directory entry modification eventsAmir Goldstein2019-02-061-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | "dirent" events are referring to events that modify directory entries, such as create,delete,rename. Those events should always be reported on a watched directory, regardless if FS_EVENT_ON_CHILD is set on the watch mask. fsnotify_nameremove() and fsnotify_move() were modified to no longer set the FS_EVENT_ON_CHILD event bit. This is a semantic change to align with the "dirent" event definition. It has no effect on any existing backend, because dnotify, inotify and audit always requets the child events and fanotify does not get the delete,rename events. The fsnotify_dirent() helper is used instead of fsnotify_parent() to report a dirent event to dentry->d_parent without FS_EVENT_ON_CHILD and regardless if parent has the FS_EVENT_ON_CHILD bit set. Unlike fsnotify_parent(), fsnotify_dirent() assumes that dentry->d_name and dentry->d_parent are stable. For fsnotify_create()/fsnotify_mkdir(), this assumption is abviously correct. For fsnotify_nameremove(), it is less trivial, so we use dget_parent() and take_dentry_name_snapshot() to grab stable references. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* Linux 5.0-rc4Linus Torvalds2019-01-281-1/+1
|
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-01-278-17/+61
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Fix the swapped outb() parameters in the KASLR code - Fix the PKEY handling at fork which missed to preserve the pkey state for the child. Comes with a test case to validate that. - Fix the entry stack handling for XEN PV to respect that XEN PV systems enter the function already on the current thread stack and not on the trampoline. - Fix kexec load failure caused by using a stale value when the kexec_buf structure is reused for subsequent allocations. - Fix a bogus sizeof() in the memory encryption code - Enforce PCI dependency for the Intel Low Power Subsystem - Enforce PCI_LOCKLESS_CONFIG when PCI is enabled" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled x86/entry/64/compat: Fix stack switching for XEN PV x86/kexec: Fix a kexec_file_load() failure x86/mm/mem_encrypt: Fix erroneous sizeof() x86/selftests/pkeys: Fork() to check for state being preserved x86/pkeys: Properly copy pkey state at fork() x86/kaslr: Fix incorrect i8254 outb() parameters x86/intel/lpss: Make PCI dependency explicit
| * x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabledSinan Kaya2019-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set") dependencies on CONFIG_PCI that previously were satisfied implicitly through dependencies on CONFIG_ACPI have to be specified directly. PCI_LOCKLESS_CONFIG depends on PCI but this dependency has not been mentioned in the Kconfig so add an explicit dependency here and fix WARNING: unmet direct dependencies detected for PCI_LOCKLESS_CONFIG Depends on [n]: PCI [=n] Selected by [y]: - X86 [=y] Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set") Signed-off-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-acpi@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190121231958.28255-2-okaya@kernel.org
| * x86/entry/64/compat: Fix stack switching for XEN PVJan Beulich2019-01-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While in the native case entry into the kernel happens on the trampoline stack, PV Xen kernels get entered with the current thread stack right away. Hence source and destination stacks are identical in that case, and special care is needed. Other than in sync_regs() the copying done on the INT80 path isn't NMI / #MC safe, as either of these events occurring in the middle of the stack copying would clobber data on the (source) stack. There is similar code in interrupt_entry() and nmi(), but there is no fixup required because those code paths are unreachable in XEN PV guests. [ tglx: Sanitized subject, changelog, Fixes tag and stable mail address. Sigh ] Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/5C3E1128020000780020DFAD@prv1-mh.provo.novell.com
| * x86/kexec: Fix a kexec_file_load() failureDave Young2019-01-152-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()") changed the behavior of kexec_locate_mem_hole(): it will try to allocate free memory only when kbuf.mem is initialized to zero. However, x86's kexec_file_load() implementation reuses a struct kexec_buf allocated on the stack and its kbuf.mem member gets set by each kexec_add_buffer() invocation. The second kexec_add_buffer() will reuse the same kbuf but not reinitialize kbuf.mem. Therefore, explictily reset kbuf.mem each time in order for kexec_locate_mem_hole() to locate a free memory region each time. [ bp: massage commit message. ] Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()") Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Philipp Rudo <prudo@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yannik Sembritzki <yannik@sembritzki.me> Cc: Yi Wang <wang.yi59@zte.com.cn> Cc: kexec@lists.infradead.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181228011247.GA9999@dhcp-128-65.nay.redhat.com
| * x86/mm/mem_encrypt: Fix erroneous sizeof()Peng Hao2019-01-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using sizeof(pointer) for determining the size of a memset() only works when the size of the pointer and the size of type to which it points are the same. For pte_t this is only true for 64bit and 32bit-NONPAE. On 32bit PAE systems this is wrong as the pointer size is 4 byte but the PTE entry is 8 bytes. It's actually not a real world issue as this code depends on 64bit, but it's wrong nevertheless. Use sizeof(*p) for correctness sake. Fixes: aad983913d77 ("x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()") Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1546065252-97996-1-git-send-email-peng.hao2@zte.com.cn
| * x86/selftests/pkeys: Fork() to check for state being preservedDave Hansen2019-01-151-10/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug where the per-mm pkey state was not being preserved across fork() in the child. fork() is performed in the pkey selftests, but all of the pkey activity is performed in the parent. The child does not perform any actions sensitive to pkey state. To make the test more sensitive to these kinds of bugs, add a fork() where the parent exits, and execution continues in the child. To achieve this let the key exhaustion test not terminate at the first allocation failure and fork after 2*NR_PKEYS loops and continue in the child. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: peterz@infradead.org Cc: mpe@ellerman.id.au Cc: will.deacon@arm.com Cc: luto@kernel.org Cc: jroedel@suse.de Cc: stable@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20190102215657.585704B7@viggo.jf.intel.com
| * x86/pkeys: Properly copy pkey state at fork()Dave Hansen2019-01-151-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory protection key behavior should be the same in a child as it was in the parent before a fork. But, there is a bug that resets the state in the child at fork instead of preserving it. The creation of new mm's is a bit convoluted. At fork(), the code does: 1. memcpy() the parent mm to initialize child 2. mm_init() to initalize some select stuff stuff 3. dup_mmap() to create true copies that memcpy() did not do right For pkeys two bits of state need to be preserved across a fork: 'execute_only_pkey' and 'pkey_allocation_map'. Those are preserved by the memcpy(), but mm_init() invokes init_new_context() which overwrites 'execute_only_pkey' and 'pkey_allocation_map' with "new" values. The author of the code erroneously believed that init_new_context is *only* called at execve()-time. But, alas, init_new_context() is used at execve() and fork(). The result is that, after a fork(), the child's pkey state ends up looking like it does after an execve(), which is totally wrong. pkeys that are already allocated can be allocated again, for instance. To fix this, add code called by dup_mmap() to copy the pkey state from parent to child explicitly. Also add a comment above init_new_context() to make it more clear to the next poor sod what this code is used for. Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: peterz@infradead.org Cc: mpe@ellerman.id.au Cc: will.deacon@arm.com Cc: luto@kernel.org Cc: jroedel@suse.de Cc: stable@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
| * x86/kaslr: Fix incorrect i8254 outb() parametersDaniel Drake2019-01-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The outb() function takes parameters value and port, in that order. Fix the parameters used in the kalsr i8254 fallback code. Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions") Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: linux@endlessm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com
| * x86/intel/lpss: Make PCI dependency explicitSinan Kaya2019-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set") dependencies on CONFIG_PCI that previously were satisfied implicitly through dependencies on CONFIG_ACPI have to be specified directly. LPSS code relies on PCI infrastructure but this dependency has not been explicitly called out so do that. Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set") Signed-off-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-acpi@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190102181038.4418-11-okaya@kernel.org
* | Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds2019-01-272-18/+16Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer fixes from Thomas Gleixner: "Two commits which were missed to be sent during the merge window. - The TSC calibration fix turns out to be more urgent as recent Skylake-X systems seem to have massive trouble with calibration disturbance. This should go back into stable for that reason and it the risk of breakage is rather low. - Drop an unused define" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Remove unused FSEC_PER_NSEC define x86/tsc: Make calibration refinement more robust
| * | x86/hpet: Remove unused FSEC_PER_NSEC defineRoland Dreier2018-12-041-4/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code"). Remove it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181130211450.5200-1-roland@purestorage.com
| * | x86/tsc: Make calibration refinement more robustDaniel Vacek2018-11-061-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The threshold in tsc_read_refs() is constant which may favor slower CPUs but may not be optimal for simple reading of reference on faster ones. Hence make it proportional to tsc_khz when available to compensate for this. The threshold guards against any disturbance like IRQs, NMIs, SMIs or CPU stealing by host on guest systems so rename it accordingly and fix comments as well. Also on some systems there is noticeable DMI bus contention at some point during boot keeping the readout failing (observed with about one in ~300 boots when testing). In that case retry also the second readout instead of simply bailing out unrefined. Usually the next second the readout returns fast just fine without any issues. Signed-off-by: Daniel Vacek <neelx@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@redhat.com
* | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2019-01-271-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Glexiner: "A single regression fix to address the unintended breakage of posix cpu timers. This is caused by a new sanity check in the common code, which fails for posix cpu timers under certain conditions because the posix cpu timer code never updates the variable which is checked" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Unbreak timer rearming
| * | | posix-cpu-timers: Unbreak timer rearmingThomas Gleixner2019-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent commit which prevented a division by 0 issue in the alarm timer code broke posix CPU timers as an unwanted side effect. The reason is that the common rearm code checks for timer->it_interval being 0 now. What went unnoticed is that the posix cpu timer setup does not initialize timer->it_interval as it stores the interval in CPU timer specific storage. The reason for the separate storage is historical as the posix CPU timers always had a 64bit nanoseconds representation internally while timer->it_interval is type ktime_t which used to be a modified timespec representation on 32bit machines. Instead of reverting the offending commit and fixing the alarmtimer issue in the alarmtimer code, store the interval in timer->it_interval at CPU timer setup time so the common code check works. This also repairs the existing inconistency of the posix CPU timer code which kept a single shot timer armed despite of the interval being 0. The separate storage can be removed in mainline, but that needs to be a separate commit as the current one has to be backported to stable kernels. Fixes: 0e334db6bb4b ("posix-timers: Fix division by zero bug") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190111133500.840117406@linutronix.de
* | | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2019-01-275-12/+39
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A small series of fixes which all address possible missed wakeups: - Document and fix the wakeup ordering of wake_q - Add the missing barrier in rcuwait_wake_up(), which was documented in the comment but missing in the code - Fix the possible missed wakeups in the rwsem and futex code" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Fix (possible) missed wakeup futex: Fix (possible) missed wakeup sched/wake_q: Fix wakeup ordering for wake_q sched/wake_q: Document wake_q_add() sched/wait: Fix rcuwait_wake_up() ordering
| * | | | locking/rwsem: Fix (possible) missed wakeupXie Yongji2019-01-211-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because wake_q_add() can imply an immediate wakeup (cmpxchg failure case), we must not rely on the wakeup being delayed. However, commit: e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task") relies on exactly that behaviour in that the wakeup must not happen until after we clear waiter->task. [ peterz: Added changelog. ] Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task") Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | futex: Fix (possible) missed wakeupPeter Zijlstra2019-01-211-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must not rely on wake_q_add() to delay the wakeup; in particular commit: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups") moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which could result in futex_wait() waking before observing ->lock_ptr == NULL and going back to sleep again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/wake_q: Fix wakeup ordering for wake_qPeter Zijlstra2019-01-211-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Notable cmpxchg() does not provide ordering when it fails, however wake_q_add() requires ordering in this specific case too. Without this it would be possible for the concurrent wakeup to not observe our prior state. Andrea Parri provided: C wake_up_q-wake_q_add { int next = 0; int y = 0; } P0(int *next, int *y) { int r0; /* in wake_up_q() */ WRITE_ONCE(*next, 1); /* node->next = NULL */ smp_mb(); /* implied by wake_up_process() */ r0 = READ_ONCE(*y); } P1(int *next, int *y) { int r1; /* in wake_q_add() */ WRITE_ONCE(*y, 1); /* wake_cond = true */ smp_mb__before_atomic(); r1 = cmpxchg_relaxed(next, 1, 2); } exists (0:r0=0 /\ 1:r1=0) This "exists" clause cannot be satisfied according to the LKMM: Test wake_up_q-wake_q_add Allowed States 3 0:r0=0; 1:r1=1; 0:r0=1; 1:r1=0; 0:r0=1; 1:r1=1; No Witnesses Positive: 0 Negative: 3 Condition exists (0:r0=0 /\ 1:r1=0) Observation wake_up_q-wake_q_add Never 0 3 Reported-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/wake_q: Document wake_q_add()Peter Zijlstra2019-01-212-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only guarantee provided by wake_q_add() is that a wakeup will happen after it, it does _NOT_ guarantee the wakeup will be delayed until the matching wake_up_q(). If wake_q_add() fails the cmpxchg() a concurrent wakeup is pending and that can happen at any time after the cmpxchg(). This means we should not rely on the wakeup happening at wake_q_up(), but should be ready for wake_q_add() to issue the wakeup. The delay; if provided (most likely); should only result in more efficient behaviour. Reported-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/wait: Fix rcuwait_wake_up() orderingPrateek Sood2019-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some peculiar reason rcuwait_wake_up() has the right barrier in the comment, but not in the code. This mistake has been observed to cause a deadlock in the following situation: P1 P2 percpu_up_read() percpu_down_write() rcu_sync_is_idle() // false rcu_sync_enter() ... __percpu_up_read() [S] ,- __this_cpu_dec(*sem->read_count) | smp_rmb(); [L] | task = rcu_dereference(w->task) // NULL | | [S] w->task = current | smp_mb(); | [L] readers_active_check() // fail `-> <store happens here> Where the smp_rmb() (obviously) fails to constrain the store. [ peterz: Added changelog. ] Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8f95c90ceb54 ("sched/wait, RCU: Introduce rcuwait machinery") Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2019-01-277-16/+20
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A small set of fixes for the interrupt subsystem: - Fix a double increment in the irq descriptor allocator which resulted in a sanity check only being done for every second affinity mask - Add a missing device tree translation in the stm32-exti driver. Without that the interrupt association is completely wrong. - Initialize the mutex in the GIC-V3 MBI driver - Fix the alignment for aliasing devices in the GIC-V3-ITS driver so multi MSI allocations work correctly - Ensure that the initial affinity of a interrupt is not empty at startup time. - Drop bogus include in the madera irq chip driver - Fix KernelDoc regression" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size genirq/irqdesc: Fix double increment in alloc_descs() genirq: Fix the kerneldoc comment for struct irq_affinity_desc irqchip/madera: Drop GPIO includes irqchip/gic-v3-mbi: Fix uninitialized mbi_lock irqchip/stm32-exti: Add domain translate function genirq: Make sure the initial affinity is not empty
| * \ \ \ \ Merge tag 'irqchip-5.0-2' of ↵Thomas Gleixner2019-01-184-15/+15
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip updates from Marc Zyngier - Add missing DT translation call in stm32-exti - Fix uninitialized mutex in the GICv3 MBI support code - Drop useless GPIO includes from the madera driver - Fix PCI Multi-MSI allocation with aliasing devices on GICv3 ITS
| | * | | | | irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their sizeMarc Zyngier2019-01-181-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way we allocate events works fine in most cases, except when multiple PCI devices share an ITS-visible DevID, and that one of them is trying to use MultiMSI allocation. In that case, our allocation is not guaranteed to be zero-based anymore, and we have to make sure we allocate it on a boundary that is compatible with the PCI Multi-MSI constraints. Fix this by allocating the full region upfront instead of iterating over the number of MSIs. MSI-X are always allocated one by one, so this shouldn't change anything on that front. Fixes: b48ac83d6bbc2 ("irqchip: GICv3: ITS: MSI support") Cc: stable@vger.kernel.org Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | | | irqchip/madera: Drop GPIO includesLinus Walleij2019-01-171-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This irqchip does not use anything GPIO-related so drop the GPIO includes. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | | | irqchip/gic-v3-mbi: Fix uninitialized mbi_lockYang Yingliang2019-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mbi_lock mutex is left uninitialized, so let's use DEFINE_MUTEX to initialize it statically. Fixes: 505287525c24d ("irqchip/gic-v3: Add support for Message Based Interrupts as an MSI controller") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | | | | irqchip/stm32-exti: Add domain translate functionLoic Pallardy2019-01-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Domain translate function is needed to recover irq configuration parameters from DT node Fixes: 927abfc4461e ("irqchip/stm32: Add stm32mp1 support with hierarchy domain") Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | | genirq/irqdesc: Fix double increment in alloc_descs()Huacai Chen2019-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent rework of alloc_descs() introduced a double increment of the loop counter. As a consequence only every second affinity mask is validated. Remove it. [ tglx: Massaged changelog ] Fixes: c410abbbacb9 ("genirq/affinity: Add is_managed to struct irq_affinity_desc") Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhuacai@gmail.com> Cc: Dou Liyang <douliyangs@gmail.com> Link: https://lkml.kernel.org/r/1547694009-16261-1-git-send-email-chenhc@lemote.com
| * | | | | | genirq: Fix the kerneldoc comment for struct irq_affinity_descJonathan Corbet2019-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent commit added a new field but did not update the kerneldoc comment, leading to this build warning: ./include/linux/interrupt.h:268: warning: Function parameter or member 'is_managed' not described in 'irq_affinity_desc' Add the missing information, making the docs build 0.001% quieter. Fixes: c410abbbacb9 ("genirq/affinity: Add is_managed to struct irq_affinity_desc") Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dou Liyang <douliyangs@gmail.com> Link: https://lkml.kernel.org/r/20190108170432.59bae8a6@lwn.net
| * | | | | | genirq: Make sure the initial affinity is not emptySrinivas Ramana2019-01-151-0/+3
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If all CPUs in the irq_default_affinity mask are offline when an interrupt is initialized then irq_setup_affinity() can set an empty affinity mask for a newly allocated interrupt. Fix this by falling back to cpu_online_mask in case the resulting affinity mask is zero. Signed-off-by: Srinivas Ramana <sramana@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.org
* | | | | | Merge tag 'edac_fix_for_5.0' of ↵Linus Torvalds2019-01-271-2/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fix from Borislav Petkov: "Fix persistent register offsets of altera_edac, from Thor Thayer" * tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, altera: Fix S10 persistent register offset
| * | | | | | EDAC, altera: Fix S10 persistent register offsetThor Thayer2019-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the persistent register offset where address and status are stored. Fixes: 08f08bfb7b4c ("EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine") Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: devicetree@vger.kernel.org Cc: dinguyen@kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: robh+dt@kernel.org Cc: stable <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/1548179287-21760-2-git-send-email-thor.thayer@linux.intel.com
* | | | | | | Merge tag 'for-linus-20190127' of git://git.kernel.dk/linux-blockLinus Torvalds2019-01-272-11/+10Star
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block revert from Jens Axboe: "Silly error snuck into a patch from the last series, let's do a revert to avoid a potential use-after-free" * tag 'for-linus-20190127' of git://git.kernel.dk/linux-block: Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"
| * | | | | | | Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"Jens Axboe2019-01-272-11/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't touch a bio after ->make_request_fn(), for all we know it could already have been completed by the time this function returns. This reverts commit 698cef173983b086977e633e46476e0f925ca01e. Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-01-2713-115/+130
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "Quite a few fixes for x86: nested virtualization save/restore, AMD nested virtualization and virtual APIC, 32-bit fixes, an important fix to restore operation on older processors, and a bunch of hyper-v bugfixes. Several are marked stable. There are also fixes for GCC warnings and for a GCC/objtool interaction" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Mark expected switch fall-throughs KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths KVM: selftests: check returned evmcs version range x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function kvm: selftests: Fix region overlap check in kvm_util kvm: vmx: fix some -Wmissing-prototypes warnings KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 svm: Fix AVIC incomplete IPI emulation svm: Add warning message for AVIC IPI invalid target KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error KVM: x86: Fix PV IPIs for 32-bit KVM host x86/kvm/hyper-v: recommend using eVMCS only when it is enabled x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR kvm: x86/vmx: Use kzalloc for cached_vmcs12 KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL KVM: x86: Fix single-step debugging x86/kvm/hyper-v: don't announce GUEST IDLE MSR support
| * | | | | | | | KVM: x86: Mark expected switch fall-throughsGustavo A. R. Silva2019-01-256-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search pathsMasahiro Yamada2019-01-252-5/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The header search path -I. in kernel Makefiles is very suspicious; it allows the compiler to search for headers in the top of $(srctree), where obviously no header file exists. The reason of having -I. here is to make the incorrectly set TRACE_INCLUDE_PATH working. As the comment block in include/trace/define_trace.h says, TRACE_INCLUDE_PATH should be a relative path to the define_trace.h Fix the TRACE_INCLUDE_PATH, and remove the iffy include paths. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: selftests: check returned evmcs version rangeVitaly Kuznetsov2019-01-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check that KVM_CAP_HYPERV_ENLIGHTENED_VMCS returns correct version range. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectlyVitaly Kuznetsov2019-01-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e2e871ab2f02 ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper") broke EVMCS enablement: to set vmcs_version we now call nested_get_evmcs_version() but this function checks enlightened_vmcs_enabled flag which is not yet set so we end up returning zero. Fix the issue by re-arranging things in nested_enable_evmcs(). Fixes: e2e871ab2f02 ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper functionSean Christopherson2019-01-251-66/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the asm blob results in a significantly smaller amount of code that is marked with STACK_FRAME_NON_STANDARD, which makes it far less likely that gcc will split the function and trigger a spurious objtool warning. As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows the bulk of code to be properly checked by objtool. Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually save/restore the host's RBP and load the guest's RBP prior to calling vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code, and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's impossible to avoid modifying %rbp. Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will split into separate functions, e.g. so that pieces of the function can be inlined. Splitting the function means that the compiled Elf file will contain one or more vmx_vcpu_run.part.* functions in addition to a vmx_vcpu_run function. Depending on where the function is split, objtool may warn about a "call without frame pointer save/setup" in vmx_vcpu_run.part.* since objtool's stack validation looks for exact names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD. Up until recently, the undesirable function splitting was effectively blocked because vmx_vcpu_run() was tagged with __noclone. At the time, __noclone had an unintended side effect that put vmx_vcpu_run() into a separate optimization unit, which in turn prevented gcc from inlining the function (or any of its own function calls) and thus eliminated gcc's motivation to split the function. Removing the __noclone attribute allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning. Kudos to Qian Cai for root causing that the fnsplit optimization is what caused objtool to complain. Fixes: 453eafbe65f7 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines") Tested-by: Qian Cai <cai@lca.pw> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>