| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a shared page used for paravirtualization. It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.
The physical address specified by the hypercall is not used, as
e500 does not have real mode.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them). Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.
Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The documented behavior did not match the implemented one (which also
never changed).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch includes a brief introduction to the nested vmx feature in the
Documentation/kvm directory. The document also includes a copy of the
vmcs12 structure, as requested by Avi Kivity.
[marcelo: move to Documentation/virtual/kvm]
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
| |
| |
| |
| |
| |
| | |
Also removes a long-unused #define and an extraneous semicolon.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We used to notify the Host every time we updated a device's status. However,
it only really needs to know when we're resetting the device, or failed to
initialize it, or when we've finished our feature negotiation.
In particular, we used to wait for VIRTIO_CONFIG_S_DRIVER_OK in the
status byte before starting the device service threads. But this
corresponds to the successful finish of device initialization, which
might (like virtio_blk's partition scanning) use the device. So we
had a hack, if they used the device before we expected we started the
threads anyway.
Now we hook into the finalize_features hook in the Guest: at that
point we tell the Launcher that it can rely on the features we have
acked. On the Launcher side, we look at the status at that point, and
start servicing the device.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not exit on some non-fatal errors:
- writev() fails in net_output(). The result is a lost packet or packets.
- writev() fails in console_output(). The result is partially lost console
output.
- readv() fails in net_input(). The result is a lost packet or packets.
Rather than bringing the guest down, this patch ignores e.g. an allocation
failure on the host side. Example:
lguest: page allocation failure. order:4, mode:0x4d0
Pid: 4045, comm: lguest Tainted: G W 2.6.36 #1
Call Trace:
[<c138d614>] ? printk+0x18/0x1c
[<c106a4e2>] __alloc_pages_nodemask+0x4d2/0x570
[<c1087954>] cache_alloc_refill+0x2a4/0x4d0
[<c1305149>] ? __netif_receive_skb+0x189/0x270
[<c1087c5a>] __kmalloc+0xda/0xf0
[<c12fffa5>] __alloc_skb+0x55/0x100
[<c1305519>] ? net_rx_action+0x79/0x100
[<c12fafed>] sock_alloc_send_pskb+0x18d/0x280
[<c11fda25>] ? _copy_from_user+0x35/0x130
[<c13010b6>] ? memcpy_fromiovecend+0x56/0x80
[<c12a74dc>] tun_chr_aio_write+0x1cc/0x500
[<c108a125>] do_sync_readv_writev+0x95/0xd0
[<c11fda25>] ? _copy_from_user+0x35/0x130
[<c1089fa8>] ? rw_copy_check_uvector+0x58/0x100
[<c108a7bc>] do_readv_writev+0x9c/0x1d0
[<c12a7310>] ? tun_chr_aio_write+0x0/0x500
[<c108a93a>] vfs_writev+0x4a/0x60
[<c108aa21>] sys_writev+0x41/0x80
[<c138f061>] syscall_call+0x7/0xb
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
HighMem per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
active_anon:134651 inactive_anon:50543 isolated_anon:0
active_file:96881 inactive_file:132007 isolated_file:0
unevictable:0 dirty:3 writeback:0 unstable:0
free:91374 slab_reclaimable:6300 slab_unreclaimable:2802
mapped:2281 shmem:9 pagetables:330 bounce:0
DMA free:3524kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:8kB active_file:8760kB inactive_file:2760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:88kB slab_unreclaimable:148kB kernel_stack:40kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 865 2016 2016
Normal free:150100kB min:3728kB low:4660kB high:5592kB active_anon:6224kB inactive_anon:15772kB active_file:324084kB inactive_file:325944kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:12kB writeback:0kB mapped:1520kB shmem:0kB slab_reclaimable:25112kB slab_unreclaimable:11060kB kernel_stack:1888kB pagetables:1320kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 9207 9207
HighMem free:211872kB min:512kB low:1752kB high:2992kB active_anon:532380kB inactive_anon:186392kB active_file:54680kB inactive_file:199324kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1178504kB mlocked:0kB dirty:0kB writeback:0kB mapped:7588kB shmem:36kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 65*8kB 35*16kB 18*32kB 11*64kB 9*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3524kB
Normal: 35981*4kB 344*8kB 158*16kB 28*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 150100kB
HighMem: 5732*4kB 5462*8kB 2826*16kB 1598*32kB 84*64kB 10*128kB 7*256kB 1*512kB 1*1024kB 1*2048kB 9*4096kB = 211872kB
231237 total pagecache pages
2340 pages in swap cache
Swap cache stats: add 160060, delete 157720, find 189017/194106
Free swap = 4179840kB
Total swap = 4194300kB
524271 pages RAM
296946 pages HighMem
5668 pages reserved
867664 pages shared
82155 pages non-shared
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
|
|
|
|
| |
No virtio device does this any more, so no need to clutter lguest with it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
|
|
|
|
|
| |
ed16648eb5b86917f0b90bdcdbc857202da72f90 "Move kvm, uml, and lguest
subdirectories" broke the lguest example launcher.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ucast transport is similar to the mcast transport (and, in fact,
shares most of its code), only it uses UDP unicast to move packets.
Obviously this is only useful for point-to-point connections between
virtual ethernet devices.
Signed-off-by: Nolan Leake <nolan@cumulusnetworks.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* 'kvm-updates/2.6.40' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (131 commits)
KVM: MMU: Use ptep_user for cmpxchg_gpte()
KVM: Fix kvm mmu_notifier initialization order
KVM: Add documentation for KVM_CAP_NR_VCPUS
KVM: make guest mode entry to be rcu quiescent state
KVM: x86 emulator: Make jmp far emulation into a separate function
KVM: x86 emulator: Rename emulate_grpX() to em_grpX()
KVM: x86 emulator: Remove unused arg from emulate_pop()
KVM: x86 emulator: Remove unused arg from writeback()
KVM: x86 emulator: Remove unused arg from read_descriptor()
KVM: x86 emulator: Remove unused arg from seg_override()
KVM: Validate userspace_addr of memslot when registered
KVM: MMU: Clean up gpte reading with copy_from_user()
KVM: PPC: booke: add sregs support
KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0)
KVM: PPC: use ticks, not usecs, for exit timing
KVM: PPC: fix exit accounting for SPRs, tlbwe, tlbsx
KVM: PPC: e500: emulate SVR
KVM: VMX: Cache vmcs segment fields
KVM: x86 emulator: consolidate segment accessors
KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions
...
|
|
|
|
|
|
|
|
|
|
| |
- Documentation/kvm/ to Documentation/virtual/kvm
- Documentation/uml/ to Documentation/virtual/uml
- Documentation/lguest/ to Documentation/virtual/lguest
throughout the kernel source tree.
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
|
|
|
|
|
|
|
| |
Remove uml from the top level 00-INDEX file.
Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
|
|
I.E:
cd Documentation
mkdir virtual
git mv kvm uml lguest virtual
Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
|