1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
|
PCI EXPRESS GUIDELINES
======================
1. Introduction
================
The doc proposes best practices on how to use PCI Express/PCI device
in PCI Express based machines and explains the reasoning behind them.
The following presentations accompany this document:
(1) Q35 overview.
http://wiki.qemu.org/images/4/4e/Q35.pdf
(2) A comparison between PCI and PCI Express technologies.
http://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf
Note: The usage examples are not intended to replace the full
documentation, please use QEMU help to retrieve all options.
2. Device placement strategy
============================
QEMU does not have a clear socket-device matching mechanism
and allows any PCI/PCI Express device to be plugged into any
PCI/PCI Express slot.
Plugging a PCI device into a PCI Express slot might not always work and
is weird anyway since it cannot be done for "bare metal".
Plugging a PCI Express device into a PCI slot will hide the Extended
Configuration Space thus is also not recommended.
The recommendation is to separate the PCI Express and PCI hierarchies.
PCI Express devices should be plugged only into PCI Express Root Ports and
PCI Express Downstream ports.
2.1 Root Bus (pcie.0)
=====================
Place only the following kinds of devices directly on the Root Complex:
(1) PCI Devices (e.g. network card, graphics card, IDE controller),
not controllers. Place only legacy PCI devices on
the Root Complex. These will be considered Integrated Endpoints.
Note: Integrated Endpoints are not hot-pluggable.
Although the PCI Express spec does not forbid PCI Express devices as
Integrated Endpoints, existing hardware mostly integrates legacy PCI
devices with the Root Complex. Guest OSes are suspected to behave
strangely when PCI Express devices are integrated
with the Root Complex.
(2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
hierarchies.
(3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI
hierarchies.
(4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses
are needed.
pcie.0 bus
----------------------------------------------------------------------------
| | | |
----------- ------------------ ------------------ --------------
| PCI Dev | | PCIe Root Port | | DMI-PCI Bridge | | pxb-pcie |
----------- ------------------ ------------------ --------------
2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use:
-device <dev>[,bus=pcie.0]
2.1.2 To expose a new PCI Express Root Bus use:
-device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z]
Only PCI Express Root Ports and DMI-PCI bridges can be connected
to the pcie.1 bus:
-device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
-device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
2.2 PCI Express only hierarchy
==============================
Always use PCI Express Root Ports to start PCI Express hierarchies.
A PCI Express Root bus supports up to 32 devices. Since each
PCI Express Root Port is a function and a multi-function
device may support up to 8 functions, the maximum possible
number of PCI Express Root Ports per PCI Express Root Bus is 256.
Prefer grouping PCI Express Root Ports into multi-function devices
to keep a simple flat hierarchy that is enough for most scenarios.
Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
if there is no more room for PCI Express Root Ports.
Please see section 4. for further justifications.
Plug only PCI Express devices into PCI Express Ports.
pcie.0 bus
----------------------------------------------------------------------------------
| | |
------------- ------------- -------------
| Root Port | | Root Port | | Root Port |
------------ ------------- -------------
| -------------------------|------------------------
------------ | ----------------- |
| PCIe Dev | | PCI Express | Upstream Port | |
------------ | Switch ----------------- |
| | | |
| ------------------- ------------------- |
| | Downstream Port | | Downstream Port | |
| ------------------- ------------------- |
-------------|-----------------------|------------
------------
| PCIe Dev |
------------
2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
-device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \
-device <dev>,bus=root_port1
2.2.2 Using multi-function PCI Express Root Ports:
-device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \
-device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \
-device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \
2.2.3 Plugging a PCI Express device into a Switch:
-device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \
-device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \
-device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \
-device <dev>,bus=downstream_port1
Notes:
- (slot, chassis) pair is mandatory and must be unique for each
PCI Express Root Port. slot defaults to 0 when not specified.
- 'addr' parameter can be 0 for all the examples above.
2.3 PCI only hierarchy
======================
Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
but, as mentioned in section 5, doing so means the legacy PCI
device in question will be incapable of hot-unplugging.
Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
(having 32 slots) and several PCI-PCI Bridges attached to it
(each supporting also 32 slots) will support hundreds of legacy devices.
The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge
until is full and then plug a new PCI-PCI Bridge...
pcie.0 bus
----------------------------------------------
| |
----------- ------------------
| PCI Dev | | DMI-PCI BRIDGE |
---------- ------------------
| |
------------------ ------------------
| PCI-PCI Bridge | | PCI-PCI Bridge | ...
------------------ ------------------
| |
----------- -----------
| PCI Dev | | PCI Dev |
----------- -----------
2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use:
-device <dev>[,bus=pcie.0]
2.3.2 Plugging a PCI device into a PCI-PCI Bridge:
-device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0] \
-device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y] \
-device <dev>,bus=pci_bridge1[,addr=x]
Note that 'addr' cannot be 0 unless shpc=off parameter is passed to
the PCI Bridge.
3. IO space issues
===================
The PCI Express Root Ports and PCI Express Downstream ports are seen by
Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each
such Port should be reserved a 4K IO range for, even though only one
(multifunction) device can be plugged into each Port. This results in
poor IO space utilization.
The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
by not allocating IO space for each PCI Express Root / PCI Express
Downstream port if:
(1) the port is empty, or
(2) the device behind the port has no IO BARs.
The IO space is very limited, to 65536 byte-wide IO ports, and may even be
fragmented by fixed IO ports owned by platform devices resulting in at most
10 PCI Express Root Ports or PCI Express Downstream Ports per system
if devices with IO BARs are used in the PCI Express hierarchy. Using the
proposed device placing strategy solves this issue by using only
PCI Express devices within PCI Express hierarchy.
The PCI Express spec requires that PCI Express devices work properly
without using IO ports. The PCI hierarchy has no such limitations.
4. Bus numbers issues
======================
Each PCI domain can have up to only 256 buses and the QEMU PCI Express
machines do not support multiple PCI domains even if extra Root
Complexes (pxb-pcie) are used.
Each element of the PCI Express hierarchy (Root Complexes,
PCI Express Root Ports, PCI Express Downstream/Upstream ports)
uses one bus number. Since only one (multifunction) device
can be attached to a PCI Express Root Port or PCI Express Downstream
Port it is advised to plan in advance for the expected number of
devices to prevent bus number starvation.
Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI
Express hierarchy) enables the hierarchy to not spend bus numbers on
Upstream Ports.
The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
number space. All bus numbers assigned to the buses recursively behind a
given pxb-pcie device's root bus must fit between the bus_nr property of
that pxb-pcie device, and the lowest of the higher bus_nr properties
that the command line sets for other pxb-pcie devices.
5. Hot-plug
============
The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
do not support hot-plug, so any devices plugged into Root Complexes
cannot be hot-plugged/hot-unplugged:
(1) PCI Express Integrated Endpoints
(2) PCI Express Root Ports
(3) DMI-PCI Bridges
(4) pxb-pcie
Be aware that PCI Express Downstream Ports can't be hot-plugged into
an existing PCI Express Upstream Port.
PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI
based and can work side by side with the PCI Express native hot-plug.
PCI Express devices can be natively hot-plugged/hot-unplugged into/from
PCI Express Root Ports (and PCI Express Downstream Ports).
5.1 Planning for hot-plug:
(1) PCI hierarchy
Leave enough PCI-PCI Bridge slots empty or add one
or more empty PCI-PCI Bridges to the DMI-PCI Bridge.
For each such PCI-PCI Bridge the Guest Firmware is expected to reserve
4K IO space and 2M MMIO range to be used for all devices behind it.
Because of the hard IO limit of around 10 PCI Bridges (~ 40K space)
per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the
Integrated Endpoints. (The PCI Express Hierarchy needs no IO space).
(2) PCI Express hierarchy:
Leave enough PCI Express Root Ports empty. Use multifunction
PCI Express Root Ports (up to 8 ports per pcie.0 slot)
on the Root Complex(es), for keeping the
hierarchy as flat as possible, thereby saving PCI bus numbers.
Don't use PCI Express Switches if you don't have
to, each one of those uses an extra PCI bus (for its Upstream Port)
that could be put to better use with another Root Port or Downstream
Port, which may come handy for hot-plugging another device.
5.3 Hot-plug example:
Using HMP: (add -monitor stdio to QEMU command line)
device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/>
6. Device assignment
====================
Host devices are mostly PCI Express and should be plugged only into
PCI Express Root Ports or PCI Express Downstream Ports.
PCI-PCI Bridge slots can be used for legacy PCI host devices.
6.1 How to detect if a device is PCI Express:
> lspci -s 03:00.0 -v (as root)
03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
Subsystem: Intel Corporation Dual Band Wireless-AC 7260
Flags: bus master, fast devsel, latency 0, IRQ 50
Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [40] Express Endpoint, MSI 00
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
Capabilities: [14c] Latency Tolerance Reporting
Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014
If you can see the "Express Endpoint" capability in the
output, then the device is indeed PCI Express.
7. Virtio devices
=================
Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints
will remain PCI and have transitional behaviour as default.
Transitional virtio devices work in both IO and MMIO modes depending on
the guest support. The Guest firmware will assign both IO and MMIO resources
to transitional virtio devices.
Virtio devices plugged into PCI Express ports are PCI Express devices and
have "1.0" behavior by default without IO support.
In both cases disable-legacy and disable-modern properties can be used
to override the behaviour.
Note that setting disable-legacy=off will enable legacy mode (enabling
legacy behavior) for PCI Express virtio devices causing them to
require IO space, which, given the limited available IO space, may quickly
lead to resource exhaustion, and is therefore strongly discouraged.
8. Conclusion
==============
The proposal offers a usage model that is easy to understand and follow
and at the same time overcomes the PCI Express architecture limitations.
|