| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Some past security reviews carried out for UEFI Secure Boot signing
submissions have covered specific drivers or functional areas of iPXE.
Mark all of the files comprising these areas as permitted for UEFI
Secure Boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Mark all files used in a standard build of bin-x86_64-efi/snponly.efi
as permitted for UEFI Secure Boot. These files represent the core
functionality of iPXE that is guaranteed to have been included in
every binary that was previously subject to a security review and
signed by Microsoft. It is therefore legitimate to assume that at
least these files have already been reviewed to the required standard
multiple times.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The third-party 802.11 stack and NFS protocol code are known to
include multiple potential vulnerabilities and are explicitly
forbidden from being included in Secure Boot signed builds. This is
currently handled at the per-directory level by defining a list of
source directories (SRCDIRS_INSEC) that are to be excluded from Secure
Boot builds.
Annotate all files in these directories with FILE_SECBOOT() to convey
this information to the new per-file Secure Boot permissibility check,
and remove the old separation between SRCDIRS and SRCDIRS_INSEC.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
| |
Signed-off-by: Christian I. Nilsson <nikize@gmail.com>
|
| |
|
|
| |
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
| |
Identifiers are taken from the pci.ids database.
Signed-off-by: Bert Ezendam <bert.ezendam@alliander.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The queue base address is meaningless for a low latency queue, since
the queue entries are written directly to the on-device memory. Any
non-zero queue base address will be safely ignored by the hardware,
but leaves open the possibility that future revisions could treat it
as an error.
Leave this field as zero, to match the behaviour of the Linux driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a801244 ("[ena] Increase receive ring size to 128 entries")
increased the receive ring size to 128 entries (while leaving the fill
level at 16), since using a smaller receive ring caused unexplained
failures on some instance types.
The original hardware bug that resulted in that commit seems to have
been fixed: experiments suggest that the original failure (observed on
a c6i.large instance in eu-west-2) will no longer reproduce when using
a receive ring containing only 16 entries (as was the case prior to
that commit).
Newer generations of the ENA hardware (observed on an m8i.large
instance in eu-south-2) seem to have a new and exciting hardware bug:
these instance types appear to use a hash of the received packet
header to determine which portion of the (out-of-order) receive ring
to use. If that portion of the ring happens to be empty (e.g. because
only 32 entries of the 128-entry ring are filled at any one time),
then the packet will be silently dropped.
Work around this new hardware bug by reducing the receive ring size
down to the current fill level of 32 entries. This appears to work on
all current instance types (but has not been exhaustively tested).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Avoid running out of transmit descriptors when sending TCP ACKs by
increasing the transmit queue size to match the increased received
fill level.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Ensure that writes to on-device memory have taken place before writing
to the doorbell register.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Experiments suggest that at least some instance types (observed with
c6i.large in eu-west-2) experience high packet drop rates with only 16
receive buffers allocated. Increase the fill level to 32 buffers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Newer generations of the ENA hardware require the use of low latency
transmit queues, where the submission queues and the initial portion
of the transmitted packet are written to on-device memory via BAR2
instead of being read from host memory.
Detect support for low latency queues and set the placement policy
appropriately. We attempt the use of low latency queues only if the
device reports that it supports inline headers, 128-byte entries, and
two descriptors prior to the inlined header, on the basis that we
don't care about using low latency queues on older versions of the
hardware since those versions will support normal host memory
submission queues anyway.
We reuse the redundant memory allocated for the submission queue as
the bounce buffer for constructing the descriptors and inlined packet
data, since this avoids needing a separate allocation just for the
bounce buffer.
We construct a metadata submission queue entry prior to the actual
submission queue entry, since experimentation suggests that newer
generations of the hardware require this to be present even though it
conveys no information beyond its own existence.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
| |
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Avoid spurious assertion failures by ensuring that references to
uncompleted transmit buffers are not retained after the device has
been closed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Newer generations of the ENA hardware require the use of low latency
transmit queues, where the submission queues and the initial portion
of the transmitted packet are written to on-device memory via BAR2
instead of being read from host memory.
Prepare for this by mapping the on-device memory BAR. As with the
register BAR, we may need to steal a base address from the upstream
PCI bridge since the BIOS on some instance types (observed with an
m8i.metal-48xl instance in eu-south-2) will fail to assign an address
to the device.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
| |
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Use pci_bar_set() when we need to set a device base address (on
instance types such as c6i.metal where the BIOS fails to do so), so
that 64-bit BARs will be handled automatically.
This particular issue has so far been observed only on 6th generation
instances. These use 32-bit BARs, and so the lack of support for
handling 64-bit BARs has not caused any observable issue.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Experimentation suggests that rearming the interrupt once per observed
completion is not sufficient: we still see occasional delays during
which the hardware fails to write out completions.
As described in commit d2e1e59 ("[gve] Use dummy interrupt to trigger
completion writeback in DQO mode"), there is no documentation around
the precise semantics of the interrupt rearming mechanism, and so
experimentation is the only available guide. Switch to rearming both
TX and RX interrupts unconditionally on every poll, since this
produces better experimental results.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
| |
The DQO-QPL operating mode uses registered queue page lists but still
requires the raw DMA address (rather than the linear offset within the
QPL) to be provided in transmit and receive descriptors.
Set the queue page list base device address appropriately.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware reports descriptor and packet completions separately for
the transmit ring. We currently ignore descriptor completions (since
we cannot free up the transmit buffers in the queue page list and
advance the consumer counter until the packet has also completed).
Now that transmit completions are written out immediately (instead of
being delayed until 128 bytes of completions are available), there is
no value in retaining the descriptor completions.
Omit descriptor completions entirely, and reduce the transmit fill
level back down to its original value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When operating in the DQO operating mode, the device will defer
writing transmit and receive completions until an entire internal
cacheline (128 bytes) is full, or until an associated interrupt is
asserted. Since each receive descriptor is 32 bytes, this will cause
received packets to be effectively delayed until up to three further
packets have arrived. When network traffic volumes are very low (such
as during DHCP, DNS lookups, or TCP handshakes), this typically
induces delays of up to 30 seconds and results in a very poor user
experience.
Work around this hardware problem in the same way as for the Intel
40GbE and 100GbE NICs: by enabling dummy MSI-X interrupts to trick the
hardware into believing that it needs to write out completions to host
memory.
There is no documentation around the interrupt rearming mechanism.
The value written to the interrupt doorbell does not include a
consumer counter value, and so must be relying on some undocumented
ordering constraints. Comments in the Linux driver source suggest
that the authors believe that the device will automatically and
atomically mask an MSI-X interrupt at the point of asserting it, that
any further interrupts arriving before the doorbell is written will be
recorded in the pending bit array, and that writing the doorbell will
therefore immediately assert a new interrupt if needed.
In the absence of any documentation, choose to rearm the interrupt
once per observed completion. This is overkill, but is less impactful
than the alternative of rearming the interrupt unconditionally on
every poll.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Ensure that remainder of completion records are read only after
verifying the generation bit (or sequence number).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Use the default dummy MSI-X target address that is now allocated and
configured automatically by pci_msix_enable().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Select a preferred operating mode from those advertised as supported
by the device, falling back to the oldest known mode (GQI-QPL) if
no modes are advertised.
Since there are devices in existence that support only QPL addressing,
and since we want to minimise code size, we choose to always use a
single fixed ring buffer even when using raw DMA addressing. Having
paid this penalty, we therefore choose to prefer QPL over RDA since
this allows the (virtual) hardware to minimise the number of page
table manipulations required. We similarly prefer GQI over DQO since
this minimises the amount of work we have to do: in particular, the RX
descriptor ring contents can remain untouched for the lifetime of the
device and refills require only a doorbell write.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the "DQO" out-of-order transmit and receive queue
formats. These are almost entirely different in format and usage (and
even endianness) from the original "GQI" in-order transmit and receive
queues, and arguably should belong to a completely different device
with a different PCI ID. However, Google chose to essentially crowbar
two unrelated device models into the same virtual hardware, and so we
must handle both of these device models within the same driver.
Most of the new code exists solely to handle the differences in
descriptor sizes and formats. Out-of-order completions are handled
via a buffer ID ring (as with other devices supporting out-of-order
completions, such as the Xen, Hyper-V, and Amazon virtual NICs). A
slight twist is that on the transmit datapath (but not the receive
datapath) the Google NIC provides only one completion per packet
instead of one completion per descriptor, and so we must record the
list of chained buffer IDs in a separate array at the time of
transmission.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We cancel any pending transmissions when (re)starting the device since
any transmissions that were initiated before the admin queue reset
will not complete.
The network device core will also cancel any pending transmissions
after the device is closed. If the device is closed with some
transmissions still pending and is then reopened, this will therefore
result in a stale I/O buffer being passed to netdev_tx_complete_err()
when the device is restarted.
This error has not been observed in practice since transmissions
generally complete almost immediately and it is therefore unlikely
that the device will ever be closed with transmissions still pending.
With out-of-order queues, the device seems to delay transmit
completions (with no upper time limit) until a complete batch is
available to be written out as a block of 128 bytes. It is therefore
very likely that the device will be closed with transmissions still
pending.
Fix by ensuring that we have dropped all references to transmit I/O
buffers before returning from gve_close().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Handle async events related to link speed change, link speed config
change, and port phy config changes.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
|
|
|
|
|
|
| |
The descriptors and completions in the DQO operating mode are not the
same sizes as the equivalent structures in the GQI operating mode.
Allow the queue stride size to vary by operating mode (and therefore
to be known only after reading the device descriptor and selecting the
operating mode).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Rename data structures and constants that are specific to the GQI
operating mode, to allow for a cleaner separation from other operating
modes.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We currently assume that the buffer index is equal to the descriptor
ring index, which is correct only for in-order queues.
Out-of-order queues will include a buffer tag value that is copied
from the descriptor to the completion. Redefine the data buffers as
being indexed by this tag value (rather than by the descriptor ring
index), and add a circular ring buffer to allow for tags to be reused
in whatever order they are released by the hardware.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Raw DMA addressing allows the transmit and receive descriptors to
provide the DMA address of the data buffer directly, without requiring
the use of a pre-registered queue page list. It is modelled in the
device as a magic "raw DMA" queue page list (with QPL ID 0xffffffff)
covering the whole of the DMA address space.
When using raw DMA addressing, the transmit and receive datapaths
could use the normal pattern of mapping I/O buffers directly, and
avoid copying packet data into and out of the fixed queue page list
ring buffer. However, since we must retain support for queue page
list addressing (which requires this additional copying), we choose to
minimise code size by continuing to use the fixed ring buffer even
when using raw DMA addressing.
Add support for using raw DMA addressing by setting the queue page
list base device address appropriately, omitting the commands to
register and unregister the queue page lists, and specifying the raw
DMA QPL ID when creating the TX and RX queues.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Allow for the existence of a queue page list where the base device
address is non-zero, as will be the case for the raw DMA addressing
(RDA) operating mode.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The "create TX queue" and "create RX queue" commands have fields for
the descriptor and completion ring sizes, which are currently left
unpopulated since they are not required for the original GQI-QPL
operating mode.
Populate these fields, and allow for the possibility that a transmit
completion ring exists (which will be the case when using the DQO
operating mode).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GVE family supports two incompatible descriptor queue formats:
* GQI: in-order descriptor queues
* DQO: out-of-order descriptor queues
and two addressing modes:
* QPL: pre-registered queue page list addressing
* RDA: raw DMA addressing
All four combinations (GQI-QPL, GQI-RDA, DQO-QPL, and DQO-RDA) are
theoretically supported by the Linux driver, which is essentially the
only public reference provided by Google. The original versions of
the GVE NIC supported only GQI-QPL mode, and so the iPXE driver is
written to target this mode, on the assumption that it would continue
to be supported by all models of the GVE NIC.
This assumption turns out to be incorrect: Google does not deem it
necessary to retain backwards compatibility. Some newer machine types
(such as a4-highgpu-8g) support only the DQO-RDA operating mode.
Add a definition of operating mode, and pass this as an explicit
parameter to the "configure device resources" admin queue command. We
choose a representation that subtracts one from the value passed in
this command, since this happens to allow us to decompose the mode
into two independent bits (one representing the use of DQO descriptor
format, one representing the use of QPL addressing).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
The Linux driver occasionally uses the terminology "packet descriptor"
to refer to the portion of the descriptor excluding the buffer
address. This is not a helpful separation, and merely adds
complexity.
Simplify the code by removing this artifical separation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
| |
Provide space for the device to return its list of supported options.
Parse the option list and record the existence of each option in a
support bitmask.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
| |
Add support to advertise adapter error recovery support to the
firmware. Implement error recovery operations if adapter fault is
detected. Refactor memory allocation to better align with probe and
open functions.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The chainloaded-device-only "snponly" driver already drags in support
for driving SNP, NII, and MNP devices, on the basis that the user
generally doesn't care which UEFI API is used and just wants to boot
from the same network device that was used to load iPXE.
The multi-device "snp" driver already drags in support for driving SNP
and NII devices, but does not drag in support for MNP devices.
There is essentially zero code size overhead to dragging in support
for MNP devices, since this support is always present in any iPXE
application build anyway (as part of the code to download
"autoexec.ipxe" prior to installing our own drivers).
Minimise surprise by dragging in support for MNP devices whenever
using the "snp" driver, following the same reasoning used for the
"snponly" driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Update completion queue doorbell to a non-arming type, since polling
is used.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
|
|
|
|
| |
Read and display the core version immediately after mapping the MMIO
registers, to provide a basic sanity check that the registers have
been correctly mapped and the core is not held in reset.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
| |
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Remove unnecessary driver specific macros. Use standard
pci_read_config_xxxx, pci_write_config_xxx, writel/q calls.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
| |
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UEFI model for wireless network boot cannot sensibly be described
without cursing. Commit 758a504 ("[efi] Inhibit calls to Shutdown()
for wireless SNP devices") attempts to work around some of the known
issues.
Experimentation shows that on at least some platforms (observed with a
Lenovo ThinkPad T14s Gen 5) the vendor SNP driver is broken to the
point of being unusable in anything other than the single use case
envisioned by the firwmare authors. Doing almost anything directly
via the SNP protocol interface has a greater than 50% chance of
locking up the system.
Assume, in the absence of any evidence to the contrary so far, that
vendor SNP drivers for wireless network devices are so badly written
as to be unusable. Refuse to even attempt to interact with these
drivers via the SNP or NII protocol interfaces.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Add a basic driver for the DesignWare Ethernet MAC network interface
as found in the Lichee Pi 4A.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The data cache must be invalidated twice for RX DMA buffers: once
before passing ownership to the DMA device (in case the cache happens
to contain dirty data that will be written back at an undefined future
point), and once after receiving ownership from the DMA device (in
case the CPU happens to have speculatively accessed data in the buffer
while it was owned by the hardware).
Only the used portion of the buffer needs to be invalidated after
completion, since we do not care about data within the unused portion.
Update the DMA API to include the used length as an additional
parameter to dma_unmap(), and add the necessary second cache
invalidation pass to the RISC-V DMA API implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
|
|
|
| |
Cache management operations must generally be performed on virtual
addresses rather than physical addresses.
Change the address parameter in dma_map() to be a virtual address, and
make dma() the API-level primitive instead of dma_phys().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
|
| |
|
|
|
|
|
| |
Add support for new device IDs. Remove device IDs which were never
in use.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
|
|
| |
Use human readable strings for dev_description in PCI_ROM array.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|
| |
|
|
|
|
|
|
| |
Remove logic that programs the hardware to strip out VLAN from RX
packets. Do not drop packets due to VLAN mismatch and allow the upper
layer to decide whether to discard the packets.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
|