summaryrefslogtreecommitdiffstats
path: root/src/include/ipxe/tcp.h
Commit message (Collapse)AuthorAgeFilesLines
* [tcp] Add missing packed attribute on struct tcp_headerMichael Brown2018-04-191-1/+1
| | | | | | Debugged-by: Mark Rutland <mark.rutland@arm.com> Debugged-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Send TCP keepalives on idle established connectionsMichael Brown2016-06-131-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | In some circumstances, intermediate devices may lose state in a way that temporarily prevents the successful delivery of packets from a TCP peer. For example, a firewall may drop a NAT forwarding table entry. Since iPXE spends most of its time downloading files (and hence purely receiving data, sending only TCP ACKs), this can easily happen in a situation in which there is no reason for iPXE's TCP stack to generate any retransmissions. The temporary loss of connectivity can therefore effectively become permanent. Work around this problem by sending TCP keepalives after a period of inactivity on an established connection. TCP keepalives usually send a single garbage byte in sequence number space that has already been ACKed by the peer. Since we do not need to elicit a response from the peer, we instead send pure ACKs (with no garbage data) in order to keep the transmit code path simple. Originally-implemented-by: Ladi Prosek <lprosek@redhat.com> Debugged-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Guard against malformed TCP optionsMichael Brown2016-01-281-2/+0Star
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Gracefully close connections during shutdownMichael Brown2015-07-041-0/+7
| | | | | | | | | | | | | | | We currently do not wait for a received FIN before exiting to boot a loaded OS. In the common case of booting from an HTTP server, this means that the TCP connection is left consuming resources on the server side: the server will retransmit the FIN several times before giving up. Fix by initiating a graceful close of all TCP connections and waiting (for up to one second) for all connections to finish closing gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and for the incoming FIN to have been received and ACKed at least once). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Implement support for TCP Selective Acknowledgements (SACK)Michael Brown2015-03-121-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TCP Selective Acknowledgement option (specified in RFC2018) provides a mechanism for the receiver to indicate packets that have been received out of order (e.g. due to earlier dropped packets). iPXE often operates in environments in which there is a high probability of packet loss. For example, the legacy USB keyboard emulation in some BIOSes involves polling the USB bus from within a system management interrupt: this introduces an invisible delay of around 500us which is long enough for around 40 full-length packets to be dropped. Similarly, almost all 1Gbps USB2 devices will eventually end up dropping packets because the USB2 bus does not provide enough bandwidth to sustain a 1Gbps stream, and most devices will not provide enough internal buffering to hold a full TCP window's worth of received packets. Add support for sending TCP Selective Acknowledgements. This provides the sender with more detailed information about which packets have been lost, and so allows for a more efficient retransmission strategy. We include a SACK-permitted option in our SYN packet, since experimentation shows that at least Linux peers will not include a SACK-permitted option in the SYN-ACK packet if one was not present in the initial SYN. (RFC2018 does not seem to mandate this behaviour, but it is consistent with the approach taken in RFC1323.) We ignore any received SACK options; this is safe to do since SACK is only ever advisory and we never have to send non-trivial amounts of data. Since our TCP receive queue is a candidate for cache discarding under low memory conditions, we may end up discarding data that has been reported as received via a SACK option. This is permitted by RFC2018. We follow the stricture that SACK blocks must not report data which is no longer held by the receiver: previously-reported blocks are validated against the current receive queue before being included within the current SACK block list. Experiments in a qemu VM using forced packet drops (by setting NETDEV_DISCARD_RATE to 32) show that implementing SACK improves throughput by around 400%. Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK improves throughput by around 700%, increasing the download rate from 35Mbps up to 250Mbps (which is approximately the usable bandwidth limit for USB2). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [legal] Relicense files under GPL2_OR_LATER_OR_UBDLMichael Brown2015-03-021-1/+1
| | | | | | | | | | These files cannot be automatically relicensed by util/relicense.pl since they either contain unusual but trivial contributions (such as the addition of __nonnull function attributes), or contain lines dating back to the initial git revision (and so require manual knowledge of the code's origin). Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Calculate correct MSS from peer addressMichael Brown2014-03-041-10/+0Star
| | | | | | | | | | | | | | | | | | | iPXE currently advertises a fixed MSS of 1460, which is correct only for IPv4 over Ethernet. For IPv6 over Ethernet, the value should be 1440 (allowing for the larger IPv6 header). For non-Ethernet link layers, the value should reflect the MTU of the underlying network device. Use tcpip_mtu() to calculate the transport-layer MTU associated with the peer address, and calculate the MSS to allow for an optionless TCP header as per RFC 6691. As a side benefit, we can now fail a connection immediately with a meaningful error message if we have no route to the destination address. Reported-by: Anton D. Kachalov <mouse@yandex-team.ru> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Reduce path MTU to 1280 bytesMichael Brown2013-09-041-3/+12
| | | | | | | | | | | | | | | | | | The path MTU is currently hardcoded to 1460 bytes, which fails to allow space for TCP options. Sending a maximum-sized datagram (which is viable when using HTTP POST) will therefore fail since the Ethernet MTU will be exceeded. Reduce the hardcoded path MTU to produce a maximum datagram of 1280 bytes, which is the size required of data link layers by IPv6. It is a reasonable assumption that all intermediary data link layers will be able to convey this packet without fragmentation, even for IPv4. Note that this reduction has a minimal impact upon download throughput, since it affects only the transmit data path. Originally-fixed-by: Suresh Sundriyal <ssundriy@vmware.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Increase maximum window size to 256kBMichael Brown2012-07-091-22/+24
| | | | | | | | A window size of 256kB should be sufficient to allow for full-bandwidth transfers over a Gigabit LAN, and for acceptable transfer speeds over other typical links. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Add support for TCP window scalingMichael Brown2012-06-291-1/+29
| | | | | | | | The maximum unscaled TCP window (64kB) implies a maximum bandwidth of around 300kB/s on a WAN link with an RTT of 200ms. Add support for the TCP window scaling option to remove this upper limit. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Allow sufficient headroom for TCP headersMichael Brown2011-09-191-0/+10
| | | | | | | | | | | | | | | TCP currently neglects to allow sufficient space for its own headers when allocating I/O buffers. This problem is masked by the fact that the maximum link-layer header size (802.11) is substantially larger than the common Ethernet link-layer header. Fix by allowing sufficient space for any TCP headers, as well as the network-layer and link-layer headers. Reported-by: Scott K Logan <logans@cottsay.net> Debugged-by: Scott K Logan <logans@cottsay.net> Tested-by: Scott K Logan <logans@cottsay.net> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Remove obsolete constantsMichael Brown2010-11-191-4/+0Star
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Use MAX_LL_NET_HEADER_LEN instead of defining our own MAX_HDR_LENMichael Brown2010-11-191-1/+0Star
| | | | Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Fix misaligned table entries when using gcc 4.5Piotr Jaroszyński2010-08-201-1/+1
| | | | | | | | | | Declarations without the accompanying __table_entry cause misalignment of the table entries when using gcc 4.5. Fix by adding the appropriate __table_entry macro or (where possible) by removing unnecessary forward declarations. Signed-off-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [tcp] Handle out-of-order received packetsMichael Brown2010-07-211-1/+30
| | | | | | | | | | | | | | | Maintain a queue of received packets, so that lost packets need not result in retransmission of the entire TCP window. Increase the TCP window to 8kB, in order that we can potentially transmit enough duplicate ACKs to trigger Fast Retransmission at the sender. Using a 10MB HTTP download in qemu-kvm with an artificial drop rate of 1 in 64 packets, this reduces the download time from around 26s to around 4s. Signed-off-by: Michael Brown <mcb30@ipxe.org>
* [build] Rename gPXE to iPXEMichael Brown2010-04-201-0/+318
Access to the gpxe.org and etherboot.org domains and associated resources has been revoked by the registrant of the domain. Work around this problem by renaming project from gPXE to iPXE, and updating URLs to match. Also update README, LOG and COPYRIGHTS to remove obsolete information. Signed-off-by: Michael Brown <mcb30@ipxe.org>