summaryrefslogtreecommitdiffstats
path: root/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt
diff options
context:
space:
mode:
Diffstat (limited to '3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt')
-rw-r--r--3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt1226
1 files changed, 1226 insertions, 0 deletions
diff --git a/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt b/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt
new file mode 100644
index 0000000..6f1869c
--- /dev/null
+++ b/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt
@@ -0,0 +1,1226 @@
+
+Internet Engineering Task Force RMT WG
+INTERNET-DRAFT Luigi Rizzo/U. Pisa
+draft-ietf-rmt-bb-pgmcc-03.txt Gianluca Iannaccone/Intel
+ Lorenzo Vicisano/Cisco
+ Mark Handley/UCL
+ 12 July 2004
+ Expires: January 2005
+
+
+ PGMCC single rate multicast congestion control:
+ Protocol Specification
+
+
+
+Status of this Document
+
+This document is an Internet-Draft and is in full conformance with all
+provisions of Section 10 of RFC2026.
+
+Internet-Drafts are working documents of the Internet Engineering Task
+Force (IETF), its areas, and its working groups. Note that other groups
+may also distribute working documents as Internet-Drafts.
+
+Internet-Drafts are valid for a maximum of six months and may be
+updated, replaced, or obsoleted by other documents at any time. It is
+inappropriate to use Internet-Drafts as reference material or to cite
+them other than as a "work in progress".
+
+The list of current Internet-Drafts can be accessed at
+http://www.ietf.org/ietf/1id-abstracts.txt
+
+To view the list Internet-Draft Shadow Directories, see
+http://www.ietf.org/shadow.html.
+
+This document is a product of the IETF RMT WG. Comments should be
+addressed to the authors, or the WG's mailing list at rmt@lbl.gov.
+
+
+ Abstract
+
+
+ This document describes PGMCC, a single rate multicast
+ congestion control scheme which is TCP-friendly and achieves
+ scalability, stability and fast response to variations in
+ network conditions. PGMCC is suitable for both non-reliable
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley [Page 1]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ and reliable data transfers. It is mainly designed for NAK-
+ based multicast protocols, and uses a window-based, TCP-like
+ control loop using positive ACKs between one representative of
+ the receiver group (the ACKER) and the sender. The ACKER is
+ selected dynamically and may change over time.
+
+ PGMCC is made of two components: a window-based control loop,
+ which closely mimics TCP behavior, and a fast and low-overhead
+ procedure to select (and track changes of) the ACKER. The
+ scheme is robust to measurement errors, and supports fast
+ response to changes in the receiver set and/or network
+ conditions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley [Page 2]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ Table of Contents
+
+
+ 1. Introduction. . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1. Terminology. . . . . . . . . . . . . . . . . . . . . 4
+ 2. Protocol Overview . . . . . . . . . . . . . . . . . . . 4
+ 2.1. Packet Contents. . . . . . . . . . . . . . . . . . . 6
+ 2.1.1. Data Packets. . . . . . . . . . . . . . . . . . . 6
+ 2.1.2. Feedback Packets. . . . . . . . . . . . . . . . . 7
+ 2.1.3. Field sizes and formats . . . . . . . . . . . . . 8
+ 2.2. Window-based controller. . . . . . . . . . . . . . . 9
+ 2.3. Acker Selection. . . . . . . . . . . . . . . . . . . 11
+ 2.3.1. Initial Acker election. . . . . . . . . . . . . . 11
+ 2.3.2. Acker dropouts. . . . . . . . . . . . . . . . . . 12
+ 2.4. TCP Throughput Equation. . . . . . . . . . . . . . . 12
+ 2.5. RTT measurement. . . . . . . . . . . . . . . . . . . 13
+ 2.5.1. Explicit Timestamp. . . . . . . . . . . . . . . . 13
+ 2.5.2. Implicit timestamp. . . . . . . . . . . . . . . . 13
+ 2.5.3. Sequence numbers. . . . . . . . . . . . . . . . . 14
+ 2.5.4. Recommendations . . . . . . . . . . . . . . . . . 15
+ 2.6. Loss rate measurement. . . . . . . . . . . . . . . . 15
+ 2.7. Timeouts . . . . . . . . . . . . . . . . . . . . . . 16
+ 2.8. Interaction with feedback suppression
+ schemes . . . . . . . . . . . . . . . . . . . . . . . . . 16
+ 2.9. Interaction with ECN . . . . . . . . . . . . . . . . 17
+ 3. Procedures - Sender . . . . . . . . . . . . . . . . . . 17
+ 4. Procedures -- Receiver. . . . . . . . . . . . . . . . . 18
+ 5. Security Considerations . . . . . . . . . . . . . . . . 19
+ 6. Authors' Addresses. . . . . . . . . . . . . . . . . . . 20
+ 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . 20
+ 8. Full Copyright Statement. . . . . . . . . . . . . . . . 21
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley [Page 3]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+1. Introduction
+
+This document describes PGMCC, a single rate multicast congestion
+control scheme which is TCP-friendly and achieves scalability, stability
+and fast response to variations in network conditions.
+
+PGMCC is designed for multicast sessions with one sender and one or more
+receivers, and is a good match for transport protocols using negative
+acknowledgements (NAKs) to collect feedback from the receivers. The
+congestion control scheme implemented by PGMCC closely mimics the
+congestion-control behavior of TCP, as it uses a window-based control
+loop which is run between the sender and a selected receiver called the
+ACKER. The role of the ACKER is to provide timely feedback in the same
+way as a TCP receiver; additionally, the ACKER is selected dynamically
+among the receivers as the one which would experience the lowest
+throughput if separate TCP sessions were run between the sender and each
+of the receivers.
+
+Scalability in PGMCC comes from the use of negative acknowledgements
+(NAKs) for collecting feedback from receivers other than the ACKER. As
+a consequence, the usual techniques for NAK suppression and aggregation
+can be used to reduce the amount of feedback to the source and improve
+the scalability of the scheme.
+
+PGMCC is designed to completely decouple congestion control from data
+integrity. As a consequence, the scheme can work with both reliable data
+transfer and unreliable communication protocols such as those used for
+video or audio streaming.
+
+While designed with multicast in mind, PGMCC can be equally used as a
+replacement for TCP for unicast sessions which require a lower degree of
+reliability than what TCP offers.
+
+
+1.1. Terminology
+
+In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
+"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" are to be interpreted as described in RFC 2119 and indicate
+requirement levels for compliant PGMCC implementations.
+
+
+2. Protocol Overview
+
+PGMCC is based on two separate but complementary mechanisms:
+
+ o A window-based control loop which closely emulates TCP congestion
+ control.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2. [Page 4]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ The window-based control loop is simply an adaptation of the TCP
+ congestion control scheme to transport protocols where missing
+ (because of network errors or congestion) data packets are not
+ necessarily retransmitted, and so the congestion control scheme
+ cannot rely on cumulative acknowledgements. In PGMCC, the
+ ``congestion window'' is simulated using a token-based scheme which
+ permits congestion control to be decoupled from retransmission
+ state. One of the receivers in the group operates as the ACKER, i.e.
+ the node in charge of sending positive acknowledgements back to the
+ source and thus controlling the rate of the transfer.
+
+
+ o A procedure to select the ACKER.
+ The purpose of this procedure is to make sure that, in presence of
+ multiple receivers, the ACKER is dynamically selected to be the
+ receiver which would have the lowest throughput if separate TCP
+ sessions were run between the sender and each receiver.
+ For the acker selection mechanism, PGMCC uses a throughput equation
+ to determine the expected throughput for a given receiver as a
+ function of the loss rate and round-trip time. Unlike other schemes
+ [2], the TCP throughput equation is not used to determine the actual
+ sending rate, which is completely controlled by the window-based
+ control loop.
+
+
+In principle, PGMCC's congestion control mechanism works as follows:
+
+
+ o Receivers measure the loss rate and feed this information back to
+ the sender, either in ACK or NAK messages.
+
+
+ o The sender also uses these feedback messages to measure the round-
+ trip time (RTT) to each receiver.
+
+
+ o The loss rate and RTT are then fed into PGMCC's throughput equation,
+ to determine the expected throughput to that receiver.
+
+
+ o The sender then selects as the acker the receiver with the lowest
+ expected throughput, as computed by the equation.
+
+The dynamics of the acker selection mechanism are sensitive to how the
+measurements are performed and applied. In the rest of this document we
+suggest specific mechanisms to perform and apply these measurements.
+Other mechanisms are possible, but it is important to understand how the
+interactions between mechanisms affect the dynamics of PGMCC.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2. [Page 5]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+2.1. Packet Contents
+
+Before specifying the sender and receiver functionality, we describe the
+information required by PGMCC to perform its tasks. This information is
+carried in the data packets sent by the sender, and in the feedback
+packets sent by the receiver. As PGMCC will be used along with some
+transport protocol, the actual data and feedback packets will contain
+further information for use by the protocol itself. For this reason, we
+do not specify packet formats, as these depend on the details of the
+transport protocol used.
+
+Note that the requirements of the transport protocol in terms of packet
+generation may differ from those of PGMCC. As an example, most NAK-based
+reliable multicast protocols do not use positive acknowledgements, but
+PGMCC requires ACKs for clocking out data packets; unreliable transport
+protocols might have no interest in generating NAKs for data integrity
+purposes, yet PGMCC depends on NAKs reaching the data sender in order to
+elect the ACKER.
+
+
+Implementors may decide to insert PGMCC-related information in already
+existing protocol packets whenever possible, but in cases such as the
+ones described in the previous paragraph, it might be necessary to
+define and generate new packets exclusively for congestion control
+purposes. As an example, in a prototype implementation of PGMCC on top
+of the PGM protocol [7], some of the information used by PGMCC is
+already present in the original protocol packets, and PGMCC-specific
+information is carried as PGM options in ODATA and NAK packets. However,
+a new packet type has been defined for ACKs, which are generated
+according to the rules defined in this document.
+
+
+2.1.1. Data Packets
+
+Each data packet sent by the data sender contains the following
+information:
+
+
+ o A SEQUENCE NUMBER. This number is incremented by one for each data
+ packet transmitted. The field must be sufficiently large that it
+ does not wrap causing two different packets with the same sequence
+ number to be in the receiver's recent packet history at the same
+ time.
+
+
+ o A TIMESTAMP (or equivalent information, see Section 2.5) indicating
+ when the packet with this sequence number has been sent. There is
+ no requirement for synchronized clocks between the sender and the
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.1.1. [Page 6]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ receivers. The timestamp is used to measure network round-trip
+ times, so needs sufficient resolution for this task. A resolution
+ of 1ms would be adequate.
+
+
+ o The ACKER IDENTITY, i.e. the identity of the receiver in charge of
+ sending an acknowledgement for this data packet. The ACKER is
+ elected as a result of the process described in Section 2.3.
+ A special value is used to indicate that no ACKER is designated for
+ this packet -- this can happen at the beginning of a session or when
+ the current ACKER leaves the group. Receivers interpret this value
+ as a request to elect a new acker.
+
+
+2.1.2. Feedback Packets
+
+There are two types of feedback packets used by PGMCC: ACK packets and
+NAK packets.
+ACK packets are generated by the current ACKER, and are used to detect
+loss or successful delivery of packets, and to regulate the throughput
+accordingly. ACK packets also contain information used to determine the
+TCP-equivalent throughput for the ACKER.
+NAK packets are sent by any receiver who experiences loss. They contain
+information used to determine the TCP-equivalent throughput for that
+receiver. In an actual protocol instantiation (such as PGM [7]), NAK
+packets might also be used by the protocol to request the retransmission
+of specific packets, and indicate the identity of the packet being
+requested.
+
+Both ACK and NAK packets are sent by data receivers, and contain the
+following information:
+
+
+ o The TIMESTAMP (or equivalent information) derived from the most
+ recently received data packet according to one of the techniques
+ described in Section 2.5.
+ This value is used by the sender to measure the RTT to the receiver
+ who generated this feedback packet.
+
+
+ o ``p'', the receiver's current estimate of the LOSS RATE. The loss
+ rate is measured by receivers as described in Section 2.6
+
+In addition to the above, ACK packets (sent by the acker designated in
+the corresponding data packets) must also contain the following
+information:
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.1.2. [Page 7]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ o RX_MAX, the highest sequence number among received data packets
+ (taking care to deal with sequence number wrapping correctly).
+
+ o ACK_BITMAP, a bitmap indicating the receive status of the latest N
+ (typically N=32) data packets with sequence numbers RX_MAX-(N-1) to
+ RX_MAX.
+
+
+This information is used by the sender to record which packets have been
+received or lost, and manipulate the transmit window accordingly. Note
+that each ACK packet contains information about multiple packets, and
+this increases the robustness of the scheme to loss of ACK packets.
+This is necessary because ACKs are not sent reliably (unlike TCP's ACKs,
+which are cumulative).
+
+
+2.1.3. Field sizes and formats
+
+The following sizes and formats are suggested for the various fields
+used by PGMCC and transmitted over the network:
+
+
+ o SEQUENCE NUMBERS
+ 32 bit, unsigned, network order.
+
+
+ o TIMESTAMPS
+ 32 bit, unsigned, network order. A resolution of 1ms or better is
+ desirable.
+
+
+ o ACKER IDENTITY
+ Same size and format of a network layer address (e.g. 32 bit for
+ IPv4). Note though that using an IP address for the Acker Identify
+ will cause problems with NAT traversal. Transport protocol
+ designers might examine the SSRC mechanism used by RTP [6] as an
+ alternative form of node identifier that could be used as Acker
+ Identity.
+
+
+ o LOSS RATE (``p'')
+ 16-bit unsigned integer, in network format, with 0 indicating no
+ loss and 2^16-1 indicating 100% loss.
+
+
+ o ACK BITMAP
+ 32-bit, in network format, with least significant bit indicating
+ receive status of packet RX_MAX.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.1.3. [Page 8]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+2.2. Window-based controller
+
+In a window-based congestion control scheme such as TCP, the
+``congestion window'' represents, among other things, the maximum amount
+of packets in flight at any time, which in turn controls the throughput
+of the session. The sender keeps track of the actual number of packets
+in flight, basing on its transmissions and the reception of
+acknowledgements.
+
+The sender may dynamically change the size of the window, according to
+the congestion control scheme being used. In TCP, and PGMCC, an
+``Additive Increase Multiplicative Decrease'' (AIMD) scheme is used: in
+absence of loss, the window is increased by some fixed amount (typically
+one packet) per round trip time (RTT), whereas upon loss the window is
+reduced to a fraction of its original value (typically halved) in each
+RTT in which a loss event is experienced.
+
+In PGMCC the window is managed using a token-based mechanism, controlled
+by two variables:
+
+ o A ``Window Size'', W, which describes the current window size in
+ packets.
+
+ o A ``Token Count'', T, which indicates the number of packets that can
+ be transmitted. T is bounded above by W. It is decremented every
+ time a packet is transmitted, and incremented every time an ACK is
+ received, according to the rules below.
+
+Note that these two variables need to hold non-integer data. Typically
+a fixed point representation with at least 16 bits for both integer and
+fractional parts would be acceptable for implementation purposes.
+
+The information contained in each ACK is used to determine how many new
+packets are acknowledged by that ACK, and whether there are
+unacknowledged packets which were not reported in previous ACKs. The
+sender also schedules a timeout to react in case no ACKs are received.
+
+The sender behaves as follows:
+
+
+ o INITIALIZATION
+ At startup, or after a timeout, both W and T are set to 1.
+
+
+ o ACK RECEPTION, NO LOSS DETECTED
+ If the incoming ACK reports new acknowledged packets, and no loss
+ (as defined in the next paragraph) is detected, then the window is
+ inflated by one packet per RTT.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.2. [Page 9]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ NOTE: during the slow-start phase, TCP opens the window
+ exponentially up to the SSTHRESH value, which is computed by TCP
+ according to the dynamics of the session and updated upon losses.
+
+ We do recommend that PGMCC uses a similar strategy, but using a
+ fixed, small value for SSTHRESH (e.g. 4 packets). In fact, due to
+ the dynamicity of the ACKER, which might change on every single
+ packet, it is hard to compute a reliable estimate of the SSTHRESH
+ without keeping state for multiple receivers, and the benefits are
+ small in any event.
+
+ In summary, the reaction to ACK reception on no loss modifies T and
+ W as follows (here, N is the number of new packets acknowledged by
+ the incoming ACK):
+
+ if (W < SSTHRESH) then
+ D = min(N, SSTHRESH - W) // use the first D acks for
+ exp.opening
+ N = N - D // and the remaining ones for
+ linear opening
+ T = T + 2*D
+ W = W + D
+ endif
+ // do linear window opening with the remaining acks
+ T = T + N * ( 1 + 1/W )
+ W = W + N/W
+
+
+ o PACKET TRANSMISSION
+ One token is consumed for each packet transmitted:
+
+ T = T - 1
+
+
+ o ACK RECEPTION, LOSS DETECTED
+ If the incoming ACK reports an unacknowledged data packet which is
+ followed by at least 3 acknowledged data packets, then the packet is
+ assumed to be lost and PGMCC reacts by halving the window, in the
+ same way as TCP after 3 duplicate acknowledgements. This is
+ achieved by modifying T and W as follows:
+
+ T = T - W/2 , W = W/2
+
+ to simulate the multiplicative decrease.
+ Additionally, all window manipulation is suspended for the
+ subsequent RTT. This is achieved by recording the current transmit
+ sequence number, and canceling any further manipulation of the
+ window until feedback is received for the next transmitted packet,
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.2. [Page 10]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+ or until a timeout occurs.
+
+
+
+
+2.3. Acker Selection
+
+The ACKER selection process in PGMCC aims at locating the receiver which
+would have the lowest throughput if each receiver were using a separate
+TCP connection to transfer data.
+
+Because the steady-state throughput of a TCP connection can be
+characterized in a reasonably accurate way in terms of its loss rate and
+round trip time [3], the throughput for each receiver can be estimated
+by using these two parameters.
+
+Whenever an ACK or NAK packet from any of the receivers reaches it, the
+sender is able to compute the expected throughput T_i for that receiver
+by using the equation shown in Section 2.4, with the round trip time RTT
+and loss rate p and measured as described in Sections 2.5 and 2.6,
+respectively. At any given time, the sender stores the expected
+throughput for the current ACKER, T_acker. This value is updated every
+time an ACK or NAK from the current ACKER is received (note that, after
+a new ACKER is selected, the sender will typically receive ACKs from the
+old ACKER for one RTT, and the feedback from different ACKERs might be
+interleaved if the paths leading to them have different round trip
+times).
+
+Whenever an ACK or NAK is received from another node i (a previous ACKER
+or some other receiver), the expected throughput T_i for that node is
+computed, and compared with T_acker. Node i is selected as the new
+acker if
+
+ T_i < C * T_acker
+
+where the constant C between 0 and 1 provides some hysteresis and avoids
+too frequent oscillations in the choice of the ACKER. A suggested value
+for C is 0.75.
+
+Note that, from an implementation point of view (see Section 2.4), it is
+more convenient to compute T_i ^(-2), so the above equation must be
+modified accordingly.
+
+
+2.3.1. Initial Acker election
+
+Upon reception of a data packet reporting that no acker is currently
+selected, receivers generate a dummy NAK report which is used to elect
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.3.1. [Page 11]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+the initial acker. The NAK is sent with the usual feedback suppression
+mechanism dictated by the transport protocol (possibly with shorter time
+constants) to avoid feedback implosion, and the sender will select the
+source of the first incoming NAK as the new ACKER.
+
+
+2.3.2. Acker dropouts
+
+
+If the ACKER decides to disconnect from the session, it can cause the
+session to stop. To avoid this, it is recommended that an ACKER deciding
+to leave the session informs the sender by sending an ACK packet (or a
+duplicate) carrying an "ACKER_LEAVING" option. The reception of this
+packet by the sender will in turn trigger an initial acker election
+phase.
+
+
+
+2.4. TCP Throughput Equation
+
+Any realistic equation of TCP throughput as a function of loss event
+rate and RTT should be suitable for use in PGMCC. Unlike other schemes
+[2] where the throughput equation directly controls the transmit rate,
+in PGMCC the equation is used only for acker selection purposes, and the
+throughput values are only compared among themselves. As a consequence,
+we can use the following equation, derived from the one presented in [3]
+by setting RTO = 4 * RTT (as it is common practice):
+
+ M = 1/T = RTT_i * sqrt(p) * (1 + 9*p * (1 + 32*(p)^2))
+
+
+where
+
+ M = 1/T is proportional to the inverse of the throughput for the
+ receiver under consideration;
+
+ RTT is the round trip time for the receiver under consideration;
+
+ p is the loss rate for the receiver under consideration, between 0
+ and 1.0;
+
+and multiplying constants are omitted.
+
+The above equation is accurate on a wide range of loss rates, and also
+covers situations where retransmission timeouts have a significant
+impact on the throughput of the protocol.
+
+Note that when p=0, the equation yields 1/T = M = 0. This does not
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.4. [Page 12]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+constitute a problem as we can still compare the M values computed for
+different receivers to determine the acker. Also note that it is easier
+to compute M^2 instead of M, because the former does not require the use
+of sqrt().
+
+In future, different throughput equations may be substituted for this
+equation. The requirement is that the throughput equation be a
+reasonable approximation of the sending rate of TCP for conformant TCP
+congestion control.
+
+The parameters p and RTT need to be measured or calculated by a PGMCC
+implementation. The measurement of RTT is specified in Section 2.5; the
+measurement of p is specified in Section 2.6.
+
+2.5. RTT measurement
+
+In PGMCC, the RTT is measured by the sender making use of the timestamp
+(or equivalent information) echoed back by each receiver in feedback
+messages. Three procedures are possible to measure the RTT, as follows.
+In no case is it required to have clock synchronization between sender
+and receivers.
+
+
+2.5.1. Explicit Timestamp
+
+This first technique relies on the transmission of a timestamp TS_j with
+each data packet j.
+The receiver will record the most recently received timestamp, and will
+echo it back to the source when generating an ACK or a NAK. If the
+feedback is delayed, the time elapsed between the reception of the
+timestamp and the generation of the feedback should be added to the
+echoed timestamp.
+The sender computes the RTT by subtracting the received timestamp from
+the current value of the clock.
+
+The resolution of the timestamp value should be good enough for
+reasonable precision measurement of typical network round trip times. If
+receivers need to apply correction for delayed feedback, it is necessary
+that receivers know the resolution of the timestamp clock. A suggested
+value is 1ms.
+
+
+2.5.2. Implicit timestamp
+
+With this technique, the sender will record a timestamp TS_j for each
+transmitted data packet j, but the timestamp will not be transmitted
+with the packet itself.
+The receiver will record the most recently received sequence number, and
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.5.2. [Page 13]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+will echo it back to the source when generating an ACK or a NAK.
+The sender computes the RTT by looking up the timestamp associated with
+the sequence number received in the feedback packet, and subtracting it
+from the current clock value.
+
+If the feedback from the receiver is delayed, as it is commonly the case
+for NAKs, the receiver can compute, and send back to the source, a
+correction term corresponding to the time elapsed between the reception
+of the timestamp and the generation of the feedback. The correction term
+will then be subtracted by the sender in order to obtain the correct
+estimate of the RTT.
+
+This RTT measurement technique is equivalent to the previous one, but it
+saves some space in data packets as the timestamp does not need to be
+sent explicitly. Feedback packets might become larger if the correction
+value is transmitted explicitly; but in many cases, the sequence number
+will already be present for other reasons (e.g. ACK packets), and
+wherever space is a concern the sequence number and the correction term
+can be packed in a single 32-bit word without loss of precision.
+
+
+2.5.3. Sequence numbers
+
+This technique is the least precise, but it does not rely on the
+presence of a high resolution clock on the nodes.
+The sender will not compute any timestamp, and just send data packets
+with their sequence number j.
+The receiver will record the most recently received sequence number, and
+will echo it back to the source when generating an ACK or a NAK.
+The sender computes the RTT as the difference between the most recently
+sent sequence number and the sequence number received from the ACK or
+NAK packet.
+
+Note that in this case the RTT is not measured in seconds, but in
+"sequence numbers", which are monotonically, but not uniformly,
+increasing with time. The two measurements are equivalent if the sender
+transmits at a constant rate. When the data rate changes over time (as
+it is normally happens, given that PGMCC controls the actual data rate),
+then the "measured" RTT values grow with the actual transmit rate. This
+can influence the correctness of the results when comparing two
+measurement done over different and only partially overlapping time (and
+sequence number) intervals where the transmit rate incurs a significant
+change.
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.5.3. [Page 14]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+2.5.4. Recommendations
+
+Whenever possible, the measurement of the RTT should be carried out
+using either explicit or implicit timestamps, and by keeping track of
+the "correction term" (the delay between data reception and feedback
+generation).
+
+If the receiver does not have a clock with a suitable resolution, the
+correction term might not be present (or be inaccurate). In this case
+the timestamps received by the sender on NAK packets might be in error,
+in the worst case, by as much as the packet interarrival time. This
+error will normally not be present on ACK packets, which are sent
+immediately. A suitable correction should be applied by the sender in
+order to avoid systematic errors.
+
+The measurement based on sequence numbers is less accurate, but also
+less sensitive to errors due to the lack of the correction term. In
+fact, the measurement error induced by the lack of the correction term
+can be at most one unit. This suggests that, when the correction term
+is not available, measurements based on sequence numbers should be
+favoured. Simulations have shown that the acker selection mechanism
+performs moderately better when the RTT measurement is based on
+timestamps, but performance is reasonably good also with measurements
+based on sequence numbers.
+
+
+
+2.6. Loss rate measurement
+
+
+The loss measurement in PGMCC is entirely performed by receivers. The
+measurement results do not directly influence the transmit rate, but are
+only used for comparison purposes. As a consequence, the scheme is
+reasonably robust to different measurement techniques, as long as they
+are not influenced too strongly by single loss events.
+
+The main method suggested for loss measurement is Exponentially Weighted
+Moving Average (EWMA), which is formally equivalent to a single-pole
+digital low pass filter applied to a binary signal x_i, where x_i = 1 if
+packet i is lost, x_i = 0 if packet i is successfully received.
+
+The loss rate p_i upon reception or detection of loss of packet i is
+computed as
+
+
+ p_i = c_p * p_{i-1} + (1 - c_p ) * p_i
+ where the constant c_p between 0 and 1 is related to the bandpass of
+the filter. Experiments have shown good performance with c = 500/65536,
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.6. [Page 15]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+and computations performed with fixed point arithmetic and 16 fractional
+digits.
+
+As an alternative to EWMA, the technique used in TFRC [2] can be
+adopted. Simulations have shown a moderate improvement in the acker
+selection mechanism by measuring loss using the TFRC loss estimator,
+which is however slightly more expensive to compute than the EWMA loss
+estimator in presence of packet reordering.
+
+
+2.7. Timeouts
+
+
+When a packet is transmitted, the sender schedules a timeout to prevent
+stalls upon loss of ACKs or disconnection of the ACKER. In TCP, which
+has a similar problem, the timeout value is computed by accumulating
+statistics (SRTT and RTTVAR) on RTT samples, starting from a default
+initial value (3s) when no RTT samples are available.
+
+PGMCC can use a similar scheme to compute the timeouts, remembering that
+upon ACKER changes (which may be very frequent), the computation of SRTT
+and RTTVAR must be restarted from the beginning, unless the sender
+decides to keep state for at least a small number of recent ackers.
+
+Because the ACKER can leave the group without notifying the sender,
+after a number of successive timeouts the sender MUST force the election
+of a new ACKER. We recommend this new election to be performed after
+two successive timeouts.
+
+
+2.8. Interaction with feedback suppression schemes
+
+
+Several schemes are used by NAK-based multicast protocols to reduce the
+amount of feedback directed toward the source and make the protocol
+scale with large populations of receivers. Such schemes typically rely
+on randomly delaying NAK generation, and suppressing pending NAKs when
+an equivalent NAK or a retransmission is heard; or, intermediate nodes
+such as routers can implement some form of feedback aggregation and
+filtering.
+
+Such schemes might prevent NAKs from potential ACKER candidates from
+reaching the source. This filtering might impact the speed at which
+PGMCC selects the correct ACKER, though initial experience from
+simulations seem to suggest that PGMCC behavior is not severely affected
+by NAK suppression schemes.
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 2.8. [Page 16]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+2.9. Interaction with ECN
+
+
+PGMCC can use ECN notifications in much the same way as actual losses,
+and use such notifications to control the throughput of the session.
+
+At the receiver, ECN-marked data packets can be considered as lost
+packets for the purpose of loss rate computation and ACK/NAK generation.
+If the ACKER sends an ACK for ECN-marked packets, that ACK MUST report
+that the packet being acknowledged that was ECN marked. Similarly the
+ACKER must indicate in the ACK packet's received packets bitmap that the
+packet was ECN-marked, or that the packet was lost.
+
+We note that to support use of the ECN nonce, the ACK packet's received
+packets bitmap would require two bits per packet being reported.
+
+
+3. Procedures - Sender
+
+The following pseudo-code specifies the complete behavior of the sender
+in PGMCC.
+
+
+initialization:
+ T = 1 ; W = 1 ; /* initialize window and number of tokens */
+ RETRY = 0 ; /* number of consecutive timeouts so far */
+ < initialize p, RTT for acker to default values >
+ ACKER = NO_ACKER; /* no acker is known */
+ < initialize sequence numbers >
+ QUEUED = 0; /* packets waiting to be transmitted */
+
+on transmission request:
+ send_packet() ;
+
+on timeout expiration :
+ T = 1 ; W = 1 ; /* initialize window and number of tokens */
+ if (RETRY < RETRY_MAX)
+ RETRY = RETRY + 1
+ else
+ ACKER = NO_ACKER ; /* old acker is not valid anymore */
+ send_packet() ;
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 3. [Page 17]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+on ACK/NAK reception from receiver I :
+ < compute p and RTT for source of this ACK, see Sec. 2.5 and 2.6 >
+ RETRY = 0 ;
+ if (ACKER == NO_ACKER) { /* select current as acker is no other known */
+ ACKER = I ;
+ T = T + 1 ;
+ }
+ if (ACKER != I)
+ < select acker according to Sec. 2.3 > ;
+ else {
+ <update acker statistics (p_ACKER, RTT_ACKER) >
+ if (packet_type == ACK) {
+ < update_window according to Sec.2.2 >
+ send_packet ;
+ if (ack_pending)
+ update_timeout ;
+ }
+}
+
+send_packet() {
+ if (QUEUED > 0 && T >= 1) {
+ < transmit one packet >
+ T = T - 1 ;
+ QUEUED = QUEUED - 1 ;
+ }
+ if ( <there are unacked packet> )
+ <set a timeout, see Sec.2.7 >
+}
+
+
+
+4. Procedures -- Receiver
+
+The following pseudo-code specifies the complete behavior of the
+receiver in PGMCC.
+
+A receiver only transmits an ACK packet when it receives a data packet
+for which the receiver is designated as the ACKER by the data packet
+itself. A receiver can transmit a NAK packet after it has detected that
+a data packet is missing and a suitable delay has passed, as dictated by
+the feedback suppression rules of the protocol in use.
+
+The data packet contains acknowledgement status about the most recent 32
+sequence numbers known to the receiver.
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 4. [Page 18]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+on initialization/session setup:
+ < initialize state variables and ACK bitmap >
+
+on DATA packet reception:
+ < update p measurement according to Sec.2.6 >
+ < record timestamp and packet reception time >
+ if (ACKER == this_node) {
+ < send an immediate ACK >
+ }
+ if ( <some data packet is missing and unacknowledged > )
+ < schedule a timeout for NAK transmission >
+
+on NAK reception:
+ < suppress any pending NAK transmission for the sequence
+ number indicated in the NAK >
+
+on timeout:
+ if ( < there are missing and unacknowledged packets > ) {
+ < send a NAK for one or more of the missing packets >
+ < mark such packets as acknowledged >
+ if ( <some data packet is missing and unacknowledged > )
+ < schedule a timeout for NAK transmission >
+ }
+
+
+5. Security Considerations
+
+PGMCC is not a transport protocol in its own right, but a congestion
+control mechanism that is intended to be used in conjunction with a
+transport protocol. Therefore security primarily needs to be considered
+in the context of a specific transport protocol and its authentication
+mechanisms.
+
+Congestion control mechanisms can potentially be exploited to create
+denial of service. This may occur through spoofed feedback. Thus any
+transport protocol that uses PGMCC should take care to ensure that
+feedback is only accepted from the receiver of the data. The precise
+mechanism to achieve this will however depend on the transport protocol
+itself.
+
+In addition, congestion control mechanisms may potentially be
+manipulated by a greedy receiver that wishes to receive more than its
+fair share of network bandwidth. A receiver might do this by first
+reporting inflated loss and RTT samples, in order to get selected as the
+ACKER, and then generating ACK at the desired rate (including possibly
+claiming to have received packets that in fact were lost due to
+congestion). Possible defenses against such a receiver could be based
+on the sender verifying the correctness of the loss and RTT samples
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 5. [Page 19]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+supplied by the receiver. A PGMCC sender SHOULD compare the receiver
+reports on loss rate and RTT with the information derived directly from
+the incoming stream of ACKs. In case of discrepancy of the reports, a
+PGMCC sender SHOULD mark the current acker as ineligible and initiate a
+new acker election. The decision on how large that discrepancy should be
+before initiating a new acker election is left to the implementation.
+
+Also, the sender MAY include some form of nonce that the receiver must
+feed back to the sender to prove receipt. However, the details of such a
+nonce would depend on the transport protocol, and in particular on
+whether the transport protocol is reliable or unreliable.
+
+
+6. Authors' Addresses
+
+ Luigi Rizzo
+ luigi@iet.unipi.it
+ Dip. Ing. dell'Informazione,
+ Univ. di Pisa
+ via Diotisalvi 2, 56122 Pisa, Italy
+
+ Gianluca Iannaccone
+ gianluca.iannaccone@intel.com
+ Intel Research
+ 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
+
+ Lorenzo Vicisano
+ lorenzo@cisco.com
+ cisco Systems, Inc.
+ 170 West Tasman Dr.,
+ San Jose, CA, USA, 95134
+
+ Mark Handley
+ m.handley@cs.ucl.ac.uk
+ University College London,
+ Gower Street, London WC1E 6BT, UK
+
+
+7. Acknowledgments
+
+We would like to acknowledge feedback and discussions on equation-based
+congestion control with a wide range of people, including members of the
+Reliable Multicast Research Group, the Reliable Multicast Transport
+Working Group, and the End-to-End Research Group.
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 7. [Page 20]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+[1] Bradner, S., Key words for use in RFCs to Indicate Requirement
+Levels (IETF RFC 2119) http://www.rfc-editor.org/rfc/rfc2119.txt
+
+[2] Floyd, S., Handley, M., Padhye, J., Widmer, J., "Equation-Based
+Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
+Stockholm, Aug. 2000
+
+[3] Padhye, J. and Firoiu, V. and Towsley, D. and Kurose, J., "Modeling
+TCP Throughput: A Simple Model and its Empirical Validation", Proc ACM
+SIGCOMM 1998.
+
+[4] Mankin, A., Romanow, A., Brander, S., Paxson, V., "IETF Criteria for
+Evaluating Reliable Multicast Transport and Application Protocols,"
+RFC2357, June 1998.
+
+[5] Rizzo, L., "pgmcc: a TCP-friendly single-rate multicast congestion
+control scheme", ACM SIGCOMM 2000, Stockholm, Aug.2000
+
+[6] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V., "RTP: A
+Transport Protocol for Real-Time Applications", RFC 1889, Jan 1996.
+
+[7] Speakman, T., Crowcroft, J., Gemmell, J., Farinacci, D. , Lin, S.,
+Leshchiner, D., Luby, M., Montgomery, T. , Rizzo, L., Tweedly, A.,
+Bhaskar, N., Edmonstone, R., Sumanasekera, R., Vicisano, L., PGM
+Reliable Transport Protocol Specification, RFC 3208, December 2001.
+rfc3208.txt also available at ftp://ftp.rfc-editor.org/in-
+notes/rfc3208.txt
+
+
+
+
+8. Full Copyright Statement
+
+Copyright (C) The Internet Society (2000). All Rights Reserved.
+
+This document and translations of it may be copied and furnished to
+others, and derivative works that comment on or otherwise explain it or
+assist in its implementation may be prepared, copied, published and
+distributed, in whole or in part, without restriction of any kind,
+provided that the above copyright notice and this paragraph are included
+on all such copies and derivative works. However, this document itself
+may not be modified in any way, such as by removing the copyright notice
+or references to the Internet Society or other Internet organizations,
+except as needed for the purpose of developing Internet standards in
+which case the procedures for copyrights defined in the Internet
+languages other than English.
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 8. [Page 21]
+
+INTERNET-DRAFT Expires: January 2005 July 2004
+
+
+The limited permissions granted above are perpetual and will not be
+revoked by the Internet Society or its successors or assigns.
+
+This document and the information contained herein is provided on an "AS
+IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
+FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
+LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
+INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
+FITNESS FOR A PARTICULAR PURPOSE."
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley Section 8. [Page 22]