1 files changed, 1226 insertions, 0 deletions
diff --git a/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt b/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt
new file mode 100644
index 0000000..6f1869c
--- /dev/null
+++ b/3rdparty/openpgm-svn-r1085/doc/draft-ietf-rmt-bb-pgmcc-03.txt
@@ -0,0 +1,1226 @@
+
+Internet Engineering Task Force 				  RMT WG
+INTERNET-DRAFT					     Luigi Rizzo/U. Pisa
+draft-ietf-rmt-bb-pgmcc-03.txt		       Gianluca Iannaccone/Intel
+						  Lorenzo Vicisano/Cisco
+							Mark Handley/UCL
+							    12 July 2004
+						   Expires: January 2005
+
+
+	    PGMCC single rate multicast congestion control:
+			 Protocol Specification
+
+
+
+Status of this Document
+
+This document is an Internet-Draft and is in full conformance with all
+provisions of Section 10 of RFC2026.
+
+Internet-Drafts are working documents of the Internet Engineering Task
+Force (IETF), its areas, and its working groups.  Note that other groups
+may also distribute working documents as Internet-Drafts.
+
+Internet-Drafts are valid for a maximum of six months and may be
+updated, replaced, or obsoleted by other documents at any time.  It is
+inappropriate to use Internet-Drafts as reference material or to cite
+them other than as a "work in progress".
+
+The list of current Internet-Drafts can be accessed at
+http://www.ietf.org/ietf/1id-abstracts.txt
+
+To view the list Internet-Draft Shadow Directories, see
+http://www.ietf.org/shadow.html.
+
+This document is a product of the IETF RMT WG.	Comments should be
+addressed to the authors, or the WG's mailing list at rmt@lbl.gov.
+
+
+				Abstract
+
+
+     This document describes PGMCC, a single rate multicast
+     congestion control scheme which is TCP-friendly and achieves
+     scalability, stability and fast response to variations in
+     network conditions.  PGMCC is suitable for both non-reliable
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley				[Page 1]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+     and reliable data transfers.  It is mainly designed for NAK-
+     based multicast protocols, and uses a window-based, TCP-like
+     control loop using positive ACKs between one representative of
+     the receiver group (the ACKER) and the sender.  The ACKER is
+     selected dynamically and may change over time.
+
+     PGMCC is made of two components: a window-based control loop,
+     which closely mimics TCP behavior, and a fast and low-overhead
+     procedure to select (and track changes of) the ACKER.  The
+     scheme is robust to measurement errors, and supports fast
+     response to changes in the receiver set and/or network
+     conditions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley				[Page 2]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+			   Table of Contents
+
+
+     1. Introduction. . . . . . . . . . . . . . . . . . . . . .   4
+      1.1. Terminology. . . . . . . . . . . . . . . . . . . . .   4
+     2. Protocol Overview . . . . . . . . . . . . . . . . . . .   4
+      2.1. Packet Contents. . . . . . . . . . . . . . . . . . .   6
+       2.1.1. Data Packets. . . . . . . . . . . . . . . . . . .   6
+       2.1.2. Feedback Packets. . . . . . . . . . . . . . . . .   7
+       2.1.3. Field sizes and formats . . . . . . . . . . . . .   8
+      2.2. Window-based controller. . . . . . . . . . . . . . .   9
+      2.3. Acker Selection. . . . . . . . . . . . . . . . . . .  11
+       2.3.1. Initial Acker election. . . . . . . . . . . . . .  11
+       2.3.2. Acker dropouts. . . . . . . . . . . . . . . . . .  12
+      2.4. TCP Throughput Equation. . . . . . . . . . . . . . .  12
+      2.5. RTT measurement. . . . . . . . . . . . . . . . . . .  13
+       2.5.1. Explicit Timestamp. . . . . . . . . . . . . . . .  13
+       2.5.2. Implicit timestamp. . . . . . . . . . . . . . . .  13
+       2.5.3. Sequence numbers. . . . . . . . . . . . . . . . .  14
+       2.5.4. Recommendations . . . . . . . . . . . . . . . . .  15
+      2.6. Loss rate measurement. . . . . . . . . . . . . . . .  15
+      2.7. Timeouts . . . . . . . . . . . . . . . . . . . . . .  16
+      2.8. Interaction with feedback suppression
+      schemes . . . . . . . . . . . . . . . . . . . . . . . . .  16
+      2.9. Interaction with ECN . . . . . . . . . . . . . . . .  17
+     3. Procedures - Sender . . . . . . . . . . . . . . . . . .  17
+     4. Procedures -- Receiver. . . . . . . . . . . . . . . . .  18
+     5. Security Considerations . . . . . . . . . . . . . . . .  19
+     6. Authors' Addresses. . . . . . . . . . . . . . . . . . .  20
+     7. Acknowledgments . . . . . . . . . . . . . . . . . . . .  20
+     8. Full Copyright Statement. . . . . . . . . . . . . . . .  21
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley				[Page 3]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+1.  Introduction
+
+This document describes PGMCC, a single rate multicast congestion
+control scheme which is TCP-friendly and achieves scalability, stability
+and fast response to variations in network conditions.
+
+PGMCC is designed for multicast sessions with one sender and one or more
+receivers, and is a good match for transport protocols using negative
+acknowledgements (NAKs) to collect feedback from the receivers.  The
+congestion control scheme implemented by PGMCC closely mimics the
+congestion-control behavior of TCP, as it uses a window-based control
+loop which is run between the sender and a selected receiver called the
+ACKER. The role of the ACKER is to provide timely feedback in the same
+way as a TCP receiver; additionally, the ACKER is selected dynamically
+among the receivers as the one which would experience the lowest
+throughput if separate TCP sessions were run between the sender and each
+of the receivers.
+
+Scalability in PGMCC comes from the use of negative acknowledgements
+(NAKs) for collecting feedback from receivers other than the ACKER.  As
+a consequence, the usual techniques for NAK suppression and aggregation
+can be used to reduce the amount of feedback to the source and improve
+the scalability of the scheme.
+
+PGMCC is designed to completely decouple congestion control from data
+integrity. As a consequence, the scheme can work with both reliable data
+transfer and unreliable communication protocols such as those used for
+video or audio streaming.
+
+While designed with multicast in mind, PGMCC can be equally used as a
+replacement for TCP for unicast sessions which require a lower degree of
+reliability than what TCP offers.
+
+
+1.1.  Terminology
+
+In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
+"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" are to be interpreted as described in RFC 2119 and indicate
+requirement levels for compliant PGMCC implementations.
+
+
+2.  Protocol Overview
+
+PGMCC is based on two separate but complementary mechanisms:
+
+  o A window-based control loop which closely emulates TCP congestion
+    control.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		    Section 2.	[Page 4]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+    The window-based control loop is simply an adaptation of the TCP
+    congestion control scheme to transport protocols where missing
+    (because of network errors or congestion) data packets are not
+    necessarily retransmitted, and so the congestion control scheme
+    cannot rely on cumulative acknowledgements.  In PGMCC, the
+    ``congestion window'' is simulated using a token-based scheme which
+    permits congestion control to be decoupled from retransmission
+    state. One of the receivers in the group operates as the ACKER, i.e.
+    the node in charge of sending positive acknowledgements back to the
+    source and thus controlling the rate of the transfer.
+
+
+  o A procedure to select the ACKER.
+    The purpose of this procedure is to make sure that, in presence of
+    multiple receivers, the ACKER is dynamically selected to be the
+    receiver which would have the lowest throughput if separate TCP
+    sessions were run between the sender and each receiver.
+    For the acker selection mechanism, PGMCC uses a throughput equation
+    to determine the expected throughput for a given receiver as a
+    function of the loss rate and round-trip time. Unlike other schemes
+    [2], the TCP throughput equation is not used to determine the actual
+    sending rate, which is completely controlled by the window-based
+    control loop.
+
+
+In principle, PGMCC's congestion control mechanism works as follows:
+
+
+  o Receivers measure the loss rate and feed this information back to
+    the sender, either in ACK or NAK messages.
+
+
+  o The sender also uses these feedback messages to measure the round-
+    trip time (RTT) to each receiver.
+
+
+  o The loss rate and RTT are then fed into PGMCC's throughput equation,
+    to determine the expected throughput to that receiver.
+
+
+  o The sender then selects as the acker the receiver with the lowest
+    expected throughput, as computed by the equation.
+
+The dynamics of the acker selection mechanism are sensitive to how the
+measurements are performed and applied. In the rest of this document we
+suggest specific mechanisms to perform and apply these measurements.
+Other mechanisms are possible, but it is important to understand how the
+interactions between mechanisms affect the dynamics of PGMCC.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		    Section 2.	[Page 5]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+2.1.  Packet Contents
+
+Before specifying the sender and receiver functionality, we describe the
+information required by PGMCC to perform its tasks. This information is
+carried in the data packets sent by the sender, and in the feedback
+packets sent by the receiver.  As PGMCC will be used along with some
+transport protocol, the actual data and feedback packets will contain
+further information for use by the protocol itself. For this reason, we
+do not specify packet formats, as these depend on the details of the
+transport protocol used.
+
+Note that the requirements of the transport protocol in terms of packet
+generation may differ from those of PGMCC. As an example, most NAK-based
+reliable multicast protocols do not use positive acknowledgements, but
+PGMCC requires ACKs for clocking out data packets; unreliable transport
+protocols might have no interest in generating NAKs for data integrity
+purposes, yet PGMCC depends on NAKs reaching the data sender in order to
+elect the ACKER.
+
+
+Implementors may decide to insert PGMCC-related information in already
+existing protocol packets whenever possible, but in cases such as the
+ones described in the previous paragraph, it might be necessary to
+define and generate new packets exclusively for congestion control
+purposes. As an example, in a prototype implementation of PGMCC on top
+of the PGM protocol [7], some of the information used by PGMCC is
+already present in the original protocol packets, and PGMCC-specific
+information is carried as PGM options in ODATA and NAK packets. However,
+a new packet type has been defined for ACKs, which are generated
+according to the rules defined in this document.
+
+
+2.1.1.	Data Packets
+
+Each data packet sent by the data sender contains the following
+information:
+
+
+  o A SEQUENCE NUMBER. This number is incremented by one for each data
+    packet transmitted.  The field must be sufficiently large that it
+    does not wrap causing two different packets with the same sequence
+    number to be in the receiver's recent packet history at the same
+    time.
+
+
+  o A TIMESTAMP (or equivalent information, see Section 2.5) indicating
+    when the packet with this sequence number has been sent.  There is
+    no requirement for synchronized clocks between the sender and the
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		Section 2.1.1.	[Page 6]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+    receivers.	The timestamp is used to measure network round-trip
+    times, so needs sufficient resolution for this task.  A resolution
+    of 1ms would be adequate.
+
+
+  o The ACKER IDENTITY, i.e. the identity of the receiver in charge of
+    sending an acknowledgement for this data packet.  The ACKER is
+    elected as a result of the process described in Section 2.3.
+    A special value is used to indicate that no ACKER is designated for
+    this packet -- this can happen at the beginning of a session or when
+    the current ACKER leaves the group.  Receivers interpret this value
+    as a request to elect a new acker.
+
+
+2.1.2.	Feedback Packets
+
+There are two types of feedback packets used by PGMCC: ACK packets and
+NAK packets.
+ACK packets are generated by the current ACKER, and are used to detect
+loss or successful delivery of packets, and to regulate the throughput
+accordingly. ACK packets also contain information used to determine the
+TCP-equivalent throughput for the ACKER.
+NAK packets are sent by any receiver who experiences loss.  They contain
+information used to determine the TCP-equivalent throughput for that
+receiver. In an actual protocol instantiation (such as PGM [7]), NAK
+packets might also be used by the protocol to request the retransmission
+of specific packets, and indicate the identity of the packet being
+requested.
+
+Both ACK and NAK packets are sent by data receivers, and contain the
+following information:
+
+
+  o The TIMESTAMP (or equivalent information) derived from the most
+    recently received data packet according to one of the techniques
+    described in Section 2.5.
+    This value is used by the sender to measure the RTT to the receiver
+    who generated this feedback packet.
+
+
+  o ``p'', the receiver's current estimate of the LOSS RATE.  The loss
+    rate is measured by receivers as described in Section 2.6
+
+In addition to the above, ACK packets (sent by the acker designated in
+the corresponding data packets) must also contain the following
+information:
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		Section 2.1.2.	[Page 7]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+  o RX_MAX, the highest sequence number among received data packets
+    (taking care to deal with sequence number wrapping correctly).
+
+  o ACK_BITMAP, a bitmap indicating the receive status of the latest N
+    (typically N=32) data packets with sequence numbers RX_MAX-(N-1) to
+    RX_MAX.
+
+
+This information is used by the sender to record which packets have been
+received or lost, and manipulate the transmit window accordingly.  Note
+that each ACK packet contains information about multiple packets, and
+this increases the robustness of the scheme to loss of ACK packets.
+This is necessary because ACKs are not sent reliably (unlike TCP's ACKs,
+which are cumulative).
+
+
+2.1.3.	Field sizes and formats
+
+The following sizes and formats are suggested for the various fields
+used by PGMCC and transmitted over the network:
+
+
+  o SEQUENCE NUMBERS
+    32 bit, unsigned, network order.
+
+
+  o TIMESTAMPS
+    32 bit, unsigned, network order.  A resolution of 1ms or better is
+    desirable.
+
+
+  o ACKER IDENTITY
+    Same size and format of a network layer address (e.g. 32 bit for
+    IPv4).  Note though that using an IP address for the Acker Identify
+    will cause problems with NAT traversal.  Transport protocol
+    designers might examine the SSRC mechanism used by RTP [6] as an
+    alternative form of node identifier that could be used as Acker
+    Identity.
+
+
+  o LOSS RATE (``p'')
+    16-bit unsigned integer, in network format, with 0 indicating no
+    loss and 2^16-1 indicating 100% loss.
+
+
+  o ACK BITMAP
+    32-bit, in network format, with least significant bit indicating
+    receive status of packet RX_MAX.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		Section 2.1.3.	[Page 8]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+2.2.  Window-based controller
+
+In a window-based congestion control scheme such as TCP, the
+``congestion window'' represents, among other things, the maximum amount
+of packets in flight at any time, which in turn controls the throughput
+of the session. The sender keeps track of the actual number of packets
+in flight, basing on its transmissions and the reception of
+acknowledgements.
+
+The sender may dynamically change the size of the window, according to
+the congestion control scheme being used. In TCP, and PGMCC, an
+``Additive Increase Multiplicative Decrease'' (AIMD) scheme is used: in
+absence of loss, the window is increased by some fixed amount (typically
+one packet) per round trip time (RTT), whereas upon loss the window is
+reduced to a fraction of its original value (typically halved) in each
+RTT in which a loss event is experienced.
+
+In PGMCC the window is managed using a token-based mechanism, controlled
+by two variables:
+
+  o A ``Window Size'', W, which describes the current window size in
+    packets.
+
+  o A ``Token Count'', T, which indicates the number of packets that can
+    be transmitted.  T is bounded above by W.  It is decremented every
+    time a packet is transmitted, and incremented every time an ACK is
+    received, according to the rules below.
+
+Note that these two variables need to hold non-integer data.  Typically
+a fixed point representation with at least 16 bits for both integer and
+fractional parts would be acceptable for implementation purposes.
+
+The information contained in each ACK is used to determine how many new
+packets are acknowledged by that ACK, and whether there are
+unacknowledged packets which were not reported in previous ACKs.  The
+sender also schedules a timeout to react in case no ACKs are received.
+
+The sender behaves as follows:
+
+
+  o INITIALIZATION
+    At startup, or after a timeout, both W and T are set to 1.
+
+
+  o ACK RECEPTION, NO LOSS DETECTED
+    If the incoming ACK reports new acknowledged packets, and no loss
+    (as defined in the next paragraph) is detected, then the window is
+    inflated by one packet per RTT.
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		  Section 2.2.	[Page 9]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+    NOTE: during the slow-start phase, TCP opens the window
+    exponentially up to the SSTHRESH value, which is computed by TCP
+    according to the dynamics of the session and updated upon losses.
+
+    We do recommend that PGMCC uses a similar strategy, but using a
+    fixed, small value for SSTHRESH (e.g. 4 packets).  In fact, due to
+    the dynamicity of the ACKER, which might change on every single
+    packet, it is hard to compute a reliable estimate of the SSTHRESH
+    without keeping state for multiple receivers, and the benefits are
+    small in any event.
+
+    In summary, the reaction to ACK reception on no loss modifies T and
+    W as follows (here, N is the number of new packets acknowledged by
+    the incoming ACK):
+
+      if (W < SSTHRESH) then
+	  D = min(N, SSTHRESH - W) // use the first D acks for
+    exp.opening
+	  N = N - D		       // and the remaining ones for
+    linear opening
+	  T = T + 2*D
+	  W = W + D
+      endif
+      // do linear window opening with the remaining acks
+      T = T + N * ( 1 + 1/W )
+      W = W + N/W
+
+
+  o PACKET TRANSMISSION
+    One token is consumed for each packet transmitted:
+
+				 T = T - 1
+
+
+  o ACK RECEPTION, LOSS DETECTED
+    If the incoming ACK reports an unacknowledged data packet which is
+    followed by at least 3 acknowledged data packets, then the packet is
+    assumed to be lost and PGMCC reacts by halving the window, in the
+    same way as TCP after 3 duplicate acknowledgements.  This is
+    achieved by modifying T and W as follows:
+
+			   T = T - W/2 , W = W/2
+
+    to simulate the multiplicative decrease.
+    Additionally, all window manipulation is suspended for the
+    subsequent RTT.  This is achieved by recording the current transmit
+    sequence number, and canceling any further manipulation of the
+    window until feedback is received for the next transmitted packet,
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		 Section 2.2.  [Page 10]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+    or until a timeout occurs.
+
+
+
+
+2.3.  Acker Selection
+
+The ACKER selection process in PGMCC aims at locating the receiver which
+would have the lowest throughput if each receiver were using a separate
+TCP connection to transfer data.
+
+Because the steady-state throughput of a TCP connection can be
+characterized in a reasonably accurate way in terms of its loss rate and
+round trip time [3], the throughput for each receiver can be estimated
+by using these two parameters.
+
+Whenever an ACK or NAK packet from any of the receivers reaches it, the
+sender is able to compute the expected throughput T_i for that receiver
+by using the equation shown in Section 2.4, with the round trip time RTT
+and loss rate p and measured as described in Sections 2.5 and 2.6,
+respectively.  At any given time, the sender stores the expected
+throughput for the current ACKER, T_acker. This value is updated every
+time an ACK or NAK from the current ACKER is received (note that, after
+a new ACKER is selected, the sender will typically receive ACKs from the
+old ACKER for one RTT, and the feedback from different ACKERs might be
+interleaved if the paths leading to them have different round trip
+times).
+
+Whenever an ACK or NAK is received from another node i (a previous ACKER
+or some other receiver), the expected throughput T_i for that node is
+computed, and compared with T_acker.  Node i is selected as the new
+acker if
+
+			   T_i < C * T_acker
+
+where the constant C between 0 and 1 provides some hysteresis and avoids
+too frequent oscillations in the choice of the ACKER.  A suggested value
+for C is 0.75.
+
+Note that, from an implementation point of view (see Section 2.4), it is
+more convenient to compute T_i ^(-2), so the above equation must be
+modified accordingly.
+
+
+2.3.1.	Initial Acker election
+
+Upon reception of a data packet reporting that no acker is currently
+selected, receivers generate a dummy NAK report which is used to elect
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley	       Section 2.3.1.  [Page 11]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+the initial acker. The NAK is sent with the usual feedback suppression
+mechanism dictated by the transport protocol (possibly with shorter time
+constants) to avoid feedback implosion, and the sender will select the
+source of the first incoming NAK as the new ACKER.
+
+
+2.3.2.	Acker dropouts
+
+
+If the ACKER decides to disconnect from the session, it can cause the
+session to stop. To avoid this, it is recommended that an ACKER deciding
+to leave the session informs the sender by sending an ACK packet (or a
+duplicate) carrying an "ACKER_LEAVING" option.	The reception of this
+packet by the sender will in turn trigger an initial acker election
+phase.
+
+
+
+2.4.  TCP Throughput Equation
+
+Any realistic equation of TCP throughput as a function of loss event
+rate and RTT should be suitable for use in PGMCC.  Unlike other schemes
+[2] where the throughput equation directly controls the transmit rate,
+in PGMCC the equation is used only for acker selection purposes, and the
+throughput values are only compared among themselves.  As a consequence,
+we can use the following equation, derived from the one presented in [3]
+by setting RTO = 4 * RTT (as it is common practice):
+
+	 M = 1/T = RTT_i * sqrt(p) * (1 + 9*p * (1 + 32*(p)^2))
+
+
+where
+
+    M = 1/T is proportional to the inverse of the throughput for the
+    receiver under consideration;
+
+    RTT is the round trip time for the receiver under consideration;
+
+    p is the loss rate for the receiver under consideration, between 0
+    and 1.0;
+
+and multiplying constants are omitted.
+
+The above equation is accurate on a wide range of loss rates, and also
+covers situations where retransmission timeouts have a significant
+impact on the throughput of the protocol.
+
+Note that when p=0, the equation yields 1/T = M = 0. This does not
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		 Section 2.4.  [Page 12]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+constitute a problem as we can still compare the M values computed for
+different receivers to determine the acker.  Also note that it is easier
+to compute M^2 instead of M, because the former does not require the use
+of sqrt().
+
+In future, different throughput equations may be substituted for this
+equation.  The requirement is that the throughput equation be a
+reasonable approximation of the sending rate of TCP for conformant TCP
+congestion control.
+
+The parameters p and RTT need to be measured or calculated by a PGMCC
+implementation.  The measurement of RTT is specified in Section 2.5; the
+measurement of p is specified in Section 2.6.
+
+2.5.  RTT measurement
+
+In PGMCC, the RTT is measured by the sender making use of the timestamp
+(or equivalent information) echoed back by each receiver in feedback
+messages. Three procedures are possible to measure the RTT, as follows.
+In no case is it required to have clock synchronization between sender
+and receivers.
+
+
+2.5.1.	Explicit Timestamp
+
+This first technique relies on the transmission of a timestamp TS_j with
+each data packet j.
+The receiver will record the most recently received timestamp, and will
+echo it back to the source when generating an ACK or a NAK. If the
+feedback is delayed, the time elapsed between the reception of the
+timestamp and the generation of the feedback should be added to the
+echoed timestamp.
+The sender computes the RTT by subtracting the received timestamp from
+the current value of the clock.
+
+The resolution of the timestamp value should be good enough for
+reasonable precision measurement of typical network round trip times. If
+receivers need to apply correction for delayed feedback, it is necessary
+that receivers know the resolution of the timestamp clock. A suggested
+value is 1ms.
+
+
+2.5.2.	Implicit timestamp
+
+With this technique, the sender will record a timestamp TS_j for each
+transmitted data packet j, but the timestamp will not be transmitted
+with the packet itself.
+The receiver will record the most recently received sequence number, and
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley	       Section 2.5.2.  [Page 13]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+will echo it back to the source when generating an ACK or a NAK.
+The sender computes the RTT by looking up the timestamp associated with
+the sequence number received in the feedback packet, and subtracting it
+from the current clock value.
+
+If the feedback from the receiver is delayed, as it is commonly the case
+for NAKs, the receiver can compute, and send back to the source, a
+correction term corresponding to the time elapsed between the reception
+of the timestamp and the generation of the feedback. The correction term
+will then be subtracted by the sender in order to obtain the correct
+estimate of the RTT.
+
+This RTT measurement technique is equivalent to the previous one, but it
+saves some space in data packets as the timestamp does not need to be
+sent explicitly. Feedback packets might become larger if the correction
+value is transmitted explicitly; but in many cases, the sequence number
+will already be present for other reasons (e.g. ACK packets), and
+wherever space is a concern the sequence number and the correction term
+can be packed in a single 32-bit word without loss of precision.
+
+
+2.5.3.	Sequence numbers
+
+This technique is the least precise, but it does not rely on the
+presence of a high resolution clock on the nodes.
+The sender will not compute any timestamp, and just send data packets
+with their sequence number j.
+The receiver will record the most recently received sequence number, and
+will echo it back to the source when generating an ACK or a NAK.
+The sender computes the RTT as the difference between the most recently
+sent sequence number and the sequence number received from the ACK or
+NAK packet.
+
+Note that in this case the RTT is not measured in seconds, but in
+"sequence numbers", which are monotonically, but not uniformly,
+increasing with time. The two measurements are equivalent if the sender
+transmits at a constant rate. When the data rate changes over time (as
+it is normally happens, given that PGMCC controls the actual data rate),
+then the "measured" RTT values grow with the actual transmit rate. This
+can influence the correctness of the results when comparing two
+measurement done over different and only partially overlapping time (and
+sequence number) intervals where the transmit rate incurs a significant
+change.
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley	       Section 2.5.3.  [Page 14]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+2.5.4.	Recommendations
+
+Whenever possible, the measurement of the RTT should be carried out
+using either explicit or implicit timestamps, and by keeping track of
+the "correction term" (the delay between data reception and feedback
+generation).
+
+If the receiver does not have a clock with a suitable resolution, the
+correction term might not be present (or be inaccurate). In this case
+the timestamps received by the sender on NAK packets might be in error,
+in the worst case, by as much as the packet interarrival time.	This
+error will normally not be present on ACK packets, which are sent
+immediately.  A suitable correction should be applied by the sender in
+order to avoid systematic errors.
+
+The measurement based on sequence numbers is less accurate, but also
+less sensitive to errors due to the lack of the correction term. In
+fact, the measurement error induced by the lack of the correction term
+can be at most one unit.  This suggests that, when the correction term
+is not available, measurements based on sequence numbers should be
+favoured.  Simulations have shown that the acker selection mechanism
+performs moderately better when the RTT measurement is based on
+timestamps, but performance is reasonably good also with measurements
+based on sequence numbers.
+
+
+
+2.6.  Loss rate measurement
+
+
+The loss measurement in PGMCC is entirely performed by receivers.  The
+measurement results do not directly influence the transmit rate, but are
+only used for comparison purposes. As a consequence, the scheme is
+reasonably robust to different measurement techniques, as long as they
+are not influenced too strongly by single loss events.
+
+The main method suggested for loss measurement is Exponentially Weighted
+Moving Average (EWMA), which is formally equivalent to a single-pole
+digital low pass filter applied to a binary signal x_i, where x_i = 1 if
+packet i is lost, x_i = 0 if packet i is successfully received.
+
+The loss rate p_i upon reception or detection of loss of packet i is
+computed as
+
+
+		 p_i = c_p * p_{i-1} + (1 - c_p ) * p_i
+ where the constant c_p between 0 and 1 is related to the bandpass of
+the filter. Experiments have shown good performance with c = 500/65536,
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		 Section 2.6.  [Page 15]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+and computations performed with fixed point arithmetic and 16 fractional
+digits.
+
+As an alternative to EWMA, the technique used in TFRC [2] can be
+adopted. Simulations have shown a moderate improvement in the acker
+selection mechanism by measuring loss using the TFRC loss estimator,
+which is however slightly more expensive to compute than the EWMA loss
+estimator in presence of packet reordering.
+
+
+2.7.  Timeouts
+
+
+When a packet is transmitted, the sender schedules a timeout to prevent
+stalls upon loss of ACKs or disconnection of the ACKER.  In TCP, which
+has a similar problem, the timeout value is computed by accumulating
+statistics (SRTT and RTTVAR) on RTT samples, starting from a default
+initial value (3s) when no RTT samples are available.
+
+PGMCC can use a similar scheme to compute the timeouts, remembering that
+upon ACKER changes (which may be very frequent), the computation of SRTT
+and RTTVAR must be restarted from the beginning, unless the sender
+decides to keep state for at least a small number of recent ackers.
+
+Because the ACKER can leave the group without notifying the sender,
+after a number of successive timeouts the sender MUST force the election
+of a new ACKER.  We recommend this new election to be performed after
+two successive timeouts.
+
+
+2.8.  Interaction with feedback suppression schemes
+
+
+Several schemes are used by NAK-based multicast protocols to reduce the
+amount of feedback directed toward the source and make the protocol
+scale with large populations of receivers.  Such schemes typically rely
+on randomly delaying NAK generation, and suppressing pending NAKs when
+an equivalent NAK or a retransmission is heard; or, intermediate nodes
+such as routers can implement some form of feedback aggregation and
+filtering.
+
+Such schemes might prevent NAKs from potential ACKER candidates from
+reaching the source. This filtering might impact the speed at which
+PGMCC selects the correct ACKER, though initial experience from
+simulations seem to suggest that PGMCC behavior is not severely affected
+by NAK suppression schemes.
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		 Section 2.8.  [Page 16]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+2.9.  Interaction with ECN
+
+
+PGMCC can use ECN notifications in much the same way as actual losses,
+and use such notifications to control the throughput of the session.
+
+At the receiver, ECN-marked data packets can be considered as lost
+packets for the purpose of loss rate computation and ACK/NAK generation.
+If the ACKER sends an ACK for ECN-marked packets, that ACK MUST report
+that the packet being acknowledged that was ECN marked.  Similarly the
+ACKER must indicate in the ACK packet's received packets bitmap that the
+packet was ECN-marked, or that the packet was lost.
+
+We note that to support use of the ECN nonce, the ACK packet's received
+packets bitmap would require two bits per packet being reported.
+
+
+3.  Procedures - Sender
+
+The following pseudo-code specifies the complete behavior of the sender
+in PGMCC.
+
+
+initialization:
+  T = 1 ; W = 1 ;   /* initialize window and number of tokens */
+  RETRY = 0 ;	    /* number of consecutive timeouts so far  */
+  < initialize p, RTT for acker to default values >
+  ACKER = NO_ACKER; /* no acker is known		      */
+  < initialize sequence numbers >
+  QUEUED = 0;	    /* packets waiting to be transmitted      */
+
+on transmission request:
+  send_packet() ;
+
+on timeout expiration :
+  T = 1 ; W = 1 ;   /* initialize window and number of tokens */
+  if (RETRY < RETRY_MAX)
+    RETRY = RETRY + 1
+  else
+    ACKER = NO_ACKER ; /* old acker is not valid anymore      */
+  send_packet() ;
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 3.  [Page 17]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+on ACK/NAK reception from receiver I :
+  < compute p and RTT for source of this ACK, see Sec. 2.5 and 2.6 >
+  RETRY = 0 ;
+  if (ACKER == NO_ACKER) { /* select current as acker is no other known */
+    ACKER = I ;
+    T = T + 1 ;
+  }
+  if (ACKER != I)
+    < select acker according to Sec. 2.3 > ;
+  else {
+    <update acker statistics (p_ACKER, RTT_ACKER) >
+    if (packet_type == ACK) {
+      < update_window according to Sec.2.2 >
+      send_packet ;
+      if (ack_pending)
+	update_timeout ;
+  }
+}
+
+send_packet() {
+  if (QUEUED > 0 && T >= 1) {
+    < transmit one packet >
+    T = T - 1 ;
+    QUEUED = QUEUED - 1 ;
+  }
+  if ( <there are unacked packet> )
+    <set a timeout, see Sec.2.7 >
+}
+
+
+
+4.  Procedures -- Receiver
+
+The following pseudo-code specifies the complete behavior of the
+receiver in PGMCC.
+
+A receiver only transmits an ACK packet when it receives a data packet
+for which the receiver is designated as the ACKER by the data packet
+itself.  A receiver can transmit a NAK packet after it has detected that
+a data packet is missing and a suitable delay has passed, as dictated by
+the feedback suppression rules of the protocol in use.
+
+The data packet contains acknowledgement status about the most recent 32
+sequence numbers known to the receiver.
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 4.  [Page 18]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+on initialization/session setup:
+  < initialize state variables and ACK bitmap >
+
+on DATA packet reception:
+  < update p measurement according to Sec.2.6 >
+  < record timestamp and packet reception time >
+  if (ACKER == this_node) {
+    < send an immediate ACK >
+  }
+  if ( <some data packet is missing and unacknowledged > )
+    < schedule a timeout for NAK transmission >
+
+on NAK reception:
+  < suppress any pending NAK transmission for the sequence
+    number indicated in the NAK >
+
+on timeout:
+  if ( < there are missing and unacknowledged packets > ) {
+    < send a NAK for one or more of the missing packets >
+    < mark such packets as acknowledged >
+    if ( <some data packet is missing and unacknowledged > )
+      < schedule a timeout for NAK transmission >
+  }
+
+
+5.  Security Considerations
+
+PGMCC is not a transport protocol in its own right, but a congestion
+control mechanism that is intended to be used in conjunction with a
+transport protocol.  Therefore security primarily needs to be considered
+in the context of a specific transport protocol and its authentication
+mechanisms.
+
+Congestion control mechanisms can potentially be exploited to create
+denial of service.  This may occur through spoofed feedback.  Thus any
+transport protocol that uses PGMCC should take care to ensure that
+feedback is only accepted from the receiver of the data.  The precise
+mechanism to achieve this will however depend on the transport protocol
+itself.
+
+In addition, congestion control mechanisms may potentially be
+manipulated by a greedy receiver that wishes to receive more than its
+fair share of network bandwidth.  A receiver might do this by first
+reporting inflated loss and RTT samples, in order to get selected as the
+ACKER, and then generating ACK at the desired rate (including possibly
+claiming to have received packets that in fact were lost due to
+congestion).  Possible defenses against such a receiver could be based
+on the sender verifying the correctness of the loss and RTT samples
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 5.  [Page 19]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+supplied by the receiver. A PGMCC sender SHOULD compare the receiver
+reports on loss rate and RTT with the information derived directly from
+the incoming stream of ACKs.  In case of discrepancy of the reports, a
+PGMCC sender SHOULD mark the current acker as ineligible and initiate a
+new acker election. The decision on how large that discrepancy should be
+before initiating a new acker election is left to the implementation.
+
+Also, the sender MAY include some form of nonce that the receiver must
+feed back to the sender to prove receipt. However, the details of such a
+nonce would depend on the transport protocol, and in particular on
+whether the transport protocol is reliable or unreliable.
+
+
+6.  Authors' Addresses
+
+   Luigi Rizzo
+   luigi@iet.unipi.it
+   Dip. Ing. dell'Informazione,
+   Univ. di Pisa
+   via Diotisalvi 2, 56122 Pisa, Italy
+
+   Gianluca Iannaccone
+   gianluca.iannaccone@intel.com
+   Intel Research
+   15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
+
+   Lorenzo Vicisano
+   lorenzo@cisco.com
+   cisco Systems, Inc.
+   170 West Tasman Dr.,
+   San Jose, CA, USA, 95134
+
+   Mark Handley
+   m.handley@cs.ucl.ac.uk
+   University College London,
+   Gower Street, London WC1E 6BT, UK
+
+
+7.  Acknowledgments
+
+We would like to acknowledge feedback and discussions on equation-based
+congestion control with a wide range of people, including members of the
+Reliable Multicast Research Group, the Reliable Multicast Transport
+Working Group, and the End-to-End Research Group.
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 7.  [Page 20]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+[1] Bradner, S., Key words for use in RFCs to Indicate Requirement
+Levels (IETF RFC 2119) http://www.rfc-editor.org/rfc/rfc2119.txt
+
+[2] Floyd, S., Handley, M., Padhye, J., Widmer, J., "Equation-Based
+Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
+Stockholm, Aug. 2000
+
+[3] Padhye, J. and  Firoiu, V. and Towsley, D. and Kurose, J., "Modeling
+TCP Throughput: A Simple Model and its Empirical Validation", Proc ACM
+SIGCOMM 1998.
+
+[4] Mankin, A., Romanow, A., Brander, S., Paxson, V., "IETF Criteria for
+Evaluating Reliable Multicast Transport and Application Protocols,"
+RFC2357, June 1998.
+
+[5] Rizzo, L., "pgmcc: a TCP-friendly single-rate multicast congestion
+control scheme", ACM SIGCOMM 2000, Stockholm, Aug.2000
+
+[6] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V., "RTP: A
+Transport Protocol for Real-Time Applications", RFC 1889, Jan 1996.
+
+[7] Speakman, T., Crowcroft, J., Gemmell, J., Farinacci, D. , Lin, S.,
+Leshchiner, D., Luby, M., Montgomery, T. , Rizzo, L., Tweedly, A.,
+Bhaskar, N., Edmonstone, R., Sumanasekera, R., Vicisano, L., PGM
+Reliable Transport Protocol Specification, RFC 3208, December 2001.
+rfc3208.txt also available at ftp://ftp.rfc-editor.org/in-
+notes/rfc3208.txt
+
+
+
+
+8.  Full Copyright Statement
+
+Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+This document and translations of it may be copied and furnished to
+others, and derivative works that comment on or otherwise explain it or
+assist in its implementation may be prepared, copied, published and
+distributed, in whole or in part, without restriction of any kind,
+provided that the above copyright notice and this paragraph are included
+on all such copies and derivative works. However, this document itself
+may not be modified in any way, such as by removing the copyright notice
+or references to the Internet Society or other Internet organizations,
+except as needed for the purpose of developing Internet standards in
+which case the procedures for copyrights defined in the Internet
+languages other than English.
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 8.  [Page 21]
+
+INTERNET-DRAFT		  Expires: January 2005 	       July 2004
+
+
+The limited permissions granted above are perpetual and will not be
+revoked by the Internet Society or its successors or assigns.
+
+This document and the information contained herein is provided on an "AS
+IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
+FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
+LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
+INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
+FITNESS FOR A PARTICULAR PURPOSE."
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rizzo/Iannaccone/Vicisano/Handley		   Section 8.  [Page 22]