LLG 8-Jun-77 13:01 29364 IEN # 12 L. Garlick / SRI-ARC Supercedes: None R. Rom / SRI-ARC Replaces: None J. Postel /SRI-ARC 15 March 1977 Section: 2.4.4.1 Issues in Reliable Host-to-Host Protocols Lawrence L. Garlick Raphael Rom Jonathan B. Postel March 15, 1977 Augmentation Research Center Stanford Research Institute Menlo Park, California 94025 (415) 326-6200 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols ABSTRACT Fully reliable network host-to-host protocols have recently gained significant attention, primarily due to more strin- gent security requirements of network users. This paper will discuss issues related to one such protocol, which is supported by the Transmission Control Program (TCP). The protocol, first introduced in 1974, features end-to-end pos- itive acknowledgement, retransmission, internetwork addressing capabilities, and ordered delivery. The issues of interest in this paper are protocol correct- ness and completeness, protocol efficiency, and complexity of implementation. The discussion will suggest alterations and extensions to TCP. Flow control heuristics using TCP's windowing techniques are explored. Flow control information is augmented to allow fair apportionment of bandwidth, better bandwidth utiliza- tion through optimistic credits, flow control credits matched to the type of traffic, and increased performance for high precedence connections. An alternative for selecting the startup sequence number of a connection is presented. It is suggested that the resynchronization method for sequence number space manage- ment should be abandoned because it is overly complicated and can actually fail when the data stream is stopped by flow control. The need for the separation of data and control channels is motivated, introducing the notion of a reliable subchannel. The findings are presented both to further the understanding of reliable protocols and to encourage intelligent implementations of TCP. LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols Issues in Reliable Host-to-Host Protocols 2 INTRODUCTION 3 Due to numerous advances in computer communications, there has been a tremendous growth in computer networking. This has led to the need for parallel advances in distributed computing protocols. Typical of these advances are the packet switching network protocols developed for the ARPA network. The need for a protocol that supports distributed process-to-process communication was realized early by ARPA network designers and the ARPA host-to-host protocol (AHHP) became the reference point for such process-to-process protocols. 3a The AHHP has been very successful in providing a basis for abundant research in distributed computing and in providing a prototype for process-to-process protocols. As experience with networking has grown, new applications, new topologies, new network access methods, and new higher level protocols have emerged. The AHHP has not been entirely suited for the new requirements that have resulted from this experience. 3b End-to-end reliability is an example of a new requirement needed by host-to-host protocols. It has been a concern for builders of both secure applications and higher level protocols. There are two important motivations for strin- gent reliability requirements. First, security measures, such as encryption, are often applied at the host-to-host level or lower. Second, higher level protocols, such as the ARPA TELNET protocol, should not be required to handle transmission error checking. The AHHP does not provide host-to-host acknowledgement; it relies upon subnet and host-to-subnet protocols to deliver messages reliably. While the performance of the AHHP has been almost error free, it has been known to lose messages; thus it cannot be considered a fully reliable protocol. 3c Other deficiencies in AHHP include addressing constraints, weak error recovery, simplex connections, and large overhead for passing flow control information. 3d TCP, which, throughout this paper will be an abbreviation for both the Transmission Control Program and the protocol it supports, corrects the deficiencies of AHHP. TCP was Garlick, Rom, & Postel page 1 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols initially designed to be a reliable internetwork host-to- host protocol [Reference 1], as well as a solution to many of the problems of the AHHP. When the special internetwork addressing considerations are ignored (as they shall be in this paper), it represents a significant advancement in host-to-host protocols. Among its reliability features are positive acknowledgement, retransmission, and sequencing of data and controls. It guarantees the error free delivery of each message for which it claims successful delivery. Other improvements include duplex connections and the ability to use a network address (socket) in several connections. 3e The paper is organized around three issues--a discussion of flow control techniques for TCP, alternate strategies for the management of connection sequence number space, and the need for a control subchannel for each TCP connection. To provide further context for the discussion, a brief summary of interesting TCP features is presented. It is assumed that the reader is somewhat familiar with the AHHP and has been exposed to the early literature on TCP-like protocols [References 1, 2, 6]. A glossary of abbreviations and terms, and appendices that magnify a few of the more in- volved issues can be found at the end of the paper. 3f TCP: A RELIABLE TRANSMISSION PROTOCOL 4 Network Characteristics 4a TCP does not depend on the transmission medium for its re- liability, i.e., it is assumed that the subnetwork may be unreliable. The subnet need not ensure the orderly or errorless delivery of subnet packets, or account for lost packets. TCP functions correctly in the face of large packet lifetimes, and the opening and closing of connections in quick succession. Connections 4b Logical connections are established for process-to-process (user-to-user) communication. TCP connections are full- duplex channels established between source and destination sockets (network-wide process names). A socket may be a party to more than one connection, but only one connection can exist between any pair of sockets. Garlick, Rom, & Postel page 2 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols TCP provides the means by which a connection between the processes is established, controlled during the transfer of data, and terminated at the completion of the session. Connection management requires the exchange of controls between TCP's. There are controls for connection synchronization, out-of-band signalling (interrupt), data flushing, resynchronization, and connection closing. As described below, controls accompany data whenever possible to avoid the overhead of separate control packets. Packaging and Headers 4c TCP packages user letters (messages) into packets suitable for transmission over a subnetwork. Each letter or par- tial letter is prefixed by a TCP header, which includes fields for addressing, sequencing, acknowledgements, flow control, controls, and error checking. The header is optionally followed by a block of data. The smallest unit of data transfer and the unit of sequencing is the 8-bit byte (octet). Sequencing 4d Sequence numbers are used as acknowledgement identifiers and as an ordering mechanism. They are assigned to each octet of data and to those controls that need synchronization with the data stream. Only one sequence number is sent with each TCP header; it represents the se- quence number assigned to the first control or data in the packet. This means that data and control sequence numbers come from the same name space. The packet length is used to determine the highest sequence number consumed by the packet. Reuse of sequence numbers is allowed only for duplicate retransmissions. The sequence number space is managed by a cooperatively by the sender and the receiver, as will be discussed later. Acknowledgement and Retransmission 4e A TCP acknowledgement represents the successful delivery of some number of octets to the receiving process's buffer or to the remote TCP (controls). It is sent to the transmitting TCP in the acknowledgement field of a subse- quent TCP header. The sequence number placed in this field is the highest sequence number acknowledged by the Garlick, Rom, & Postel page 3 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols receiver and implies acknowledgement of all previous octets. If packets arrive out of order, an acknowledgement cannot be sent for octets with sequence numbers higher than the missing octets, since that would implicitly acknowledge the missing data. Packets can be retransmitted at will until they are acknowledged; however, bandwidth may be underutilized if improper retransmission policies are followed. Duplicates naturally arise from retransmissions that occur prior to the receipt of an acknowledgment and are detected and han- dled as described below. Synchronization and Resynchronization 4f TCP is expected to run in a network with relatively long packet lifetimes and relatively short times between the closing and opening of a connection. Therefore, several problems must be solved concerning detection of old dupli- cate packets, that is, packets that have sequence numbers assigned by old instances of a connection between the same sockets. These problems are how to select startup se- quence numbers, how to reliably exchange new sequence num- bers, and how to determine when resynchronization of se- quence numbers is necessary. The exchange of sequence numbers at synchronization or resynchronization time is accomplished using a "three-way handshake" method [References 2, 4, 5]. This method pro- vides positive acknowledgement of the exchanged sequence numbers and is sufficient to handle the problem of simultaneous connection establishment attempts. A solution to the other two problems has been an Initial Sequence Number curve [References 4, 5, 6], that is used by the sender as a mechanism for 1) selecting the first sequence number for a connection and 2) detecting when the consumption of sequence numbers is not progressing in a manner that will guarantee that old duplicates can be reliably identified by the receiving TCP. The management of the sequence number space will be dis- cussed in section 4. Garlick, Rom, & Postel page 4 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols Flow Control 4g Flow control is exerted by the receiver by issuing credits, which represent the receiving process's willingness to buffer data. Credits are passed in the TCP header in the window size field. The window size is added to the last acknowledged sequence number (the window's left edge) to give the highest allowable sequence number that may be sent (the window's right edge). Flow control is discussed in further detail in section 3. Packet Acceptance Checking 4h The receiving TCP is responsible for the detection of packets with improper sequence numbers. These may have sequence numbers that are either old duplicates (from pre- vious connections) or illegal because they are not within an acceptable flow control range. To determine the action to be taken for a newly received packet, acceptability ranges are defined. The following three ranges are mutually exclusive and collectively exhaustive of the sequence number space (see Figure 1): Acknowledge-deliver range (ADR) The packet has arrived in-order and does not exceed the receiving process's buffer space. Data will be placed in the buffer and an acknowledgement will be generated to indicate successful delivery. Acknowledge-only range (AOR) A duplicate packet has arrived, as a result of retransmission. It will be acknowledged, but not de- livered, since delivery has already occurred. Discard range (DR) An illegal packet has arrived. It may be an old du- plicate or a packet that cannot be delivered due to flow control. Appendix A provides more details of the packet acceptance policy. Garlick, Rom, & Postel page 5 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols Garlick, Rom, & Postel page 6 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols FLOW CONTROL TECHNIQUES 5 Flow control is basically a mechanism to prevent the re- ceiving process's buffers from overflowing. A good flow control scheme must handle a whole spectrum of problems that result from performing this basic duty. This section first discusses general flow control goals and methods, and then specific techniques for use with TCP that could significa- ntly improve protocol performance. Where suggestions occur, they represent an enhancement to the flow control scheme used in the initial versions of TCP. 5a The goals of an ambitious flow control scheme include the following: 5b Receiver's Allocation Any flow control strategy should consider the buffer space offered by a receiving user, since this represents a depository for incoming messages and relieves the TCP of resource allocation problems. Congestion Prevention The flow control strategy should prevent queueing of messages in the protocol module (TCP), so that TCP re- sources can be used to handle those messages that have a high probability of being delivered immediately. Congestion in the subnet can be caused by a retransmission protocol like TCP, since each unacknowledged packet is retransmitted. The flow con- trol scheme should make it easy to slow or stop retransmission from the sender. Deadlock Prevention When congestion does occur, resources must be available to handle traffic-clearing messages. Controls and flow control information must be delivered and interpreted even when data is queued. Fair Apportionment Of Bandwidth In a virtual connection environment, it is important to be able to fairly allocate the available bandwidth to Garlick, Rom, & Postel page 7 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols users, based on a variety of criteria. One criterion may be precedence of the user or the connection. Anoth- er may be the mode of traffic, e.g., interactive traffic may get preference over bulk traffic. Bandwidth Utilization For Various Modes Of Transmission A network will usually serve several types of user communities and thus should be able to adapt the flow control strategy to the needs of the user. For example, transmission patterns for interactive users and bulk transfer users are quite different. Those differences should be reflected in the flow control strategies. Interplay With Subnet Flow Control Often the interfaces between modules representing levels of protocol can cause flow control problems [Reference 8]. For instance, the subnet flow control of the ARPANET is adversely affected whenever a host does not readily accept incoming data from the packet switch (IMP). TCP is especially flexible in this regard, be- cause it can absorb congested traffic from the subnet and discard it if necessary. Exchanging Flow Control Information 5c A windowing scheme to convey flow control information has been used for many different types of protocols. It is an efficient technique that is useful whenever positive acknowledgement and retransmission are used for reliable transmission. Flow control information is passed in the header of a packet as a window size. It is used in con- junction with the acknowledgement sequence number (the window's left edge) to determine the highest sequence num- ber that can be transmitted with some assurance that it will be acknowledged without retransmission. The acknowledge sequence number plus the window size gives the right edge of the flow control window. A nonzero window size gives permission to send a message of a certain length. It is an "oversend" to send messages with sequence numbers that exceed the window right edge. In TCP, oversends will occur occasionally, since the flow control information is always slightly out of date and it is possible to withdraw flow control credits. Occassional oversends are not a problem, because the receiver can al- Garlick, Rom, & Postel page 8 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols ways discard incoming data without sending acknowledgements. Determining the Window Size 5d The TCP acknowledgement and retransmission scheme allows flexibility in determining the correct flow control window size. The window size should indicate the willingness of the receiving process to provide buffer space. The window size could represent exactly the available buffer space that the user has offered for letter receiving (the conservative strategy), or it could reflect some expected buffer space, based on previous allocations (the optimistic strategy). Conservative Guaranteed Allocation The conservative approach to window size setting gives the receiving process almost full control over the flow control mechanism. By assuring the sender that there will be space for a particular number of octets, the policy reduces discards thus reducing the number of retransmissions. (Some messages may still be discarded if they arrive out of order and sufficient reassembly space is not available.) There are some disadvantages to the conservative strategy of window size setting. Flow control informa- tion is always slightly out of date when it is finally received. The receiving process could have drastically increased or decreased its allocation, making the infor- mation useless. Unless a process provides for double buffering, the window very likely will go from a fixed size (whatever the users buffer is) to zero, each time a message is passed on to the receiving process. Depend- ing on the scheduling algorithm in the host, this could result in windows of size zero, totally inhibiting mes- sage flow. Before messages can flow again, a packet with flow control information must arrive at the source. Thus, a round trip delay is experienced between messages and there is an increase of dataless packets in the net- work. Another related problem is that large single buffers may be used to receive small letters. If a window of say size k is advertised and a packet of size << k arrives that includes the end of a letter, then the destination Garlick, Rom, & Postel page 9 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols buffer is returned to the receiving process. The previ- ous flow control credit, which was large, is withdrawn and the window becomes zero. In the interim, the sender may have sent several small letters, thinking the receiver has the buffers to accept them. The receiving TCP, knowing that the receiving process has no available buffer space, will advertise a zero window. By the time the window information arrives at the sending TCP, it likely will be an inaccurate report and cause further delays. Optimistic Credits The alternative to the conservative approach is to send flow control information that is a good estimate of the expected receiver's available space [References 3,7]. Thus, the window size should be a function of previous window sizes as well as the current available space. The window size should be an average, weighted very heavily toward the current time, so that a process that is truly rejecting data will soon reflect a very small window. This method could even be mixed with heuristics to force the window to zero after a fixed period without re- ceiving. Optimistic allocation can do much to help solve the problem of drastic window size changes experienced with the conservative scheme. In granting permission to transmit messages before the user has allocated buffer space, it fills the pipe and allows a smoother flow. It is still reliable, because any message can be discarded in the receiver since it will be retransmitted later. The disadvantages of the method are its instability when faced with very irregular receiving patterns. A poorly behaving receiver can still sabotage this policy, but not as easily as with conservative technique. As will be shown below, an optimistic strategy may be quite dynamic with respect to recent receiving patterns, connection precedence, and the fair sharing of the available bandwidth. It may be possible to determine the semantics associated with the window size by exchanging transmission mode or topological information. When a connection is opened, Garlick, Rom, & Postel page 10 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols the transmission mode (e.g., interactive, bulk) and the topology (e.g., satellite link) could be exchanged. This would be used to determine the weighting of previ- ous window sizes in calculating the current window. To demonstrate the idea of an optimistic flow control policy, a method for setting the receive window size is given in Appendix B. Zero Flow Control Windows 5e It may be necessary to stop the flow on a TCP connection, i.e., stop all new transmissions and unnecessary retransmissions. This is required when there are no user receive buffers into which data can be placed. A zero re- ceive window indicates an unwillingless to receive data. This reluctance is conveyed to the remote TCP by sending a packet with zero in the window size field. When interpreting packets, each TCP must read window sizes on all packets, even those that acknowledge old duplicates. This is necessary for setting the window to zero when there is no data to carry the flow control in- formation. TCP must perform special functions with regard to sending packets into a zero window. If no data is being sent on the connection, a zero window is of no concern to the sending TCP. If there is data to be sent, it must be queued. If necessary, new data from the sending process must be rejected. The creation of new packets must be suspended entirely, and retransmission must be suspended, except for flushing controls, synchronizing controls, and the window opening control mentioned below. Opening a window of size zero also presents some special problems [Reference 6]. Since a window size can accompany each packet, it seems that the normal data packet and acknowledgement transmissions should be sufficient to vary the size of the windows. However, when the remote TCP is showing a zero receive window, it is difficult to send a window change reliably. A data packet cannot be sent be- cause the closed window indicates that only controls should be retransmitted; moreover, there may be no data to send. If ACKs are used and they arrive out of order, it may be impossible to tell if the window is opening or closing. Garlick, Rom, & Postel page 11 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols The problem of opening a window of size zero is solved by using a pair of controls, one sent by the local TCP that is making its window size nonzero (WOPEN) and one that is sent by the foreign TCP to acknowledge the opening (WACK). These are special controls that must be handled immediate- ly, without regard for flow control restrictions. If con- trols can be blocked by data, as in the present TCP, then the WOPEN must be tagged with, but must not consume a se- quence number. SEQUENCE NUMBER SPACE MANAGEMENT 6 The second area of the current TCP protocol that needs at- tention is that of the reliable handling of the sequence number space. In a packet-switching network with alternative routing schemes, a packet can have a relatively long lifetime, especially if the topology of the network in- cludes satellite links. Due to misrouting, a packet can ar- rive at its destination minutes or even hours late, depend- ing on the topology. A reliable protocol must be able to determine if such a packet is deliverable, acknowledgeable, or if it must be discarded without acknowledgement. If dur- ing the packet's transit time the connection is closed or broken due to a crash with loss of memory, then the packet is no longer valid. If the connection is reestablished, using the same source and destination addresses, then the arrival of the old packet can cause confusion in the re- ceiving TCP. A reliable mechanism must exist to guarantee that the receiving TCP can distinguish packets of the cur- rent connection from packets of an old connection. 6a Resynchronization, suggested by Tomlinson [Reference 4,5], is one such mechanism. Resynchronization is used in this paper to denote the mechanism itself, rather than the stage of the mechanism when the actual resetting of the sequence numbers is done. The scheme is based on selecting initial sequence numbers (ISN's) from a curve in the sequence- number/time plane. When a new connection is opened, its first sequence number is taken from the ISN curve. If the consumption of sequence numbers is satisfactory, i.e., simi- lar in slope to the ISN curve, resynchronization of sequence numbers need not occur. However, if the rate of consumption is too slow, resynchronization may be required to avoid colliding with the ISN curve. The ISN curve has a parallel boundary (defining a "forbidden zone") that indicates that Garlick, Rom, & Postel page 12 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols no new sequence numbers may be assigned and that resynchronization must take place immediately. If this is not done and if a crash occurs, sequence numbers assigned in the forbidden zone could conflict with the ISN chosen for the new connection. See Appendix C, and References 4, 5, 6 for further details of the resynchronization mechanism. 6b A few of the problems related to implementing resynchronization are discussed below. 6c Understanding and Documenting the Problem Even though the resynchronization method is a workable one, it is not at all straightforward. It takes numerous pages and illustrations just to document the concept [Reference 4,5,6]. As has been pointed out in the past by weathered ARPANET protocol implementers, a protocol must be reasonably easy to understand and easy to document. After all, if the network is heterogeneous, it will be implemented on numerous kinds of hardware by system programmers with various degrees of skill. Testing for the Need to Resynchronize The protocol requires that if a connection is broken due to a system crash, the sequence number chosen at startup must be one that cannot be confused with any sequence number still in the network for the old instance of that connection. To satisfy this requirement, periodic runtime checking must be done to determine if the se- quence number consumption rate is satisfactory, i.e., if it is approaching the forbidden zone. This check must be done at fixed time intervals, not just when sequence numbers are being assigned. The check may result in the need to resynchronize even (and especially) if the connection is idle. Resynchronization and Flow Control The need to resynchronize may occur at any time, and the resynchronization must proceed in a timely manner if normal activity is to continue. However, since resynchronization means changing from the old sequence numbers to new sequence numbers and since the resynchronization control must be acknowledged (marked with an "old" sequence number), all data marked with the Garlick, Rom, & Postel page 13 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols "old" numbers must be acknowledged before the resynchronization control is acknowledged. If data is not being accepted because the user is not receiving, then resynchronization cannot proceed. If resynchronization cannot proceed, then neither new con- trols nor new data may be sent. The Loss of a Truly Out-Of-Band Signal Due to the flow control problem mentioned above, all controls can be blocked during a resynchronization pro- cess. This includes the interrupt, which is supposed to be an out-of-band signal. Losing the out-of-band capa- bility, even in rare instances, is an unfortunate defi- ciency. Higher-level protocols that rely on an out-of- band signal could be severely crippled by the inability to interrupt a "runaway" process. In fact, it is the runaway process, by not accepting data, that will soon force resynchronization and will not be interruptable. Extra Connection States and Controls When a state diagram is used to represent a TCP connection, 40% of the connection states are a result of the resynchronization mechanism [Reference 6]. These seven extra states allow for simultaneous resynchronization attempts and resynchronization attempts during connection closing (with no data loss). One extra control is required to support resynchronization. It is believed that more would be required for satisfactory solutions to the problems of resynchronizing a connection that is blocked by data flow control and for support of a true out-of-band sig- nal. Decentralized Code Code to support resynchronization would be scattered throughout many modules of the protocol implementation. There must be a watchdog for detecting the forbidden zone. There would be heuristics strewn throughout the control sending and parsing modules. Also, to solve the flow control and interrupt problems mentioned above, special provisions must be made for either flushing data or saving old sequence numbers. Garlick, Rom, & Postel page 14 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols An Alternative to Resynchronization 6d An alternative to resynchronization is a strategy that uniquely names each instance of a connection. The name (or incarnation number) is passed in each packet and is used by the receiver to filter out packets from old connections. The incarnation number is generated from clock time; thus, like the resynchronization method, no crash-proof memory is required. Each time a TCP comes up, it determines its incarnation number from a clock. The appropriate clock resolution and wraparound period is a factor of the maximum packet lifetime for the network or interconnected network. Let us assume that the clock has a resolution of one minute and a wraparound period of 256 minutes. The resulting incarnation number is 8 bits long, and is used to assure the receiver that any message received with this incarnation number is from the active connection and not an old one. The uniqueness of the incarnation number al- lows the resetting of the sequence number space to zero at initialization of each new path (first connection between two users). When a connection is closed, a TCP must save the last se- quence number used. It must retain the number for time MPL (maximum packet lifetime). Saving the sequence number and the time of a closed connection solves the problem of the repeated opening and closing of the same connection (source and destination). It does not solve the problems created by TCP or host computer crashes. When connection establishment is requested, the list of old connections must be searched by (source, destination). If a match is found, the sequence number plus one is the first sequence number used when the connection is opened. If there is no match, then numbering can start at zero. Management of the old connection list entails removal of outdated items. This can be handled, for the most part, during normal searching. When list storage becomes scarce, a simple garbage collection routine can be invoked. There are two problems with the method using incarnation numbers. First, there is some concern about the size of the old connection list. It would not be surprising to see 1000 connections per hour for an average host. The Garlick, Rom, & Postel page 15 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols fact that TCP allows a socket to be party to many connections will lead to fewer source and destination pairs; thus, many connections will be reused. (This is in contrast to the ARPA network, where restrictions in socket usage result in contact connections being used to spawn direct, dynamically named service connections.) Another factor that alleviates concern about the space required for the old connection list is the recent progress in inexpensive memories. The second problem is how to keep the incarnation number small enough to be sent in each header and still keep the clock cycle (name space) large enough to ensure uniqueness. It is felt that an incarnation number field greater than 8 bits is excessive header overhead. To ac- commodate this, the resolution of the clock is constrained, which leads to the following restriction ap- plied at host startup time. When a host comes up after a crash, it must delay at least MPL / 2**8 before any connections are opened, so that a unique TCP incarnation number is always chosen. A startup delay of one minute is probably sufficient for the internetting case since it implies a maximum packet lifetime (MPL) of 256 minutes. THE NEED FOR A CONTROL SUBCHANNEL 7 In earlier versions of TCP, data, controls, and out-of-band signals (also a control) are all multiplexed onto one logical channel. This means that one set of sequence num- bers is used for their orderly and reliable delivery. 7a One advantage of a single logical channel is the savings in the TCP header. Protocol overhead is a serious matter, since it is suffered with each message. Let us assume that it is desirable to allow piggybacking of activity from each channels. Since each logical channel requires header fields for both a sequence number and an acknowledgement number, header sizes increase by twice the sequence number field size as each new channel is added. 7b A second advantage to one logical channel is the ability to synchronize the control stream with the data stream. Synchronization of the control and data streams is useful for handling interrupts and connection closing (without data loss). However, synchronization of streams can result in Garlick, Rom, & Postel page 16 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols unwanted interdependencies, since the acknowledgement of a control may require the acknowledgement of preceding data. 7c Two disadvantages of the single sequence number space scheme have been discovered recently: reassembly of data mixed with controls is costly when packets arrive out of order, and a true out-of-band signal is not being provided. The first problem is an efficiency matter that has plagued early implementers [Reference 9]. User buffer space cannot be used for the reassembly of out of order packets because there is no way to know if the unarrived packets contain only data or if controls are intermixed with the data. 7d The essence of the second problem is that the acknowledgement scheme requires that acknowledgement of a sequence number is implicit acknowledgement of all preceding sequence numbers. Since interrupts must be acknowledged for reliability, the transmission of an interrupt can be blocked by data flow control in the receiver. This was noticed by Cerf initially (Reference 2) and an attempt was made to rectify the matter by giving the interrupt extra semantics-- that it always flushes unacknowledged data. This solution is probably sufficient unless resynchronization methods are used for sequence number selection. 7e As mentioned earlier, when the resynchronization method is used, there is no clean solution to the problem of achieving both synchronization with the data stream and independence of data flow control. This is due to the fact that the resynchronizing control can be blocked by data flow control but cannot be flushed. 7f A compromise solution when using resynchronization is to separate controls and interrupts from the data channel, mak- ing a control subchannel. The control sequence number is the composite of the data channel sequence number (DCSN) and the subchannel sequence number (SCSN). This serves the dual purpose of synchronizing the two streams and using the resynchronization mechanism of the data channel for all subchannels. A subchannel allows reliable transmission even when the data channel is inactive, without flushing data. 7g From the SCSN, the number of control fields, and the last SCSN received, the receiver can determine if subchannel traffic is coming in order and thus, whether it can be acknowledged. 7h Garlick, Rom, & Postel page 17 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols The field size holding the SCSN determines the wraparound point in the SCSN space. The SCSN space is initialized to zero when the DCSN is synchronized. It IS NOT reset with each DCSN change. 7i There is no flow control information passed for the subchannel. Discarding controls (without acknowledgement) is the flow control mechanism. Since the sequence number space is small compared to that needed to prevent wraparound in the worst case, the TCP must keep track of the DCSN to which the first SCSN was assigned. If wraparound of the SCSN space occurs, in the rare event that many controls are sent while the data channel is blocked, then the control channel becomes blocked. This is very unlikely because a long series of controls will probably contain a string of interrupts, and successfully delivered interrupts will usu- ally cause the receiving process to unblock the data chan- nel. 7j Acceptability Test for Subchannel Traffic 7k The acceptability test of items on the subchannel is a composite test of both sequence numbers. First the DCSN is checked to see if it would be acknowledged if it were an octet received on the data channel. Only if it would have been discarded will the item on the subchannel be discarded. Having passed the DCSN test, the SCSN is checked to see if the item is deliverable and acknowledgeable with respect to the SCSN sequence number space. The SCSN test is less involved than the DCSN test because there is no flow control range. To be believable, the SCSN must fall in the range of SCSN's sent and SCSN's for which acknowledgements have been received. This is a check for everything except the existence of old duplicates from old instances of the connection, which is made by checking the DCSN. Garlick, Rom, & Postel page 18 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols A Scenario Using a Control Subchannel 7l Let us examine a short scenario between TCP A and TCP B. The scenario assumes connections have been established and transmission has proceeded normally. Only those header fields that relate to data and control channels will be indicated. Note that the control length can be determined by the receiver from other fields in the header. The fol- lowing shorthand will be used in the scenario: DSN - data sequence number DL - length of data in octets DACK - acknowledgement for all preceding data octets CSN - control sequence number CACK - acknowledgement for all preceding controls Garlick, Rom, & Postel page 19 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols #1 from TCP A ----------------------------------- ! DSN ! DL ! DACK ! CSN ! CACK ! ====> ! 100 ! 2 ! 200 ! 5 ! 25 ! ====> ----------------------------------- sends 2 data octets (100 & 101), acks data through 200; sends 1 control (5), acks controls through 25. #2 from TCP A ----------------------------------- ! DSN ! DL ! DACK ! CSN ! CACK ! ====> ! 102 ! 3 ! 200 ! 5 ! 25 ! ====> ----------------------------------- sends 3 data octets (102-104), acks data through 200; sends no controls, acks controls through 25. #3 from TCP A ----------------------------------- ! DSN ! DL ! DACK ! CSN ! CACK ! ====> ! 105 ! 3 ! 201 ! 6 ! 25 ! ====> ----------------------------------- sends 3 data octets (105-107), acks data through 201; sends 1 control (6), acks controls through 25. #4 from TCP B ---------------------------------- <==== ! DSN ! DL ! DACK ! CSN ! CACK ! <==== ! 202 ! 1 ! 101 ! 26 ! 6 ! ---------------------------------- Having received #1, #3, but not #2, sends 1 data octets (202), acks data through 101; sends 1 control (26), acks controls through 6. The main things to notice from this scenario are that data and controls are still piggybacked, as in the current version of TCP, and that there is a degree of independence between the two channels. As the scenario shows, TCP B can acknowledge controls that have arrived in order even though Garlick, Rom, & Postel page 20 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols it has not received data in order. Moreover, TCP B is able to use the latest data sequence number to test the accep- tability of the latest control sequence numbers. SUMMARY 8 Several suggestions have been presented here for the im- provement of TCP. The suggestions relate to improved effi- ciency, simplification of implementation, and protocol functionality. The motivation for the suggestions is more than to improve a specific protocol. It is also to focus attention on a set of issues that are common to all reliable host-to-host protocols. 8a Flow control ideas have been discussed, with attention to implementation ideas that satisfy fairly ambitious goals. Window management techniques have been suggested that could improve efficiency. A window setting method was presented that features optimistic credits that are a function of past credits, congestion, and available buffer space. 8b An alternative to the resynchronization method of sequence number space management has been given. The suggested meth- od is based on passing TCP incarnation numbers and keeping an old connection list. The method is simple to implement, requires no nonvolatile memory, and still guarantees reli- able detection of illegal packets. 8c Finally, the need for the separation of data and control channels was motivated. The solution, a reliable subchannel, is achievable with no separate sequence number space maintenance. 8d It is hoped that each of these suggestions will be imple- mented in future versions of TCP. There are interdependencies involved; that is, some of the stated problems become less severe when others are solved. For ex- ample, if resynchronization is abandoned, then the argument for separate channels is motivated only by the need for the efficient reassembly of out of order packets. 8e Of all the suggestions, the most important is that concern- ing a new approach to sequence number space management. However, if resynchronization methods are retained, then a Garlick, Rom, & Postel page 21 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols subchannel for controls is a must. Otherwise, a truly out- of-band signal is lost. 8f The discussion of flow control indicated areas that should gain attention as more experience with TCP is gained. This should be an area for significant measurement, under many different transmission modes. Garlick, Rom, & Postel page 22 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols REFERENCES [1] Cerf, V. and R. Kahn, "A Protocol for Packet Network Intercommunication," IEEE Transactions on Communica- tion, Vol COM-20, No. 5, May 1974. [2] Cerf, V., Y. Dalal, C. Sunshine, "Specification of Internet Transmission Control Program," INWG General Note #72, December 1974 (Revised). [3] Sunshine, C., "Interprocess Communication Protocols for Computer Networks," Digital Systems Laboratory Technical Note #105, December 1975. [4] Tomlinson, R., "Selecting Sequence Numbers," INWG Protocol Note #2, September 1974. [5] Dalal, Y., "More on Selecting Sequence Numbers," INWG Protocol Note #4, October 1974. [6] Postel, J., L. Garlick, R. Rom, "Transmission Con- trol Protocol Specification (AUTODIN II)," SRI-ARC Catalog #35938 & #35939, July 1976. [7] Sunshine, C., "Factors In Interprocess Communication Protocol Efficiency For Computer Networks," Proc. National Computer Conf., 1976, AFIPS Press, pp 571-576. [8] Herrmann, Jeff, "Flow Control in the ARPA Network," Networks, Vol 1, Number 1, June 1976. [9] Burchfiel, J., W. Plummer, R. Tomlinson, "Proposed Revisions to the TCP," INWG Protocol Note #44, Sep- tember 1976. Garlick, Rom, & Postel page 23 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols GLOSSARY AHHP: ARPANET host-to-host protocol. control: commands passed between TCP's that are used to co- ordinate connection management. DCSN: data channel sequence number. host: a computer that is connected to the network and that executes programs on behalf of its users. A host may pro- vide services to other computers on the network. ISN: Initial sequence number; the first sequence number used when a connection is synchronized or resynchronized. MPL: maximum packet lifetime. octet: eight bits. SCSN: subchannel sequence number; control channel sequence number. socket: an entity defining one end of a TCP connection; the inter-network-wide name of a process port. subnetwork: the network of computers that provides a com- munication medium for network hosts. The nodes of a subnetwork may function as host interface points as well as store and forward computers. TCP: Transmisssion Control Program and the protocol it implements. window: a dynamic range in the sequence number space used in flow control management. Garlick, Rom, & Postel page 24 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols APPENDIX A: PACKET ACCEPTANCE This appendix provides details of the TCP packet acceptance testing scheme. It should clarify the possible actions the receiving TCP may take when it receives an arbitrary packet. Remember, the receiver is responsible for the detection of packets with improper sequence numbers from either old connections or ill-behaving TCP's. For notation, let ADR = acknowledge and deliver range AOR = acknowledge only range DR = discard range S = size of sequence number space (number per octet) x = sequence number to be tested FCLE = flow control left window edge ADRE = (FCLE+ADR) mod S = Ack-deliver right edge (Discard left edge - 1) AOLE = (FCLE-AOR) mod S = Ack-only left edge (Discard right edge + 1) TSE = time since connection establishment (in sec) MPL = maximum packet lifetime (in sec) TB = TCP bandwidth (in octets/sec) For any sequence number, x, and packet text length, l, if (AOLE <= x <= ADRE) mod S and (AOLE <= x+l-1 <= ADRE) mod S then the packet should be acknowledged. If x and l satisfy (FCLE <= x <= ADRE) mod S and (FCLE <= x+l-1 <= ADRE) mod S Garlick, Rom, & Postel page 25 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols then x can also be delivered to the user; however, ordered delivery requires that x = FCLE. A packet is not in a range only if all of it lies outside a range. When a packet falls in more than one range, prece- dence is ADR, then AOR, then DR. When a packet falls in the AOR then an ACK should be sent, even if a packet has to be created. The ACK will specify the current left window edge. This assures acknowledgment of all duplicates. ADRE is exactly the maximum sequence number ever "advertised" through the flow control window, plus one. This allows for controls to be accepted even though permission for them may never have been explicitly given. Of course, each time a control with a sequence number equal to the ADRE is sent, the ADRE must be incremented by one. AOR is set so that old duplicates (from previous incarnations of the connection) can be detected and dis- carded. Thus AOR = Min(TSE, MPL) * TB. Garlick, Rom, & Postel page 26 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols APPENDIX B: WINDOW SIZE SETTING To demonstrate the idea of an optimistic policy for window size setting, a method for setting the receive window size is given [Reference 6]. The scheme satisfies the flow con- trol goals discussed earlier. Several parameters have been vaguely unspecified since they can be determined only after considerable testing and measurement of a specific TCP im- plementation. First, some notation: B - Total bandwidth of the TCP, given unlimited user re- sources N - The number of connections in the TCP CONGEST - A congestion factor which reflects available TCP resources (CONGEST =< 1) WLT - The long term window W - The current window AVWT - Weighting coefficient for available buffer space OLDWT - Weighting coefficient for old window (OLDWT = 1 - AVWT) Tot - Total user buffer space Avail - The unfilled part of Tot The long term window might look like: WLT = B/N * CONGEST. The algorithm used to update the current window is the fol- lowing. Upon the processing of a user's receive request (buffer offering), the local receive window is set so that: W = MINIMUM(WLT, Tot). Garlick, Rom, & Postel page 27 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols Each time a packet is sent for this connection, the local TCP sets the receive window and the packet header window size field so that W = (AVWT * Avail/Tot) * WLT + (OLDWT * W) (for nonzero Tot) and W = OLDWT * W (for Tot = 0). It is important to note that a user's receive buffer is re- turned when an End-of-Letter is received. Thus, a small letter sent to a large buffer can cause the Avail and Tot to vary abruptly, even though there may be a smooth flow of letters. This window size setting scheme meets the goals mentioned in section 3 in the following ways: WLT is dependent upon the number of the connections, thereby administering fairness among connections. It also considers the level of congestion in the receiving TCP, assuming some measure of resource availability can be pro- vided. The window size will never exceed the bandwidth allocated to the connection. The algorithm may sometimes give cre- dit to a "well behaving" process by setting his window to greater than the actual buffer available. This window will be reduced if the process does not supply new receive buffers promptly. The current window size is dependent upon previous window sizes and upon the rate at which the process makes letter space available. If a process fails to make such space available, its receive window will be reduced by OLDWT every time a packet is sent. (The TCP may also apply a threshold mechanism by which a window is set to zero when it is reduced below the threshold.) The algorithm can be modified slightly to support high throughput for high precedence connections. Parameter WLT cAn be made dependent on some criterion for the high pri- ority traffic. Categories of priority can be used with some guaranteed service (part of the bandwidth) given the highest priority categories. Garlick, Rom, & Postel page 28 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols APPENDIX C: RESYNCHRONIZATION DETAILS In Figure 2, we show the history of sequence numbers used by a particular connection. The lines labeled "ISN" represent the maximum permitted rate at which sequence numbers can be used, however, this may be different than the maximum throughput rate for the TCP. Suppose that the TCP supporting the connection fails at "C" and must be restarted. Assume, also, that the sequence num- ber selected to restart is drawn from the value of ISN at the time event "C" occurred. The shaded area between "C" and "B" represents the maximum expected time that packets emitted at "C" can stay in the net. Clearly, the ISN line intersects this shaded area, indicating that, after the restart, it is possible that packets emitted at "C" may be- come undistinguishable from those potentially emitted along the ISN curve. To correct this flaw, the sequence number currently to be used on the connection must be resynchronized before running into the forbidden zone to the left of the ISN line. Testing for the need to resynchronize As packets are produced and sequence numbers assigned to them, the TCP must check for two possible conditions which indicate that resynchronization is needed. The first is that sequence numbers are being used up so fast that they advance faster than ISN. The other is that they advance so slowly that ISN "catches up with them." The basic method of selecting an initial sequence number is to delay for an arbitrary period labelled a "clock tick" or STEP and then select the new ISN. In Figure 2, three sequence number histories are traced, ending in points "A", "B", and "C". In the trace labelled "A," sequence numbers are used at such a rate that point "A" lies beyond ISN plus one STEP. If the connection were to fail and be restarted at "A," the new ISN would be just below point "A" and would introduce potential unwanted duplicates. This situation can be detected before transmission of the packet. Let L be the length of the data in octets. Let SEQ represent the proposed sequence number of the packet, and SEQ+L-1 be the sequence number implicitly associated Garlick, Rom, & Postel page 29 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols with the last octet of packet data. Also, let SMPL be the sequence numbers consumed at maximum TCP throughput during a maximum packet lifetime. If ISN+STEP (at the moment that SEQ is to be assigned) lies in the range [SEQ, SEQ+L-1], then the type "A" ISN failure is about to occur. The solution is to send only as much text as is allowed (which does not result in the failure) and WAIT for the clock to tick again. The situation in curve "B" is quite different. In this case, the connection is using numbers so slowly that the forbidden zone preceding the ISN curve has advanced and run into the connection sequence number curve. There are two solutions. One is to wait for the packet lifetime plus one clock step to expire (in which case the sequence history will pop out of the forbidden zone again). The other is to actively resynchronize the connection. The test for the type "B" situation is whether sequence number SEQ lies in the range [ISN, ISN+SMPL+STEP]. Note that all tests for inclusion must be modulo S, the size of the sequence number space, to account for the wrap around of sequence numbers. Curve "C" in Figure 2 shows a sequence number trace which tends, on the average, to lie within legal values at all times. Garlick, Rom, & Postel page 30 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols Garlick, Rom, & Postel page 31 LLG 8-Jun-77 13:01 29364 Issues in Reliable Host-to-Host Protocols As presented at the Second Berkeley Workshop on Distributed Data Management and Computer Networks, May 1977, at Berkeley, California.####; Garlick, Rom, & Postel page 0