Congestion Control and Active Queue Management During Flow Startup

(1)

Department of Computer Science Series of Publications A

Report A-2019-7

Congestion Control and Active Queue Management During Flow Startup

Ilpo J¨arvinen

Doctoral dissertation, to be presented for public examination with the permission of the Faculty of Science of the University of Helsinki, in Auditorium A129, Chemicum building, Kumpula, Helsinki, on the 14th of November, 2019 at 12 o’clock.

University of Helsinki Finland

(2)

Supervisor

Markku Kojo, University of Helsinki, Finland Sasu Tarkoma, University of Helsinki, Finland Pre-examiners

Pasi Sarolahti, Aalto University, Finland Michael Welzl, University of Oslo, Norway Opponent

Anna Brunstr¨om, Karlstad University, Sweden Custos

Sasu Tarkoma, University of Helsinki, Finland

Contact information

Department of Computer Science P.O. Box 68 (Pietari Kalmin katu 5) FI-00014 University of Helsinki Finland

Email address: info@cs.helsinki.ﬁ URL: http://cs.helsinki.ﬁ/

Telephone: +358 2941 911

Copyright c 2019 Ilpo J¨arvinen ISSN 1238-8645

ISBN 978-951-51-5585-6 (paperback) ISBN 978-951-51-5586-3 (PDF) Helsinki 2019

Unigraﬁa

(3)

Congestion Control and Active Queue Management During Flow Startup

Ilpo J¨arvinen

Department of Computer Science

P.O. Box 68, FI-00014 University of Helsinki, Finland ilpo.jarvinen@cs.helsinki.ﬁ

PhD Thesis, Series of Publications A, Report A-2019-7 Helsinki, November 2019, 87+48 pages

ISSN 1238-8645

ISBN 978-951-51-5585-6 (paperback) ISBN 978-951-51-5586-3 (PDF) Abstract

Transmission Control Protocol (TCP) has served as the workhorse to transmit Internet traffic for several decades already. Its built-in congestion control mechanism has proved reliable to ensure the stability of the Internet, and congestion control algorithms borrowed from TCP are being applied largely also by other transport protocols. TCP congestion control has two main phases for increasing sending rate. Slow Start is responsible for starting up a flow by seeking the sending rate the flow should use. Congestion Avoidance then takes over to manage the sending rate for flows that last long enough. In addition, the flow is booted up by sending the Initial Window of packets prior to Slow Start.

There is a large difference in the magnitude of sending rate increase during Slow Start and Congestion Avoidance. Slow Start increases the sending rate exponentially, whereas with Congestion Avoidance the increase is lin- ear. If congestion is detected, a standard TCP sender reduces the sending rate heavily. It is well known that most of the Internet flows are short. It implies that flow startup is a rather frequent phenomenon. Â Also, many traffic types exhibit an ON-OFF pattern with senders remaining idle for varying periods of time. As the flow startup under Slow Start causes exponential sending rate increase, the link load is often subject to exponential load transients that escalate in a few round trips into overload, if not controlled properly. It is true especially near the network edge where traffic aggregation is limited to a few users.

iii

(4)

iv

Traditionally much of the congestion control research has focused on behavior during Congestion Avoidance and uses large aggregates during testing.

To control router load, Active Queue Management (AQM) is recommended.

The state-of-the-art AQM algorithms, however, are designed with little at- tention to Slow Start. This thesis focuses on congestion control and AQM during the flow startup. We explore what effect the Initial Window has to competing latency-sensitive traffic during a flow startup consisting of multiple parallel flows typical to Web traffic and investigate the impact of increasing Initial Window from three to ten TCP segments. We also highlight what the shortcomings are in the state-of-the-art AQM algorithms and formulate the challenges AQM algorithms must address to properly handle flow startup and exponential load transients. These challenges include the horizon problem, RTT (round-trip time) uncertainty and rapidly changing load. None of the existing AQM algorithms are prepared to handle these challenges. Therefore we explore whether an existing AQM algorithm called Random Early Detection (RED) can be altered to control exponential load transients effectively and propose necessary changes to RED. We also propose an entirely new AQM algorithm called Predict. It is the first AQM algorithm designed primarily for handling exponential load transients.

Our evaluation shows that because of shortcomings in handling exponential load transients, the state-of-the-art AQM algorithms often respond too slowly or too fast depending on the actual RTT of the traﬃc. In contrast, the Predict AQM algorithm performs timely congestion indication without compromising throughput or latency unnecessarily, yielding low latency over a large range of RTTs. In addition, the load estimation in Predict is designed to be fully compatible with pacing and the timely congestion indication allows relaxing the large sending rate reduction on congestion detection.

Computing Reviews (2012) Categories and Subject Descriptors:

C.2.1 Network Architecture and Design C.2.2 Network Protocols

C.2.3 Network Operations C.2.6 Internetworking

C.4 Performance of Systems General Terms:

Active Queue Management, Congestion Control, Access Networks, TCP

(5)

v

Additional Key Words and Phrases:

Load Transients, Flow Startup

(6)

vi

(7)

Acknowledgements

I am very thankful and lucky to have Markku Kojo as my closest PhD supervisor who has tirelessly used time on reviewing both the papers and thesis manuscript. His suggestions to improve the contents have been in- valuable. The second special mention goes to Laila Daniel who encouraged and persuaded me to continue on solving the remaining accuracy stum- bling blocks for Predict that had been, at that point, put aside for years as unworkable. Without it, the thesis might have needed to take another course.

Besides the coauthors in the papers, a number of other people have had a significant impact on making this thesis to happen. My second PhD supervisor Sasu Tarkoma has friendly kept “bugging” me on the progress of the research and thesis. Without the excellent support of our PhD coordinator Pirjo Moen, the steps when nearing the finish line would have been much more painful to understand. The language in the thesis owes much to Marina Kurtén who provided a long list of corrections to improve the grammar and style. I am also thankful of the useful feedback from the pre-examiners Michael Welzl and Pasi Sarolahti.

Part of this work has been carried out in Wireless Broadband Access (WiBrA) project funded by TEKES Finland, Nokia, Nokia Siemens Net- works, and TeliaSonera Finland.

Helsinki, October 2019 Ilpo J¨arvinen

vii

(8)

viii

(9)

List of Reprinted Publications

Flow Initiation

Research paper I:I. Järvinen, B. Chemmagate, A. Y. Ding, L. Daniel, M. Isomäki, J. Korhonen, and M. Kojo. “Effect of Competing TCP Traffic on Interactive Real-Time Communication”. In: Proc. 14th Passive and Active Measurement Conference (PAM 2013), Lecture Notes in Computer Science. (Hong Kong). Vol. 7799. Springer Berlin Heidelberg, March 2013, pp. 94–103. doi: 10.1007/978-3-642-36516-4_10

Active Queue Management During Flow Startup

Research paper II: I. J¨arvinen, Y. Ding, A. Nyrhinen, and M. Kojo.

“Harsh RED: Improving RED for Limited Aggregate Traﬃc”. In: Proc.

26th IEEE International Conference on Advanced Information Networking and Applications (AINA-2012). (Fukuoka, Japan). IEEE, March 2012, pp. 832–840. doi: 10.1109/AINA.2012.103

Research paper III:I. J¨arvinen and M. Kojo. “Evaluating CoDel, PIE, and HRED AQM Techniques with Load Transients”. In: Proc. 39th IEEE Conference on Local Computer Networks (LCN 2014). (Edmonton, Al- berta, Canada). IEEE, September 2014, pp. 159–167. doi: 10.1109/LCN.

2014.6925768

Research paper IV: I. J¨arvinen and M. Kojo. “Gazing Beyond Hori- zon: The Predict Active Queue Management for Controlling Load Tran- sients”. In: Proc. 31st IEEE International Conference on Advanced In- formation Networking and Applications (AINA-2017). (Taipei, Taiwan (ROC)). IEEE, March 2017, pp. 126–135. doi: 10.1109/AINA.2017.110

ix

(10)

x

(11)

List of Abbreviations

3G Third Generation Cellular Technology ABC Appropriate Byte Counting

ABE Alternative Backoﬀ with ECN

ACK Acknowledgement

AQM Active Queue Management

ARED Adaptive RED

AVQ Adaptive Virtual Queue

BBR Bottleneck Bandwidth and Round-trip Propagation Time BDP Bandwidth Delay Product

CBI Control Block Interdependence CBR Constant Bit Rate

CDF Cumulative Distribution Function CE Congestion Experienced

CoDel Controlled Delay

CWR Congestion Window Reduced CWV Congestion Window Validation DAASS Delayed ACK After Slow Start

DASH Dynamic Adaptive Streaming over HTTP DCH Decicated Channel

DCTCP Data Center TCP

DNS Domain Name System

DOCSIS Data Over Cable Service Interface Speciﬁcation DRR Deﬁcit Round Robin

DSL Digital Subscriber Line DualQ Coupled Dual Queue AQM

ECE ECN-Echo

ECN Explicit Congestion Notiﬁcation ECT ECN Capable Transport

EDGE Enhanced Data Rates for GSM Evolution EWMA Exponentially Weighted Moving Average FCT Flow Completion Time

xi

(12)

xii

FIFO First-in, First-out FIFO First In First Out

FPGA Field-Programmable Gate Array FQ-CoDel FlowQueue-CoDel

FTP File Transfer Protocol

GRED Gentle RED

GSM Global System for Mobile Communications HAS HTTP Adaptive Streaming

HRED Harsh Random Early Detection HSPA High-Speed Packet Access HTML Hypertext Markup Language HTTP Hypertext Transfer Protocol

HULL High-bandwidth Ultra Low Latency IETF Internet Engineering Task Force IPDV IP Packet Delay Variation IP Internet Protocol

IQR Interquartile Range

IW Initial Window

L4S Low Latency, Low Loss, Scalable Throughput LEDBAT Low Extra Delay Background Transport LPF Low-Pass Filter

MADPIE Maximum and Average Queuing Delay with PIE MD Multiplicative Decrease

MTU Maximum Transmission Unit NAT Network Address Translation NTP Network Time Protocol NVP Non-Validated Period

P2P Peer-to-Peer

PIE Proportional Integral Controller Enhanced PI Proportional Integral Controller

PQ Phanthom Queue

QUIC Quick UDP Internet Connections RED Random Early Detection

RFC Request For Comment RTO Retransmission Timeout RTP Real-time Transport Protocol

RTT Round-Trip Time

SACK Selective Acknowledgement SFQ Stochastic Fair Queuing SRED Stabilized RED

(13)

xiii SSL Secure Socket Layer

SYN Synchronize

TCP Transmission Control Protocol

TFO TCP Fast Open

TLS Transport Layer Security UDP User Datagram Protocol

VCP Variable-structure Congestion Control Protocol WLAN Wireless Local Area Network

WRED Weighted RED

WWW World Wide Web

XCP Explicit Control Protocol

(14)

xiv

(15)

Chapter 1 Introduction

Congestion control is essential to ensure proper functioning of the Inter- net. Without congestion control, the Internet would become unusable due to overload. The congestion control mechanism of Transmission Control Protocol (TCP) [13, 151] has proved reliable to ensure the stability of the Internet, and congestion control algorithms borrowed from TCP are being applied largely also by other transport protocol.

1.1 Background and Motivation

Internet traffic patterns have changed over the years. One major reason is the World Wide Web (WWW). Soon after its invention in 1989, the web became very popular and since then a large portion of Internet traffic has been web traffic [185]. Web pages are transferred using Hypertext Transfer Protocol (HTTP) [23, 26, 73–75] that is layered on top of Transmission Control Protocol (TCP) [151]. Web traffic consists largely of short TCP flows and alternation between active and inactive transmission periods is frequent [1, 49, 56, 63]. Web browsers typically use parallel connections to reduce latency in fetching a web page that consists of many HTTP objects.

TCP has a built-in congestion control mechanism to ensure the stability of the Internet by preventing congestion collapse [102, 140] and to prevent the use of excessive sending rate over a prolonged period of time. The TCP congestion control mechanism is self-clocked using acknowledgement packets (ACKs) as feedback and tries to follow a “packet conservation principle”

that allows sending a new data packet to the network only when another packet has left it [102]. TCP congestion control has two main phases: Slow Start and Congestion Avoidance [13]. A TCP sender ﬁrst uses Slow Start to probe for the available capacity on the end-to-end path that is unknown

1

(18)

2 1 Introduction to the sender. The ACK clock is booted up for Slow Start by sending up to the TCP Initial Window (IW) worth of packets to the network. Then the sender increases the sending rate exponentially during Slow Start. The sender has to violate the packet conservation principle every time it increases the sending rate because obeying the principle would only allow maintaining the same sending rate. Slow Start continues until the TCP flow hits “the ceiling” that the sender discerns through packets dropped in the network due to congestion. Once the proper sending rate is acquired, the TCP sender continues in Congestion Avoidance, which is much less aggressive compared with Slow Start. We can compare these two congestion control phases to a storm and calm waters, the storm being Slow Start and calm waters matching Congestion Avoidance. To make things worse, the storms caused by Slow Start do not settle until the sender finds the ceiling through packet drops (assuming there is enough payload to be transmitted in the TCP flow to feed the storm).

The use of short TCP flows with web traffic emphasizes the importance of the Slow-Start phase that is now the norm rather than an infrequent occurrence. It also makes the Initial Window important because web traffic often uses parallel flows to transfer a web page. Slow Start produces exponential load transients. The effect of exponential load transients on the load of a link is most significant close to the network edge where traffic aggregation is limited to only a single or a few users sharing the link. Such an access network or access link is often also the bottleneck link, that is, the narrowest link on the whole end-to-end path. When the bottleneck link is close to an end-user, it is very likely also saturable by one end-user alone, in contrast to the network core where the traffic aggregation level is higher than near the network edge.

When the sending rate is higher than the rate of the bottleneck link, a queue forms at the router before the bottleneck link. The router needs to somehow manage the queue. Traditionally, a simple First-in, First-out (FIFO) policy together with drop from the tail when the buffer is full (Taildrop) has been used. Due to issues with persistently full or nearly full buffers, Active Queue Management (AQM) was introduced to pro- actively drop packets before the buffer becomes full to leave buffer space for transient bursts. On the early round-trips (RTT) of Slow Start, the queue is typically transient as the link is not yet saturated. The link drains the queue during the intermediate idle periods of each RTT. However, Slow Start rapidly escalates from a low load to the full load and overload in just a few RTTs because of the exponential nature of the sending rate increase. Once the end-to-end path becomes fully utilized by Slow Start, a standing queue

(19)

1.1 Background and Motivation 3 forms in front of the bottleneck link. If the AQM algorithm does not react in time, the standing queue keeps growing with a rapid rate leading to a large delay spike that may harm competing traffic, for example, a competing interactive media flow or a flow in the same web transaction. Because AQM algorithms have been designed only for coping with Congestion Avoidance, the rapid load growth during exponential load transients frequently takes them by surprise. Slow Start easily overpowers the control authority in AQM algorithms leading to suboptimal performance.

Designing AQM algorithms with awareness for exponential load transients has serious challenges. This thesis explores those challenges, namely the horizon problem and RTT uncertainty. Many AQM algorithms base their control decisions on queue visible at the router, either measuring queue length or queuing delay. The horizon problem, however, prevents the router from acquiring a complete picture of the traﬃc load by simply measuring its current queuing, because the distribution of the load over the end-to-end path keeps some of the load-inducing packets out of the view of the router. At the same time, inability to know the RTTs of the ﬂows going through the router also makes it challenging for the AQM algorithm to use a proper measurement interval for load calculation as it does not know what that interval should be.

Even though the experiments in this thesis are carried out using TCP, we believe it does not significantly limit the generality of the findings. The generality comes from other protocols running on top of another transport protocol, mainly User Datagram Protocol (UDP) [152], implementing congestion control features that are heavily borrowed from TCP algorithms (e.g., QUIC [100, 101, 124]). In addition, more and more traffic that has traditionally transferred over UDP is being moved to work on top of HTTP which is especially true with streaming traffic. It is now largely transmitted using HTTP adaptive streaming (HAS) [182] instead of, for example, Real-time Transport Protocol (RTP) [174] on top of UDP. As HTTP traffic is transmitted using either TCP itself or using QUIC that implements TCP-like congestion control, our focus on TCP is very valid in the Internet of today.

Given the frequent occurrence of exponential load transients in normal network traﬃc, considering them during AQM algorithm design seems a natural design choice. To ignore them is like designing a ship (AQM algorithm) only for calm waters. Unfortunately, exponential load transients are rather nasty storms that occur very frequently shaking the unequipped ship over and over again. As the storms caused by exponential load tran-

(20)

4 1 Introduction sients are such that they are only quenched after the AQM algorithm takes serious actions, the AQM designs ignore them only at their own peril.

This thesis focuses on the behavior during a ﬂow startup and exponential load transients that occur during the bandwidth-probing phase. We cover (a) the issues that occur due to the unpreparedness of the congestion control and AQM algorithms to the rapid changes in the load and (b) propose solutions that are aware of exponential load transients.

1.2 Problem Statement and Scope

This thesis is composed of two research themes related to ﬂow startup.

The first research theme focuses on flow initiation, that is, to the very first round trip (RTT) during which data is being transmitted when a flow starts up. In addition to existing aggressive behavior in the Internet with web traffic using parallel TCP flows, there is a proposal on making the TCP Initial Window larger [50]. This relatively recent proposal and web traffic using parallel TCP flows need to be evaluated together. The related research questions are as follows:

RQ1 Does parallel ﬂow initiation introduce load transients with delays that are large enough to be harmful to competing delay-sensitive traﬃc?

RQ2 What is the impact of the proposed increased TCP Initial Window?

RQ1 addresses the effect of existing behavior in web browsers that typically open a large number of parallel TCP flows. RQ2 aims to give insights into how the aggressiveness develops if the proposed TCP modifications are widely deployed.

The second research theme focuses on Active Queue Management (AQM) during ﬂow startup. Our initial research questions related to existing work on this theme are:

RQ3 Do the state-of-the-art AQM algorithms [80, 94, 142, 143, 145, 146]

work properly during exponential load transients that occur due to ﬂow startup?

RQ4 What are the issues AQM algorithms need to solve to work properly during exponential load transients?

RQ3 seeks for an answer to whether the existing AQM algorithms are adequate for handling ﬂow startup-related exponential load transients. In addition, a negative answer to RQ3 gives insights into what the problems

(21)

1.2 Problem Statement and Scope 5 with the state-of-the-art AQM algorithms are, leading us to RQ4. To answer RQ4, we were able to crystallize the AQM problems with exponential load transients to two root problems that we labelled as the horizon problem and RTT uncertainty. Together those two problems make the load estimation on a router challenging. The issues caused by these problems are somewhat known beforehand to congestion control research and listed as open challenges by RFC 6077 [147]. This thesis reﬁnes the understanding about these challenges.

Since none of the existing AQM algorithms properly addresses the issues that occur during exponential load transients, we realized that we need a new AQM algorithm that is designed primarily with the exponential load transients in mind. The remaining research questions are related to such an AQM algorithm:

RQ5 Can an AQM algorithm measure link load properly if it solves the horizon problem and RTT uncertainty?

RQ6 Can the known characteristics of the exponential load transient be used by an AQM algorithm to predict the load into the near future?

RQ7 Can an AQM algorithm send a congestion signal at the right point of time to terminate the bandwidth-probing phase of a connection when the bottleneck link becomes fully utilized?

RQ5 seeks for an answer to whether the more accurate understanding on why AQM algorithms have a hard time to correctly manage the exponential load transients helps in constructing an AQM algorithm that properly measures the current link load. If the answer to RQ5 is yes, the acquired knowledge about the parameters of the exponential load transient may allow further reﬁnement of the AQM algorithm. The reﬁnement is formalized in RQ6 aiming to give predictive power to the AQM algorithm.

The prediction allows the AQM algorithm to circumvent a third problem that occurs during exponential load transients due to rapidly changing load.

The high rate of change in the load makes the current link load estimate already stale once the router is able to enforce a sender response that occurs only after one additional RTT. The prediction powers the AQM algorithm to take into account the RTT-long delay in feedback allowing the router to send timely congestion signals to the senders, which could enable solving RQ7 about timely feedback to an ongoing bandwidth-probing phase.

This thesis focuses on load transients that occur during ﬂow startup.

The behavior and problems that occur after the ﬂow startup during TCP Congestion Avoidance are beyond the scope of this thesis. Thus, advanced

(22)

6 1 Introduction TCP algorithms that only start working during the Congestion Avoidance phase of the ﬂow such as CUBIC [88, 159] and Compound TCP [180, 184]

fall outside the scope.

Load transients have a big impact on the load of the routers at the network edge or near it. Such routers typically serve only one to a few users with limited flow aggregate and low overall utilization. This thesis focuses on the effects of load transients to such routers. In many carrier- grade routers on the Internet core the level of traffic aggregation is very high and the effect of a single flow or an end host on their load is miniscule.

AQM on such a router is outside the scope of this thesis, however, as long as any single sender may cause an overload situation, the work in this thesis is relevant also on such routers. The overload condition can occur if a single sender has enough capacity to saturate the outgoing link or can cause a signiﬁcant load swing on top of the other traﬃc.

The scope of this thesis is further narrowed by excluding ﬂow handshake- related problems and improvements. Therefore TCP three-way handshake [151] improvements (e.g., TCP Fast Open [47]) and possible use of Trans- port Layer Security (TLS) [64–66, 157, 188] (formerly Secure Socket Layer (SSL) [83]) are outside the scope. While the number of RTTs needed during the handshake increases ﬂow completion times, which is a problem from the end-user perspective, it is less of a problem from the congestion control point of view. It is the actual payloads that are more problematic for the congestion control and therefore this thesis focuses on the behavior after the completion of the initial handshakes.

1.3 Methodology

In this thesis, we perform experimental network measurements to evaluate performance during a flow startup. The experiments are conducted either with real cellular 3G hardware or as simulations using the ns2 network simulator [99]. We use workloads that mimic mainly web traffic object response with a rough level of abstraction excluding the effects of Domain Name System (DNS) [138, 139] queries, web object requests, and delays due to structural dependencies within web page, which all are relevant to real web transactions. To validate the measurement toolset, a smaller set of preliminary tests is first run to identify problems. If unwanted behavior due to bugs, misconfigurations, misimplementations, etc. are found during analysis, the issues are fixed before running the full set of tests. During the work, we identified and fixed a few issues in the ns2 RED implementation.

The results from the experiments are carefully analyzed in order to conﬁrm

(23)

1.4 Thesis Contributions 7 the validity of the results and to locate the root cause for each phenomenon.

If a validity problem is found only at this late stage, the problems are ﬁxed and the aﬀected experiments are rerun.

In this thesis we use mainly workloads and metrics that try to minimize the impact of samples from the flows that have transitioned to Congestion Avoidance. This selection is based on our observation that often the transient problems are short in duration compared with the overall lifetime of many experiments by other researchers. Therefore, the transient problems often fail to show up clearly, if at all, in statistics that include a vast number of samples from non-transient phases of the connections. In our measurements interest is quite often placed on the worst-case behavior because it sets a clear performance bound that is independent of the aspects we ex- cluded during our web traffic abstraction. While the aspects abstracted away would also affect user perceived performance, in practice it would explode the required test space to cover a comprehensive set of real web pages.

In this thesis, we use mainly TCP to provoke congestion. However, the flow startup-related problems are generalizable to other transport protocols. Often such protocols even borrow heavily from the vast library of TCP algorithms developed to handle various scenarios. Even if the algorithms used by the other protocol are not exactly the same, they often still include elements similar to those of TCP, which stems from the TCP- friendly concept [79]. As the TCP-friendly concept dictates that the other transport protocols cannot be too aggressive compared with a competing TCP flow, it has led to adoption of similar behavior in practice. Further- more, there is only quite a limited set of ways to do the flow startup anyway, which forces adoption of similar behavior for the flow startup regardless of the protocol. In addition, measurements show that when the link capacity is highly utilized, HTTP running on top of TCP is by volume responsible for most of the traffic [166].

1.4 Thesis Contributions

This thesis contributes to the knowledge on how aggressiveness during a flow startup affects performance of the flow and the co-existing flows that share the same bottleneck link, especially the peak queuing delay. The first research theme about flow initiation explores how the end-host congestion control needs to initiate flows in order to not produce delay spikes harmful to concurrent traffic that may be sensitive to latency.

(24)

8 1 Introduction The second research theme identifies and proposes solutions to the in- teractions between flow startup and AQM algorithms. To the best of our knowledge, we are the first to develop AQM algorithms based primarily on behavior during exponential load transients. We measure how the existing mainstream AQM algorithms perform during exponential load transients, identify their key shortcomings, and propose improved AQM algorithms that are prepared to handle exponential load transients.

The research in this thesis is presented in four original publications that are referred to as Pub. I, II, III, and IV. As the research conducted in this thesis also covers proposals active in the Internet Engineering Task Force (IETF), an important part of the research is contributing the evaluation results of the active proposals to IETF. The contribution of each publication is described below.

Pub. I: In this paper we evaluate the impact of aggressive ﬂow startup.

The evaluation covers the impact of parallel connections typical to web traf- ﬁc combined with the proposal to increase TCP Initial Window (IW) [50]

from three to ten TCP segments ¹. The experiments for this paper are conducted in a real high-speed cellular network and the impact of the aggressive flow startup is quantified using the delay imposed on a concurrent interactive media flow that is latency sensitive. The work in this paper answers RQ1 and RQ2 giving insight into how the end hosts should handle flow initiation in order to not harm latency-sensitive traffic that may be competing. We also confirm in this paper that a long-lived bulk TCP connection in the particular cellular network cause excessive buffering known as bufferbloat [84] that has a very devastating impact on interactive traffic. The performance impacts due to the long-lived bulk TCP connection, however, are beyond the scope of this thesis.

The author is the major contributor to the measurement toolset and to the result analysis. Some parts of the toolset originate from earlier tests by Aki Nyrhinen and Binoy Chemmagate but they were improved by the author. Eﬀectively, most of the toolset was rewritten by the improvements from the author and by a limited amount also the new work from Binoy Chemmagate. The author is the main contributor of the paper content.

Binoy Chemmagate was responsible for running the experiments and did small parts of the result analysis. Laila Daniel provided useful advice.

Aaron Yi Ding assisted with related work and together with Markku Kojo gave useful suggestions. Markus Isom¨aki and Jouni Korhonen worked as technical advisers.

1We also evaluated the impact of reduced initial RTO [148] but found out its impact small and due to space limitation results were not included into Pub. I.

(25)

1.4 Thesis Contributions 9 Pub. II: This paper highlights how the Random Early Detection (RED) AQM algorithm [80] responds too slowly when it encounters exponential load transients that are typical to occur with limited aggregate traffic and proposes the HRED (Harsh RED) AQM algorithm that properly responds to exponential load transients. The paper compares HRED performance to a FIFO queue with the simple Taildrop and RED with recommended parameters using the ns2 network simulator [99]. For this paper, we developed a simulation model for exponential load transients that mim- ics behavior that, for example, a web transaction using parallel connections to transmit HTTP objects would cause when aggregate traffic over a router is limited. Such condition commonly occurs at the network edge where the traffic for a router originates from one or a few users. This ns2 model is then used in all our subsequent work. While analyzing the results from the simulations we also discovered issues in the ns2 RED implementation. We fixed the issues before rerunning the affected simulations.

The results in this paper show how RED with the recommended parameters leads to anomalous behaviors during exponential load transients which causes drastic performance reduction. HRED, on the other hand, is shown to properly control the queue without anomalies. Based on the HRED experience we discover that contrary to the conventional wisdom on setting the RED parameters to respond “slowly” to congestion, it is safer to operate RED “too fast” to prevent anomalous behavior in the RED algorithm.

This paper also shows that an equal response from an AQM algorithm to TCP Slow Start and Congestion Avoidance [13, 102] causes performance issues because they have a very diﬀerent level of aggressiveness.

The author developed HRED, the measurement tools, and performed the result analysis. Discussion about exponential load transients and TCP Slow Start nuances with Aki Nyrhinen were very helpful while designing HRED. The author is the main contributor of the paper content. Figure 2 and Figure 1 were made by Aaron Yi Ding, the latter according to the instructions of the author. Aaron Yi Ding helped with the related work and together with Markku Kojo helped in writing and ﬁnalizing the paper.

Pub. III: In this paper we evaluate how AQM algorithms perform during exponential load transients. The paper compares HRED proposed in Pub. II to the state-of-the-art AQM algorithms CoDel (Controlled De- lay) [142, 143] and PIE (Proportional Integral controller Enhanced) [145, 146] using the ns2 simulation model we developed in Pub. II. This paper together with Pub. II answers RQ3. The results show that the state-of- the-art AQM algorithms handle exponential load transients inadequately.

We also discovered an issue in the departure rate estimator of PIE that

(26)

10 1 Introduction eﬀectively disabled the whole PIE algorithm when the link bandwidth is low. We have contributed to the PIE speciﬁcation [145] to ensure it does not leave the particular part of the algorithm unclear.

The author is the sole toolset contributor and performed the result analysis. The author is the main contributor of the paper content but has received valuable feedback and suggestions from the coauthor.

Pub. IV: This paper explores the issues AQM algorithms face dur- ing exponential load transients and proposes a new AQM algorithm called Predict. The paper reﬁnes the understanding about the limitations in the existing AQM algorithms by deﬁning the horizon problem and RTT uncertainty that make load estimation challenging for routers. The HRED algorithm proposed in Pub. II went as far as the state-of-the-art AQM algorithms seem to be able to in responding to exponential load transients.

However, HRED is still seriously limited by the requirement for a known, constant end-to-end RTT, that is, it would need to solve the RTT uncertainty. Instead of trying to improve HRED, this paper proposes the Predict AQM algorithm that not only solves the horizon problem and RTT uncertainty but is also able to predict the load into the near future, effectively answering RQ5, RQ6, and RQ7 positively. Pub. II found out that it is necessary to differentiate responses depending on whether there is Slow Start in progress or not, Predict solves this differentiation by detecting whether exponential load transient is in fact taking place and only then responding aggressively.

For this paper, we run simulation experiments using an extended version of the exponential load transient model that was used in Pub. II and III.

The experiments cover a wide set of network paths with different RTTs to validate the Predict AQM algorithm and compare it against PIE, CoDel, and SFQ-CoDel. The results show that there is room for significant peak delay improvements if the design of an AQM algorithm properly addresses exponential load transients and that the current AQM algorithms have a hard time to correctly manage exponential load transients. The additional results acquired in this paper further reinforce that the state-of-the-art AQM algorithms do not adequately solve exponential load transients and may even do significant harm to performance (RQ3).

The Predict AQM algorithm we developed is able to control the fastest and therefore the most dangerous form of load growth transients that a congestion-controlled load can experience. Therefore, we believe we may have solved one of the open questions in congestion control research related to information acquisition [147].

(27)

1.5 Structure of the Thesis 11 The author crystallized the horizon problem and RTT uncertainty, and the limitations they impose on AQM. The author developed the Predict AQM algorithm and alone extended the toolset from the previous two papers to be suitable for the measurements done in this paper. The author did the entire result analysis and is the main contributor of the paper content.

Again, the author has received valuable feedback and suggestions from the coauthor.

1.5 Structure of the Thesis

The rest of this thesis is organized as follows. Chapter 2 provides the background to flow startup first highlighting how flow startup relevance has increased due to developments in the Internet traffic. Then it explains various components related to the flow startup process and covers the flow initiation research theme explored by Pub. I. Chapter 3 focuses on Active Queue Management (AQM) during flow startup that is the second research theme in this thesis. It first introduces state of the AQM art, then explains how these AQM algorithms behave during load transients that are caused by flow startup, and finally discusses solutions that address the shortcomings in the state-of-the-art AQM algorithms. Table 1.1 summarizes the mapping between the research questions and the content of this thesis.

Chapter 4 concludes this thesis, presents the areas that need future work, and highlights how the discoveries and insights gained in this thesis may turn out useful for a future Internet congestion control architecture. The four original publications describing the research conducted in this thesis are included at the end of this thesis.

Table 1.1: Summary of the thesis structure, research themes, and questions.

Theme Research Main Original Publications Question Sections

Flow RQ1 2.2 Pub. I

Initialization RQ2 2.2 Pub. I

Active Queue RQ3 3.2, 3.3 Pub. II, Pub. III, Pub. IV Management RQ4 3.2, 3.3 Pub. II, Pub. III, Pub. IV During Flow RQ5 3.3, 3.4 Pub. II, Pub. IV

Startup RQ6 3.3, 3.4 Pub. IV

RQ7 3.3, 3.4 Pub. IV

(28)

12 1 Introduction

(29)

Chapter 2 Flow Startup and Congestion Control

The lifetime of any flow in the Internet begins with flow startup. With a large number of flows coming and going in common Internet traffic, it is important to understand how flow startup and congestion control in- teract. This chapter focuses on topics related to flow startup. First in Section 2.1 we highlight the developments in the Internet traffic that has led to a significant increase in relevance of the flow startup for good performance of flows in the Internet. In the next two sections we discuss the flow startup phases: flow initiation that relates to RQ1, RQ2, and Pub. I is discussed in Section 2.2; and the following bandwidth-probing phase is discussed in Section 2.3. Section 2.5 discusses optional mechanisms for exiting the bandwidth-probing phase before a congestion signal. In Section 2.4 we discuss congestion that often ends the bandwidth-probing phase, associated actions related to sender response, and possibly needed loss recovery.

Section 2.6 concentrates on restarting ﬂows after periods of inactivity or low activity. Finally, a summary is provided in Section 2.7.

2.1 Importance of Flow Startup due to Internet Traﬃc Developments

TCP congestion control [13] is based on a “packet conservation principle”

that is enacted when the TCP ﬂows in the network operate “in equilibrium” [102]. The main purpose of the TCP congestion control was initially to prevent “congestion collapse” that caused abysmal performance due to unnecessary retransmissions [102, 140] by operating TCP ﬂows in a manner that ensures a stable network [102]. The TCP congestion control also aims

13

(30)

14 2 Flow Startup and Congestion Control to utilize the network capacity eﬃciently but within the constraints imposed by its primary goal of avoiding the congestion collapse. Slow Start used for probing available capacity was seen only as a mandatory viola- tion of the packet conservation principle until packet conservation can be achieved [102]. This means that the major focus of the congestion control work has been on Congestion Avoidance in equilibrium and that is reﬂected in a rather large number of congestion control-related models, proofs, etc.

that are based on the “steady-state” operation. The steady-state operation implies expectation of large, long-running bulk TCP flows. In general, the progress of a long-running bulk TCP flow is at its best when congestion control does not reduce the throughput of the flow which was long used as the main or sole metric to gauge congestion control performance.

It was soon realized that congestion control needs to do more than prevent the congestion collapse. TCP congestion control aims for a condition where the network cannot hold even a single packet more. A TCP sender keeps increasing its sending rate on each RTT even in Congestion Avoid- ance to probe the network for higher available capacity as long as there is payload to transmit. Only after a dropped packet is detected, the sender backs off temporarily and then again continues trying to drive the network to the maximal level of utilization and buffer usage. TCP congestion control together with a simple drop-from-tail-when-buffer-is-full policy, known as Taildrop, leads to full or close to full buffers. To prevent problems associated with full buffers, Active Queue Management (AQM), such as Random Early Detection (RED) [80], was introduced and later recommended to be deployed by Internet routers [31]. But again, the AQM research and design focused on solutions for Congestion Avoidance and steady state.

Nevertheless, the need for lower delay was recognized as a target for AQM algorithms.

The World Wide Web (WWW) was invented in 1989 [27]. A web page is a collection of objects that are glued together and displayed by a web browser according to the instructions given in Hypertext Markup Language (HTML) [25, 190, 191]. In order to display a web page, the web browser performs a web transaction ﬁrst fetching the main object of the web page called body object from a remote web server. Based on parsing the body object, the browser decides what additional objects need to be fetched.

Those additional objects are called inline objects and may contain images, scripts, stylesheets, etc. Processing of an inline object may require further inline objects to be fetched. Once all required objects are transferred, the web transaction is complete. Some web browsers may start to display the web page while the web transaction is still in progress, however, the actual

(31)

2.1 Importance of Flow Startup due to Internet Traﬃc Developments 15 representation of the web page to the end user and issues related to it are out of scope for this thesis.

Web objects are transferred over Hypertext Transfer Protocol (HTTP).

HTTP/1.0 [26] performs fetching of each web object over a Transmission Control Protocol (TCP) [151] connection and once completed, it closes the connection. If persistent connections are enabled, it is possible to reuse a TCP connection without closing it and opening another TCP connection for the next object to be fetched. As many objects typically originate from the same remote server, HTTP/1.1 [73–75] introduces HTTP pipelining to enable re-using a TCP connection more eﬃciently for multiple objects as HTTP pipelining allows the next request to be made before the ﬁrst object is fully transferred. Also, the TCP connections with HTTP/1.1 are persistent by default.

One characteristic of web traffic is the presence of ON and OFF periods that refer to whether an HTTP transmission is currently active or not. The OFF periods are caused by (a) the user reading and thinking (inactive OFF) and (b) the web browser processing the HTTP objects received so far before the browser is able to decide what HTTP object to request next (active OFF) [1, 56]. These ON and OFF periods are also reflected in various HTTP models [21, 49, 63, 181]. A recent study from real traffic captures confirms the presence of internal OFF periods for roughly a quarter to one third of all connections [165].

In the 1990s, the web became very popular. By 1995 web traffic made up roughly one fifth of all Internet traffic [141]. Only two years after that in 1997, the share of the HTTP traffic had grown so that more than 70%

of the Internet traﬃc was HTTP traﬃc [185].

While the web traﬃc, HTTP models, and measurements mentioned above are old, HTTP and web traﬃc changed very little for many years.

The most signiﬁcant change after HTTP/1.1 was the complexity growth in the web pages increasing the sizes of the HTTP objects and the number of the objects that were required to complete web transactions [153, 155]. As the complexity grew, also the latency of a web transaction became larger.

The latency arms race between browser makers was a result. HTTP/1.1 allows up to two parallel connections to be used for a single server to better utilize the resources during a web transaction [74] but the number of allowed parallel connections was increased from two by the browser makers to reduce the latency to win against competing browsers [169]. In addition, many complex web sites were not satisfied with the off-the-shelf parallelism and resorted to a technique called domain sharding. The domain sharding artificially inflates the number of hostnames the inline objects of a web

(32)

16 2 Flow Startup and Congestion Control page appear to originate from which circumvents the per hostname limit on the number of allowed parallel connections. Therefore the number of parallel connections a browser may open during a web transaction became very large [42, 178].

Many of the HTTP objects are quite small [153, 155] and therefore the TCP flows transmitting them mostly operate in Slow Start. Many of such TCP flows may never leave Slow Start before the data of the transmitted HTTP object runs out. As web traffic forms a significant portion of Internet traffic, the short TCP flows transmitting HTTP objects play a significant role for the load at the routers. TCP senders increase sending rate exponentially per RTT in Slow Start producing exponential load transients. At this point, the situation is a network that is designed for operation in Con- gestion Avoidance, whereas the transfers for HTTP objects mostly operate in the much more aggressive Slow Start probing for the sending rate they should use. As the load growth during an exponential load transient is very fast, the TCP flows using Slow Start can congest links just in a few RTTs, which is much faster than the AQM algorithms expect by design.

Short TCP ﬂows are often regarded as “mice”, in contrast to “ele- phants” that are much larger. Unfortunately, the “mouse” notion can easily convey a false message that such ﬂows are not important from the congestion control point of view. However, a web transaction is not a single

“mouse” but an “army of mice” that together may quickly eat up all the resources because the flows of the web transaction often start up and increase sending rate simultaneously. Together the footprint of the short web traffic flow “mice” is significant and is comparable to a longer TCP flow in Slow Start. Often HTTP pipelining even naturally combines the HTTP objects to a longer “train of mice” that is transmitted over a single TCP flow. A measurement study based on real traffic captures confirms that it is reasonable to assume that for flows originating from a single endpoint, a small number of flows at each time tend to proceed past TCP Initial Window to form bandwidth probing aggregates [7].

While congestion can occur almost anywhere in the network, some links are more likely to get congested than the others. One such case is the link with the least bandwidth along an end-to-end path. The link with the least bandwidth is commonly close to the network edge, that is, the access link or a link in the access network is likely to have smaller bandwidth than the links in the network core [67]. The links close to the network edge have only a limited level of traﬃc aggregation serving only one to a few end users. Typically, the access links also have low overall utilization [134, 166, 177]. Therefore, the impact of a single user or a single web transaction on

(33)

2.1 Importance of Flow Startup due to Internet Traffic Developments 17 the load of the link is large. As OFF periods natural to the web traffic are frequent, the new connections often start their Slow Start when the link at the network edge is idle [34] or has low load. An exponential load transient then rapidly increases the load but after a relatively short ON period, the link becomes idle again and the process soon repeats itself once the end user initiates the next web transaction. A study on access link performance [176, 177] confirms that (a) web traffic is likely a significant contributor to access link saturation, that (b) volume-wise the saturated periods are “almost negligible” which hints at the presence of a significant amount of OFF periods, and that (c) the link utilization over longer periods of time is very low for most of the clients. The first observation is true even though the methodology used in the analysis filters a significant portion of HTTP traffic away by limiting the analysis only to flows with at least 190kB of payload. In addition, another measurement confirms that HTTP is responsible for 91% of the data volume when the capacity is being highly utilized [166]. The low overall link utilization for most of the broadband links is likewise confirmed by a recent study [134].

On routers in the Internet core with a high level of traffic aggregation, exponential load transients are less of a problem as the effect of a single user or an end host on the load is miniscule. The network topology towards the Internet core may dictate that no end host has enough access-link capacity to saturate an outgoing link of such a router. However, as long as there is any single source with enough capacity to cause significant load swings, the work in this thesis may have some usefulness also on large routers although such routers are not the main target for this work.

While web traffic keeps increasing, other traffic types have surpassed it by volume. First, peer-to-peer (P2P) traffic with long-running downloads and later video streaming became volume-wise larger than web traffic [52, 154, 198]. Nowadays, streaming is even more important [53]. Some of the streaming traffic is interactive and therefore sensitive to latency. With interactive game traffic, likewise, low latency often improves user experience.

These interactive traffic types may not be the most dominant in the overall traffic volumes and are often not the cause for high load. Despite their innocence, they commonly still experience very noticeable effects during high-load situations negatively affecting user experience.

Today HTTP is no longer limited to its initial use for web traﬃc but HTTP use has expanded to many things besides the web. Most of the non- interactive streaming today uses HTTP adaptive streaming (HAS) [182]

(also known as dynamic adaptive streaming over HTTP (DASH)) to deliver content to the end-users (services such as YouTube, Netﬂix, Spotify, etc.).

(34)

18 2 Flow Startup and Congestion Control As HTTP adaptive streaming runs on top of TCP, TCP still is the most used protocol in the Internet. A HAS stream is split into chunks and each chunk is encoded with the multiple advertized rates [17, 125]. A client decides which of the advertized rates it uses in fetching each chunk. The fetched chunks are stored into a playback buffer. Typically the HAS client keeps making chunk requests in advance until the playback buffer has a good level of occupancy. Then it needs to wait until some of the playback buffer is consumed before resuming the advance chunk fetching. Thus, despite the playback itself being continuous, the HAS client has an internal tendency to produce ON-OFF periods. Therefore, it is hardly surprising that Slow Start is shown to occur often with HAS [17].

In order to reduce latency, new proposals on how to transfer web objects have been made, which also address the use of excessive number of parallel connections. First, SPDY [22, 149, 179] that uses a single connection to transmit many HTTP objects by multiplexing them. Then, the follow up for SPDY work in the context of HTTP/2 [23] that supports many of the SPDY features, including the multiplexing of HTTP objects into a single TCP connection. Further improvements are made in QUIC [29, 101] that removes the TCP protocol-level head-of-line blocking issue. The head-of- line blocking occurs with SPDY and HTTP/2 when a TCP packet is lost and the TCP receiver has to wait until a retransmitted copy of the packet arrives because TCP supports only in-order delivery of the data. With the head-of-line blocking issue resolved, there is even less need for using parallel connections to quickly deliver HTTP objects across the network.

The QUIC congestion control [100] is heavily based on the existing TCP congestion control algorithms. The exponential load transients, however, are not removed by any of the techniques as probing for available capacity is needed independent of the number of ﬂows. QUIC may eventually become

“the next TCP” that is used to transport the majority of the Internet content.

Besides latency in transmitting the actual HTTP payload, the user- visible latency includes the latency caused by the TCP three-way handshake delay. TCP Fast Open (TFO) [47] optimizes the TCP handshake by allowing subsequent connections to the same destination to embed data already to the TCP SYN packet. QUIC [101, 124] as an UDP-based protocol goes even further removing the entire three-way handshake for connections to the same destination and also combines TLS handshake to further reduce connection setup latency for encrypted connections. These handshake latency-related optimizations, however, are out of scope for this thesis as

(35)

2.2 Flow Initiation 19 they are quite unimportant from the congestion control point of view we focus on.

2.2 Flow Initiation

At the beginning of a TCP connection after the three-way handshake, a TCP sender starts the data transmission by sending up to the number of segments allowed by the Initial Window (IW) [12]. With the typical maximum transmission unit (MTU) of 1500 bytes, up to three segments are sent in the Initial Window. The segments in the Initial Window are used to boot up the ACK clock, a constant stream of ACK packets ﬂowing in the reverse direction, that is used for sending packets after the Initial Window. The Initial Window is “uncontrolled” traﬃc in the sense that it is sent without knowing that packets are leaving the network.

Relatively recently, a larger Initial Window has been proposed for experimental use to reduce latency [50]. The larger Initial Window reduces the number of RTTs needed to complete short transfers that are typical, for example, with web objects. The larger Initial Window allows up to 10 segments to be sent without verifying that there is capacity available for them. Figure 2.1 shows the flow startup with the Initial Window of three and ten segments. A study with web search traffic indicates that the Initial Window of ten segments instead of three improves average latency by roughly 10% [68]. In addition, the latency improvement increases with higher quantiles. A measurement study indicates that one fourth to one fifth of the TCP connections become more aggressive because of the Initial Window larger than three segments [7].

Combining the larger Initial Window with parallel flows starting up at the same point of time can result in a large number of segments injected into the network. Such a large uncontrolled burst of packets may be harmful to other flows in the network. When aggregated effect is considered, measurements in the residential access network indicate that for 5%-12%

of the cases¹ there is amplified effect due to combined larger Initial Win- dows from multiple flows [7]. Pub. I focuses on measuring the effect of the larger Initial Window and the combined effects of the parallel flows and larger Initial Window to competing latency-sensitive traffic over a high- speed cellular network.

Pub. I shows that the impact of sending the Initial Window is very signiﬁcant for the delay experienced by the packets of the competing traﬃc.

1Aggregate grouping in [7] is worst-case analysis with one second time window to form an aggregated group and therefore likely includes some ﬂows that do not group for real.

(36)

20 2 Flow Startup and Congestion Control

(a) IW3 (b) IW10

Figure 2.1: TCP Initial Window (IW).

With one or two ﬂows, the quality impact is only small but issues become more severe as the number of ﬂows starting at the same time increases.

With the proposed larger Initial Window of ten segments, even the impact of a single flow exceeds the impact of six parallel flows using the Initial Window of three segments. The measurements made in Pub. I also show that the bad quality the large number of parallel flows or the proposed larger Initial Window cause is not limited to the Initial Window RTT but prolongs over the duration of the TCP Slow Start bandwidth-probing phase.

In order to alleviate the problems caused by excessive bursts due to the larger Initial Window, initial spreading has been proposed [163, 164].

Initial spreading separates the Initial Window packets to individual, smaller transients over the expected RTT that was measured during the three-way handshake. A similar proposal but with an additional removal of the Initial Window altogether has also been made in [9]. Long before the larger Initial Window proposal, JumpStart [127] was proposed eﬀectively removing the concept of Initial Window completely but without any spreading.

The removal of the Initial Window is very harmful to other ﬂows in the light of Pub. I as the number of uncontrolled packets allowed to be sent will be much larger. We believe that even spreading the packets over the initial RTT is unable to solve the problems caused by an excessive initial sending rate. Whenever the sender violates the packet conservation principle, there

(37)

2.3 Bandwidth-Probing Phase 21 is a possibility to do significant harm to competing traffic. As the Initial Window sender cannot even know about the harm, not to mention react to it, before the harm has already happened, the only way to avoid harming other traffic is to retain sending huge uncontrolled initial bursts. If a huge Initial Window is used, the only hope is that statistical multiplexing is lucky enough to cover for it, that is, that there is no other traffic present which could be harmed. It is somewhat unfortunate that the sender is likely to be lucky very often because of low overall utilization near the network edge. Thus, measuring or extracting the harm caused from a large set of real world data is challenging, and it is too easy and tempting to falsely conclude that no harm is caused by the larger Initial Window. The use of Slow Start for bandwidth probing with a small Initial Window is a good way to limit the extent of the harm caused by the Initial Window as it gives every sender time to respond. A claim has been made that it is in the best interest of the sender to limit the Initial Window to avoid issues because of local resource exhaustion [9] but it is likely not enough as the competing traffic may not originate from the same source. In such case, the resources exhausted may not be local but near the network edge at the other end of the connection. Therefore, the harm caused to the competing traffic may not be measurable at all by the sender using an excessive sending rate. In the worst case, the sender might not care about the competing traffic but is willing to harm other traffic to get “better service” for its own traffic and might see the opportunity to monopolize the remote resources as a competitive advantage.

An alternative to uncontrolled sending at the beginning of data transmission is given by an old solution known as Quick-Start [78, 167, 173]. A TCP sender using Quick-Start acquires information about available capacity from the routers on the end-to-end path using IP and TCP options. In practice, however, Quick-Start requires support from every router on the end-to-end path and is therefore extremely challenging to deploy.

Various alternatives to the Initial Window of three segments are compared in [170–172].

2.3 Bandwidth-Probing Phase

Bandwidth probing is a congestion control phase at the early round-trips of the connection. The purpose of probing is to determine the capacity of the end-to-end path that is usually unknown to the end host. Figure 2.2 shows the taxonomy of possible bandwidth probing approaches.

(38)

22 2 Flow Startup and Congestion Control

Bandwidth probing approaches

Self-clocked bandwidth probing Timer-based approaches

Paced bandwidth probing Bandwidth estimation with quick sending rate ramp-up

Figure 2.2: Taxonomy of bandwidth-probing approaches.

Self-Clocked Bandwidth Probing

A sender using self-clocked bandwidth probing relies on incoming acknowledgements (ACKs) for timing the outgoing packets. In addition to probing for the available capacity, the bandwidth-probing phase is used by the sender to boot up an ACK clock that is a constant stream of ACKs ﬂowing in the reverse direction [102]. The ACK clock carries credits to the sender from which the sender can infer how many packets have left the network.

The credits tell the sender how many new packets can be injected into to the network “safely”, that is, without increasing the load in the network.

In addition to the “safe” packets, the sender needs to inject more packets than what has left the network in order to probe for a higher sending rate than is currently being used. The sender terminates the bandwidth-probing phase once a congestion signal is detected or a high enough sending rate is attained.

In TCP the self-clocked bandwidth probing process is performed by Slow Start [13, 102]. Figure 2.3a shows the Slow-Start process. The left- hand side is the TCP sender and the right-hand side is the TCP receiver.

First, the sender sends three segments as per the standard Initial Window.

After the Initial Window, the sender waits for ACKs to arrive. In Slow Start, the sender sends an additional packet on each arrival of ACK besides the packets that according to the credits have left the network. The additional packet probes for more available capacity. The number of packets that can be sent is tracked by the sender using a congestion window (cwnd) variable [13]. The Slow-Start process results in doubling the congestion window and thus the sending rate per each RTT, in other words, the sending rate and the load imposed by the ﬂow is increasing exponentially.

The exponential increase in the sending rate results in exponential load transients also at the network routers. With the self-clocked bandwidth

(39)

2.3 Bandwidth-Probing Phase 23 probing, the sender has to send packets in bursts. The burst is followed by an idle period as long as the sending rate has not yet used up all the available network capacity.

(a) Slow Start (b) Slow Start with De- layed ACK

(c) ABC corrected Slow Start with Delayed ACK

Figure 2.3: TCP Slow-Start variants.

The Slow-Start process allows up to doubling the sending rate per RTT.

However, the sending rate growth may be less because of Delayed ACK [54]

that was implemented decades ago to roughly halve the number of ACK packets. A smaller rate for ACKs was necessary to reduce the rate of interrupts from the network adapter and the associated cost of processing ACKs. Slow Start with Delayed ACK is shown in Figure 2.3b. When a TCP receiver is using Delayed ACK, it sends one ACK after receiving two segments instead of after every segment. If no second segment is received, the receiver waits instead until a Delayed ACK timer expires to send out the pending ACK for the single segment. Because of Delayed ACKs, instead of doubling, as low as 1.5 factor growth is possible with a typical Slow Start but the actual factor also depends on how RTT and delayed ACK timers are sized with respect to the other [14]. If the receiver does not implement

(40)

24 2 Flow Startup and Congestion Control or use Delayed ACKs ², the growth factor in Slow Start is exactly two.

Also the sender TCP implementation can correct the smaller congestion window increase due to Delayed ACKs during Congestion Avoidance using Appropriate Byte Counting (ABC) [10, 13]. Use of ABC during Slow Start is allowed for experimental use [10], however, standard implementations are recommended to not increase the congestion window by more than one segment per ACK [13]. In practice, some TCP implementations have chosen to use ABC with the larger congestion window increase also during Slow Start (e.g., the Linux TCP implementation includes an ABC-based mechanism that is adapted to the packet-based TCP implementation [48]).

Slow Start with Delayed ACKs and ABC is shown in Figure 2.3c. ABC restores the growth factor to two while the TCP sender is in Slow Start.

For simplicity, we assume from this point onward a factor of two growth rate in this thesis unless otherwise stated.

During self-clocked bandwidth probing, the ACKs tend to arrive roughly at the rate of the narrowest link on the end-to-end path called bottleneck link because the data segments are spaced out by the bottleneck link [102]

(see Figure 2.3). This implies that the sending rate of the additional packets sent during the bandwidth-probing phase exceed the rate of the bottleneck link. As the packets come in faster than can be transmitted to the bottleneck link, a queue forms at the router in front of the bottleneck link.

Figure 2.4 shows an example for TCP Slow Start-induced transient queue spikes occurring at the router in front the bottleneck link. If the link is not yet saturated by the current load, the queue is only transient and the queue will be drained to the outgoing link once the short burst of packets ends.

These transient queue spikes grow in amplitude as the number of packets grow exponentially, and on each RTT, half of the packets are sent with a higher rate than the bottleneck link can immediately forward. Eventually the load becomes large enough that the queue can no longer drain before the next burst of packets comes in. The bottleneck link is saturated and a standing queue with rapidly increasing length is formed. Assuming inﬁnite ﬂow lengths, the queue grows until congestion is signalled by the router which results in the sender discontinuing the bandwidth-probing phase.

Alternatively, the bandwidth probing may end without a congestion signal from the network. If the congestion window (cwnd) reaches the Slow-Start threshold (ssthresh) [13], the sender continues in Congestion Avoidance but typically TCP stacks set ssthresh initially to a very large value. In addition, the receiver advertized window (rwnd) may impose a

2E.g., Linux TCP receiver implements heuristics called quick ACKs to only use De- layed ACKs after Slow Start (DAASS) [8, 11].

Congestion Control and Active Queue Management During Flow Startup