A Performance Analysis Model of TCP over Multiple Heterogeneous Paths for 5G Mobile Services

Driven by the primary requirement of emerging 5G mobile services, the demand for concurrent multipath transfer (CMT) is still prominent. Yet, multipath transport protocols are not widely adopted and TCP-based CMT schemes will still be in dominant position in 5G. However, the performance of TCP flow transferred over multiple heterogeneous paths is prone to the link quality asymmetry, the extent of which was revealed to be significant by our field investigation. In this paper, we present a performance analysis model for TCP over multiple heterogeneous paths in 5G scenarios, where both bandwidth and delay asymmetry are taken into consideration. The evaluation adopting parameters from field investigation shows that the proposed model can achieve high accuracy in practical environments. Some interesting inferences can be drawn from the proposed model, such as the dominant factor that affect the performance of TCP over heterogeneous networks, and the criteria of determining the appropriate number of links to be used under different circumstances of path heterogeneity. Thus, the proposed model can provide a guidance to the design of TCP-based CMT solutions for 5G mobile services.


Introduction
For emerging and promising 5G mobile services, despite their diverse application scenarios, it is widely agreed that they share a common primary requirement: either high data rate or high reliability.To meet such requirement, evolving wireless techniques and novel network infrastructures for 5G are no doubt necessary.However, we believe that the existing Concurrent Multipath Transfer (CMT) technology could also contribute to the fulfillment of needs of 5G mobile services since it can not only improve communication throughput, but also provide communication reliability.CMT in 5G scenarios will pool multiple heterogeneous wireless resources by employing a variety of Ratio Access Technologies (RATs) concurrently.Thus, the bandwidth of every RAT will be aggregated, achieving higher throughput.Also, thanks to diversity gain of heterogeneous RATs, the communication reliability can be improved.Meanwhile, it is potentially more viable to adopt CMT for mobile services in 5G since 5G is envisioned to consist of various types of RATs (such as millimeter wave communication, LTE-A and Wi-Fi), while more and more mobile devices have been equipped with multiple wireless interfaces [1].
Multipath techniques that can achieve CMT are still in development, while TCP-based CMT solutions will be in the dominant position.There are many reasons why multipath is not widely used.First, they cannot be widely applied to a variety of network environments.For example, the performance of MPTCP [2], the most popular multipath protocol working at a transport layer, will be severely degraded in some cases [3,4].Second, the vast majority of operating systems, such as Windows, Linux, and MacOS, do not support multipath protocols well.Since most mobile services will still use TCP for now and for the foreseeable future, feasible CMT solutions for 5G services will be based on TCP.These solutions can be viewed as a middleware between the transport layer and network layer, which is transparent to the existing operating systems.Also, the interoperability between existing TCP based network infrastructure will not be compromised.
However, the performance of TCP flow transferred over multiple heterogeneous wireless networks would be adversely affected by path heterogeneity.This will be a critical feature of the highly integrative 5G system.Briefly, such performance degradation is due to the packet reordering issue [5] caused by the different link quality of employed heterogeneous wireless networks.The inherent re-sequencing mechanism of TCP can correct the problem when the packet reordering is no more than two positions [6].However, the throughput may drop drastically due to the reduction of the TCP transmission window caused by more serious packet reordering [1].Some contributions were proposed to solve the problem.Earliest Delivery Path First (EDPF) [7] schedules packets over different links based on their estimated delivery time.DAPS [8] distributes packets over different links depending on the ratio     ⁄ and .Yet, without the thorough understanding of TCP performance in the given situation, these contributions only provide limited improvement.
If we can analyze how heterogeneous networks affect the performance of TCP flow concurrently transferred over them, more efficient and elegant CMT schemes for 5G mobile services can be developed based on TCP and TCP-like congestion control protocols.Such TCP-based CMT schemes would be more deployable in 5G heterogeneous wireless networks since they are compatible with the current Internet infrastructure.
In this paper, a performance analysis model for TCP over multiple heterogeneous wireless networks is presented.To the best of our knowledge, no similar model has been reported in the literature.The proposed model can provide guidance to the design of novel CMT solutions for 5G mobile services.The main contributions of this paper are as follows: (1) We have taken field investigation on present heterogeneous wireless networks to reveal the severe extent of link quality asymmetry in terms of delay and bandwidth.This proves that the impact of network heterogeneity in future 5G is anything but empty talk.
(2) A performance analysis model is derived based on the careful analysis of segments transmission and acknowledgement response over multiple heterogeneous paths.Both bandwidth asymmetry and delay asymmetry are taken into consideration in the proposed model.
(3) High analytical accuracy is achieved by comparison to the simulation using parameters from field investigation.It proves that our model can be applied in practical environments.Simulation of TCP over multiple heterogeneous paths is created in NS3, and the predicted throughput using the proposed model can fit the simulation results with high accuracy.
(4) Some interesting inferences are drawn from the proposed model.First, compared to bandwidth asymmetry, delay asymmetry between multiple links is the dominant factor that affects the performance of TCP over heterogeneous paths.Second, the criteria of determining the appropriate number of links to be employed to optimize the TCP multipath performance is discussed.
The remainder of this paper is organized as follows.Some related work is introduced in Section 2. Section 3 details the issue of link quality asymmetry based on the results of field investigation.In Section 4, the performance analysis model for TCP over heterogeneous paths are derived.The accuracy of the proposed mode is shown in Section 5.In Section 6 we investigate the effect of path heterogeneity based on the proposed model.Section 7 concludes the paper.

Related Work
To meet the requirement for high data rate and reliability, some contributions were proposed to try to achieve stable and high-quality communication based on multipath transmission.SCTP [9,10] and its extensions [11,12] try to aggregate the bandwidth of multiple paths.MPTCP [13], a multipath extension to TCP, has also been standardized to transmit data over multiple paths simultaneously to improve reliability and throughput.IETF Multiple Interfaces (MIF) working group is developing the standards [14] for nodes with multiple interfaces.Besides these papers, there are some other works (e.g., [15][16][17][18]) studied security related networking issues, especially the key management topics [19,20].
Recently, the cellular-based solutions are generating more interest with the rapid development of 5G heterogeneous networks.For example, femocells-based schemes [21,22] were proposed to support seamless mobility and maximize the network recourse utilization using multiple interfaces.
However, apart from the practical deployment challenges, such as the existence of various types of middle boxes [3], the main difficulty is that the performance of multipath solutions may decrease significantly under the circumstances of path heterogeneity, especially when there are some bottleneck paths [4,[23][24][25].
Packet reordering is considered the dominant challenge for multipath transmission because it leads to an undesirable reduction in throughput [1].RFC5236 [26] introduces a metric named reorder density to show how far packets are displaced from their original position.Therefore, an efficient multipath solution must reduce the impact of packet reordering to alleviate its effects.
Multipath forwarding is the main reason of packet out-of-order [27].Different technologies and different paths can lead to significant differences in delay and bandwidth.When packets are forwarded over paths with different characteristics, they are likely to arrive at the receiver out of order.
Some state-of-art [28,29] has measured the characteristics of heterogeneous paths in terms of delays.However, their main purpose is to analyze the performance of different scheduling algorithms in heterogeneous networks, rather than theoretically analyze the relationship between path diversity and TCP performance.
The research of TCP performance analysis, especially in terms of throughput, is still making progress, as TCP is one of most widely deployed transport protocols in today's Internet.The research can be categorized into two kinds: one aims at improving the accuracy of prior model by novel methods [30][31][32], the other focuses on the performance of TCP applied in emerging scenarios [33,34].However, the proposed models in these papers only analyze the situation where single path is used for transmitting TCP segments.
Overall, to the best of our knowledge, no one has given a performance analysis model to analyze TCP performance over multiple paths with different link quality in heterogeneous networks, although there are many schemes [35] working at different protocol layers that are proposed to try to improve the performance over multiple paths.We believe that this model can help us design more practical multipath schemes in the future wireless networks.

Problem Description and Network Model
Network heterogeneity will become a concrete issue in 5G with the popularity of multi-access devices and deployment of emerging heterogeneous RATs.Multi-access devices that can connect to more than one wireless networks are gaining bigger market share, such as smart phones supporting dual-SIM dual stand-by mode.These devices can concurrently use up to three interfaces, including Wi-Fi, for data transmission.For such a device, the connected multiple wireless networks may share heterogeneous access technologies (e.g., WLAN vs. cellular network), heterogeneous standards (e.g., FDD-LTE vs. TD-LTE) or heterogeneous service providers.Even if two interfaces are connected to an identical wireless network, the wireless signals are likely to experience heterogeneous pass loss due to small scale fading.Considering that in 5G more heterogeneous RATs will be deployed and utilized by multi-access devices, the network heterogeneity issue will become more severe than in previous four generations.
Network heterogeneity of multi-access devices is intuitively revealed by the difference in network link quality.For two heterogeneous wireless networks, their network link quality is normally different from each other, to which we refer as network link quality asymmetry.Generally, Data Rate (DR) and Round-Trip Time (RTT) are used to describe the network link quality, for DR reveals the capacity of a network link, while RTT directly reflects the transmission delay.Accordingly, the network link quality asymmetry can be indicated by DR asymmetry and RTT asymmetry.
Intuitively, the performance of TCP transmission would be prone to network link quality asymmetry, if multiple heterogeneous wireless networks are concurrently employed for delivering segments, they will consequently degrade the performance of TCP-based CMT in 5G.This is because the transmitted segments would suffer different transmission delays due to the dissimilar network link quality of employed wireless networks.This results in segments reaching the receiver out-oforder.This segment reordering issue is widely regarded as the major challenge that undermines the performance of concurrent multipath transmission, as it causes unnecessary retransmission, prevents the congestion window from growing and disrupt ACK-clocking.The higher network link quality asymmetry becomes, the more negative impact it has on TCP performance.The analytical discussion of relationship between the performance of TCP over multiple wireless networks and the link quality asymmetry will be detailed in section IV.
To investigate the extent of network link quality asymmetry in real-world situation, we have taken a filed measurement on a group of heterogeneous wireless networks and found that their link quality deviated significantly from each other.The measurement was carried out in a test train running on a newly constructed high-speed railway before its service, where few passengers were on board, to eliminate the interference from other wireless devices.Inside the test train, a dedicated box PC with our proprietary measuring program was deployed to automatically measure and store the download DR and RTT of a certain wireless network.Incorporating different kinds of wireless modems, this device can simultaneously access multiple heterogeneous mobile networks.In the measurement, up to eight modems were adopted, including three FDD-LTE modems of China Telecom (CT), three FDD-LTE modems of China Unicom (CU) and two TD-LTE modems of China Mobile (CM).After the measurement, a group of RTT dataset and two download DR values (average and maximum) were collected on each modem.
The statistics from the measurement result is shown in Figure 1.Regarding RTT, a boxplot diagram is depicted based on collected dataset of each modem.The rectangle in a boxplot diagram represents the interquartile ranges (IQR) of the variation, while the segment inside the rectangle represents the median.By visually comparing the two boxplot diagrams, statistical inference can be made about the difference of two dataset.If the median of one dataset does not overlap the IQR of the other dataset, it can be inferred that difference exists between two datasets.Further, if two IQRs don't overlap, the difference is significant.Applying this criterion to Figure 1, we can infer that the RTT of CT1, CU3, CM1 and CM2 are significantly higher than those of CT2, CU1, CU2.Meanwhile, the RTT of CT1, CU3 and CM1 are different from the others.These conclusions can reveal the dispersion of RTT among eight modems.shows the maximum and average download data rate, both of which can reveal the significant difference in link quality of heterogeneous wireless networks.
Regarding download DR, the average and maximum values are shown using bar graphs.For maximum download DR, the ratio between the highest (CT1) and the lowest (CU3) is 8.2.As for average download DR, this ratio is even more pronounced, reaching 15.1.This means that notable deviation exists in download DR among different modems.
To sum up, the field measurement results allow to conclude that the network link quality asymmetry in real-world situation is truly significant.Besides, it is revealed that the link quality asymmetry not only exists between two heterogeneous networks, but also between two modems using access technology operated by same telecommunication company.According to above conclusions, we can infer that the network heterogeneity in future 5G will be more severe and become a concrete threat, since the wireless networks in 5G will become more diverse than nowadays with the deployment of emerging RATs.
As we have demonstrated, the network heterogeneity will affect the performance of TCP-based CMT solutions for 5G mobile services.Thus, it is very essential to create a quantitative performance analysis model regarding the relationship between the link quality asymmetry and TCP multipath performance.To build such a performance analysis model, we first present the network model of TCP flow transferred over multiple heterogeneous links, as shown in Figure 2. In this network model, the segments of single TCP connection are concurrently distributed over multiple paths between two endpoints.We use  = { 1 ,  2 , … ,   } to denote the set of  available heterogeneous links,  = { 1 ,  2 , … ,   } to denote the set of round-trip propagation delay, and  = { 1 ,  2 , … ,   } to denote the set of bandwidth.The bandwidth and round-trip propagation delay of link  is   and   To simplify the analysis, we assume that the propagation delay from the receiver to sender is zero.Round Robin (RR) is used to dispatch packets in the given network mode, which let multiple paths take turns in transferring data packets in a periodically repeated order.We choose NewReno [36] as the congestion control algorithm since it is still the widely deployed version of TCP.

Performance Analysis Model
In this section, the performance analysis model of TCP over multiple heterogeneous paths is built by analyzing the average throughput.We divide the TCP flow into consecutive transmission round.The duration time as well as the number of segments transmitted at each round are first analyzed.Then, the average throughput is derived using an iteration model.At last, the effect of link quality asymmetry on average throughput is discussed.Table 1 summarizes important parameters used in this paper.

Analysis of i-th Transmission Round
First, we focus on the transmission of segments at sender side.Let i denote the number of transmission round from the beginning of the transmission.At i-th round, sender transmits a certain number of unsent segments and waits for the acknowledgements.Since in most TCP implementations (such as NS3) only non-duplicate ACK triggers the transmission of previously unsent data.We can conduct that the i-th round begins with the arrival of i-th non-duplicate ACK.

B
The set of bandwidth of available link s The size of a segment mACK Receiver reply an ACK after receiving mACK consecutive segments SGMi,j The j-th segment that sender transmits at i-th round wi The congestion window of i-th round

𝒊
The increment of congestion window at i-th round

Ai
The number of segments acknowledged by i-th non-duplicate ACK

Ci
The number of segments that can be transmitted at i-th round Ti The time between the i-th round and (i+1)-th round ηi,j The number of the link used to transmit the j-th segment of Ci at i-th round Di,j The propagation delay and queuing delay of j-th segment of Ci at i-th round

Is
The number of rounds that the slow start phase ends

Ws
The slow start threshold of congestion window WI The initial value of congestion window Let   denote the total number of segments transmitted at -th round.  equals the free space in the congestion window, which is composed of two parts: the increment in size of congestion window and the decrement in number of outstanding segments.We define   as the size of the congestion window of -th round, and    =   −  −1 as the increment of the congestion window.
Let   denote the number of segments newly acknowledged by -th non-duplicate ACK, then   can be expressed as: The -th segment of   is defined as SGM i,j .Let  , denote the number of the link used to send the SGM i,j , where   , ∈  and  , ∈ {1, 2, … , }.Supposing segments are scheduled over n links in a round-robin manner, and the first one travels over link  1 .Hence  , can expressed as: ( The round-trip propagation delay as well as the bandwidth of link   , are   , and   , respectively.Let  , be the time elapsed between the beginning of -th round and when SGM i,j reaches the receiver, which is the sum of queuing delay and propagation delay experienced by SGM i,j .Thus, In (3), ⌊/⌋ is the quotient of j and n, while  is the average size of segments.The queuing delay is represented by [(⌊   ⌋ + 1) ] /  , , while   , represents the propagation delay.
Then we discuss the arrival of segments and the response of ACKs at receiver side.We define   as the latency between the beginning of -th round and the time when sender receives the first non-duplicate ACK that starts the ( + 1)-th round from receiver.The number of segments the first non-duplicate acknowledges is exactly  +1 .A non-duplicate ACK will be fired by receiver only if: 1) an expected number of consecutive segments are received, 2) the first out-of-order segments arrives after some consecutive segments or 3) a segment that fills the gap in the receiver's buffer arrives.The satisfaction of these criteria highly associates with the arrival order of the first segment transmitted at i-th round, which is SGM i,1 .Hence, based on whether SGM i,1 is the first to reach the receiver, we respectively calculate   and  +1 .4.1.1.Case I: SGM i,1 is the first to reach the receiver We define () as the probability that SGM i,1 arrives at the receiver first, which can be presented as: The segments are scheduled over the links in a round-robin manner, thus  , follows a uniform distribution after a large amount of transmission rounds.Hence () approximately equals 1/  .
Most TCP implementations (such as NS3) utilize a counter to delay replying cumulative ACK.Let   denote this counter, after receiving   consecutive segments receiver will reply an ACK.In this case, since the receiver receives SGM i,1 first, it will wait for the following   − 1 segments before replying an ACK until the arrival of first out-of-order segment, as shown in Figure 3. Let  be the number of consecutive segments received before the arrival of first out-of-order segments.In other words, SGM i,2 to SGM i,m arrive consecutive and SGM i,m+1 is out of order.Thus, receiver will reply the first non-duplicate ACK acknowledging  segments approximately after the arrival of SGM i,m .If  is smaller than   ,we have   ≅  , and  +1 = , where  , is the time between the beginning of  -th round and arrival of SGM i,m .The probability ( <   | ) can be calculated as: If m is equal to or larger than   ,   ≅  ,  and  +1 =   .The corresponding probability ( ≥   | ) can be calculated as: Let  ′ (  ) and  ′ ( +1 ) denote the expected value of   and  +1 under the condition that SGM i,1 is the first to reach the receiver.Based on the probabilities calculated in ( 5) and ( 6), and the corresponding   and  +1 ,  ′ (  ) and  ′ ( +1 ) can be derived as: ...
4.1.2.Case II: SGM i,1 is not the first to reach the receiver ( ̅ ) is defined as the probability of the SGM i,1 , where it is not the first to reach the receiver, which approximately equals (1 − 1/  ).In this case, the receiver will not reply any non-duplicate ACK before the arrival of SGM i,1 .Moreover, since the segments transmitted later than when SGM i,1 arrives at the receiver earlier than itself, there must be gaps in the receiver's buffer before the arrival of SGM i,1 .As shown in Figure 4, the receiver will immediately reply a non-duplicate ACK after receiving SGM i,1 , since the SGM i,1 will fill part of the existing gap.Hence,   equals  ,1 .
From ( 8) and (11) we can find that the expected value of  +1 depends on   , which means  +1 is a function of  i .For simplicity, we define  +1 = (  ).Since  +1 equals the sum of  +1 and   +1 , the increment of the congestion window needs to be discussed.
In the slow start phase, the congestion window is incremented by one segment for each ACK, thus    equals 1.Let   denote the slow start threshold of congestion window, and   the initial size of congestion window.Let   be the number of rounds that the slow start phase ends.Since the congestion window is increased by one every round, thus: In the congestion avoidance phase, the congestion is increased by 1/ on every incoming ACK that acknowledges new data.Thus, we have    = 1/ −1 .The congestion window at ( + 1)-th round can be expressed as: Based on the above analysis, the relationship between  +1 and   can be derived as (15), where function (•) is defined in (13):

Iteration for Average Throughput
According to (16), the number of segments transmitted at next round can be derived based on that at current round.Thus, the total segments transmitted from the beginning to current round of transmission can be calculated by iteration from the first round.The total time spent on transmitting can also be obtained by summing up the duration time of each transmission round.Consequently, the average throughput can be derived.
Therefore, we formulate the performing process of the model iteration as follows: Step 1: Supposing  bytes of data are expected to be received by the receiver.At the first round of transmission,  1 segments are sent within ( 1 ) seconds, where  1 equals the initial size of the congestion window, which is   .( 1 ) can be calculated according to (12).
Step 4: Compute total transmitted bytes from beginning to -th round, which is: Step 5: Let  ̂ denote total transmission time from beginning to  -th round, which can be computed as: Step 6: If total transmitted bytes is smaller than , which is the number of bytes expected by the receiver, repeat Step 2-5.Otherwise, the iteration stops, and the average throughput can be calculated as:
As mentioned earlier, when scheduled in round-robin manner, the possibility of selecting one of  available links to transmit a certain segment follows a uniform distribution after a large amount of transmission rounds.Thus,   ,1 and   , can represent any two links of set { 1 ,  2 , … ,   }.Note that   ,1 is not the first link of  available links, but the link used to transmit SGM i,1 .Equally, (  , −   ,1 ) and (  ,1 −   , ) can be the delay difference and bandwidth difference between any two links.
From this point of view, (  , −   ,1 ) and (  ,1 −   , ) reflect the extent of deviation in link quality of all links.We refer to such delay difference and bandwidth difference between any two links as delay asymmetry and bandwidth asymmetry.Therefore, it can be concluded that the average throughput is subject to delay asymmetry and bandwidth asymmetry.The more significant these two parameters become, the lower average throughput will be.
To quantify delay asymmetry, we introduce Average Delay Asymmetry, which is defined as the average absolute delay difference between any two links of  available links.Average Delay Asymmetry can be calculated as: ,   ,   ∈ .
Average Delay Asymmetry and Average Bandwidth Asymmetry can both affect the performance of TCP transferred over multiple heterogeneous links.Comparison of extent of these two parameters on TCP performance will be presented in section VI.

Simulation Study
The proposed model in section IV is evaluated by comparing its prediction with the results of simulation.To verify that our model can be used in practical environments, the parameters in both model prediction and simulation are taken from the datasets collected in the field measurement discussed in section III.The simulation of TCP over multiple heterogeneous links is implemented in Network Simulator 3 (NS3) [37].

Simulation Implementation
Figure 5 depicts the simulation topology.Two endpoints are connected by multiple Point-to-Point Protocol (PPP) links.At each point, apart from two PPP network adapters, a virtual network device (VND) that works at the network layer was added.An IP address is assigned to VND. Between two endpoints, a TCP connection binding to the IP addresses of two VNDs is established.At both endpoints, TCP NewReno is used.When a TCP segment of the established connection is pushed down to the network layer, the corresponding IP packet will be forwarded to VND.VND then passes the IP packet to a dedicated packet-processing program attached to it.The IP packet will be encapsulated into a UDP datagram and then sent to the peer from one of the PPP network adapters.A Round-Robin scheduling algorithm is employed to decide which network adapter will be used to transmit the subsequent encapsulated packets.Thus, the segments of the single TCP connection established between two endpoints will be concurrently transmitted from all the available PPP network adapters.To measure the throughput of simulated TCP over multiple heterogeneous links in NS3, a sending application is installed on endpoint A, and a receiving application is installed on endpoint B. A then sends  bytes data to B, and B records the transfer finish time as  seconds.Thus, the throughput can be calculated as / bytes per second.

Evaluation Methodology
Using the proposed model and the simulation respectively, two sets of throughputs of TCP over multiple heterogeneous links are obtained for comparison.For each case, the derivation of throughput is performed under different number of heterogeneous links employed for concurrent transmission.When utilizing a certain number of links, the bandwidth of a link remains constant but different from those of the other links.The value of delay of a link is fetched from an individual dataset associated with that link.For example, if  links are employed for a concurrent transmission, and the delay dataset of each link contains  values, then there will be   combinations of delay values.The Average Delay Asymmetry of   groups of delay will be calculated and sorted, from which 36 groups of delay will be evenly selected.For selected groups of delay values, the derivation of throughput is repeated using simulation and proposed model correspondingly.

Parameter Settings
The parameters for model prediction or simulation are taken from the measurement results of field investigation towards the wireless network heterogeneity, as described in section III.Since eight modems were measured during the investigation, up to eight links can be employed for concurrent transmission in model prediction or simulation, namely link I to link VIII.For example, if our links are needed, Link I, II, III and IV will be utilized.Link I, II and III represents FDD-LTE of China Telecom, Link IV, V and VI represents FDD-LTE of China Unicom, link VII and VIII represents TD-LTE of China Mobile.The bandwidth of link I to link VII are set as the maximum measured download data rates shown in Figure 1(b), which are respectively 35.9Mbps, 18.4Mbps, 33.3Mbps, 14.7Mbps, 14.8Mbps, 4.4Mbps, 22.5Mbps and 12.5Mbps.
The field measurement results of RTT of a modem are directly adopted as the delay dataset of corresponding link in the simulation.
The other parameters used in proposed model are set according to Table 2.

Evaluation Results
We introduce prediction accuracy to evaluate the proposed model's consistency to simulation results.Supposing the predicted throughput using the proposed model is   , the derived throughput using simulation under same circumstance is   , then prediction accuracy is defined as: The evaluation results with number of links varying from 2 to 8 are depicted in Figure 6, where throughput is plotted against the cyan circles, which represent the simulation results, and the red crosses indicate the predicted values using the proposed model.It can be observed that there is a good match between the model prediction and the simulation results in all cases.With the number of links employed for concurrent transmission varying from 2 to 8, the prediction accuracies are 89.68%,83.14%, 79.26%, 75.99%, 73.24%, 71.06% and 69.50%.The prediction accuracies slightly drop as the number of utilized links increases.This is due to that the error introduced by the randomness becomes larger in the proposed mode when the number of links available for transmission grows.Even so, the average prediction accuracy can reach 77.41%.
Since the parameters of link quality (i.e., bandwidth and delay) used in the simulation are adopted from the results of field measurement, we can conclude that the proposed model is also accurate for TCP over multiple heterogeneous links in practical environment.

Analysis Based on the Proposed Model
In this section, the effect of path heterogeneity on performance of TCP flow transferred over multiple heterogeneous paths is analyzed based the proposed model.Firstly, the influence of Average Delay Asymmetry as well as Average Bandwidth Asymmetry on the throughput is investigated.Then we discuss the policy of determining appropriate number of links to transmit the segments of TCP flow over multiple heterogeneous paths.

The Influence of Delay and Bandwidth Asymmetry
It is an interesting issue to study to what extent do Average Delay Asymmetry and Average Bandwidth Asymmetry affect the throughput of TCP flow transferred over multiple heterogeneous paths.It has been previously concluded in section IV that the performance of TCP over multiple heterogeneous links is subject to these two parameters, but which is the main factor that affects the TCP performance, Average Delay Asymmetry or Average Bandwidth Asymmetry?
To answer this question, we use the proposed performance analysis model to evaluate the TCP throughput as a function of both Average Delay Asymmetry and Average Bandwidth Asymmetry.The minimum delay and bandwidth are 5ms and 100kbps.The Average Delay Asymmetry and Average Bandwidth Asymmetry are set to vary from 0ms to 35ms and from 0kbps to 700kbps.In this case, the maximum of Average Delay Asymmetry and Average Bandwidth Asymmetry are both seven times of minimum delay and bandwidth.The number of links utilized for concurrently transmitting data varies from 1 to 4. The evaluation results are shown in Figure 7. Figure 7 shows that the average throughput drops significantly to the axis of Average Delay Asymmetry but decreases at a much slower pace to the axis of Average Bandwidth Asymmetry.This phenomenon is particularly obvious when four links are used to concurrently transfer the TCP flow.In this case, under highest level of Average Delay Asymmetry, the average throughput decreases by 1.8 times as the Average Bandwidth Asymmetry varies from zero to maximum.In contrast, when Average Bandwidth Asymmetry remains at highest level, and the Average Delay Asymmetry varies from zero to maximum, the average throughput is reduced by 2.8 times.Based on the above analysis, we can conduct that the Average Delay Asymmetry is the main factor that affects the throughput performance of TCP flow over multiple heterogeneous paths.This inference can guide the design of multipath transmission mechanism in heterogeneous networks.

Relationship Between the Throughput Performance and the Number of Links
Knowing that Average Delay Asymmetry is the dominant factor that affects the TCP throughput transferred over multiple heterogeneous paths, we can now investigate the relationship between the TCP performance and the number of links employed for transmission under different level of Average Delay Asymmetry.Further, the optimal number of links should be used to achieve optimized performance is discussed.Under four groups of minimum delay, we evaluate the throughput of TCP flows employing different number of links as a function of Average Delay Asymmetry.During the evaluation, up to 4 links are utilized and the bandwidth of each link is 100kbps.Along with the Average Delay Asymmetry varying from 10ms to 90ms, the throughput is derived using the proposed model under the minimum delay of 5ms, 20ms, 35ms and 50ms respectively.The results of the evaluation are depicted in Figure 8.According to Figure 8, we can find that the Average Delay Asymmetry of heterogeneous networks compromised the benefits of aggregating bandwidth by utilizing multiple links.Meanwhile, large minimum delay exacerbates the effect of Average Delay Asymmetry on throughput performance using multiple links.Under the minimum delay of 5ms, when the Average Delay Asymmetry increases to 35.6ms, the throughput of TCP concurrently transferred over four links decreases to that of TCP using only one link .When minimum delay increases from 5ms to 50ms, such threshold of Average Delay Asymmetry at which the throughput of four links equals to that of one link decreases from 35.6ms to 30.6ms.
Based on the above evaluation results, we can roughly derive a criterion of determining the number of links to optimize the throughput performance.For example, when the minimum delay is more than 5ms and the Average Delay Asymmetry is more than 20ms, utilizing two links to transfer the TCP flow will achieve maximum throughput.

Conclusion
In this paper, the severe extent of link quality asymmetry in real world situations is revealed based on field measurement, and then a performance analysis model for TCP over multiple heterogeneous paths for 5G services is derived regarding average throughput.Taking into the consideration of both bandwidth and delay asymmetry, we carefully investigate the transmission of TCP segments over multiple heterogeneous links and derive the corresponding performance analysis model.The proposed model is validated by comparison with simulation experiment using parameters from the field measurement.The results prove that the proposed performance analysis model can achieve high analytical accuracy in practical environment.Further analysis based on the proposed model reveals some interesting inferences.First, compared to bandwidth asymmetry, delay asymmetry is the dominant factor that affects the performance of TCP over heterogeneous networks.Second, the criteria of determining appropriate number of links to be used to optimize the TCP multipath performance is discussed.The proposed model can provide a guidance to the design of CMT solutions for 5G mobile services.

Figure 1 .
Figure 1.Results of field measurement regarding link quality asymmetry of heterogeneous wireless networks.CT1, CT2 and CT3 are FDD-LTE of China Telecom, CU1, CU2 and CU3 are FDD-LTE of China Unicom, CM1 and CM2 are TD-LTE of China Mobile.(a) depicts the boxplot RTT statistics, (b)shows the maximum and average download data rate, both of which can reveal the significant difference in link quality of heterogeneous wireless networks.

Figure 2 .
Figure 2. The network model of TCP over multiple heterogeneous paths

Figure 6 .
Figure 6.Comparison between the proposed model and simulation experiment.The number of links employed for concurrent transmission varies from 2 to 8, and the corresponding comparison results are depicted in (a) to (g).The results prove that the proposed model can achieve high accuracy compared to the simulation experiments.

Figure 7 .
Figure 7.The throughput of TCP over multiple heterogeneous links on axes of both Average Bandwidth Asymmetry and Average Delay Asymmetry.The minimum delay is 5ms and the minimum bandwidth is 100kbps.(a), (b), (c) and (d) are the results using 1,2, 3 and 4 links.It is shown that the throughput is more prone to the effect of Average Delay Asymmetry.

Figure 8 .
Figure 8.The throughput of TCP over multiple heterogeneous links as function of Average Delay Asymmetry using 1, 2, 3 and 4 links.(a), (b), (c) and (d) are the results with the minimum delay of 5ms, 20ms, 35ms and 50ms.It is shown that Average Delay Asymmetry compromises the benefits of aggregating bandwidth by utilizing multiple links.