Adaptive Method for Packet Loss Types in IoT : An Naive Bayes Distinguisher

With the rapid development of IoT (Internet of Things), massive data is delivered through trillions of interconnected smart devices. The heterogeneous networks trigger frequently the congestion and influence indirectly the application of IoT. The traditional TCP will highly possible to be reformed supporting the IoT. In this paper, we find the different characteristics of packet loss in hybrid wireless and wired channels, and develop a novel congestion control called NB-TCP (Naive Bayesian) in IoT. NB-TCP constructs a Naive Bayesian distinguisher model, which can capture the packet loss state and effectively classify the packet loss types from the wireless or the wired. More importantly, it cannot cause too much load on the network, but has fast classification speed, high accuracy and stability. Simulation results using NS2 show that NB-TCP achieves up to 0.95 classification accuracy and achieves good throughput, fairness and friendliness in the hybrid network.


Introduction
The Internet of Things (IoT) is a new network industry based on Internet, mobile communication networks and other technologies, which has wide applications in industrial production, intelligent transportation, environmental monitoring and smart homes.It uses the sensors and actuators with different types of perception, communication and computing capabilities on objects to automatically acquire the information of the physical world [1][2][3].As a kind of new media technology, IoT has a very complicated network environment, including home network, office local area network, wired network and wireless network.All kinds of networks interlace and coexist with each other, which may inevitably lead to congestion.The congestion problem occurs when the remaining resource space of the network does not meet the requirements of the user and the performance indicators in the network are greatly deteriorated [4,5].In IoT, routers need to forward and transmit large amounts of data, especially in sensor networks and wireless access networks, where network bandwidth and router capacity are limited [6][7][8][9].Therefore, congestion control plays an important role in IoT that provides satisfactory service.
IoT is a hybrid network that includes both wired and wireless network environments.Thus, IoT includes not only the congestion caused by the traditional wired network, but also the congestion generated by the wireless network.Sensors and actuators in wireless sensor networks are usually deployed in edge environments where wireless channels are subject to burst packet loss due to multipath fading or high bit error rate [10][11][12][13].In the hybrid network, traditional TCP protocols can no longer properly coordinate network performance, it only considers the situation of wired networks, attributing the loss of packets to network congestion and rapidly adjusting the transmission rate to control the network load.As a result, the congestion window is frequently reduced and the network performance is seriously deteriorated [14,15].Therefore, in an IoT hybrid network, the research on correctly distinguishing packet loss categories and implementing reasonable control strategies to improve network performance has attracted wide attention from many scholars, but so far there is no recognized satisfactory solution.
In this paper, we present our scheme and measure its ability to distinguish between congestion loss and wireless random loss.First, we use simulation to study this discriminatory ability.Then, we modify NewReno to integrate our scheme which is called NB-TCP and study the induced throughput enhancement.Finally, we compare the performance of our scheme with TCP-Casablanca [16] and NewReno [17].
The rest of the paper is organized as follows.Section 2 reviews related works.Section 3 describes the system model and analyze different features of packet loss.We present the principle and implementation process of NB-TCP algorithm in Section 4. Section 5 presents simulation results and Section 6 concludes this paper.

Related Work
At present, scholars have carried out a lot of research work on the hybrid network TCP congestion control strategy.The main schemes include segmentation connection scheme, link layer scheme, cross-layer cooperation scheme and end-to-end scheme.
The segmented connection scheme divides the TCP connection into two parts, a wired segment and a wireless segment, each of which is a complete TCP connection.The wired segment is a part of the path from the sender to the base station, using traditional TCP protocols.The wireless segment is a part from the base station to the receiver, using an improvement TCP protocol.However, in this way, the sender can only receive feedback from the base station and cannot know the transmission situation of the wireless segment, which violates the end-to-end semantics of TCP protocol [18][19][20][21][22]. TCP-Split [23] algorithm and TCP-Snoop [24] algorithm used the base station as a TCP agent, which was responsible for retransmission and corresponding control of packet loss on wireless link.They identified the cause of packet loss and dealt with it without sender.However, this method required a large amount of buffering and processing capability of the base station, and was not suitable for the current network with increasing data traffic.
The link layer scheme sense packet loss and complete retransmission at the wireless link layer, which is flexible in operation, but has certain repeatability with high-level error control and is easy to compete with each other.Moreover, wireless end users also have the problem of energy limitation and their performance is often not unsatisfactory [25][26][27].There are two typical methods, Forward Error Correction (FEC) and Automatic Repeat Request (ARQ).With FEC technology, the receiver recovered damaged data according to the check sequence of the data packet.However, the cost of bandwidth resources was expense and the network performance was reduced.The ARQ technology has retransmitted wrong packets according to the ACK feedback from the receiver, but the channel usage rate was not high.
The cross-layer cooperation scheme is a protocol of cooperation among multiple layers, which improves the congestion control strategy.For example, ELN [28] transferred the status information from the link layer to the transport layer by marking the ELN bit, which has been carried as a TCP option on the ACK.As a result, the sender knew the generation of the wireless packet loss.
The end-to-end scheme improves TCP protocols and does not require support from nodes other than the sender and receiver.It is very popular and has many research results (e.g., TCP-Reno, TCP-Vegas, and TCP-Westwood).This type of scheme generally makes improvement depends on a certain parameter, such as round-trip time (RTT), one-way delay, receive packet time interval (RPTI) and retransmission rate.It obtains the threshold value or membership range for distinguishing packet loss types or current network states through statistical analysis.NCPLD [29] measured the RTT value at the sender and compared it with the delay threshold that has been set.If the RTT value was less than the threshold, the category of packet loss was judged as wireless random loss, otherwise, it was congestion packet loss.The simulation of NCPLD showed that the classification accuracy of packet loss was high with high wireless loss rate.However, when the wireless loss rate was low, the network performance was not superior to the traditional TCP protocol.TCP-Biaz [30] constructed a probabilistic packet loss detector mechanism base on the RTT value, but the improved performance was not high.TCP-Casablanca [16] marked packets as different priority levels at the sender in equal intervals.During the transmission process, the router performed different discarding behaviors according to the packet priority level.After received the data packets, the receiver identified the packet loss type by statistically analyzing the loss probability of each priority data packet.The method has not been affected by various factors in data transmission process, but its accuracy was difficult to guarantee under different wireless loss rate in hybrid networks.
Based on above considerations, we propose a new improved end-to-end congestion control algorithm.By selecting characteristic parameters and training with Naive Bayesian model, we construct an efficient and accurate identification model to correctly distinguish the types of packet loss, so as to reasonably adjust TCP congestion control mechanism and improve network performance.

System Model and Feature Extraction
Figure 1 shows a hybrid network system model that includes wired network and wireless network connected by a gateway.Servers, routers, and a gateway formed a wired LAN.Packets are sent by servers, passed through the router, and finally forwarded by the gateway to the wireless sensors.Assume that there is a bottleneck link between the router and the gateway.When the data stream increases, it will cause congestion and packet loss.The wireless link between the gateway and the wireless sensors has a high wireless error rate.That is, when packet loss occurs during transmission, it may be due to network congestion or wireless error.Packet loss authentication is a classification technique based on the characteristics of packet loss behavior.Choosing appropriate feature attributes is the key to successful classification.Since the number of selected feature attributes in the mathematical representation is the dimension, it determines the difficulty of the algorithm implementation.In theory, the more feature attributes are selected, the classification accuracy of the algorithm is higher.However, in practice, there are different degrees of dependence among the feature parameters, and their accuracy in classification is different.Therefore, selecting appropriate feature parameters will greatly improve the accuracy of the algorithm.We propose two feature parameters, the priority feature value and the RPTI feature value, which take into account both accuracy and efficiency of the algorithm.

Priority Feature
Packets of the same flow are marked as p high (packets with high priority) and p low (packets with low priority) at sender.As shown in Figure 2a, the sender sends 9 packets,p 1 ,p 2 ,...p 9 , p low 4 and p low 8 are marked as low priority.When the load of the network is too heavy or the router cache overflows, lower priority packets are dropped first.As shown in Figure 2b, p low 4 and p low 8 are dropped in router because of congestion.The p high will be dropped while there are no p low in queue.When packet loss occurs in the wireless link, the loss probability of packets with different priorities is the same.As shown in Figure 2c are randomly dropped.The receiver counts the packet loss pattern to matches the packet loss category.The statistical index can be given by where k is the marking interval, it means that one out of every k packets is marked as p low .l total is the total number of lost packets at receiver, and l p low is the number of lost packets which marked as p low in l total .If F pri ≤ 0, it means that, the proportion of p low in the lost packets is greater than 1/k , thereby judging that the packet loss is caused by congestion.

Receive Packet Time Interval Feature
The RPTI is the time between the arrival of successive packets and is approximately equal to the minimum transmission time T min on the wireless link.Assume that p new is the out-of-order packet received by the receiver, p last is the last packet arriving in sequence before p new , and T loss is the RPTI of the p new and p last .If the packet loss occurs on the wireless link and the number of consecutive packet losses is x, the minimum of T loss is (x + 1)T min .When the packet has been lost before arriving at the gateway, which connect the wired and wireless network, the out-of-order packet is queued with other packets for forwarding, so the minimum of T loss is T min .Therefore, the RPTI feature value can be given by As shown in Figure 2c, after packets are forwarded by the gateway, the wireless link randomly dropped p high 3 , p high 6 and p high 7 .The RPTI between p 2 and p 4 is 2T, p 5 and p 8 is 3T.The value of F RPTI is greater than zero.As shown in Figure 2b, p low 4 and p low 8 are dropped due to the overflow of router queue.Other packets are forwarded by the gateway and then reach the receiver and the RPTI are T.So the value of F RPTI is less than zero.
In addition, according to the above definition, the format of the training sample item A n (n ∈ N, N is the number of training sample item) is [a pri , a RPTI , C(A n )].The value of a pri and a RPTI can be given by C(A n ) is the classification result of A n , where C is the training sample classification set.

NB-TCP Algorithm
Classification problem is one of the most widely studied and applied problems in the field of data mining.Naive Bayesian Model has become one of the classic models in the classification field due to the firm and rigorous mathematical theory foundation [31].Based on Bayes' theorem, it is assumed that the influence of each feature value on a given class is independent of each other, and the classification results are identified by combining known prior probability and conditional probability.Compared with other classification methods (e.g.support vector machine (SVM), decision tree algorithm), this method is simple and easy to understand.It can be effectively applied to the receiver to collect information and analyze.Not only will this not cause too much load on the network, but it also has fast classification speed, high accuracy, stable classification efficiency, insensitivity to missing data, and can quickly and effectively obtain classification results.Suppose the sample set is A, the sample item is A n = {a 1 , a 2 , ..., a d }, and each a d is a feature value.Determine the category of A n if and only if its posterior probability value is the largest.
according to Bayes' theorem, we can get where Pr[A n ] is a constant, so the molecule can be maximized, and Equation ( 6) is simplified to the Pr[a d |C i ] is the conditional probability that a d belongs to the category C i , and its value is equal to N d C i (the ratio of the number of samples with attribute a d in N C i ) to N C i .According to Equations ( 5), ( 7) and ( 8), the decision function of classification can be given that that is, A n is classified into the class with the greatest posterior probability.Base on Equations ( 3), ( 4) and ( 9), an improvement congestion control algorithm called NB-Bayes is developed by Algorithms 1 and 2. There are two components of the improved algorithm, which are the ends of the TCP connection (e.g., TCP sender and TCP receiver).NB-TCP modify Newreno to distinguish congestion losses and wireless random losses, which performs the main classification work at the receiver.When the TCP receiver detects the packet loss, it calculates the feature value a pri and a RPTI according to Equations ( 1) and ( 2) respectively as an unclassified samples.The classification model is established using training samples during the training process.It obtains the category prior probability value and the category conditional probability value of feature attributes from training samples, which will be a time-consuming job.The training process is done before the whole classification process.In the classification process, the model can be used directly to predicate the category of unclassified samples based on Equation ( 9), therefor the classification speed is negligible.If the classifying result is wireless random loss, the ELN bit of the ACK packet will be marked, which will be returned to TCP sender when packet loss occurs, whatever the reason.
Algorithm 1 NB-TCP Algorithm at TCP Receiver.Obtain a pri and a RPTI as a sample; As NewReno; 8:

end if 9: end if
If TCP sender detects the third duplicate ACKs with unmarked tag ELN, it means that network congestion and will take congestion control actions.If the value of ELN is 1, that is wireless random loss, sender will not halving the congestion window and retransmit the packet.This kind of processing will largely avoid the unnecessary operation of halving the slow start threshold and reducing the congestion window during fast retransmission, and shield the sender from packet loss in the wireless link, thus improving the performance of TCP protocol.
The accuracy of the classification model mainly depends on the priority feature and the RPTI feature.Traffic patterns in the IoT network may change, but the classification model is robust to adaptation.This is because that feature values are not affected by traffic patterns.The detailed reasons are listed as follows.The priority feature relies on the distribution of different priority packets at receiver to get the value.Traffic patterns dose not affect the drop rule so that it dose not affect the value.Meanwhile, the RPTI feature relay on the arraying time interval between out-of-order packets T loss , the transmission time on wireless T min and loss number x to get the value.T loss is only affected by the transmission situation on wireless link.Traffic patterns dose not affect T loss , T min and x so that it dose not affect the value.

Numerical Analysis
To comprehensively analyze and verify the effectiveness of the NB-TCP algorithm, we use simulations with NS2 to build the system model.Figure 1 shows the model used.There have four TCP connection from servers to wireless sensors.The four connections share the bottleneck link between router and IoT gateway which have a bandwidth of 2 Mbps.Wireless links between the gateway and sensors have wireless error rate(WER) which take the values from 0.01 to 0.05.The simulation parameters show as Table 1 and each experiment lasts 160 seconds.TCP senders are fed with FTP traffic.The training sample obtained by data acquisition and the sample format specified in Section 3. We conduct the experiment with NB-TCP, TCP-Casablanca [16] and NewReno [17].The result showed here for the throughput and accuracy are the average over the 20 runs same experiment.Figure 3 shows the average throughput versus WER for the three methods.We can see that NB-TCP improves the throughput performance and is superior to comparison methods.With the increase of WER, more packets are lost due to bad wireless link conditions, the advantage of NB-TCP is more obvious.When the WER is 0.01, the performance of NB-TCP is nearly 35% higher than TCP-Casablanca.Compared to NewReno, the average throughput of NB-TCP is double.When the WER is 0.05, the performance is approximately 180% higher than the comparison algorithm.This is because the packet loss type is correctly classified, so the unnecessary congestion control is avoid.
Figure 4 shows the accuracy versus WER for the three methods.It can be seen that NB-TCP has a high accuracy, and is almost higher than the comparison methods with the increase of WER.When the WER is 0.02, the rate of correct classification packet loss types can reach up to 95%, and the overall probability remains above 85%.The accuracy of TCP-Casablanca and NewReno methods is less than 50% at a WER of 0.05.This shows that NB-TCP improves the classification accuracy.Comparing Figures 3 and 4, the performance of NB-TCP is better with the higher WER.
When multiple TCP flows compete for bottleneck links, allocate bandwidth fairly is very important for each TCP flow.Fairness of network resource allocation is an important index to evaluate TCP performance.Here we measure fairness using the fairness index, which can be given by [32] where s is the number of flows and f i is the average throughput.When F index is closer to 1, the fairness is the best and represents the absolute fair distribution of network resources.When the value is 1/s, it indicates that the network resource allocation is seriously unfair, it is monopolized by one flow.This experiment uses 10 flows to calculate the fairness index of three methods, respectively, and simultaneously send data to count the average throughput of each flow.Figure 5 shows the fairness index of NB-TCP and its comparison methods under different WER.It can be seen that the fairness index of all three algorithms exceeded 0.96, among which NB-TCP fairness index is higher, close to 1, indicating good fairness.The friendliness is reflected in the degree to which different TCP protocols, which share the bottleneck link, interact with each other.To prove that NB-TCP is friendly to other methods and will not preempt shared resources unduly, the experiment uses two groups of senders and receivers to establish two flows.One flow is configured with NB-TCP and the other is configured with a comparison method.The average throughput of two flows is measured under different WER. Figure 6a,b shows the friendliness of NB-TCP algorithm under different WER.Although the improved method is more competitive to network resources, the distribution of resources is stable.Compared the average throughput performance in Figure 3, although the NB-TCP throughput is higher, the average throughput of TCP-Casablanca and NewReno has not been greatly affected.To evaluate the impact of transmission data rate on throughput performance, we use four TCP connections and keep the WER at 0.05 to ensure more obvious performance with enough loss packets.Figure 7a plots the average throughput versus transmission data rate.It shows that throughput increases with the increment of transmission data rate increases before 1.5 Mbps, but decreases after that.When the congestion loss increases, network performance degrade obviously.For the same transmission data rate, our proposed algorithm still achieves a better performance.Figure 7b shows the throughput performance

Figure 2 .
Figure 2. The different characteristics of packet loss in wired scene and wireless scene.

3 : 2
Classifying the sample by the learned model and assign the result to C; NB-TCP Algorithm at TCP Sender.1:if receive the 3rd duplicate ACK then
Pr[C i ] is the prior probability of the category C i , which is the ratio of the number of samples N C i belonging to C i in the training samples to the total number of training samples N. Pr[A n |C i ] indicates the conditional probability that A n belongs to the category C i .Naive Bayesian Model assumes that the various feature values are independent of each other.Pr[A n |C i ] can be given that Pr[A n