Adaptive Filtering Queueing for Improving Fairness

In this paper, we propose a scalable and efficient Active Queue Management (AQM) scheme to provide fair bandwidth sharing when traffic is congested dubbed Adaptive Filtering Queueing (AFQ). First, AFQ identifies the filtering level of an arriving packet by comparing it with a flow label selected at random from the first level to an estimated level in the filtering level table. Based on the accepted traffic estimation and the previous fair filtering level, AFQ updates the fair filtering level. Next, AFQ uses a simple packet-dropping algorithm to determine whether arriving packets are accepted or discarded. To enhance AFQ’s feasibility in high-speed networks, we propose a two-layer mapping mechanism to effectively simplify the packet comparison operations. Simulation results demonstrate that AFQ achieves optimal fairness when compared with Rotating Preference Queues (RPQ), Core-Stateless Fair Queueing (CSFQ), CHOose and Keep for responsive flows, CHOose and Kill for unresponsive flows (CHOKe) and First-In First-Out (FIFO) schemes under a variety of traffic conditions.


Introduction
Random Early Detection (RED) detects incipient congestion by computing the average queue size [1].When the average queue size exceeds a threshold, RED drops or marks each arriving packet with a probability, where the probability is a function of the average queue size.RED not only keeps queuing delays low but also maintains high overall throughput because it can prevent current connections from global synchronization.RED should cooperate with transport-layer protocols capable of congestion control, such as TCP; unfortunately, it currently does not.Without congestion control, RED has poor fairness, especially for heterogeneous traffic environments.To improve the fairness of RED, a CHOose and Keep for responsive flows, CHOose and Kill for unresponsive flows (CHOKe) scheme was proposed [2].When a packet arrives at a router, CHOKe compares it with a packet at random from the buffer.If both packets come from the same flow, both are discarded at the same time; otherwise, the arriving packet may be discarded with a probability depending on the current degree of congestion.By using additional discrimination on flows with heavy traffic, CHOKe demonstrates improved fairness.XCHOKe is a revised version of CHOKe [3].XCHOKe maintains a lookup table to record CHOKe hits.Accordingly, it further identifies possible malicious flows.If a flow has many CHOKe hits, this flow has a higher probability of being identified as a malicious flow.Therefore, XCHOKe applies a higher dropping probability to the arriving packets of this flow.Although XCHOKe achieves better fairness than CHOKe and RED, it lacks scalability, making XCHOKe too complicated to be deployed in high-speed networks.
Considering the trade-off between scalability and fairness, Core-Stateless Fair Queueing (CSFQ) [4] and Rainbow Fair Queueing (RFQ) [5] were proposed.In particular, all routers in CSFQ are classified as edge or core routers.Edge routers need to maintain per-flow state because they have to estimate the flow rate of each arriving packet.Next, the information is inserted into the corresponding packet headers.Core routers estimate the fair share rate and then use a simple dropping algorithm to determine whether an arriving packet is accepted or discarded.RFQ is similar to CSFQ but with one significant difference: the state information that is inserted into the packet headers is the color layers, not the explicit flow rate.The operations of the core routers are further simplified, and the application can assign differentiated preferences to certain packets.The fairness of both core-stateless schemes could be degraded along with the increasing number of traversing nodes.Rotating Preference Queuing (RPQ) consists of a set of FIFO output queues that are dynamically rotated [6].RPQ dispatches qualified arriving packets to adequate output queues based on packet distribution and preferences.RPQ has excellent fairness, but it needs a large buffer size, which means that RPQ may have a high implementation cost and result in a large queueing delay.Compared with RPQ, CSFQ, CHOKe and FIFO, the proposed AFQ scheme in this paper is scalable because it is easy to implement.Furthermore, AFQ is efficient because it provides approximately perfect fairness under various traffic conditions.
An interesting research question is whether a scheme can achieve fairness without requiring per-flow state.Providing fairness is important because it also contributes to congestion avoidance.In this paper, the objective is to achieve fair bandwidth sharing with a simple and scalable AFQ approach.The rest of the paper is organized as follows: Section II reviews the related work; Section III describes the details of the AFQ scheme with a two-layer mapping mechanism that can simplify the packet comparisons; Section IV presents the simulation results that demonstrate the fairness of different schemes under various network topologies and traffic conditions; and Section V presents our conclusions.

Related Work
In general, two types of schemes are used to address fairness among competing flows: AQM and packet scheduling.Compared with packet scheduling, AQM has attracted more attention due to its simplicity and efficacy.Moreover, AQM can enhance the performance of congestion control algorithms.Deficit Round Robin (DRR) is a packet scheduler that can achieve approximately perfect fairness [7].DRR allocates a virtual queue dedicated to each active flow, which accommodates its arriving packets.When the packets are enqueued into particular queues, they will be served in a round-robin fashion according to the available quantum sizes.DRR needs to maintain per-flow state.In addition, DRR must work with a pushout (PO) buffer management scheme that avoids buffer shortages on certain flows [8].When a packet arrives at a router with a full buffer, PO will push out one or more residing packets from the longest virtual queue.In this manner, PO can make room for the new arrival.Otherwise, the arriving packet will be accepted without any constraints.PO can make buffer utilization high and packet loss low under various traffic conditions.However, PO has two main drawbacks.First, it has to find the longest virtual queue out when arriving packets encounter a full buffer.Second, it has to execute frequent pushout operations under congested traffic conditions.As a result, it is questionable whether DRR should be implemented due to the abundance of active flows in routers [9].
DRR may suffer from a large delay and jitter due to a long cycle of round robin operations, so several variants have been developed to address these issues [10][11][12].A customized deficit round robin (CDRR) takes care of real-time flows by adding a new queue to schedule real-time traffic just prior to the deadline [10].However, the extra queue increases the delay for non-real-time traffic, particularly when the traffic load is heavy.Moreover, assigning weights to the queues may enlarge overall unfairness, especially for non-real-time traffic.Another variant, fuzzy-based Adaptive Deficit Round Robin (FADRR), uses expert systems based on fuzzy logic to adjust the weights of service queues for real-time and non-real-time traffic [11].Additionally, this scheme sacrifices fairness because it favors real-time traffic that may result in bandwidth starvation for non-real-time traffic.
RED is a well-known AQM scheme employing a single FIFO buffer to accommodate arriving packets from all active flows [1].The arriving packets encounter different drop probabilities according to average queue sizes and other parameters.RED discards packets before the buffer is full, so it can prevent the TCP connections from global synchronization.Unfortunately, RED is unable to provide fairness, especially for heterogeneous traffic.Based on RED, several variants have been proposed that enhance the fairness or robustness of RED parameters [13][14][15][16][17][18].Self-Configuring RED changes dropping probabilities according to the variations in average queue sizes [13].If the average queue size oscillates around the minimum threshold, then the current dropping probability is too high.On the other hand, if it oscillates around maximum threshold, then the current dropping probability is too low.Based on the dynamics of queue sizes, the scheme adjusts packet dropping probabilities that reduce packet loss while maintaining high link utilizations.Another RED variant, weighted Destination-Based Fair Dropping (wDBFD), only needs a single queue, and packets are probabilistically dropped before they are enqueued instead of tail dropping.Furthermore, by adding weights to the drop probabilities of different destination stations, this scheme realizes destination differentiation.Additionally, wDBFD is more robust in terms of fairness when subjected to different packet arriving rates.The idea of wDBFD is similar to that of the RED, but it relies not only on past measurements of queue size but also on recent observed rates of flows.By using this additional information, wDBFD improves RED's fairness.However, this scheme also increases the complexity.
In CSFQ [4], an edge router has to maintain per-flow state and is in charge of state (flow arrive rate) insertion into packet headers.Whenever a core router receives a packet, CSFQ has to estimate the fair share rate and uses a simple probabilistic model to accept or discard the new arrival.CSFQ achieves reasonable fairness; moreover, it pushes complexity toward the edge routers, which simplifies the sophisticated implementation in the core routers.In general, the number of active flows in core routers is relatively larger than in edge routers.Therefore, CSFQ can be deployed in network environments consisting of high-speed core routers and moderate-speed edge routers.The architecture of RFQ is similar to CSFQ, which mainly consists of packet coloring and buffer management [5].RFQ transfers the flow arriving rate into a set of layers, with a globally consistent color per layer.Next, the edge routers insert the color into packet headers.When a packet arrives at a core router, the arrival will be discarded only if its color level is over the color threshold.The color threshold dynamically changes in accordance with traffic variations.Compared with CSFQ, RFQ has approximate fairness, but it only carries a simple color level rather than an explicit flow arriving rate.Furthermore, it wards off exponential averaging estimation when a packet is generated.In summary, both CSFQ and RFQ schemes classify the routers as edge or core routers, and only the edge routers maintain per-flow state.
The fairness of RPQ tends to approach that of DRR, and it outperforms several schemes, such as CSFQ, DDE, CHOKe and FIFO [6].In addition, RPQ has a complexity of O(1) and is simple to implement in high-speed networks.However, RPQ has an expensive implementation cost and a large queue delay.Currently, several TCP variants are adopted by end users, and heterogeneous congestion control schemes have thus become a characteristic of newly emerging networks.In contrast to pure TCP connections, the fairness of several well-known AQM schemes, such as RED and CHOKe, among heterogeneous TCP connections is discussed [19].We do not consider the effect of TCP variants here, but it is an interesting topic as an extension of AFQ applications.

Adaptive Filtering Queueing
In Figure 1, we depict four main components of AFQ, including accepted traffic estimation, fair filtering level estimation, a filtering level table and a packet-dropping algorithm.At the end of this section, we propose a mechanism that permits the scalability of AFQ.If the mean arriving rate of a flow is larger than the max-min fair rate, such a flow is defined as an aggressive flow; otherwise, this flow is defined as a non-aggressive flow.In addition, the flow label is composed of a pair of IP source-destination addresses related to a flow.When a packet arrives at a router, AFQ identifies the filtering level of the arriving packet by randomly comparing with a flow label from the first level to an estimated level in the filtering level table.Based on the accepted traffic estimation and the previous fair filtering level, AFQ updates estimates of the fair filtering level.Finally, AFQ uses a simple packet-dropping algorithm to determine whether the packet is qualified to be accepted or discarded according to estimates of the fair filtering level and the filtering level of the arrival.In a router, AFQ estimates accepted traffic  ̂ passing through the packet dropping component in a time interval using use Equation ( 1).The unit of  ̂ is bits, and   denotes the length of such a time interval.
̂ is the value prior to the updating of  ̂ , and   is the amount of traffic accepted in the current time interval.In addition,   is a coefficient used to balance the short-term and long-term estimates of  ̂ .The estimation of the fair filtering level α � is proportional to   / ̂ , as described in Equation ( 2).Similarly, α �  is the value before the updating of α �  , and  denotes the router's link capacity (bits per second).If  ̂ is smaller than the idea output traffic   , then α �  increases.A larger α �  will allow more arriving packets to be accepted.By observing the dynamics of accepted traffic, AFQ can produce a precise estimate of the fair filtering level.
AFQ compares the flow label of each arriving packet with a flow label in a filtering level table at random from filtering level 1 to α �  until there is a hit.The filtering level table consists of multiple filtering levels, and each filtering level only keeps flow labels, not whole packets, whose utility is to discriminate the dropping probabilities of arriving packets.If both own the same flow label (i.e., are coming from the same flow), then a hit occurs.In AFQ, multiple filtering levels work as a hierarchical filter that filters out the unqualified arriving packets.AFQ may have insufficient discriminability because of traffic dynamics, which leads to fairness degradation; hence, we should enlarge the α �  .We readjust α �  by   times the α �  in Equation (3), denoted by  �  .  is a coefficient whose function is to enhance the discriminability of AFQ.Finally, the range of flow label comparisons is altered from filtering level 1 to  �  .In other words,  �  contributes to the realization of sufficient discriminability; the principle to determine whether arriving packets are accepted or dropped still depends on α �  .As a result, the complexity of AFQ is of O( �  ).
Assuming that packet  has a first hit at filtering level   , we use Equation ( 4) to calculate filtering level   .A larger   implies that the flow of packet  has fewer residing packets in the buffer, which means that the flow label of packet  will be enrolled into filtering level   .There are two supplementary rules.First, once packet  encounters an empty filtering level at   , then   =   .Second, if there is no hit until  �  , then   =  �  .
After determining the filtering level of packet , AFQ needs to decide its location in filtering level   according to Equation ( 5), denoted by   .A circular replacement method is used to update the flow labels in each filtering level.Furthermore, we assume that each filtering level has the same capacity, denoted by .AFQ updates the filtering level table according to traffic conditions.When the traffic is heavier, AFQ updates the filtering level table with a higher frequency.
= ((  + 1)  ( + 1), 1) AFQ utilizes a simple packet-dropping algorithm to decide the treatment of packet , where   denotes the dropping probability of packet .Once packet  is accepted, it will be enqueued into the FIFO buffer; otherwise, it will be discarded immediately.Thus, only the arriving packets whose filtering levels are equal to or larger than the fair filtering level can be admitted to enter the buffer.
We propose a mechanism to add to AFQ to simplify the packet comparisons (i.e., the flow labels) while reducing memory consumption.In this design, the routers are classified as edge or core routers, similar to CSFQ.Edge routers use hash algorithms, such as SHA-1, to transfer the flow label of each packet into a key   that alleviates the IP dependency.Next, the key   is inserted into the packet's header.Figure 2 shows that the core routers have to maintain two tables; one is a first-layer table with size (  , ), and the other is a second-layer table with size (, ).When a packet arrives at the core router, AFQ extracts key   with size s from the packet header of the arrival and compares it with the keys from the first-layer table instead of the flow labels.Next, we use the same circular replacement method to maintain the keys in the first-layer table.If there is a hit at filtering level z where  ∈ [ 0 ,   ] , AFQ has to compare its flow label with a flow label at (   ,   ) in the second-layer table.We use Equation (7) to calculate the   .
If a packet encounters two hits, a real hit happened; otherwise, no real hit happened.A real hit corresponds to previous mentioned hit in the AFQ algorithm.If there is only a hit in the first table, the flow label of the arriving packet will replace the flow label at (  ,   ) in the second-layer table ; If different flows have the same key, the flow with more arriving packets has a higher chance to be captured in the second-layer table.In other words, its arriving packets will have a higher probability to be constrained.Consequently, AFQ can achieve sufficient discriminability.The updating of the second-layer table differs from the first-layer table in the use of the frequency-based replacement method.The two-layer mapping mechanism imposes additional burden on edge routers, but it can efficiently speed up the packet comparisons while reducing memory consumption on core routers.For instance, if  = 8 bits (get first 8 bits from160 bits producing by SHA-1),  = 32,  = 2,  = 7 and  = 128, the original AFQ requires 32 × 8 × 128 = 32,768 Bytes memory.When using the mechanism, AFQ requires approximately 32 × 1 × 128 + 8 × 128 × 7 = 11,264 Bytes.In other words, memory can be approximately reduced to two-thirds.To compare the fairness, we define the Normalized Bandwidth Ratio (NBR) for a specific flow based on Equation (8).NBR j denotes NBR; D j denotes the mean departure rate; and r j denotes the Mean Arrival Rate (MAR), all related to flow j.Additionally, f denotes the Max-Min Fair Share Rate (MMFSR).
, where n denotes the number of active flows.Otherwise, f = max (r i ), i ∈ n.If a scheme achieves the optimal fairness, then the NBR of each flow is equal to 1.

Simulation Results
We simulated four well-known schemes (RPQ, CSFQ, CHOKe and FIFO) and compared their fairness by analyzing the NBR behaviors.We developed a software simulator to perform all simulations, which has been used in our previous study [6].The traffic types of generating packets in each case are described in their respective figures.In Figure 3a,b, we consider two categories of network topologies: those with a single congested link and those with multiple congested links.Unless otherwise specified, we use the following parameters in the simulations.Each link capacity is of 10 Mbps, and the packet size is fixed at 1 KB.The buffer size for all schemes is set to 256 KB.In addition, we neglect the propagation delay of each link.In AFQ, the initial value of α  is set to 32, and the other parameters are set to   = 200 ms,   = 0.8 and   = 1.5.In RPQ, each output queue is set to 32 KB, and the other parameters are set as follows: ∆ = 0.8 ms, α = 0.8,   = 200 ms and  = 129.With respect to CSFQ,  and  α are both set to 200 ms.CHOKe's parameters are set to the following values:  ℎ = 120 KB,  ℎ = 40 KB,   = 0.002 and   = 0.02.Finally, the duration on each simulation is 200 s.

A Single Congested Link
We consider four cases where a single congested link is shared by 10 flows, as depicted in Figure 3a.In the first case, the flows are indexed from 1 to 10; hence, the MMFSR is 1.4 Mbps.The simulation results are depicted in Figure 4.In CHOKe and FIFO, the shared bandwidth of flows 1 to 5 is roughly proportional to their MARs.Accordingly, their NBRs are close to constant.Flows 6 to 10 have the same MBR at 10 Mbps, so they fairly share the grabbed bandwidth from flows 1 to 5. Again, their NBRs maintain a constant value.FIFO is the simplest scheme, but it shows the worst fairness.Although CHOKe performs better than the FIFO, it can provide only limited fairness.In AFQ, the NBRs of flows 1 to 5 reach the largest values, and the NBRs of flows 6 to 10 reach the lowest values.In other words, AFQ achieves the best fairness.The NBRs of flows 1 to 5 in AFQ, RPQ and CSFQ all decrease when MAR is close to MMFSR because those flows are more likely to be discarded because of traffic burstiness.From the simulations, AFQ outperforms RPQ, and RPQ outperforms CSFQ in fairness.Therefore, ARQ can effectively protect the fairness of non-aggressive flows from damage due to aggressive flows.In the third case, we consider flows consisting of three magnitudes of MARs, where the MMFSR is 1.375 Mbps.The MARs of flows 1 to 3 and flows 4 to 6 equal 0.5 Mbps and 1 Mbps, respectively.In addition, the MAR of flows 7 to 10 is 6 Mbps.The simulation results are depicted in Figure 6.The MARs of flows 1 to 6 is smaller than the MMFSR, so their NBRs are constant in CHOKe and FIFO.In RPQ and CSFQ, the NBRs of flows 1 to 3 are larger than those of flows 4 to 6. Flows 1 to 3 have lower MARs, so they have less chance of being mistakenly discarded.The MARs of flows 4 to 6 are close to the MMFSR; hence, their arriving packets are more likely to be discarded during traffic burstiness.The NBRs of AFQ are better than those of RPQ and CSFQ.As a result, RPQ still keeps the best fairness under various traffic conditions.In the fourth case, we consider that all flows belong to aggressive flows but that their MARs are diverse.The simulation results are depicted in Figure 7, and the MMFSR is 1 Mbps.Flow 1 is the only non-aggressive flow, which has a lower NBR due to larger traffic burstiness.FIFO still keeps the worst NBRs, whose trends are proportional to the MAR of each flow.We find that traffic load dominates the fairness of FIFO.Additionally, CHOKe is affected due to the same reason, but its fairness is better than that of FIFO.CSFQ has to estimate the MAR of each flow; therefore, traffic variations can influence fairness.However, exponential flow rate estimation methods can relieve such an effect.The fairness of RPQ is close to that of CSFQ, but it can also enhance fairness by increasing the number of output queues.As mentioned previously, this also reduces larger queueing delays.AFQ repeatedly outperforms the other schemes.In Figures 4-7, we conclude that AFQ is able to achieve robust and optimal fairness under a single congested link, where various traffic conditions are considered.

Multiple Congested Links
We now analyze how the NBRs of five flows with smaller MARs are affected by other flows with larger MARs under three congested links.We performed two cases based on the topology shown in Figure 3b.Flows 1 to 5 send at 0.2, 0.4, 0.6, 0.8 and 1 Mbps, and the others send at 10 Mbps.The capacity of each link is 10 Mbps so that each link between routers is congested.We consider the NBRs of flows 1 to 5 here because their fairness can be damaged by other aggressive flows.The simulation results are illustrated in Figure 8, and the MMFSR is 0.363 Mbps at the last router.The NBRs of flows 1 to 5 are all close to 0 in CHOKe and FIFO.In other words, they both fail to provide fairness.When the number of traversing routers increases, CSFQ gradually loses precision on estimates of the fair share rate and MAR of each flow.As a result, AFQ performs better than RPQ, while RPQ performs much better than CSFQ.AFQ achieves the best fairness because it can dynamically filter the arriving packets according to traffic variations.In the second case, we increase double traffic burstiness to flows 1 to 5, and the MBR of flows 6 to 28 change to 8 Mbps. Figure 9 illustrates the NBRs of flows 1 to 5, and the MMFSRs are the same as in the first case.Repeatedly, AFQ proves to be the scheme with the best fairness.Although CSFQ is worse than RPQ, it is still much better than CHOKe and FIFO.AFQ, RPQ and CSFQ degrade their fairness compared with the first case because the traffic burstiness of flows 1 to 5 is double.According to our simulation results, we conclude that AFQ can achieve the best degree of fairness under different network topologies and various traffic conditions.

Conclusions
In this paper, we present a scalable and efficient AFQ scheme that achieves fair bandwidth sharing under various traffic conditions.The routers employ a simple packet-dropping algorithm to determine whether arriving packets are accepted or dropped according to the filtering levels of arriving packets and the estimates of fair filtering levels.In addition, we propose a mechanism that can effectively simplify packet comparisons while reducing memory consumption.Accordingly, AFQ is suitable for deployment in high-speed networks.The mechanism works under the same router environments as CSFQ.We analyzed the fairness of AFQ and four other schemes under different network topologies and different traffic conditions.The simulation results demonstrate that AFQ is superior to RPQ and CSFQ and performs much better than CHOKe and FIFO.In the future, we aim to study the effect of various TCP variants on AFQ.Moreover, we plan to study an enhanced AFQ version to ensure that it can support real-time applications while keeping fairness using dynamic bandwidth readjustment.

Figure 4 .
Figure 4.The Normalized Bandwidth Ratio (NBR) achieved by each of the ten flows sharing a bottleneck link; flows 1 to 5 belong to non-aggressive flows, and flows 6 to 10 belong to aggressive flows.

Figure 5 .
Figure 5.The NBR achieved by each of the ten flows sharing a bottleneck link, with all flows belonging to aggressive flows.

Figure 6 .
Figure 6.The NBR achieved by each of the ten flows sharing a bottleneck link, with flows consisting of three magnitudes of Mean Arrival Rate (MAR)s.

Figure 7 .
Figure 7.The NBR achieved by each of the ten flows sharing a bottleneck link, with diverse MARs.

Figure 8 .
Figure 8.The NBR achieved by flows 1 to 5 with lower ON-OFF burstiness, with other flows all consisting of 10 Mbps.

Figure 9 .
Figure 9.The NBR achieved by flows 1 to 5 with double ON-OFF burstiness, with other flows all consisting of 8 Mbps.