Throughout this section, we analyze the performance of the proposed algorithms to then assess the correct behavior of the proper ONOS application.
The experiments have been carried out with an in-house developed simulator available for download [28
]. The simulator shares most of the code with the ONOS implementation, thus reducing the developing time and helping in the validation. As for the traffic itself, we have employed real traffic traces from the public CAIDA (Center for Applied Internet Data Analysis) dataset [29
]. As the original traces have been captured in 10
Ethernet links, they have a relative low average throughput, so we have constructed new traffic traces, reducing the inter-arrival times by a constant factor, producing new 6.5, 13, 19.5, 26 and
5.1. Flow Allocation Algorithms
The first experiment evaluated the variation of the energy consumption with the duration of the sampling period of the algorithms for a rate of
. Figure 10
a shows the results of the three flow allocation algorithms for a buffer size of 10,000 packets. There was a fourth algorithm, named equitable, that distributed the flows uniformly among all the ports without regards to energy efficiency, serving as a baseline for comparison. The energy consumption attained by the three energy-saving algorithms was practically the same. Besides, we can also observe from Figure 10
a that low sampling periods (e.g., lower than 0.1 s) presented higher consumption than those greater than 0.1 s. This probably corresponds to mispredictions of the flow characteristics, as already shown in Figure 3
. Finally, note that the obtained energy consumption was very close to the optimum. According to (3
), the best allocation was obtained when the rate allocation vector was
; in other words, this minimum consumption was achieved when the
load was distributed in the bundle in the following way: three ports fully utilized carrying 10
, one transmitting
and the last one with no traffic. In that case, and for the usual EEE parameters (
μs for 10
for the frame transmission mode [25
]), the bundle consumption was
). This is just a little less than the 79% energy consumption obtained by CA.
The energy consumption for the different traffic traces is shown in Figure 10
b for a sampling period of 0.5 s and a buffer size of 10,000 packets. We can observe that the energy consumption was almost identical for the three proposed algorithms and that it was considerably lower than that of the non-energy-efficient equitable algorithm. There was just a slight difference in the case of the
, where CA consumed a bit more than the greedy algorithms. This was because its safety margin (M
) made it use three ports, while the greedy algorithms would try to allocate the flows using just two ports. Nevertheless, the consumption attained by CA was indeed much lower than that of the equitable one.
a presents the variation of the packet loss rate with the sampling period for a 10,000-packet buffer size, while Figure 11
b explores the packet losses introduced for different buffer sizes, using a sampling period of 0.5 s. GA was the one with the highest losses, followed by BGA, then CA and finally the equitable algorithm. These results confirm that the greedy algorithms can lead to high loss rates when the flow rates are underestimated. The conservative algorithm, however, was able to trade a small increment in energy consumption for an acceptable loss rate for buffer sizes from 1000 packets onward. Furthermore observe how for the highest sampling rates, packet losses diminished, as the algorithm adapted faster to rate variations; although, as seen before, energy usage also incremented.
The packet loss rates for the different traffic traces are shown in Figure 11
c, where the sampling period is set to 0.5 s and the buffer size to 10,000 packets. As expected, GA and BGA were the ones having the highest losses in every case, with CA and equitable algorithms showing negligible losses. In the case of the
trace, losses were not recorded, as expected.
The results for packet transmission delay are depicted in Figure 12
. In particular, Figure 12
a shows average packet delay variation versus sampling period for a 10,000-packet buffer size. The average delay for GA was about 4 ms, which is considerably higher than that of the other algorithms, whereas the delay for BGA was about
ms, which is still a high value. The delay for CA was, however, an order of magnitude lower, about 250 μs. For reference, the delay of the equitable algorithm sat around 50 μ
, being, as expected, the lowest one. Figure 12
b shows the average packet delay experienced by the different traffic traces with the different algorithms. For the
trace, the three energy-saving algorithms behaved identically, using just one port for all the traffic. Furthermore, for CA, the delay of the packets using the 26
trace was higher than that using the
one, as, in the latter case, there was one more link in use, but with lower load. For the rest of the traces, the results were in accordance with those shown in Figure 12
5.2. QoS-Aware Algorithms
To test the performance of the two proposed QoS scheduling algorithms, we have created an additional traffic trace of / reducing again the inter-arrival times of the original CAIDA trace. Additionally, we have added a source of low-latency traffic, consisting of a synthetic traffic trace made of relatively small packets (100 and 200 bytes) and deterministic inter-arrival times corresponding to the desired final average rate. We have used CA as the flow allocation algorithm using a sampling period of 0.5 s and a buffer size limited to 10,000 packets to provide negligible (below 0.05%) packet losses, as per the results of the previous section.
a shows the average delay of the packets with low-latency requirements using the QoS-aware algorithms and that obtained using the baseline conservative one. The results in the figure correspond to the best-effort traffic trace of
, while we varied the rate of the low-latency traffic (We have omitted the results using lower rates for the best-effort traffic for the sake of brevity, since the results are analogous). The unmodified CA yielded considerably worse results than the QoS-aware algorithms, producing a delay higher than 100 μs. The fluctuations for the different rates of the low-latency traffic come from the fact that the low-latency flows would be allocated to a different port in each case, being forced to compete with a different amount of normal traffic.
Both QoS-aware algorithms significantly reduced the average delay. The SPA delay stayed around 5 μs, while TQA added less than 2 μs for all the tested transmission rates. The main delay contribution for SPA was the time needed to wake up the interface ( μs), which would be usually idle at the arrival of a low-latency packet. This was not the case for TQA, as low-latency traffic shared the port with best-effort traffic.
b shows the results when the system is already experiencing a very high load due to best-effort traffic (
). Both SPA and the non-QoS-aware CA experimented with an average delay higher than 200 μs, fluctuating up to 1000 μs, depending on the actual low-latency rate. On the other hand, TQA maintained the latency lower than 2 μs. These results confirm that SPA was not capable of providing a low latency service in high load scenarios, since all the ports were already busy forwarding best-effort traffic.
compares the average delay of the best-effort packets using the QoS-aware algorithms with the average delay of these packets when using CA for the
best-effort traffic trace and varying low-latency traffic. When the rate of low-latency packets was very low (e.g., lower than 100
), the average delay of best-effort packets was identical for the two QoS algorithms and CA, being around 264 μs. Nevertheless, as this rate increased, the delay exhibited by CA and TQA rose. On the other hand, the delay of the SPA remained unaffected by the rate of the low-latency traffic, since it was being forwarded through a different port than the best-effort traffic.
Finally, Figure 15
shows the average energy consumption of the bundle using the different QoS-aware algorithms and also CA in the same traffic conditions. Again, while the amount of high-priority traffic was negligible (i.e., lower than 10
), the three algorithms drew the same amount of energy. As expected, TQA achieved exactly the same consumption as CA irrespective of the low-latency traffic rate. However, for values higher than 10
, the energy usage increased rapidly for SPA, reaching nearly 100% for rates above 100
. This confirms that energy consumption can rise quickly in SPA as soon as the amount of high-priority traffic forwarded in the spare port becomes significant.
5.3. ONOS Application Results
The previous sections have measured the efficiency of the proposed algorithms via a simulation study. We also tested the correctness and feasibility of the proposal with an actual implementation of the application. To this end, we implemented the proposed SDN application on top of the ONOS, emulating the experimental topology with Mininet in order to evaluate the proper operation of the application. The Open vSwitch switches employed by Mininet have an OpenFlow API accessible by ONOS, but it cannot reproduce exactly the EEE capabilities, so we measured the average occupation of each outgoing link as a proxy for the corresponding energy consumption.
We evaluated our application with the / traffic trace used in the previous experiments (Results for the other traffic traces, namely the 6.5, 13, 19 and 26 Gb/s ones, have been omitted for the sake of brevity, but otherwise show consistent results).
We used tcpreplay to transmit it, but at a rate of just about 330 /, since the computer used for the experiments was not capable of transmitting this traffic trace at higher rates (We have used an Intel® Core™ i7-4710HQ (4th Generation) at 2.5 GHz).
Accordingly, the nominal capacity of the interfaces of the bundle had been scaled to 100
, and we have also scaled the sampling period to 10 s. The occupation of each port of the bundle averaged throughout twelve intervals in ten independent executions is shown in Table 1
. Despite the fact that the actual consumption of 100
interfaces would be different, this experiment allowed us to validate the behavior of the algorithm.
The results of the real implementation matched the simulation results. We see that GA used three ports to more than 90% of their nominal capacity, one to about 30%, and left the other one unused. These values describe the behavior of a water filling algorithm, as desired per design. BGA avoids having three ports so close to their nominal capacity, although one port still presented an occupation higher than 95%. As the flows were assigned in decreasing rate order, less flows were allocated to the first ports. In fact, in this case, 1.56 were allocated on average to the first port, 6.56 to the second and 96.91 to the third one. The high number of flows allocated to the third port explains why its occupation was so high. CA behaved exactly as desired, using four ports around 80% occupation and leaving the last one empty. The equitable algorithm spread the traffic evenly among all the ports of the bundle. Note that the 0.02% usage of the last port in the three energy-efficient algorithms was due to the flows being assigned randomly during the first interval. The small average occupation differences were mostly due to packet losses, which occurred whenever more than 100 / were assigned to a port during an interval.
collects the average energy consumption averaged throughout the intervals for the same ten independent executions. As we can observe, the differences in the energy consumption among the three energy-efficient algorithms were minimal, and all of them consumed about 18% less than the baseline equitable algorithm. They only differed in the consumption in port 4, which consumed about 7% less with GA than with BGA and CA. This is in accordance to the simulations.
We have also validated the QoS algorithms with the ONOS application using the setup depicted in Figure 16
. This time, the setup consisted of three switches (numbered from 1 to 3) and eight hosts (numbered from 1 to 8). Hosts 1 to 4 were connected to Switch 1 and Hosts 5 to 8 are connected to Switch 3. These edge switches were connected to an inner switch by their respective four-link bundles. All the interfaces in this scenario had a nominal capacity of 1
In this network, three UDP flows without latency requirements were originated in Hosts 1, 2 and 5, with respective destinations in Hosts 5, 6 and 7. These flows have been created with the iperf3 tool. The first two clients send traffic at 700 /, while the third one at 600 /. This way we force the flows to be allocated on the first three ports of each bundle. Then, we added three lightweight flows from Host 4 to Host 8 tagged with a predefined differentiated services code point (DSCP)value, so that they can be identified as low-latency by our ONOS application. The purpose of these lightweight flows is to measure the latency suffered by the low-latency packets, using the different scheduling algorithms.
shows box and whisker plots with the round-trip time (RTT) of 10,000 packets of the lightweight flows using the different algorithms. The whiskers show that 95% of the samples and outliers have been removed for the sake of clarity. We can see that traffic without real-time requirements suffered a substantial latency in this scenario, around 50
. This performance is expected, since the flow was allocated in the same port and queue as the 600
big flow. As a result, the packets of the small flows have to contend with the packets of the big flow, yielding considerable waiting times in the queue of the port, which are indeed the main contributions to this large RTT.
Regarding the QoS-aware algorithms, both of them managed to decrease the round-trip time of low-latency traffic by three orders of magnitude in this scenario. The SPA algorithm was using the low-priority queue of the port that does not contain any big flow, thus providing low latency. On the other hand, the TQA algorithm was using the same port as the 600 / flow, but using the high-priority queue for the lightweight flow rather than the low-priority one as in the case of the big flow. Additionally, despite the algorithms using different ports and queues, their performance in terms of latency was really solid and stable, without major fluctuations, as desired.