Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow
Abstract
:1. Introduction
2. Top-k Flow Heat Ranking Algorithm Based on AVX
2.1. Data Structure of the Sketch Algorithm
- (1)
- Hash mapping, which means taking d pairwise independent hash functions hash1, hash2, …, hashd and constructing an array of d rows of counters; each row uses a hash function to map the newly arrived element to a counter in that row to generate the Sketch’s summary of that element.
- (2)
- Update the two-dimension array. The input can be regarded as the data that arrives one after another. When the new elements (jt, ct) arrive at time t, the table entries are updated, and all the data streams are updated through this table to obtain the final two-dimensional array. The updating process is as follows. For each row in the two-dimensional array, hashi (j) is used to calculate the index of element j in row i, and the counter value of the map is added to c. The formula is as follows.
- (3)
- Query the result; the formula for the value corresponding to the query element j is as follows:
2.2. Top-k Algorithm Based on Bitonic Sort
2.3. Overview of AVX Technology
3. Optimization of Flow Heat Ranking Algorithm Based on AVX
3.1. Algorithm of Flow Heat Ranking
Algorithm 1: Algorithm for flow heat ranking |
Input: packet |
Output: flow hot rank |
|
|
|
|
|
|
|
|
|
|
|
|
|
- Obtain multiple tuples from packets to form vector KEY.
- Index vector of the Sketch algorithm is calculated by the AVX instruction set according to vector KEY.
- According to the index vector, the indexes of multiple counters are obtained, and the multiple counters corresponding to these indexes are increased.
- The flow heat is estimated by taking the minimum value of multiple counters according to the Sketch algorithm.
- If the heat of the flow is bigger than the threshold.
- If the heat of the flow is bigger than the flow with the smallest counter in the flow table.
- If the flow is already in the flow table.
- Then the flow with the smallest counter is deleted in the flow table.
- The flow table is inserted into the current flow, and inserted into the appropriate position according to the heat of the flow, maintaining bitonic peculiarity.
- Update the heat of the current flow in the flow table.
- If the current flow heat is higher than that of the neighbor.
- In the flow table, the current flow switches positions with the neighbor, maintaining bitonic peculiarity.
3.2. Optimize Sketch Algorithm with AVX
- (1)
- Initialize the hash function and select a set of four hash functions constructed using a random number modding method, which means supposing that the hash function hashi(j) of row i is as follows:
- (2)
- Construct the vector: according to the initial value of each hash function, use the instruction _mm256_set_epi32 to construct the hash result vector. According to the seeds of each hash function, a hash seed vector is constructed. For the data stream to be processed, use instruction _mm256_set1_epi32 to make four copies and construct the data element vector. Because the effective bit width of each seed, hash result, and data element is 32-bit, the actual bit width of these values in the vector is 64-bit to prevent multiplication transgression. The composition structure of each vector is shown in Figure 5.
- (3)
- Hash operation means using _mm256_mul_epi32 and _mm256_add_epi32 instructions to achieve vector multiplication and addition, using AVX instruction to complete the operation of multiple hash functions, and completing the calculation of four hash results after traversing all packets, thus improving the performance of the Sketch algorithm.hash_v =_mm256_mul_epi32(hash_v, seed_v)hash_v =_mm256_add_epi32(hash_v, data_v)
- (4)
- Sketch algorithm counting means extracting the final results of four hash functions from the hash result vector to obtain four counters by taking the minimum value of the traffic count results of the Count-min Sketch algorithm.
3.3. Bitonic Sort Optimizing with AVX
- (1)
- Initialize the bitonic sequence. The number of streams supported in the flow table is limited, and the number is usually set to a power of 2, while the number of flows in the real network is massive. At the beginning, the flow table is empty. When a packet arrives, the Sketch algorithm is used to estimate the heat of the flow. When the heat exceeds the threshold, the flow is inserted into the flow table. This is because the heat value returned by the Sketch algorithm will only be estimated to be larger, not smaller. A flow whose heat estimation value by Sketch is greater than the threshold may not be a heat flow, but a flow whose estimation value is less than the threshold is definitely not a heat flow. When the flow table is full or needs to be top-k sorted, the bitonic merge process is converted into a bitonic sequence, ensuring that the flow table is arranged in a monotonically increasing and then monotonically decreasing manner.
- (2)
- Update the bitonic sequence. After the flow table is filled, when a new flow arrives, it needs to be compared with the flow with the lowest heat in the flow table. If the heat of the new flow is lower, there is no need to insert the flow table. If the heat of the flow is higher than the lowest heat of the flow table, whether the new flow has been inserted into the flow table is checked. If the heat value of the flow is higher, it only needs to update the heat value of the flow in the flow table. If not, the flow is replaced with a new flow. After the heat updating or new flow replacement, the heat of the new flow are compared with their neighbor, the appropriate location is found through binary search [35], and the new flow is moved to the location, so as to ensure the monotony of the bitonic sequence.
- (3)
- Order the bitonic sequence. The bitonic sequence with length of 2 n is divided into X and Y with equal length, and a set of data are taken from X and Y for construction vectors. When the length of X and Y is greater than the length of the vector, the _mm256_loadu_ps instruction is used to take continuous data for construction vectors. When the length of X and Y is equal to or less than the length of the vector, the instruction _mm256_shuffle_ps is used for construction vectors by shuffling. The instructions _mm256_max_ps and _mm256_min_ps are used to compare the two vectors; the larger vector is put into the sequence M and the smaller vector is put into the sequence N, and the resulting sequences M and N are still bitonic sequences. According to this principle, a bitonic sequence with 2 n elements can firstly be obtained by shuffling and comparing operations to sequences M and N. And then the ordered sequence can be obtained by recursively performing bitonic sorting of subsequences.
4. Analysis of Experiments and Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Akhunzada, A.; Ahmed, E.; Gani, A.; Khan, M.K.; Imran, M.; Guizani, S. Securing software defined networks: Taxonomy, requirements, and open issues. IEEE Commun. Mag. 2015, 53, 36–44. [Google Scholar] [CrossRef]
- Hosseini, S.; Zade, B.M.H. New hybrid method for attack detection using combination of evolutionary algorithms, SVM, and ANN. Comput. Netw. 2020, 173, 107168. [Google Scholar] [CrossRef]
- Wu, Z.; Lu, K.; Wang, X.; Chi, W. Topology-aware network fault influence domain analysis. Comput. Electr. Eng. 2017, 57, 266–280. [Google Scholar] [CrossRef]
- Kong, D.; Shen, Y.; Chen, X.; Cheng, Q.; Liu, H.; Zhang, D.; Liu, X.; Chen, S.; Wu, C. Combination Attacks and Defenses on SDN Topology Discovery. IEEE/ACM Trans. Netw. 2023, 31, 904–919. [Google Scholar] [CrossRef]
- Wei, W.; Chen, Y.; Lin, Q.; Ji, J.; Wong, K.-C.; Li, J. Multi-objective evolving long—Short term memory networks with attention for network intrusion detection. Appl. Soft Comput. 2023, 139, 110216. [Google Scholar] [CrossRef]
- Qing, W.; Hongju, C. Computer Network Security and Defense Technology Research. In Proceedings of the 2016 Eighth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Macau, China, 11–12 March 2016; pp. 155–157. [Google Scholar] [CrossRef]
- Zhang, H.; Shen, Y.; Thai, M.T. Robustness of power-law networks: Its assessment and optimization. J. Comb. Optim. 2016, 32, 696–720. [Google Scholar] [CrossRef]
- Mogul, J.C.; Tourrilhes, J.; Yalagandula, P.; Sharma, P.; Curtis, A.R.; Banerjee, S. DevoFlow: Cost-effective flow management for high performance enterprise networks. In Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, Monterey, CA, USA, 20–21 October 2010; pp. 1–6. [Google Scholar]
- Li, J.; Li, Z.; Xu, Y.; Jiang, S.; Yang, T.; Cui, B.; Dai, Y.; Zhang, G. WavingSketch: An unbiased and generic sketch for finding top-k items in data streams. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 6–10 July 2020; pp. 1574–1584. [Google Scholar]
- Alawadi, A.H.; Zaher, M.; Molnár, S. Methods for Predicting Behavior of Elephant Flows in Data Center Networks. Infocommun. J. 2019, 6, 34–41. [Google Scholar] [CrossRef]
- Tang, L.; Huang, Q.; Lee, P.P.C. A Fast and Compact Invertible Sketch for Network-Wide Heavy Flow Detection. IEEE/ACM Trans. Netw. 2020, 28, 2350–2363. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, W.; Li, Y.; Li, L.; Li, Z.; Ye, J.; Wang, J. ChainSketch: An efffcient and accurate sketch for heavy flow detection. IEEE/ACM Trans. Netw. 2023, 31, 738–753. [Google Scholar] [CrossRef]
- Pan, Z.; Zhang, F.; Li, H.; Zhang, C.; Du, X.; Deng, D. G-SLIDE: A GPU-Based Sub-Linear Deep Learning Engine via LSH Sparsification. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 3015–3027. [Google Scholar] [CrossRef]
- Liu, J.; Li, X.; Hu, F.Q. Performance comparison on parallel CPU and GPU algorithms for two dimensional unified gas-kinetic scheme. Adv. Appl. Math. Mech. 2020, 12, 1247–1260. [Google Scholar]
- Geng, T.; Waeijen, L.; Peemen, M.; Corporaal, H.; He, Y. MacSim: A MAC-Enabled High-Performance Low-Power SIMD Architecture. In Proceedings of the 2016 Euromicro Conference on Digital System Design (DSD), Limassol, Cyprus, 31 August–2 September 2016; pp. 160–167. [Google Scholar] [CrossRef]
- Jakobs, T.; Kratzsch, S.; Rünger, G. Analyzing Data Reordering of a combined MPI and AVX execution of a Jacobi Method. In Proceedings of the 2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Naples, Italy, 1–3 March 2023; pp. 159–163. [Google Scholar] [CrossRef]
- Khan, S.; Rashid, M.; Javaid, F. A high performance processor architecture for multimedia applications. Comput. Electr. Eng. 2018, 66, 14–29. [Google Scholar] [CrossRef]
- Al Hasib, A.; Natvig, L.; Kjeldsberg, P.G.; Cebrián, J.M. Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study. J. Low Power Electron. Appl. 2017, 7, 5. [Google Scholar] [CrossRef]
- Mu, Q.; Cui, L.; Song, Y. The implementation and optimization of Bitonic sort algorithm based on CUDA. Comput. Sci. 2015, 40, 553–556. [Google Scholar]
- Zhu, H.; Zhang, Y.; Zhang, L.; He, G.; Liu, L.; Liu, N. SA Sketch: A self-adaption sketch framework for high-speed network: NA. Concurr. Comput. Pract. Exp. 2020, 1, e5891. [Google Scholar] [CrossRef]
- Li, D.; Du, R.; Liu, Z.; Yang, T.; Cui, B. Multi-copy Cuckoo Hashing. In Proceedings of the IEEE 35th International Conference on Data Engineering, Macao, China, 8–11 April 2019; pp. 1226–1237. [Google Scholar]
- Yoshioka, M.; Hiraguri, T.; Yoshino, H. Performance evaluation of sketch schemes on traffic anomaly detection accuracy. IEICE Commun. Express 2017, 6, 399–404. [Google Scholar] [CrossRef]
- Yang, T.; Zhang, H.; Wang, H.; Shahzad, M.; Liu, X.; Xin, Q.; Li, X. FID-sketch: An accurate sketch to store frequencies in data streams. World Wide Web 2019, 22, 2675–2696. [Google Scholar] [CrossRef]
- Deng, F.; Yu, Z.; Song, H.; Zhao, R.; Zheng, Q.; Li, Z.; He, H.; Zhang, Y.; Guo, F. An efficient policy evaluation engine with locomotive algorithm. Clust. Comput. 2021, 24, 1505–1524. [Google Scholar] [CrossRef]
- Li, S.; Luo, L.; Guo, D.; Zhang, Q.; Fu, P. A survey of sketches in traffic measurement: Design, optimization, application and implementation. arXiv 2020, arXiv:2012.07214. [Google Scholar]
- Cormode, G.; Muthukrishnan, S. An improved data stream summary: The count-min sketch and its applications. In Proceedings of the 2004 Latin American Symposium on Theoretical Informatics, Buenos Aires, Argentina, 5–8 April 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 29–38. [Google Scholar]
- Sisovic, S.; Bakaric, M.B.; Matetic, M. Reducing data stream complexity by applying Count-Min algorithm and discretization procedure. In Proceedings of the IEEE Fourth International Conference on Big Data Computing Service & Applications, Bamberg, Germany, 26–29 March 2018. [Google Scholar]
- Rottenstreich, O.; Reviriego, P.; Porat, E.; Muthukrishnan, S. Avoiding Flow Size Overestimation in the Count-Min Sketch with Bloom Filter Constructions. IEEE Trans. Netw. Serv. Manag. 2021, 18, 3662–3676. [Google Scholar] [CrossRef]
- Yang, T.; Jiang, J.; Liu, P.; Huang, Q.; Gong, J.; Zhou, Y.; Miao, R.; Li, X.; Uhlig, S. Adaptive Measurements Using One Elastic Sketch. IEEE/ACM Trans. Netw. 2019, 27, 2236–2251. [Google Scholar] [CrossRef]
- Tang, L.; Huang, Q.; Lee, P.P.C. MV-Sketch: A Fast and Compact Invertible Sketch for Heavy Flow Detection in Network Data Streams. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 2026–2034. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, W.; Yuan, J.; Wang, H. Implementing bitonic sorting on optical network-on-chip with bus topology. Photonic Netw. Commun. 2020, 39, 129–134. [Google Scholar] [CrossRef]
- Ranković, V.; Kos, A.; Milutinović, V. Bitonic Merge Sort Implementation on the Maxeler Dataflow Supercomputing System. IPSI BgD Trans. Internet Res. 2013, 9, 5–10. [Google Scholar]
- Marszałek, Z. Parallelization of fast sort algorithm. In Proceedings of the Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, 12–14 October 2017. [Google Scholar]
- Amiri, H.; Shahbahrami, A. SIMD programming using Intel vector extensions. J. Parallel Distrib. Comput. 2020, 135, 83–100. [Google Scholar] [CrossRef]
- Nowak, R. Generalized binary search. In Proceedings of the 2008 46th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 23–26 September 2008; pp. 568–574. [Google Scholar] [CrossRef]
- Cho, K.; Mitsuya, K.; Kato, A. Traffic data repository at the wide project. ser. USENIX 2000 FREENIX Track. USENIX. In Proceedings of the 2000 USENIX Annual Technical Conference, San Diego, CA, USA, 18–23 June 2000. [Google Scholar]
Data Set | Message Number | Flow Number | Flow Number in Flow Table |
---|---|---|---|
data set 1 | 4,774,122 | 1,196,845 | 9339 |
data set 2 | 7,664,905 | 1,691,924 | 14,791 |
data set 3 | 10,163,203 | 1,996,525 | 18,541 |
Test Set | Key Length (B) | Processing Time (µs) | Optimize Ratio | |
---|---|---|---|---|
Original | AVX Optimization | |||
Test set 1 | 64 | 1,173,567 | 935,560 | 79.7% |
96 | 1,647,833 | 1,039,376 | 63.1% | |
128 | 1,857,836 | 1,156,706 | 62.3% | |
256 | 3,257,260 | 1,699,635 | 52.2% | |
Test set 2 | 64 | 1,606,498 | 1,261,235 | 78.5% |
96 | 2,361,363 | 1,733,468 | 73.4% | |
128 | 2,664,560 | 2,162,273 | 81.1% | |
256 | 4,709,894 | 2,429,931 | 51.6% | |
Test set 3 | 64 | 2,293,531 | 1,837,161 | 80.1% |
96 | 3,109,784 | 2,089,397 | 67.2% | |
128 | 3,917,605 | 2,576,302 | 65.8% | |
256 | 6,321,628 | 3,399,399 | 53.8% |
Process | Operation | Percent of Instruction Consumption | Total |
---|---|---|---|
Original | Sketch counts | 19.9% | 51.6% |
Sketch query | 23.9% | ||
Flow table | 7.8% | ||
AVX-optimized | Sketch counts | 10.4% | 30.4% |
Sketch query | 12.3% | ||
Flow table | 7.7% | ||
Basic process | Read data set | 14.8% | 18.0% |
Others | 3.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tan, L.; Wang, Y.; Yi, J.; Yang, F. Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow. Electronics 2023, 12, 5026. https://doi.org/10.3390/electronics12245026
Tan L, Wang Y, Yi J, Yang F. Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow. Electronics. 2023; 12(24):5026. https://doi.org/10.3390/electronics12245026
Chicago/Turabian StyleTan, Lingling, Yongyue Wang, Junkai Yi, and Fei Yang. 2023. "Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow" Electronics 12, no. 24: 5026. https://doi.org/10.3390/electronics12245026
APA StyleTan, L., Wang, Y., Yi, J., & Yang, F. (2023). Single-Instruction-Multiple-Data Instruction-Set-Based Heat Ranking Optimization for Massive Network Flow. Electronics, 12(24), 5026. https://doi.org/10.3390/electronics12245026