The Aho-Corasick Paradigm in Modern Antivirus Engines: A Cornerstone of Signature-Based Malware Detection
Abstract
1. Introduction
2. Evolution and Optimized Versions
3. Algorithmic Foundations of the Aho-Corasick Automaton
3.1. Formal Description of the Aho-Corasick Algorithm
3.2. Algorithmic Representation and Pseudocode
| Algorithm 1: Aho-Corasick (build + search) |
| Input: P = {p1, p2, …, pn} // patterns over alphabet Σ T = t1…tm // text AC-Build(P): create root q0 for each pattern p in P: insert p into trie via goto transitions δ(q, a) mark terminal state with O(q) ← O(q) ∪ {p} // failure links by BFS queue ← ∅ for each a ∈ Σ with δ(q0, a) defined: f( δ(q0, a) ) ← q0 enqueue( δ(q0, a) ) for each a ∈ Σ with δ(q0, a) undefined: δ(q0, a) ← q0 // optional fallback shortcut while queue not empty: v ← dequeue() for each a ∈ Σ with δ(v, a) defined: u ← δ(v, a) x ← f(v) while δ(x, a) undefined and x ≠ q0: x ← f(x) if δ(x, a) defined: f(u) ← δ(x, a) else: f(u) ← q0 O(u) ← O(u) ∪ O( f(u) ) // inherit outputs enqueue(u) return automaton (Q, Σ, δ, q0, F, f, O) AC-Search(T): state ← q0 for each symbol a in T: while δ(state, a) undefined and state ≠ q0: state ← f(state) if δ(state, a) defined: state ← δ(state, a) else: state ← q0 if O(state) ≠ ∅: report all patterns in O(state) ending at current index |
4. Integration in Antivirus Architectures
5. Comparative Analysis of Algorithmic Variants
5.1. CPU-Oriented Variants
5.2. GPU-Oriented Variants
5.3. FPGA and ASIC Variants
5.4. Summary and Architectural Mapping
6. Practical Example of a Reference Malware Scanner
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
Abbreviations
| AC | Aho-Corasick |
| ASIC | Application-Specific Integrated Circuit |
| AVX | Advanced Vector Extensions |
| CPU | Central Processing Unit |
| DFA | Deterministic Finite Automaton |
| FPGA | Field-Programmable Gate Array |
| FSA | Finite-State Automaton |
| GPU | Graphics Processing Unit |
| HDD | Hard Disk Drive |
| IDS | Intrusion Detection System |
| NFA | Nondeterministic Finite Automaton |
| NIDS | Network Intrusion Detection System |
| PFAC | Parallel Failure-less Aho-Corasick |
| RAM | Random Access Memory |
| SIMD | Single Instruction, Multiple Data |
| SSE | Streaming SIMD Extensions |
| SSD | Solid State Drive |
Appendix A
Appendix A.1. Minimal and Revised Scanner Versions
Appendix A.2. Complete Scanner
References
- Aho, A.V.; Corasick, M.J. Efficient string matching: An aid to bibliographic search. Commun. ACM 1975, 18, 333–340. [Google Scholar] [CrossRef]
- Knuth, D.E.; Morris, J.J.H.; Pratt, V.R. Fast pattern matching in strings. SIAM J. Comput. 1977, 6, 323–350. [Google Scholar] [CrossRef]
- Kernighan, B.W.; Pike, R. The UNIX Programming Environment; Prentice-Hall: Upper Saddle River, NJ, USA, 1984. [Google Scholar]
- Navarro, G.; Raffinot, M. Flexible Pattern Matching in Strings; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Charras, C.; Lecroq, T. Handbook of Exact String Matching Algorithms; King’s College Publications: Cambridge, UK, 2004. [Google Scholar]
- Erdogan, T.; Cao, J.; Mazières, D.; Boneh, D. Hash-AV: Fast Virus Signature Matching by Cache-Resident Hashing; Stanford Applied Cryptography Group, Stanford University: Stanford, CA, USA, 2007. [Google Scholar]
- Szor, P. The Art of Computer Virus Research and Defense; Addison-Wesley: Boston, MA, USA, 2005. [Google Scholar]
- Tuck, N.; Sherwood, T.; Calder, B.; Varghese, G. Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection; IEEE INFOCOM: London, UK, 2004. [Google Scholar]
- Roesch, M. Snort: Lightweight intrusion detection for networks. In Proceedings of the 13th USENIX Conference on System Administration, Seattle, WA, USA, 7–12 November 1999. [Google Scholar]
- Vasiliadis, G.; Antonatos, S.; Polychronakis, M.; Markatos, E.P.; Ioannidis, S. Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID 2008), Cambridge, MA, USA, 15–17 September 2008; Springer: Berlin/Heidelberg, 2008; pp. 116–134. [Google Scholar]
- Dimopoulos, V.; Papaefstathiou, I.; Pnevmatikatos, D. A memory-efficient reconfigurable Aho-Corasick FSM implementation for intrusion detection systems. In Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (ICSAMOS), Samos, Greece, 16–19 July 2007; pp. 186–193. [Google Scholar]
- Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K. Learning from examples using the Aho-Corasick algorithm. Inf. Comput. 1989, 80, 1–14. [Google Scholar]
- Lin, C.-H.; Tsai, S.-Y.; Liu, C.-H.; Chang, S.-C.; Shyu, J.-M. Accelerating string matching using multi-threaded algorithm on GPU. In IEEE GLOBECOM Workshops; IEEE Communications Society: New York, NY, USA, 2010; pp. 1166–1170. [Google Scholar]
- Kim, H.; Choi, K.-I. A Pipelined Non-Deterministic Finite Automaton-Based String Matching Scheme Using Merged State Transitions in an FPGA. PLoS ONE 2016, 11, e0163535. [Google Scholar] [CrossRef] [PubMed]
- Gagniuc, P.A. Antivirus Engines: From Methods to Innovations, Design, and Applications; Elsevier Syngress: Cambridge, UK, 2024. [Google Scholar]
- Meyer, B. Incremental string matching. Inf. Process. Lett. 1985, 21, 219–227. [Google Scholar] [CrossRef]
- Idury, R.M.; Schäffer, A.A. Dynamic dictionary matching with failure functions. Theor. Comput. Sci. 1994, 131, 295–310. [Google Scholar] [CrossRef]
- Amir, A.; Farach, M.; Idury, R.M.; La Poutré, J.A.; Schäffer, A.A. Improved dynamic dictionary matching. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ‘93), Society for Industrial and Applied Mathematics USA, Austin, TX, USA, 25–27 January 1993; pp. 392–401. [Google Scholar]
- Amir, A.; Farach, M.; Idury, R.; Lapoutre, J.; Schaffer, A. Improved dynamic dictionary matching. Inf. Comput. 1995, 119, 258–282. [Google Scholar] [CrossRef]
- Zha, X.; Sahni, S. Highly compressed Aho-Corasick automata for efficient intrusion detection. In Proceedings of the 2008 IEEE Symposium on Computers and Communications, Marrakech, Morocco, 6–9 July 2008; pp. 298–303. [Google Scholar]
- Kosolobov, D.; Sivukhin, N. Compressed Multiple Pattern Matching. Leibniz Int. Proc. Inform. (LIPIcs. CPM 2019) 2019, 151, 10. [Google Scholar]
- Tan, L.; Brotherton, B.; Sherwood, T. Bit-split string-matching engines for intrusion detection and prevention. ACM Trans. Arch. Code Optim. 2006, 3, 3–34. [Google Scholar] [CrossRef]
- Pao, D.; Lin, W.; Liu, B. A memory-efficient pipelined implementation of the Aho-Corasick string-matching algorithm. ACM Trans. Arch. Code Optim. 2010, 7, 10. [Google Scholar] [CrossRef]
- Lin, C.-H.; Liu, C.-H.; Chien, L.-S.; Chang, S.-C. Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs. IEEE Trans. Comput. 2012, 62, 1906–1916. [Google Scholar] [CrossRef]
- Diptarama; Ryo, Y.; Ayumi, S. AC-Automaton Update Algorithm for Semi-dynamic Dictionary Matching. Int. Symp. String Process. Inf. Retr. 2016, 9954, 110–121. [Google Scholar]
- Song, T.; Zhang, W.; Wang, D.S.; Xue, Y.B. A Memory Efficient Multiple Pattern Matching Architecture for Network Security. In Proceedings of the IEEE INFOCOM, Phoenix, AZ, USA, 13–18 April 2008; pp. 166–170. [Google Scholar]
- Kanda, K.; Akabe, K.; Oda, Y. Engineering faster double-array Aho-Corasick automata. arXiv 2022, arXiv:2207.13870. [Google Scholar] [CrossRef]
- Pungila, C. Hybrid Compression of the Aho-Corasick Automaton for Static Analysis in Intrusion Detection Systems. In International Joint Conference CISIS’12-ICEUTE’12-SOCO’12 Special Sessions. Advances in Intelligent Systems and Computing; Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 189. [Google Scholar]
- Yata, S.; Oono, M.; Morita, K.; Fuketa, M.; Sumitomo, T.; Aoe, J. A Compact Static Double-Array Keeping Character Codes. Inf. Process. Manag. 2007, 43, 237–247. [Google Scholar] [CrossRef]
- Kanda, S.; Fuketa, M.; Morita, K.; Aoe, J.-I. A compression method of double-array structures using linear functions. Knowl. Inf. Syst. 2015, 48, 55–80. [Google Scholar] [CrossRef]
- Pungila, C.; Negru, V. Real-Time Polymorphic Aho-Corasick Automata for Heterogeneous Malicious Code Detection. In International Joint Conference SOCO’13-CISIS’13-ICEUTE’13. Advances in Intelligent Systems and Computing; Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E., Eds.; Springer: Cham, Switzerland, 2014; Volume 239. [Google Scholar]
- Oh, D.; Kim, D.; Ro, W.W. A Malicious Pattern Detection Engine for Embedded Systems. Sensors 2014, 14, 24188–24214. [Google Scholar] [CrossRef]
- Gazzan, M.; Alobaywi, B.; Almutairi, M.; Sheldon, F.T. A Deep Learning Framework for Enhanced Detection of Polymorphic Ransomware. Future Internet 2025, 17, 311. [Google Scholar] [CrossRef]
- Caviglione, L.; Gaggero, M.; Cambiaso, E.; Aiello, M. Tight arms race: Overview of current malware threats and trends in their detection. IEEE Access 2020, 9, 5371–5371. [Google Scholar] [CrossRef]
- Belazzougui, D. Succinct Dictionary Matching with No Slowdown. arXiv 2010, arXiv:1001.2860. [Google Scholar] [CrossRef]
- Bellekens, X.; Seeam, A.; Tachtatzis, C. Atkinson, Trie Compression for GPU-Accelerated Multi-Pattern Matching. arXiv 2017, arXiv:1702.03657. [Google Scholar]
- Al-Asli, M.; Ghaleb, T.A. Review of Signature-based Techniques in Antivirus Products. In Proceedings of the 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 3–4 April 2019; pp. 1–6. [Google Scholar]
- Griffin, K.; Schneider, S.; Hu, X.; Chiueh, T.C. Automatic Generation of String Signatures for Malware Detection. In Recent Advances in Intrusion Detection. RAID 2009; Kirda, E., Jha, S., Balzarotti, D., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5758. [Google Scholar]
- Hendrian, D.; Inenaga, S.; Yoshinaka, R.; Shinohara, A. Efficient Dynamic Dictionary Matching with DAWGs and AC-automata. Theor. Comput. Sci. 2019, 792, 161–172. [Google Scholar] [CrossRef]
- Li, J.; Peng, H.; Yang, E.; Hu, C.; Zhong, S.; Wang, L. Eagle+: A fast incremental approach to automaton and table online up-dates for cloud services. Future Gener. Comput. Syst. 2018, 80, 275–285. [Google Scholar] [CrossRef]
- Tumeo, A. Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching. In Data-Intensive Computing: Architectures, Algorithms and Applications; Cambridge University Press: Cambridge, UK, 2012; pp. 24–47. [Google Scholar]
- Al-Saleh, M.I.; Al-Huthaifi, R.K. On Improving Antivirus Scanning Engines: Memory On-Access Scanner. J. Comput. Sci. 2017, 13, 290–300. [Google Scholar] [CrossRef]
- Çelebi, M. Accelerating Pattern Matching Using a Novel Multi-Pattern-Matching Algorithm for Deep Packet Inspection. Appl. Sci. 2023, 13. [Google Scholar]
- Cai, Y. Multi-pattern matching algorithm for embedded computer network intrusion detection systems. Intell. Decis. Technol. 2024, 18, 185–193. [Google Scholar] [CrossRef]
- Huang, N.F.; Chu, Y.M.; Hsieh, C.Y.; Tsai, C.H.; Tzang, Y.J. A deterministic cost-effective string matching algorithm for network intrusion detection system. In Proceedings of the 2007 IEEE International Conference on Communications (ICC), Glasgow, UK, 24–28 June 2007; pp. 1292–1297. [Google Scholar]
- Hsu, F.H.; Lee, C.H.; Luo, T.; Chang, T.C.; Wu, M.H. A Cloud-Based Real-Time Mechanism to Protect End Hosts against Malware. Appl. Sci. 2019, 9, 3748. [Google Scholar] [CrossRef]
- Vasiliadis, G.; Ioannidis, S. GrAVity: A Massively Parallel Antivirus Engine. In Recent Advances in Intrusion Detection. RAID 2010; Jha, S., Sommer, R., Kreibich, C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6307. [Google Scholar]
- Terawi, N.; Ashqar, H.I.; Darwish, O.; Alsobeh, A.; Zahariev, P.; Tashtoush, Y. Enhanced Detection of Intrusion Detection System in Cloud Networks Using Time-Aware and Deep Learning Techniques. Computers 2025, 14, 282. [Google Scholar] [CrossRef]
- Frisbier, G.L.; Darwish, O.; Alsobeh, A.; Al-shorman, A. Identifying the Origins of Business Data Breaches Through CTC Detection. In Network and System Security. NSS 2024; Song, H.H., Di Pietro, R., Alrabaee, S., Tubishat, M., Al-kfairy, M., Alfandi, O., Eds.; Lecture Notes in Computer Science; Springer: Singapore, 2025; Volume 15564. [Google Scholar]
- Chirilă, A.I.; Săracin, C.G.; Deaconu, I.D.; Nicolescu, D.Ș.; Radulian, A. Remote Monitoring and Control System with Increased Operational Technology Cybersecurity Resilience. U.P.B. Sci. Bull. Ser. C 2024, 86. [Google Scholar]
- Ghiță, A.A.; Chiroiu, M.D.; Țurcanu, D. Evaluating Students’ Performance in Cybersecurity Scenarios Using Binary Trees. U.P.B. Sci. Bull. Ser. C 2025, 87. [Google Scholar]
- Hilgurt, S.Y. A Survey on Hardware Solutions for Signature-Based Security Systems. In Proceedings of the 1st International Workshop on Information Technologies: Theoretical and Applied Problems (ITTAP 2021), Ternopil, Ukraine, 16–18 November 2021; Volume 3039, pp. 6–23. [Google Scholar]
- Kouzinopoulos, C.S.; Margaritis, K.G. Multiple pattern matching: Survey and experimental results. Neural Parallel Sci. Comput. 2014, 22, 563–593. [Google Scholar]
- Elizaldea, S.; AlSabeh, A.; Mazloum, A.; Choueiri, S.; Kfoury, E.; Gomez, J.; Crichigno, J. A survey on security applications with SmartNICs: Taxonomy, challenges and future directions. J. Netw. Comput. Appl. 2025, 207, 104660. [Google Scholar]
- Ourlis, L.; Bellala, D. SIMD Implementation of the Aho-Corasick Algorithm Using Intel® AVX2. Scalable Comput. Pract. Exp. 2019, 20, 563–576. [Google Scholar] [CrossRef]
- Scarpazza, D.P.; Villa, O.; Petrini, F. Peak-Performance DFA-Based String Matching on the Cell Processor. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA, 26–30 March 2007; pp. 1–8. [Google Scholar]
- Kouzinopoulos, C.S.; Margaritis, K.G. String Matching on a Multicore GPU Using CUDA. In Proceedings of the PCI ’09, 2009 13th Panhellenic Conference on Informatics, Corfu, Greece, 10–12 September 2009; pp. 14–18. [Google Scholar]
- Lee, C.-L.; Lin, Y.-S.; Chen, Y.-C. A hybrid CPU/GPU pattern-matching algorithm for deep packet inspection. PLoS ONE 2015, 10, e0139301. [Google Scholar] [CrossRef]
- Hsieh, C.-L.; Vespa, L.; Weng, N. A high-throughput DPI engine on GPU via algorithm/implementation co-optimization. J. Parallel Distrib. Comput. 2016, 88, 46–56. [Google Scholar] [CrossRef]
- Ceška, M.; Havlena, V.; Holík, L.; Korenek, J.; Lengál, O.; Matoušek, D.; Matoušek, J.; Semric, J.; Vojnar, T. Deep packet inspection in FPGAs via approximate nondeterministic automata. In Proceedings of the IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 28 April–1 May 2019. [Google Scholar]



| Variant | Target Hardware | Key Design Feature | Performance Characteristic | Strengths | Limitations | Typical Applications |
|---|---|---|---|---|---|---|
| Classic AC [1] | CPU | Explicit transition table per symbol | Linear runtime, predictable execution | Simplicity, deterministic scan time | High memory footprint | Antivirus, IDS, text search |
| Split-AC [11] | CPU (multicore) | Subautomata partitioning | Moderate memory reduction, improved cache locality | Parallelizable, efficient for SIMD | Overhead from partition management | Multi-threaded malware scanners |
| PFAC [24] | GPU | Failure-less automaton, thread-per-byte mapping | Multi-gigabit data rates | Massive parallelism, ideal for GPUs | Increased memory use | Network inspection, cloud-scale scanning |
| Bit-Split AC [22] | FPGA/ ASIC | Bit-level transition decomposition | Constant-time propagation | Deterministic timing, sub-µs latency | Not efficient on CPUs or GPUs | Inline packet filtering, cyber-defense hardware |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gagniuc, P.A.; Păvăloiu, I.-B.; Dascălu, M.-I. The Aho-Corasick Paradigm in Modern Antivirus Engines: A Cornerstone of Signature-Based Malware Detection. Algorithms 2025, 18, 742. https://doi.org/10.3390/a18120742
Gagniuc PA, Păvăloiu I-B, Dascălu M-I. The Aho-Corasick Paradigm in Modern Antivirus Engines: A Cornerstone of Signature-Based Malware Detection. Algorithms. 2025; 18(12):742. https://doi.org/10.3390/a18120742
Chicago/Turabian StyleGagniuc, Paul A., Ionel-Bujorel Păvăloiu, and Maria-Iuliana Dascălu. 2025. "The Aho-Corasick Paradigm in Modern Antivirus Engines: A Cornerstone of Signature-Based Malware Detection" Algorithms 18, no. 12: 742. https://doi.org/10.3390/a18120742
APA StyleGagniuc, P. A., Păvăloiu, I.-B., & Dascălu, M.-I. (2025). The Aho-Corasick Paradigm in Modern Antivirus Engines: A Cornerstone of Signature-Based Malware Detection. Algorithms, 18(12), 742. https://doi.org/10.3390/a18120742

