SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls

Arnoud, Lalie; Breux, Victor; Thevenon, Pierre-Henri; Gaussier, Éric

doi:10.3390/jcp6030099

Open AccessArticle

SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls

¹

Université Grenoble Alpes, CEA, Leti, 38000 Grenoble, France

²

Université Grenoble Alpes, CNRS, Grenoble INP, LIG, 38000 Grenoble, France

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2026, 6(3), 99; https://doi.org/10.3390/jcp6030099

Submission received: 23 March 2026 / Revised: 28 May 2026 / Accepted: 2 June 2026 / Published: 6 June 2026

(This article belongs to the Topic Recent Advances in Security, Privacy, and Trust, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The increase and professionalization of cyberattacks calls for the development of relevant defense-in-depth mechanisms of which intrusion detection systems (IDSs) are essential components. This paper provides an in-depth analysis of system call-based IDSs as intelligence for detecting malicious activities. A systematic analysis of 209 publications from the scientific literature between 1996 and early 2026 highlights trends in this field of research and defines a taxonomy presenting the different approaches proposed by researchers. Eighteen state-of-the-art methods, representative of the diversity of approaches proposed in the literature, were reproduced and evaluated on two public datasets, ADFA-LD and NGIDS-DS. The detection performance and overhead of each method are examined in great detail, opening discussions on the shortcomings of the state of the art, limitations of system call-based IDSs, and lines of research that would enable this type of detection system to meet the challenges of deployment in a real-world environment. Finally, recommendations for future work are derived from these findings.

Keywords:

systematic literature review (SLR); intrusion detection systems (IDS); system calls; machine learning (ML)

1. Introduction

The cybersecurity landscape has been evolving rapidly in recent years. On one hand, companies are increasingly relying on external cloud technologies for their enterprise infrastructure, leading to a rise of around 75% in cyberattacks against these systems in 2024 [1]. On the other hand, the healthcare sector and hospitals are among the most targeted by ransomwares [2,3], which are deployed by cybercriminals to extort money from organizations. In 2023, 389 U.S. healthcare institutions were hit by such malwares, leading to sometimes critical disruptions in patient care [4]. Industry is not spared either and is also the target of cyberattacks, which are often perpetrated by government-backed hackers known as Advanced Persistent Threats (APTs) [1,2]. Infrastructures such as those governing water treatment, power generation and distribution, and transportation have historically been isolated from the Internet, and they were not designed to meet cybersecurity constraints. These critical infrastructures, which are responsible for the management of real-life critical processes, are a prime target for cybercriminals with geopolitical interests at stake [4]. Moreover, attackers are taking advantage of an ever more mature ecosystem to improve their attacks, particularly in terms of tools and evasiveness [4]. The increase and professionalization of cyberattacks calls for the development of countermeasures to enhance the defense capability of computerized systems, from Information Technologies (IT) to Operational Technologies (OT). A strategy for identifying and containing attacks is to detect them before their effects become critical. Intrusion detection then appears as a relevant defense-in-depth mechanisms for a first step toward resilience by monitoring the state of protected infrastructures against signs of malicious activities.

1.1. What Is an Intrusion?

The National Institute of Standards and Technology (NIST) defines an intrusion as “a security event, or a combination of multiple security events, that constitutes a security incident in which an intruder gains, or attempts to gain, access to a system or system resource without having authorization to do so” [5]. ANSSI, the French Cybersecurity Agency, defines it as “the act of a person or object entering a defined space (physical, logical, relational) where its presence is not desired.” [6]. According to the above definitions, an intrusion can be defined as an event that needs to be closely monitored to ensure that the considered system is not altered by malicious activities.

1.2. What Is Intrusion Detection?

Intrusion detection may be defined as the process of actively detecting the presence of intruders in the system being protected. Such intrusion detection is performed by an intrusion detection system (IDS) whose role is to look for suspicious activities, by a misuse approach, i.e., looking for known intrusion patterns, or an anomaly one, i.e., looking for deviations from the expected behavior of the system. Alerts created by these IDSs are then collected in Security Operations Centers (SOCs) where experts analyze them in detail and respond accordingly to protect potentially compromised systems. The automated detection of cyberattacks also enables automated responses to be implemented, which reduces attack surface and reinforces overall system security. Many research studies and existing solutions are based on the analysis of packet transiting the network to identify malicious actions. These methods, known as network-based intrusion detection systems (NIDSs), have the advantage of being non-intrusive, since they involve passive probes placed on an existing network [7]. Others propose IDSs based on the behavioral analysis of strategic targets for intrusion detection. Known as host-based intrusion detection systems (HIDSs), they base their detection on the analysis of signals internal to a device, such as logs, Hardware Performance Counter (HPC) registers, or sequences of system calls. The first scientific publication to conceptualize host-based IDSs was the article published by Dorothy E. Denning in 1987 [8], which used operating system audit logs as a source of information for detecting abnormal behavior. Further research has been carried out along this line—among it, the work of Theresa F. Lunt in 1993 [9].

1.3. What Are System Calls and How Can They Be Used for Intrusion Detection?

A system call is an interface between a program running in user space and the operating system (OS). Application programs use system calls to request privileged services and functionalities from the OS kernel, such as reading a file, writing on a socket or running a new process. When a computer system is running, all its applications generate sequences of system calls which represent host behavior, and they can then be analyzed for intrusion detection. The studies of Forrest and her colleagues launched this line of research in 1996 with the first publication to explicitly use system call sequences for intrusion detection. They presented a method named TIDE, for Time-Delay Embedding, based on a reference dictionary of system call sequences observed during a normal and benign execution [10,11]. At runtime, a predefined number of mismatches raises an alert. This method is often considered a gold standard to compare the performance of later research, which is mostly based on machine learning.

With the growth of research around system call-based HIDSs, some publications have made it possible to better synthesize advances using techniques derived from language processing [12] or having been carried out on the same dataset [13]. However, to the best of our knowledge, none of them provide a complete overview of state-of-the-art approaches for intrusion detection based on system calls, a rigorous comparison of detection performances, or an analysis of the applicability of this type of HIDS to real-life applications. To overcome this deficiency and provide a comprehensive overview of the research field, the most extensive systematic analysis of the literature on intrusion detection systems based on system calls is proposed, from the studies of Forrest in 1996 [10] to the beginning of 2026, which represents 209 publications. Furthermore, to enable a fair evaluation of the different approaches proposed in the state of the art, 18 of these methods were re-implemented to experimentally compare their performance under similar conditions of execution, i.e., with the same preprocessed datasets, and with the same evaluation criteria. All materials used for literature analysis are made publicly available. This in-depth analysis of HIDS solutions based on system calls enables us to assess the applicability of such solutions in real-life environments, such as in industry, cloud, enterprise servers and personal computers. Finally, recommendations are drawn up for future research.

To summarize, the main contributions of this paper are as follows:

A systematic literature review of 209 publications on intrusion detection research using dynamic monitoring of system calls, from 1996 to early 2026, whose materials are publicly available for possible reanalysis;
The re-implementation of 18 state-of-the-art methods representing the diversity of system call-based intrusion detection approaches in the literature, offering a real evaluation of the performance of these IDSs on two public datasets, encouraging research reproducibility.

Section 2 presents other work related to the review of existing contributions and highlights the need for this paper. The methodology adopted for the literature review and the experimental evaluation of detection methods are described in Section 3. The literature is reviewed in Section 4, which is followed by the results of evaluated state-of-the-art methods in Section 5. Then, discussions are opened in Section 6 based on each analysis and the maturity of system call-based IDSs for real-life applications. The limitations of this paper are also discussed in Section 7. Finally, Section 8 concludes with the main lessons to be learned from intrusion detection research based on the analysis of sequences of system calls.

2. Related Work

Several studies summarize research carried out on host-based intrusion detection.

Liu et al. [14] propose an overview of HIDSs based on the monitoring of system calls. The authors present an overview of data processing methods and decision engines used for intrusion detection in the literature as well as associated datasets. The latter and evaluation metrics are not deeply analyzed because of the scope of the paper, whose aim is not to be a systematic literature review, although it offers an excellent overview of the main trends in this field. The applicability of such solutions on embedded systems is discussed despite the potential applicability of HIDS to other use cases. Data processing through cloud and big data architectures are also discussed.

Khraisat et al. [7] covers the main features of both network- and host-based IDSs, presenting the particularities of each approach. As the subject is very broad, not all host-based detection methods explored in the literature are mentioned such as the use of HPCs. Nonetheless, this paper provides a valuable overview of general intrusion detection strategies.

Bridges et al. [15] and Martins et al. [16] provide surveys on HIDSs in general—not only the ones focusing on system call-based detection. Bridges et al. review the different types of information that can be used for host-based detection for both Unix and Windows environments, which is illustrated by comparative analyses of existing work. On the other hand, Martins et al. observe HIDSs through their applicability to IoT systems, highlighting the needed properties of such IDSs for real-time detection and its main related issues. However, other applications are not discussed. Finally, they also developed multiple attacks (28) based on known CVEs and compared the score given by open-source HIDSs when launched against a target. While they are not as precise as the proposed work aim to be on syscall-based HIDSs, both studies provide a great overview of the state of the art in HIDS research.

Khandelwal et al. [13] propose a comparative analysis of several (17) system call-based HIDSs that were evaluated on the ADFA-LD [17] dataset. They provide a description of the methods used and the performance obtained as given by the authors. They also resume the strengths and weaknesses for each presented solution. This approach is interesting because it aims to compare methods evaluated using the same dataset; thus, the results would be comparable between them. Nevertheless, this study points out that the metrics used in each paper are different, so they cannot all be compared as is. Moreover, even when these studies use the same dataset, it is not mentioned that it can be used differently in training, validation, and testing phases, which also hampers the strict comparability of results.

Sworna et al. [12] made a systematic literature review leading to an analysis of 65 studies using Natural Language Processing (NLP) methods for host-based intrusion detection based on system calls. Data extracted from these studies enable the authors to classify methods according to the intrusion detection approach used, the decision algorithm and its learning mode, and the data used with the attacks on which the evaluation of the model is made. Despite a thorough analysis of the most commonly used datasets, the context around the environments in which data collection took place is missing. This could have helped to identify the use cases envisaged by researchers for HIDSs, either for IoT applications or cloud computing, as examples.

Satilmiş et al. [18] analyzed 21 studies between 2020 and 2023 through a systematic review of the HIDS literature. While not focusing solely on system call-based HIDSs, this SLR summarizes the main contributions of recent key studies. The authors analyze the benefits and drawbacks of encountered evaluation methods, algorithms used for detection, and datasets. In particular, it underlines the fact that the additional performance costs generated by intrusion detection are hardly ever emphasized either for time or memory overheads. Yet the small number of studies included in the analysis, and the limited time period covered, make it difficult to conduct a complete analysis of the evolution of these IDSs.

Despite the existence of several publications summarizing the state of research in intrusion detection based on host information, few of them are systematic reviews of the literature. This type of analysis is needed to rigorously interpret all the available work and assess progress on a well-defined research question [19]. In addition, existing reviews focus on recent work, post-2011, or are based on a limited amount of studies according to Table 1, which provides only a partial view of the evolution of research on HIDSs. In contrast, this paper provides an in-depth analysis of 209 publications, ranging from the very first studies conducted in 1996 through to 2026, and covering the entire research field related to system call-based intrusion detection. Additionally, an effort has been made to re-implement various methods in order to properly compare their performance, which to our knowledge has not been published previously.

3. Materials and Methods

A systematic literature review requires a well-defined process for collecting the primary studies that will form the foundations of the proposed analysis. The research method for this paper follows the guidelines from [19] to present a proper assessment of the research topic by using a rigorous and verifiable methodology. The research methodology described in this section is illustrated in Figure 1 with the number of publications included at each stage indicated in parentheses.

3.1. Research Questions

This systematic literature review aims to answer the following research questions.

RQ-1: How have IDSs based on system calls research evolved since 1996?

The objective is to identify trends in approaches developed to detect intrusions from system calls. These may be in the targets on which IDSs aim to be deployed, in the methods used to classify system call sequences, or any other trend relevant to the analysis of the research area.

RQ-2: How are system call-based IDSs built?

We seek to identify and analyze the methods used to build such IDSs. This includes the overall intrusion detection approach, whether the authors are looking to recognize known attack patterns or deviations from normal behavior, the processing of system calls, and the mechanisms that classify sequences of system calls as benign or malicious.

RQ-3: On what data are IDSs based on system calls trained and tested?

The aim is to describe the data used to evaluate the performance of these IDSs: their origin, the application case studied, and their availability for reproducibility.

RQ-4: How are IDSs based on system calls evaluated?

We are interested in how the overall performance of these IDSs is measured. This includes the metrics used to evaluate detection performance but also the measurement of the impact of intrusion detection on device performance.

Then, the experimental evaluation of some key state-of-the-art methods aims to answer the following research question:

RQ-5: How do state-of-the-art IDSs based on system calls identifiers perform under the same execution conditions?

The goal is to observe the experimental performance of key methods proposed in the literature relying solely on the analysis of system calls identifiers for intrusion detection by setting predefined datasets and common evaluation methods for all of the studied methods.

3.2. Literature Review Methodology

3.2.1. Search Strategy for Studies to Be Considered

This paper is based on a review of the scientific literature available from Scopus, IEEE Xplore and The ACM Guide to Computing Literature digital libraries. The keywords aligned with the study topic are all expressions derived from intrusion, anomaly, misuse or malware detection, with system calls, which are also abbreviated to “syscalls”. The search logic used to collect relevant publications from these libraries is shown in Figure 2, and the specific search strings applied for each library are available in Appendix A. Searches are performed on titles, abstracts and keywords for Scopus, on all metadata for IEEE Xplore, and on abstracts in The ACM Guide to Computing Literature.

Although there are many duplicate publications in the search results from the IEEE Xplore and The ACM Guide to Computing Literature libraries with publications indexed in Scopus, including these libraries is not redundant because, as the collection results in Figure 1 show, not all documents are indexed in Scopus. This can be explained by a delay in indexing and databases built by abstracts instead of full-text databases, which leads to complementary results.

3.2.2. Inclusion and Exclusion Criteria for Primary Studies Selection

Once duplicate publications have been removed from the collection results, a number of inclusion and exclusion criteria are applied, as presented in Table 2, to identify relevant publications to study. The aim of these criteria is to exclude from the set of publications those that do not correspond to the scope of the review. These criteria are first applied by reading the titles and abstracts of the publications. The selection is then refined with the more in-depth full-text reading for data extraction.

3.2.3. Data Extraction from Primary Studies

From the full-text reading of these primary studies, a substantial amount of information is extracted to address review questions.

To answer RQ-1, bibliographical information from each publication is collected and linked to the information extracted for the other research questions to study research trends:

Year of publication;
Digital Object Identifier (DOI), if any;
Authors;
Title.

The answer to RQ-2 requires gathering all the information relating to the design of the proposed methods:

Intrusion detection approach, focusing on recognizing known intrusion patterns or deviations from a learned normal behavior;
If relevant to the method used, which learning paradigm is used to train the IDS;
Data usage, therefore defining the granularity of alerts, by entire execution trace, by predefined-length sequences of system calls, or by defined time interval;
Features extracted from system calls;
Features reduction method, if any;
Classification mean from extracted features;
Other information used in conjunction with system calls;
Whether the proposed IDS is collaborative, i.e., if it is intended to be deployed on a single target or if its alerts are based on data collected from multiple hosts.

RQ-3 requests information on the data used for each study:

Data type, rather collected from real devices, testbeds, or simplified environments;
Data augmentation method, if any;
Data availability at the time of publication;
Operating system from which system calls were collected;
Use case of the host under study, i.e., desktop computer, enterprise server, IoT device, or other application environment.

Finally, IDS evaluation methods need to be analyzed to answer RQ-4:

Evaluation metrics used to assess detection performance;
Overhead measurements type and scopes, if any.

When a publication presents several intrusion detection methods, only the method corresponding to the main contribution of the authors is retained for analysis. In the case of contributions comparing several detection models, such as several machine learning models, the model that presented the best results according to the authors is the one that is considered.

3.3. Methodology for Experimental Evaluation of State-of-the-Art System Call-Based IDSs

3.3.1. Inclusion, Exclusion, and Quality Assessment Criteria for Selecting Candidate Studies for Re-Implementation

More selective inclusion and exclusion criteria are being applied to the previously selected primary studies in order to examine a smaller set of methods for reproduction. Among these criteria, described in Table 3, are citation conditions; these allow us to consider key works in the field.

The quality of publications is then assessed based on a single criterion: the presence of all the information required to reproduce the work. At last, the final selection of works to be reproduced is made in a less systematic manner, aiming to prioritize methods representative of the diversity of approaches proposed in the literature and those presenting relevant research questions.

3.3.2. Data Extraction from Candidate Studies for Re-Implementation

The answer to RQ-5 requires the study of all particularities linked to the IDSs proposed by authors with the aim of reproducing their methods. Thus, the information extracted from these publications is technical information such as the following:

Training and testing data splitting method;
Pseudo-code for any proposed new algorithm;
Hyperparameters used by machine learning algorithms.

More generally, any information needed to reproduce considered methods is extracted.

4. Results: Study of Literature

In this section, we address the following:

RQ-1: How have IDSs based on system calls research evolved since 1996?
RQ-2: How are system call-based IDSs built?
RQ-3: On what data are IDSs based on system calls trained and tested?
RQ-4: How are IDSs based on system calls evaluated?

4.1. Considered Studies for Literature Review and Experimental Evaluation

The complete list of studies and their rationale for exclusion is publicly available as described in the Data Availability Statement at the end of this paper.

As can be seen in Figure 3a, more than one quarter of the publications retrieved from the search were excluded from the analysis because their contribution did not feature an intrusion detection method (I1). Around one fifth of publications were also discarded because they were not written in English, were not accessible, or because the presented work was more thoroughly analyzed in other publications within the study set (E1–6).

Then, there remain 209 relevant papers for analysis.

The second set of inclusion and exclusion criteria were applied to this selection in order to reduce the number of publications whose method is a candidate for reproduction and evaluation. According to Figure 3b, around one third of methods were not considered because they integrated other data than system calls identifiers for intrusion detection; they were therefore dismissed (I3). Another third of the publications were discarded because they were deemed too specific for integration into a particular application, which would make an evaluation alongside more versatile methods of little relevance (E7), or because they had little impact on further work in this field of research (E8-9). As can be seen in Figure 4, these conditions seem appropriate since the number of publications considered per year follows the trend of the total number of publications of the first selection. Thus, 86 publications remain candidates for the evaluation.

4.2. Complete Breakdown of Studies Proposing an Intrusion Detection Method Based on System Calls

Data extracted from the 209 reviewed publications allow us to categorize intrusion detection methods using system calls according to several characteristics. In addition to seeking to answer research questions RQ-1, RQ-2, RQ-3 and RQ-4, these new taxonomies present the most explored approaches while being broad enough to allow for the characterization of future work.

4.2.1. Comprehensive Taxonomy of System Call-Based IDSs

In response to RQ-1 and RQ-2, several elements have been identified to characterize the approaches used for system call-based intrusion detection. These elements are provided in Figure 5 and detailed below. A more detailed visualization showing the number of publications involved in each of these categories, in the form of a Sankey diagram, is available in Appendix B.

Detection Type

The detection type corresponds to the approach used for intrusion detection. Two types are considered: Anomaly, which characterizes methods looking for deviations from a behavior considered as normal, or Misuse, which characterizes methods aiming to recognize a behavior considered as malicious. The first approach has the advantage of detecting malicious behavior that was not present in the training dataset, whereas misuse detection is limited to identify only known malicious patterns. The second method is similar to the signature-based recognition used by traditional antivirus software, which relies on predefined malicious signatures [20]. No temporal evolution seems to be emerging regarding the adoption of one approach over the other with anomaly detection being the most widely used approach in the literature.

Learning Paradigm

The type of learning paradigm defines how the IDS is trained if applicable. This may involve training a data representation model, a feature reduction algorithm, or the classification model. This is the case for IDSs using machine, including deep, learning models that require a training phase. Supervised learning exploits data which, in the context of intrusion detection, are usually labeled as “normal” and/or “attack”. A machine learning model uses this information in its learning phase to adjust its weights or thresholds so that its predictions align with the ground truth labels associated to sequences of system calls. Sometimes, only sequences from one label (“normal” or “attack”) are provided or used for training; this paradigm is known as one-class supervised learning, and it should not be confused with semi-supervised learning, which corresponds to learning from both labeled and unlabeled data. Self-supervised learning, which is a type of supervised learning, is a method of training models in which the supervision signal (the “labels”) is automatically derived from the data itself rather than being manually provided in a dataset. Examples of self-supervised learning include language models that predict the next word in a sentence, where the target is derived directly as a shift in the input sequence. Unsupervised learning corresponds to learning from unlabeled data, which means that sequences of system calls provided to the model are neither identified as “normal” nor “attack” during the learning phase. The model therefore cannot use this information in its learning phase and instead seeks to discover underlying patterns within the data. This type of learning is widely used for clustering and dimensionality reduction. It usually produces a score based on likelihood or a similarity measure and can therefore be adapted to binary classification problems by applying a threshold. Reinforcement learning involves embedding an agent in an environment in which it learns a strategy by receiving rewards or penalties based on its interactions with that environment. This approach differs from traditional supervised or unsupervised learning paradigms as the aim is not to identify patterns in the data and make a prediction but rather to maximize a reward process through the predictions made. Reinforcement learning then corresponds to learning a (near-)optimal policy of actions to be taken by an agent in order to maximize a reward function. Federated learning designates a collaborative learning approach based on several local models trained from local data. In the context of intrusion detection, this can be achieved by first training a specific model for different yet similar devices, such as IoT devices [21,22,23], and then by sharing the trained models, and only the trained models, in order to make a decision. This is one way of sharing knowledge while preserving data privacy. Local models then fall into other learning paradigms described above. In our analysis, all IDSs implementing federated learning use supervised learning for local models. When multiple learning paradigms are involved in training an IDS, the method presented is categorized by the most restrictive paradigm. For example, an IDS composed of a model learning a representation of data in an unsupervised way followed by a classifier trained using supervised learning will be categorized as supervised. Similarly, the deployment of federated learning models will be classified as such regardless of whether the learning mode is supervised or unsupervised. In this case, in our literature review, all methods implementing federated learning deployed supervised models.

Looking at the number of publications per year presenting an IDS with at least one learning mechanism, as shown in Figure 6, learning was predominantly unsupervised until 2014, with one exception in 2018, in favor of supervised learning techniques. This can be explained by the increasing use of machine and deep learning algorithms, which grew during the same period.The arrival of self-supervised learning mechanisms, notably with the growing use of deep learning language processing algorithms for detecting abnormal system call sequences, can be seen from 2019 onwards as well as in recent research in federated learning with early work in 2023.

Trained on Attacks?

Some intrusion detection methods require sequences corresponding to attacks in their training phase. This has the advantage of enabling models to better identify normal behavior from malicious behavior and to effectively detect attacks similar to those in the training set. On the downside, some of these models struggle to detect attacks they have never seen before. Lots of methods proposed in the literature present an IDS trained on attack traces, especially in recent work, in proportion with the use of supervised learning, as depicted in Figure 6.

Data Granularity

IDSs use different inputs, such as a sequence of system calls corresponding to an entire execution of a process or just a part of it, in order to make a prediction regarding its legitimacy. A full sequence data granularity corresponds to the use by an IDS of entire traces of system calls of an execution, whether by program, i.e., from start to finish of program execution, or by data acquisition cycle, for example from startup to shutdown of the target under study. This necessarily leads to a delay in detection, as it requires waiting until the end of an execution to raise an alert. A fixed-length sequence granularity corresponds to the use of sequences containing a fixed number of system calls. Such sequences are thus shorter than full sequences so that a behavioral analysis can be performed before the end of the execution of the studied system, allowing a possible alert to be raised earlier in the course of an intrusion than an IDS that performs an analysis at the end of each program execution. A time interval granularity corresponds to the use of sets of system calls executed over a fixed time period. Sequences can therefore be of variable size, as the number of system calls during a given time window depends on the system activity at that moment. As with fixed-length sequences, analyses can be performed during runtime rather than afterwards. There are about the same number of studies proposing detection by full system call sequences as by fixed-length sequences, and there are also occasional studies analyzing system calls at fixed time intervals for intrusion detection.

Features

Several features can be extracted from sequences of system calls. We summarize the main types of features below:

A first representation is the one based on the sequences themselves. sequence-based features can take the form of raw sequences of system call identifiers, i.e., a list of temporally ordered system calls executed on the host under study, with or without system call selection [24,25,26,27,28]. Other sequence-based features are n-grams built from sequences of system calls.
A second common representation of sequences of system calls is frequency-based. In this type of approach, sequences are encoded as vectors whose components are the values of the frequency of system calls or n-grams. It includes occurrence count and Term Frequency-Inverse Document Frequency (TF-IDF) features [22,29,30,31,32,33,34,35]. The main drawback is the loss of information on the execution order of system calls in the observed sequence in exchange for adding weighting information.
Relationships can be derived from sequences of system calls, forming graph-based features. Often requiring the use of information other than system calls alone, these relationships can come from links between processes [36,37,38,39], links between resources on the studied target such as files or sockets [40,41,42,43,44,45,46], or network elements such as destination IP addresses [47,48].
The democratization of machine learning algorithms for intrusion detection has led to the embedding-based representation of system calls. An embedding is a vector representation of an object under study, in our case of each system call. Embeddings can be predefined, like One-Hot Encodings [49,50,51,52,53], but are most often learned as Word2Vec, GloVe, or Locally Linear Embedding [54,55,56,57,58,59,60,61,62,63].
Information on a group membership of system calls within a sequence can also serve as a feature of a detection model. group-based features represent system calls by their belonging to a group, whether linked to their scope of action [35,46,57,64,65,66,67,68], or following the prior application of a clustering algorithm [69,70,71,72,73].
Other methods use descriptive statistics of the composition of each sequence [74,75]. These statistical description-based features allow for a complete agnosticity of system call identifiers, meaning that certain behaviors in the sequence of system calls can be learned on one system and effectively applied to another system [75].

In addition to the above feature types, we consider in the remainder a category Other which gathers “exotic” representations of sequences of system calls [27,76,77,78,79,80,81,82,83,84], like the Soundex representation described in [85].

When research into intrusion detection using system calls first began, IDSs mainly focused on analyzing the sequences themselves or the n-grams that made up the observed sequences, as can be seen in Figure 7. Around 2015, however, research began to shift toward using information related to the frequency of system calls. This is consistent with the evolution of technologies such as multithreading, which allows multiple programs to run on a single processor core, and the increase in the number of cores per processor. Since task scheduling is performed by the operating system, even on the same benign behavior, system calls very rarely have exactly the same execution sequence across an entire system. It is also worth noting the interest in representing system calls using graphs since 2009, and the recent use of embeddings as features, especially since 2020, coinciding with the growing use of deep learning algorithms and ensemble methods for intrusion detection, as can be seen in Figure 8.

Feature Reduction

Feature reduction consists of constructing, from the extracted features, new features of reduced size or dimension. The main approaches adopted are Principal Component Analysis (PCA) and Singular Value Decomposition (SVD), which are closely related unsupervised statistical methods for creating linear combinations of non-correlated variables from observed and possibly correlated variables [21,22,23,24,33,35,86,87]. In our case, observed variables are the features extracted from system call sequences, and the newly created variables are those given as input to the classification model. Other feature reduction methods can be used [30,88,89,90,91], such as Linear Discriminant Analysis (LDA) [92], using the encoder layer of an autoencoder model [93], Recursive Feature Elimination (RFE) [94], and rough sets [95]. It does not appear that any trend is emerging regarding the use or non-use of feature reduction mechanisms nor regarding the type of mechanism used.

Classification Method

Because intrusion detection is a classification problem, an IDS needs at least one way of classifying observed behavior as benign or malicious based on the computed features of system call sequences and potentially other information. Classification methods allow one to raise an alert when the observed behavior is considered malicious. They therefore have two major objectives: to present a maximum attack detection rate in order to detect all the threats for which it was designed while minimizing the number of false alerts that are raised when no malicious action takes place. Several classification methods are presented in the state of the art, which we have tried to group into 15 main and subcategories.

Heuristics-based classification methods rely on observed patterns within the extracted features and most often define decision thresholds empirically on a measure of similarity or sequence coverage [25,39,42,44,80,96,97,98,99,100,101,102,103]. For instance, TIDE defines a mismatch threshold on the observed sequences compared to the reference sequences to raise an alert [10,11].
Rule-based classifiers classify system call traces against a set of predefined decision rules, producing static and deterministic classification results [47,62,68,104,105,106,107,108,109,110,111,112,113,114,115,116]. These methods require a detailed analysis of the protected system by experts to ensure that its behavior is fully characterized as well as to avoid false alarms. Rules can be boolean, which means that the classifier assesses whether the observed traces comply or not with the defined rules in order to mark them as benign or malicious. Classification can also be fuzzy rule-based [117], meaning that the set of rules assigns a degree of membership to several classes, from benign to malicious, in a multiple-label manner.
The execution of system calls can also be seen as a stochastic process which can be modeled with Markov chains [70,77,118,119,120,121], Bayesian networks [78,122], Hidden Markov Models (HMMs) [95,123,124,125,126,127,128], Hidden semi-Markov models (HsMMs) [129], non-stationary Markov chains [130], and Dynamic Bayesian Networks (DBNs) [131] in order to assign a probability score to an observed sequence of system calls so as to determine its likelihood relative to a learned normal distribution. A threshold on this value, often learned during the training phase, triggers an alert when the observed behavior is drastically different from the learned behavior.
Machine learning encompasses a large number of algorithms that can be used for classification purposes. Some methods, such as Linear Discriminant Analysis (LDA) [24,132], Logistic Regression (LR) [94,133], and Naive Bayes (NB) [86], are linear classifiers that seek to identify a separation between normal and abnormal data and hence require training on both types of data to correctly identify this boundary. Support Vector Machines (SVMs) [27,32,36,67,76,91,93,134,135,136,137] go further than linear classifiers by using kernel functions to project data into a high-dimension space in which a linear separation is applied. A One-Class SVM (OC-SVM) [138,139] is a variant of SVMs trained only on data from one class whose decision boundary corresponds to the distribution of the learned data. This algorithm identifies, at inference, data deviating from this learned distribution and characterizes them as anomalies. Methods based on decision trees (DTs) [140,141], such as Random Forest (RF) [23,34,74,90,142,143,144] or Isolation Forest (IF) [75], define a set of conditions that differentiate normal sequences from abnormal sequences, either by a majority vote of the DTs on the nature of the sequences, or by identifying the conditions that isolate rare data from the rest of the data to characterize them as outliers. The use of DTs allows for an interpretability of classification results because the conditions applied by the DTs can be retrieved. Boosting algorithms such as XGBoost [35,145] can also use DTs but, unlike RF, they do not all have the same weight in the model’s final classification decision. The k-means [79,83,87] and k-Nearest Neighbors (kNN) [29,58,81,146,147,148,149,150,151] classifiers evaluate a function based on distance from a centroid or from neighboring data, respectively, in order to identify groups in the observed data and identify isolated data marked as anomalous. The Gaussian Mixture Model (GMM) [84] is a partitioning clustering algorithm that can assign a probability of belonging to groups of data considered similar to a given data point, thereby identifying data that deviate from these groups with a richer representation of the data than k-means clustering.
Deep learning is a subset of machine learning defined by the use of deep Artificial Neural Networks (ANNs) [152,153] with several hidden layers of neurons. A Multi-Layer Perceptron (MLP) [22,31,33,46,89,154,155,156] is a feedforward neural network, meaning that information from the inputs propagates through the network only in the direction of the output, like Convolutional Neural Networks (CNNs) [50,57,157,158,159,160] such as WaveNet [56]. The main component of CNNs is convolutional layers, which apply a convolution kernel to the sequential data in order to identify particular local patterns. Conversely, in Recurrent Neural Networks (RNNs) [161], the network treats the sequences step by step, and the output of the recurrent cell at a given step also depends on the output of the cell at the previous step, allowing information to flow along the sequence. Among these RNN-based algorithms are Long Short-Term Memory (LSTM) [51,52,53,55,162,163] and Gated Recurrent Unit (GRU) models [60,164], which are used as classifiers of system call sequences when preserving order is important for anomaly detection. More recent architectures such as Transformers [165] are designed to process input data simultaneously and with better contextualization between the beginning and end of the sequence, unlike RNNs, which require sequential processing. Graph Neural Networks (GNNs) such as GrapheSAGE [48] are networks specifically built to ingest data composed of nodes, for instance processes or files, linked by edges, like system calls. Other specific neural networks such as Extreme Learning Machine (ELM) [82,166,167,168] have been used in the literature for the classification of system call sequences. An AutoEncoder (AE) [21,43,45] is a neural network architecture designed around an encoder, whose role is to compress input data, and a decoder responsible for reconstructing the original data from the compressed representation. This type of model can be used in intrusion detection by using the reconstruction error between input and output as an anomaly score for the observed sequence of system calls. Other variants include Variational AutoEncoder (VAE) [66] and Graph AutoEncoder (GraphAE) [169].
Reinforcement prediction algorithms with, for instance, Markov reward processes [170,171,172], enable machine learning agents to be trained to recognize benign behavior from malicious behavior based on observed system call sequences.
Finally, neuro-fuzzy systems use fuzzy set theory to provide fuzzy logic to a neural network which is able to assign to new samples of data a degree of membership in several classes from least abnormal to most abnormal [85].
Colored Petri Nets are used to detect patterns in system call sequences, which are then characterized as functionalities. Chaining these functionalities together in the context of execution creates a signature for each program, which is then considered benign or malicious by comparison with a database of reference signatures [37,173,174,175,176]. This approach has the advantage of being able to track malicious actions performed on a system in addition to raising alerts.
Artificial Immune Systems (AISs) are a biology-inspired approach to intrusion detection that uses algorithms abstracting clonal selection through hypermutation processes, negative selection, immune memory, and danger theory to identify abnormal behaviors. Even though Forrest [10] is one of the first studies partly using this type of approach for anomaly detection, we rather consider it as a heuristic-based classification approach, which is in line with more recent similar methods that do not refer themselves as AISs. Typical AIS-based methods used for anomaly detection are described in [28,177,178,179,180].
Combining several models may provide better predictions than using a single model. Ensemble methods enable the training of machine learning models with varying hyperparameters [35,145,181,182]. Other combinations [30,69,71,72,88,183,184,185,186,187,188,189,190,191,192,193,194,195], such as a combination in ROC space [196,197,198], have also been studied for the detection of malicious behavior using sequences of system call identifiers.

As can be seen in Figure 8, researchers initially used heuristics, rules, and stochastic processes to distinguish benign sequences from malicious ones before turning more unanimously to machine learning algorithms from 2014 and deep learning from 2020. In recent years, IDSs using classifiers that do not rely on artificial intelligence have become almost occasional.

Collaborative

Intrusion detection can be based on the behavioral analysis of several targets at the same time—for instance, to identify rogue devices [199]. In this case, we call an IDS collaborative when several hosts are required for intrusion detection. The topic of collaborative IDSs was addressed sporadically between 2006 and 2013 [41,126,178,179,199]; then, it resumed with the use of federated learning starting in 2023 [21,22,23,169].

Additional Information Used

System call identifiers are information that can be used for intrusion detection, but they can also be used together with other information that provides additional context for detection.

Additional information may include data from the dynamic analysis of system calls, i.e., directly interceptable with system call identifiers. Examples include timestamps corresponding to the start or end of the call execution [122,132,159], arguments passed as parameters [43,55,70,77,107,112,127,200], information linked to the calling process such as the process identifier (PID) [39,148] or parent-process identifier (PPID) [47,80,174], information linked to the privileges with which the calling process wishes to access resources like EUID and EGID for Linux devices [104,201,202,203], the return value of the system call which indicates the success or failure of the call [74,120], as well as the value of the error raised in the case of a failed call [99,109].
Resources usage is a good indicator of how much a machine is being used, making it a key piece of information for identifying less evasive malware such as ransomware, whose encryption of the file system results in an abnormal increase in CPU activity, or botnets, whose activation generates significant network activity on infected hosts. Then, a number of studies have incorporated host activity information into intrusion detection in conjunction with system calls. These include the CPU time devoted to each process [84,179,180,185]; the RAM used, allocated and released during execution [84,179,180,185]; input and output usage [185], such as peripherals and GPIO usage; network usage in terms of number of active connections, the number of packets transmitted and received, or bandwidth saturation [179,180,204]; the thread status of monitored processes—that is, whether they are running, sleeping or dead [84]; host power consumption [193], which is intrinsically linked to its activity; and Hardware Performance Counters (HPC) [193], which reflects the micro-architectural state of each microprocessor, i.e., its number of well-predicted conditional branches, its number of cache misses, etc.
Since system calls are functions executed at the kernel level, higher-level OS Events can also be used for intrusion detection. As an example, [151] uses calls to the Android API in addition to system calls as a feature of a machine learning classifier for anomaly detection. Starting from the observation that the majority of Android applications make use of middleware libraries, based on the Android API, indices about their compromission can be seen through the monitoring of these functions. However, since an application can also directly call kernel functions, this is not a substitute to using system calls to detect malicious behavior. Ref. [28] presents an IDS using keyboard interrupt signals to detect the presence of user-space keyloggers on a virtual machine. In conjunction with system calls related to file system management and networks, the authors succeed in observing the behaviors induced by the presence of each keylogger studied through Virtual Machine Introspection (VMI).
For intrusion detection involving the monitoring of a particular process, a prior analysis of binaries enables all system calls that may be called during execution to be retrieved. In this way, the consistency between system calls from static analysis of the binaries and those actually observed at runtime can be used to detect anomalies in observed execution [139].
Other information, not falling into the previous categories, has also been used in the state of the art and designated Other in our taxonomy, including the memory address at which the assembler jump instruction to the system call is located, i.e., the address of the kernel branch instruction making the system call execution routine [51,79], or shell commands causing system call execution [204].

4.2.2. Data Used to Study System Call-Based Intrusion Detection Mechanisms

To answer RQ-1 and RQ-3, it is necessary to be able to describe the data used for the training and evaluation of intrusion detection systems. A number of features have been identified, as detailed below and illustrated in Figure 9.

Data Availability

An important property of the data used in training and evaluating IDSs is their availability. The availability of the data used by authors at the time of publication of their studies was analyzed. Public data encourage the reproducibility of the results presented by authors as well as a comparison with other studies from the literature if they were conducted using the same public data. The use of private data may be due to industrial partnerships whose artifacts cannot be made public or to the study of a use case whose specificity is not present in already public data. In this case, comparison with other intrusion detection methods becomes more difficult, even if such data may come from real installations and the results reported reflect IDS performance under operational conditions. The literature is divided almost equally between work carried out on public and private datasets. A temporal analysis does not reveal any link between the publication of a new system call dataset and an increase in the use of public data. It is also important to note that the UNM [10] and DARPA98 [205] datasets are still used in recent work even though they were made public almost three decades ago. The same applies to ADFA-LD [17], which is the most widely used public dataset according to our analysis, despite the fact that the data were collected more than 14 years ago and are therefore certainly not representative of modern computer systems anymore. Furthermore, as shown in Figure 9, datasets that were once public like UNM are no longer accessible, which makes it difficult to reproduce the findings in the future. Publications considered below for the analysis of the type of data used by authors, the operating system on which they were collected, and the host environment are those whose research uses private data. The reason for this choice is that given that half of the literature is based on a limited number of public datasets, considering them could bias the characteristics of the data we wish to highlight through Figure 9, Figure 10 and Figure 11.

Data Type

Data type refers to the nature of the data used for training and evaluating IDSs. It is considered real when it comes from a real environment, namely existing systems operating for the use case under study and with the usual users. Data are considered coming from a dedicated testbed when it comes from an environment that attempts to reproduce as faithfully as possible the systems used for the use case under study. Finally, it is considered emulated if it comes from an environment that is simplified compared to the systems used for the use case under study. Data from executions on virtual machines are considered emulated data when the use case does not involve a cloud environment. If the use case under study involves user actions, the data are considered emulated when their behavior is approximated using automation frameworks such as Selenium. Whenever researchers use private data, they are mostly collected on testbeds built specifically for generating such data, as shown in Figure 9. A number of emulated systems have been used, probably for the ease and security of deployment when running real malware, at the expense of data quality, which is less representative of real-world use cases.

Data Augmentation

Some studies use data augmentation methods such as Generative Adversarial Networks (GANs) [154,157] and the Synthetic Minority Over-sampling Technique (SMOTE) [145,181] to generate missing data samples for an even distribution of normal and abnormal sequences of system calls or to expand datasets for training large neural networks.

Operating System

System calls constituting datasets may originate from multiple operating systems. Although most work has focused on Unix-based operating systems like FreeBSD and Linux, other work has been based on data from Windows, Android, or OSX systems. Some publications do not describe the origin of the system calls collected for their experiments.

Private data specifically collected for research come mainly from Unix environments (60 publications in our analysis, Figure 9) and Windows (40). A few isolated studies have been conducted on Android since 2014, as can be seen in Figure 10, and one study focused on OSX in 2011.

Host Environment

Regardless of the underlying operating system, data describe the host environment from which system calls are extracted and embedded.

This environment may include any of the following:

Desktop, such as a personal computer or workstation;
Server, when the target hosts a few web applications for personal use or a significant number of services as part of an enterprise server;
Cloud, when the use case focuses on virtualized services running in containers or virtual machines;
Mobile, when the subject of study is a smartphone;
IoT/I-IoT, when the target is embedded in an environment with a specialized role of data aggregation and processing, in a home automation or industrial environment.

Some studies do not focus on a specific environment for their IDS research.

The main use case covered by private datasets is office computing, as shown in Figure 11. Interest in the cloud has been growing since 2012, and interest in the IoT has been growing since 2019.

4.2.3. Evaluation Method of System Call-Based IDS

Responding to RQ-1 and RQ-4, evaluations conducted by the authors were analyzed whether they focused on computational efficiency or on measuring the detection performance of presented methods. Findings are presented in Figure 12 and reported below.

Overhead Measurement

The evaluation of intrusion detection systems involves measuring their impact on the resources of the system on which they run. Measuring this overhead involves temporal complexity such as additional CPU load, increased latency or longer execution times; it can also involve spatial complexity such as the RAM used by the program during execution or the persistent memory footprint. These metrics can be measured at different stages of the IDS lifecycle, such as its training phase or during inference. This makes it possible to identify resources required for each phase. For example, an IDS with negligible inference overhead can be deployed on a system with limited resources, but it may be worthwhile to defer the learning phase of this solution to a more powerful machine if it requires a larger amount of computational resources. Conversely, if the target has the necessary resources, local relearning of the IDS is possible following the aggregation of new data. Only a few studies measure the overhead for their IDS, yet when they do, the measurement often focuses on inference time, i.e., once the solution has been deployed.

Evaluation Metrics

The detection performance of IDSs is measured using metrics. Some authors do not provide metrics but supply the raw predictions of their detection model during the testing phase. Detection methods that directly classify sequences as normal or malicious can have their results summarized using confusion matrices showing the number of True Positives (TP), i.e., the number of system call sequences identified by the IDS as malicious and labeled as malicious in the dataset; the number of False Positives (FP), i.e., the number of sequences identified by the IDS as malicious but labeled as normal; the number of True Negatives (TN), i.e., the number of sequences identified by the IDS as normal and labeled as normal; and the number of False Negatives (FN), i.e., the number of sequences identified by the IDS as normal but labeled as malicious. A number of metrics can be calculated from these results:

True Positive Rate (TPR): also known as recall or sensitivity, this metric corresponds to the number of sequences correctly identified as malicious by the IDS out of all the malicious sequences in the dataset.
False Positive Rate (FPR): number of sequences falsely identified as malicious out of all sequences labeled as normal.
True Negative Rate (TNR): also known as specificity, it corresponds to the number of sequences correctly identified as normal out of all sequences labeled as normal.
False Negative Rate (FNR): the number of sequences falsely identified as normal out of all sequences labeled as malicious.
Accuracy: number of sequences correctly identified as normal or malicious out of all sequences in the test dataset. It should be noted that this metric is highly dependent on the dataset used, as an IDS with no detection capability that labels all traces as normal could achieve high accuracy if the dataset contained very few malicious samples [206].
Precision: number of sequences correctly identified as malicious by the IDS out of all sequences labeled as malicious by the IDS.
F1-score: metric computed as the harmonic mean between TPR and precision, which is preferred for addressing the problem of imbalanced classes where accuracy measurement may not be representative of the IDS’s actual detection capabilities.
Matthews Correlation Coefficient (MCC): a metric that, like the F1-score, aims to address the problem of unbalanced class distribution by proposing a score proportional to the results obtained on the four metrics of the confusion matrix.

For detection methods that assign an anomaly score to each sequence of system calls, results can be presented using a Receiver Operating Characteristic (ROC) curve showing the detection model’s performance in terms of TPR (y-axis) and FPR (x-axis) for each threshold used to distinguish normal sequences from malicious ones. The Area Under the ROC Curve (AUC) thus represents the classification performance across several decision thresholds. The TPR obtained for a fixed FPR can then be deduced and, conversely, an FPR for a fixed TPR. Other measures, such as generality [207] or the number of virtual machines protected by the proposed detection framework [208], are occasionally used by authors to address the specific features of their solution. No particular trends were observed in measuring the impact of proposed IDSs on the system under study nor the metrics employed to evaluate the performance of detection methods.

5. Results: Evaluation of State-of-the-Art System Call-Based IDSs

This section addresses the following:

RQ-5: How do state-of-the-art IDSs based on system call identifiers perform under the same execution conditions?

5.1. Selected Studies

The quality of the 86 candidate publications for experimental evaluation was carefully assessed (AQ1), leading to the choice of 14 of them which represents 15 intrusion detection methods to be evaluated alongside the 3 methods proposed by Forrest and her colleagues, namely TIDE [11], STIDE and t-STIDE [209], which have been considered by the research community as gold standard methods. The selected 18 methods are described below, and their characteristics are summarized in Table 4.

5.1.1. [M1]

Hofmeyr et al. [11] proposed to collect observed sliding windows of system calls of a predefined length k and of step 1 during a normal and benign execution and to store them as a reference dictionary of sequences of system calls. An anomaly score is then constructed at the detection stage by comparing sliding windows of length k of the observed sequence against those in the reference dictionary; sequences that do not appear are considered as mismatches, and the number of mismatches divided by the total number of subsequences constitutes the anomaly score for the entire sequence. A threshold is then set to distinguish between a normal and an abnormal sequence based on its anomaly score.

5.1.2. [M2], [M3]

Warrender et al. [209] presented two methods derived from TIDE. STIDE ([M2]) differs from TIDE at the detection stage by taking into account the number of mismatches in a temporally close window; the authors called this the Locality Frame Count (LFC). This means that the anomaly score of the entire sequence is computed from the number of consecutive sliding windows that have not been observed during normal execution, with the number of considered consecutive windows being fixed, as for the window size k. A threshold on LFC identifies abnormal sequences from normal ones. t-STIDE ([M3]) adds the occurrences of each sequence of system calls during normal execution to the reference dictionary. Hence, rare sequences in the reference dictionary (defined as sequences appearing less that 0.001% in the normal execution by the authors) are also considered as mismatches and included in the LFC calculation. Again, a threshold on LFC identifies abnormal sequences from normal ones.

5.1.3. [M4]

The method proposed by Yeung and Ding [123] is based on the training of a fully connected HMM from the sequences of system calls observed through sliding windows of length k and step of 1 on a normal and benign execution. A threshold is then chosen to identify normal and abnormal sequences based on the log-likelihood of the score returned by the HMM for each observed sequence.

5.1.4. [M5]

Sharma et al. [147] represent system call sequences in the form of vectors containing the binary presence of each system call in a sequence, i.e., “1” if it is in the sequence, “0” otherwise. The authors present a new measure to quantify the similarity between two of these vectors, which is a derivative of the binary weighted cosine measure [210]. This similarity measure is used as the kernel of a k-nearest neighbors algorithm (kNN), which classifies new sequences of system calls by examining its k closest sequences observed from the normal execution. Then, the anomaly score is computed as the average distance to its neighbors. A threshold on this probability is used to assert the classification of the sequence tested as normal or abnormal.

5.1.5. [M6]

Murtaza et al. [64] do not consider raw system calls but the type of operation they are related to, as defined in the Linux kernel: architecture (10 system calls), file system (131), inter-process communication (7), kernel (127), memory management (21), networking (2), security (3), or unknown (37). Sequences of system calls are therefore replaced by sequences of corresponding operations. The proposed method, called Kernel State Modeling, measures the occurrence of each of three operations (file system, kernel, and memory management) out of the total number of operations in a sequence and compares it with the same frequencies calculated during benign execution of the considered program. If the frequency of file system operations exceeds the maximum frequency observed during training, the anomaly is confirmed. Otherwise, the difference between the frequency of the other two operations and their maximum frequency observed during training is compared to an

α

threshold, which is computed during the training phrase. If one of these measurements is greater than the threshold, then the anomaly is also confirmed. If none of these conditions is met, the sequence is considered normal.

5.1.6. [M7]

Creech and Hu [166] propose to build a dictionary of “words” composed of sequences of contiguous system calls observed in the training set and of variable size, i.e., 1-g to n-grams with a predefined n. Only words appearing at least 200 times are kept, and then they are combined in all possible ways into “phrases” of a predefined maximum word size. This constitutes discontiguous sequences of system calls from the training set, enriching the number of sequences considered in the reference frame of benign behavior. The count of reference sentences appearing in observed sequence of system calls for each sentence size becomes the feature on which an Extreme Learning Machine (ELM) is trained and tested. A threshold on the score returned by the ELM defines the boundary between sequences considered normal and abnormal.

5.1.7. [M8]

Rather than viewing the anomaly detection problem as a binary normal/abnormal problem, Nauman et al. [207] use rough sets theory to define an in-between region in which suspicious sequences are found, i.e., whose abnormality or normality is not as certain as other sequences. Sequences of system calls in the training set are first split into subsequences by a sliding window of step 1 and a predefined size similar to [M1], [M2] and [M3]. Equal windows are grouped into equivalence classes, and each class is assigned a probability based on the number of windows in the class out of the total number of windows in the sequence as well as a conditional probability based on the number of malicious windows in the class out of the total number of windows in the class. Two thresholds,

α

and

β

, are optimized during the training phase and separate classes into sets of abnormal, in-between, and normal classes, respectively, based on their probability and conditional probability. A sequence of system calls is then predicted as abnormal if at least one of its window is predicted as abnormal. A window is predicted to be abnormal if its equivalence class has a conditional probability exceeding the

α

threshold; it is predicted to be normal if its equivalence class has a conditional probability lower than the

β

threshold. Finally, if this probability lies between these two thresholds, it is considered suspicious, but it is not classified as abnormal.

5.1.8. [M9]

Khreich et al. [198] start from the intuition that combining several anomaly detection methods allows taking advantage of the strengths of each of them. Multiple models—STIDE ([M2]), an HMM, a One-Class Support Vector Machine (OC-SVM)—using two different representations of system calls—raw for STIDE and HMM, frequency of system calls in the observed sequence for OC-SVM—are trained independently on the same training set of normal sequences of system calls. The authors propose a method called Iterative Boolean Combination, which combines the predictions of each detection method using different thresholds for each model involved, that give the best AUC in the ROC space during the validation phase. A reference FPR is then chosen, and the two points on the validation ROC curve that are adjacent to the chosen FPR are selected, knowing that the considered points each correspond to a model combination rule at predefined thresholds. At inference, the two predictions corresponding to the two selected combination rules are computed, and the final prediction is chosen randomly between those two with a higher probability of choosing the combination linked to the point that is closest to the chosen FPR.

5.1.9. [M10]

Marteau [103] presents a sequence covering algorithm called Sequence Covering for Intrusion Detection (SC4ID) which aims to find the optimal coverage of a sequence of system calls by a set of subsequences extracted from reference sequences observed during benign executions of the system. An anomaly score is constructed as a function of the number of subsequences required for optimal coverage of the sequence analyzed relative to the size of the sequence itself. A threshold then defines the limit above which the sequence is considered abnormal from the computed score.

5.1.10. [M11]

Liu [75] uses neither the raw composition of system call sequences nor their frequency within these sequences as is but rather a set of descriptive statistics on the frequency of n-grams in a sequence. Mutliple values are computed, such as quantiles, maximum value, standard deviation, mean, skewness, kurtosis, standard error, and Shannon’s entropy from all frequencies of 1–3-gram in the observed sequence. These values are then given as input features to an Isolation Forest (IF) model that associates an anomaly score on which a threshold is applied to differentiate sequences considered normal from those considered abnormal.

5.1.11. [M12]

Wunderlich et al. [52] compare different representations of sequences of system calls as input to a recurrent neural network Long Short-Term Memory (LSTM) model, among a one-hot encoding and two embedding representations, Word2Vec and GloVe. Sequences given to the model are subsequences corresponding to a sliding window of predefined length and a step of 1 over the observed sequence of system calls. If one of these subsequences is considered abnormal, then the entire sequence is also considered abnormal. Experiments conducted by the authors showed better results with the first representation, i.e., one-hot encoding, so this is the method that was chosen. The LSTM model combined with densely connected neural network layers classifies each observed sequence as normal or abnormal following the one-hot encoding input of sequences of system calls.

5.1.12. [M13]

Zhang et al. [31] raise the question of the need to analyze the entirety of an execution trace in order to detect malicious behavior. Thus, their method proposes a TF-IDF representation of 2–4-gram by considering only the first 200 system calls generated by each process as input to a Multi-Layer Perceptron (MLP) model which is responsible for the classification of the observed sequences.

5.1.13. [M14], [M15]

Ring et al. [56] describe the Trace-Level Anormaly Detection algorithm, involving a language model called WaveNet [211] trained on sequences of system calls from normal executions. An embedding layer trained at the same time as the neural network is applied to the model input to represent each system call. Given all previous system calls in a sequence, the model determines the probability distribution for the following system call in that sequence. The probability of each sequence is then calculated by multiplying the probabilities of system calls for these sequences, and a threshold on the negative log-likelihood of this probability classifies the observed sequence as normal or abnormal ([M14]). An ensembling model of three WaveNets with a filter number of, respectively, 128, 256 and 512 in each convolution layer is also proposed ([M15]). The score on which classification is based is the average of the differences between the three scores and the median of the training scores for each model, which are input into the ReLU function.

5.1.14. [M16]

Subba and Gupta [33] question the relevance of using the TF-IDF of all the 5-g of system calls in a sequence as the input feature of a classifier. Thus, they apply Singular Value Decomposition (SVD) as feature reduction to identify the most discriminating linear combinations of n-grams between normal and abnormal sequences, and then they use them as input to an MLP that is in charge of classifying observed sequences.

5.1.15. [M17]

The architecture of the intrusion detection solution proposed by Zhang et al. [57] integrates several processes. Each sequence of system calls is first completed by a generative LSTM until it reaches a predefined fixed size. The extended sequences are then used, on the one hand, to build abstraction sequences of system calls according to their scope of action, similar to [M6]; on the other hand, they are used to build differential encoding sequences, i.e., system calls prefixed with an encoding according to their previous system call. The three sequences thus constructed (extended, abstract and encoded) are represented in an embedding form by Word2Vec models learned for each of these representations. Finally, these three embeddings form the input channels for a Text-CNN model [212,213] in charge of predicting the abnormality of the observed trace.

5.1.16. [M18]

The idea of Shamim et al. [121] is to find patterns in the ordering of system calls in a sequence, corresponding to repeated program execution, as can be seen in IoT applications. A segmentation of system call sequences is performed using autocorrelation, the application of the Savitzky–Golay filter [214], and prominence to identify peaks delimiting “segments” of system calls within the observed sequence. The starting point of the longest common subsequence between obtained segments is considered the beginning of the execution of the pattern. A Markov chain is then constructed to determine the probability of each segment of system calls during normal execution. During anomaly detection, system calls are ingested sequentially to find the starting point of the pattern execution. Once this point is found, the probability of the current segment of system calls is calculated from the Markov chain transition matrix. If the identified segment is of a similar length to the one observed during training and its probability is below a certain threshold, the sequence is classified as abnormal.

5.2. Evaluation Workflow

The evaluation of reproduced methods is performed on two public datasets from the scientific literature. The ADFA-LD dataset [17] is the most widely used by studies published over the last decade, as highlighted in Section 4.2.2. ADFA-LD is a set of system call traces collected on an Ubuntu 11.04 operating system running a vulnerable Apache web server with PHP, a MySQL database, a web collaboration tool named TikiWiki, and other services including FTP and SSH [17]. According to the authors, the platform represents a common small server for file sharing, database and web services with remote access. A first considered threat model is that of a remote attacker seeking to compromise exposed services. Four attacks have been developed for this purpose, including two password brute force attacks on FTP and SSH services, an exploitation using a TikiWiki CVE [215], and a custom PHP remote file inclusion. A second considered threat model is that of an attacker using social engineering to have a poisoned executable file run on the server, probably by a technician. Both of these attacks can be used to obtain a reverse shell or to create a new superuser. As the dataset is designed for IDSs with an anomaly detection approach, it is built around three subsets: a training set and a validation set containing, respectively, 833 and 4372 traces of normal web server usage, and a dedicated test set, comprising 746 traces from multiple executions of each attack. NGIDS-DS [216] is a more recent dataset than ADFA-LD containing traces of system calls executed on a corporate server running Ubuntu 14.04 and operating storage, FTP file sharing, email, web application, NAT, DNS, and SSH services. This machine is subject to a significant set of attacks grouped by the authors into seven families: Exploits, Denial of Services, Worms, Reconnaissance, Shellcode, Backdoors, and Generic. The dataset is divided into four long system call traces corresponding to normal server executions and 16 additional traces following the execution of attacks on the same system. To obtain similar system call sequences between both ADFA-LD and NGIDS-DS datasets, the latter traces are divided into sequences of size 2600, which is the average size of those in ADFA-LD. This results in 34,628 sequences of 2600 system calls, which are divided into 29,783 normal sequences and 4845 attack sequences.

Some selected methods require the training set to contain malicious traces, while others do not. Datasets were thus first imported as a whole without taking into account the training, validation, and attack separation of initial datasets. They were then divided accordingly into training, validation, and test sets. Training and validation sets represent 70% of the entire dataset with 20% of this subset being dedicated to validation. The other 30% constitutes the test set. For methods that have to learn only on normal traces, attack traces were divided between the validation set (20%) and test set (80%). One method ([M13]) only considers the first 200 system calls of each trace; datasets were therefore reduced to consider only traces that have at least 200 system calls, while other traces were truncated.

Table 5 lists the content of each set and methods involved.

Once datasets have been initialized, they were preprocessed according to the requirements of each method. Methods were implemented in Python 3.10 using the scikit-learn 1.2.1 package for machine learning algorithms and evaluation metrics, Keras/Tensorflow 2.11 for deep learning models, hmmlearn 2.8 for HMM models, and Gensim 4.3.1 for Word2Vec embeddings. The runtime environment was configured to accelerate the training and inference phases of deep learning models thanks to NVIDIA CUDA, which enables calculations to be parallelized on GPUs. As a first step, the results obtained under the conditions described by the authors in their publications were sought to be retrieved. If these results could not be obtained, authors were contacted for insights. This was the case for several methods:

[M7]: We encountered difficulties in executing the proposed method in a suitable time. Strictly following the pseudo-codes proposed in the publication exploded our computation time, the method running over several weeks when the paper speaks of several days of execution. After comparing the explanations with those of the thesis manuscript of the main author [217], we realized that the algorithm for generating what they call “sentences” is only based on a selection of “words” that appear at least 200 times in the training set. Once this selection had been made and implementation optimized using suffix trees, the proposed method was able to be executed in a decent amount of time on the ADFA-LD dataset. However, we were unable to evaluate this method on the NGIDS-DS dataset due to an execution time exceeding two months, which made it unsuitable for this paper. In addition, the presented detection performance could not be found despite contacting the team who had worked on the subject, who informed us that they no longer had the source code and could not help us any further.
[M14], [M15]: The main difficulty encountered was that we were unable to obtain the results presented for the ensembling method despite the availability of the used source code, which is claimed to achieve superior detection performance compared to using the WaveNet model alone. The authors were very responsive and helpful, directing us to the right branch of the versioning repository. Their source code was used in order to avoid any errors on our side. Yet the experiments showed no significant improvement in intrusion detection performance between the model alone and the ensemblist model.
[M17]: The proposed method is rather complex and, although many hyperparameters are given in the publication, it is not always clear to us what they correspond to. For instance, the number of layers in the encoder and decoder of the sequence completer is not specified nor is the length of each window or whether padding is used. A number of 150,000 epochs is mentioned for training the Word2Vec model, which seems disproportionate. The pool size for MaxPooling of the Text-CNN model is not mentioned nor do the authors mention the use of a validation set. We therefore contacted them but received no response despite several enquiries. Then, without feedback as to the quality of our re-implementation, the reproduced method does not allow the results presented by the authors to be retrieved.

The seed used to initialize machine and deep learning model weights is set before runtime. In addition, hyperparameters that are not given by the authors are retrieved by a grid search using the validation set so that all models are evaluated in their most favorable configuration. Each method is launched 10 times on ADFA-LD and 5 times on NGIDS-DS to have results that are independent of the random data split between train, validation and test sets, and then they are evaluated on a set of metrics depending on its approach. We chose to consider only five executions for NGIDS-DS due to the number of sequences contained in the dataset, which significantly increases the training and testing time of certain methods, sometimes exceeding our defined limit of 60 days per execution. Score methods ([M1], [M2], [M3], [M4], [M5], [M7], [M9], [M10], [M11], [M14], [M15]), which compute an anomaly score for each sequence of system calls, are evaluated on AUC, TPR for fixed FPR, and FPR for fixed TPR. Raw results and ROC curves are given in Appendix C. Prediction methods ([M6], [M8], [M12], [M13], [M16], [M17], [M18]), which answer a binary classification problem between normal and attack traces, are evaluated on accuracy, balanced accuracy, precision, F1-score, TPR, and FPR, as defined in Section 4.2.3. Raw results and confusion matrices are given in Appendix D. However, these metrics do not allow for a comparison between the performance of scoring methods and prediction methods. Therefore, we use Normalized Discounted Cumulative Gain (NDCG), imported from research in information retrieval, as a metric evaluating the quality of the classification. The NDCG @n measures the relevance of the anomaly score associated with the n sequences that had the best predictions. The relevance score is calculated between the associated anomaly score and the true label of the sequence, which penalizes normal traces predicted as abnormal. In other words, the more a detection system associates high anomaly scores with truly abnormal behaviors, the higher its relevance score will be, and therefore the better its NDCG will be. Although this metric does not represent a measure of the performance of models alone, it allows methods to be ranked according to the quality of their detection on the same sequences. Finally, execution times for training and inference phases are logged. These measurements are performed one method at a time on the same machine with an Intel bi-Xeon Gold 6348 (2×28 cores at 2.6 GHz) as CPU, 512 Go of RAM and an NVIDIA A100 GPU with 80 Go of VRAM, running Scientific Linux 7. Training is carried out on the GPU for deep learning models, but inference is forced per trace on CPU instead of per batch on GPU in fairness to other methods that cannot benefit from hardware acceleration and to become closer to an on-line inference that receives one trace after another.

5.3. Detection Performances

Figure 13 shows the detection performance achieved by reproduced score methods.

A first point to note is the poor performance of [M4]. Being based on an HMM, it seems unable to model the normal behavior described in both datasets. Specifically, the authors designed this method to track system calls of a single program, whereas ADFA-LD and NGIDS-DS are built from the system calls of each program running on the studied machine. This leads us to believe that intrusion detection at the system level is too complex to be achieved through a stochastic process. On the other hand, despite being designed by its authors for anomaly detection on a single program, [M1] is surprisingly among the most effective methods. Based on sequences coverage, it inspired [M10], which offers almost the best True Positive Rates for a False Positive Rate set at 1% but also provides the worst False Positive Rates for a detection rate of 100% on both datasets. [M5] also delivers strong detection performance, achieving the third-best TPR for an FPR set at 1% and the best FPR for a 100% TPR on ADFA-LD. However, it performs significantly worse on NGIDS-DS. It can be explained by the complexity of the dataset which represents a constantly busy enterprise server and that therefore generates highly diverse system call sequences. This method, which uses a similarity measure as the kernel of a kNN algorithm, thus seems to encourage anomaly detection by applying clustering on the presence of system calls in observed sequences but only when the use case implies that the system is being protected from being overloaded as in NGIDS-DS. The results presented by [M7] are encouraging, but it never achieved the 20% FPR for the 100% TPR described in the original publication despite being evaluated on the same ADFA-LD dataset. Exploring other combinations of system calls from the dataset likely permits the model to be more generic in characterizing normal behavior than methods such as [M1], [M2], [M3] and [M10], whose reference system call patterns are created solely from observed training data. This over-generalization seems also responsible for the low number of False Positives for the 100% detection of true positives, yet it fails to detect traces of actual attacks when a low FPR is enforced. Meanwhile, this method is very complex and requires tremendous computing resources to be executed in an acceptable amount of time. In our experiments, one iteration of training, validation, and test exceeds 60 days on the NGIDS-DS dataset, making the method impractical for anomaly detection when the target behavior is too scattered. The results presented by the authors are reproduced for [M14]. As it does not outperform the other methods on any single metric, it appears to perform well on all of them on ADFA-LD: it achieves twice the detection rate obtained by [M7] for a TPR of 1% even though the model was trained only on benign system call sequences; it therefore manages to detect malicious sequences better than a supervised model that has been exposed to malicious patterns during the training process. The FPR for a 90% TPR is also similar between these two methods. In contrast, the model has a quite high False Positive Rate for 100% detection. Using embeddings to represent system call sequences appears to be suitable for use with neural networks, but improvements are needed. Furthermore, using multiple WaveNet models with different hyperparameters does not seem to significantly improve detection performance compared to a single model, which is seen by [M15] following the results of [M14] on both datasets. [M11] fails to compete with other methods on ADFA-LD and has the worst AUC on this dataset apart from [M4]. Since the method uses a set of descriptive statistics on the frequency of system call n-grams in a sequence as detection features, there is a significant loss of information, first through the n-grams, which result in the loss of information about the order of execution of system calls, and then through frequency analysis. In addition, it achieves the second-best AUC on NGIDS-DS, which is due in particular to its reduced FPR for 90% TPR. The “coarse-grained” analysis of system calls therefore appears to be useful when the behavior of the target under study is too complex to be tracked by a detection method based on the sequences themselves. Then, it has the advantage of being independent of system call identifiers, which is an interesting property that would allow a model to learn behaviors on one system that remain valid on another. The combination of heterogeneous classifiers in [M9] seems to produce positive results despite its complexity, which made it impossible to evaluate it on NGIDS-DS. Given the results of [M4] based on an HMM, the contribution of this same model in the combination proposed by [M9] is questionable. The AUC obtained by this method is higher than that of [M2], which is consistent with the way the combination of models is constructed to maximize this metric. The fixed TPR or FPR values are better than [M2] alone, which is most likely due to the OC-SVM. However, the False Positive Rate for a TPR of 90% and 100% is quite high. Taking advantage of the strengths of several models therefore seems to be a good idea as long as this combination makes sense and the benefits of each model have been proven.

The detection performance of the prediction methods is presented in Figure 14.

The precision of [M13] and [M16] is especially noticeable on ADFA-LD, surpassing other methods with an extremely low FPR and good detection capability. Using a TF-IDF vectors of n-grams as a feature of an MLP model, with or without SVD, thus seems more than appropriate on this dataset. However, these methods have no discernment capability on NGIDS-DS given the systematic characterization of sequences as normal. The cumbersome mechanism deployed for [M17] also fails to identify abnormal traces in NGIDS-DS. Its TPR is the second highest on ADFA-LD among the reproduced methods, but it also has the second highest FPR on that dataset. [M6] is also in this situation with unsatisfactory results on both datasets. One reason is probably the replacement of system calls with their operation types, which seems to overly simplify sequences. In a frequency analysis of the composition of each trace of system calls, such simplification makes it hard to differentiate between sequences resulting from legitimate behavior and those resulting from malicious behavior.

Rough sets for system call sequences classification are an interesting approach given the results of [M8], but they do not perform as well as [M16]. As shown in [M12], using recurrent neural networks like LSTM along with one-hot encoding for each system call gives a really good TPR on ADFA-LD but at the cost of a really high FPR. A noteworthy feature of this method is that the LSTM processes sub-windows of 20 system calls, and the entire system call trace is predicted as abnormal if any of these windows are predicted as abnormal. This explains the highest FPR of all the reproduced methods as well as the fact that [M12] predicts nearly every sequence as abnormal on NGIDS-DS. Finally, recognizing patterns in sequences using a Markov chain as proposed by [M18] achieves a higher TPR than [M6] on ADFA-LD and compared to most methods on NGIDS-DS, but it also has a high FPR, giving it the lowest F1-score on ADFA-LD but the highest on NGIDS-DS.

NDCG scores are presented in Figure 15 in a sorted list to help assess the quality of predictions and scores produced by each intrusion detection method. From @1 to @1000, the variance of NDCG scores increases while bias decreases. [M10], [M12], [M13], and [M16] assign an anomaly score or accurate prediction to malicious traces on ADFA-LD. However, they do not perform well on NGIDS-DS and are among the lowest in NDCG @1000 scores. Surprisingly, [M1] achieves the best results on this dataset. The derived methods [M2], [M3], and [M10] are not far behind, as is [M5]. The standard deviation of [M4], [M5], [M12], [M13], and [M17] is higher than the other methods at @1 on ADFA-LD. Still, on this dataset, [M4], [M5], [M7], [M11], [M17], and [M18] maintain large standard deviations on @50, @100, and @500, meaning that the anomaly score these methods associate to truly malicious traces is not always adequately high. On NGIDS-DS, [M5], [M10], and [M16] have a high standard deviation at NDCG@1, and [M12], [M16], and [M18] maintain it at @50, @100, and @500. As such, methods whose classification is based on heuristics obtain more relevant predictions or anomaly scores on NGIDS-DS attack traces than the deep learning-based models that perform best on ADFA-LD. This seems counterintuitive, since this type of algorithm would seem to be better suited for modeling the complexity of behavior under NGIDS-DS and thus more suitable for complex datasets. The method to be considered for a detection solution therefore depends heavily on the dataset and, by extension, on the use case in question.

All methods seem to be completely misguided on NGIDS-DS despite having the same evaluation pipeline as ADFA-LD. In particular, grid searches are redone at each evaluation iteration to find hyperparameters that allow methods to obtain the best predictions. The fact that reproduced methods yield more contrasting results on ADFA-LD than on NGIDS-DS seems to confirm that the latter is a more difficult dataset than ADFA-LD, which we associate with it containing the system calls of a heavily used corporate server. Normal behavior then appears to be too complex to be satisfactorily characterized by the methods evaluated in this paper. Given the results of score-based and prediction-based approaches that were reproduced, it can be inferred that no single approach stands out from the others with significantly higher performance on both datasets. The analysis of raw system call identifiers frequencies in a trace followed by an MLP classifier ([M13], [M16]) is a powerful method compared to reproduced ones, maximizing TPR while minimizing FPR, although the feature reduction based on SVD ([M16]) seems to undermine results. Representing sequences by embeddings, whether one-hot ([M12]) or learned by another model ([M14], [M15]), is relevant when used as input to a neural network. Lastly, sequence coverage methods ([M1], [M2], [M3], [M10]) are also effective and are consistent in the relevance of the given score. However, no unsupervised method performs as successfully as supervised methods for a given FPR or TPR. Furthermore, methods using multiple representations of system calls such as [M17], based on extended sequences and encoded system calls, as well as their functional abstraction, seem like a good idea even if the reimplemented method does not achieve superior detection performance over simpler methods such as [M13].

5.4. Overhead Evaluation

Observing the training and inference times for each method, as shown in Figure 16, clearly reveals that there are no consistent correlations between a method’s temporal complexity and its detection capabilities.

Method [M4] obtained the poorest detection results on the ADFA-LD dataset despite being one of the most time-consuming methods to train and infer. [M6] and [M8] are among the least costly to train and infer, as is [M1], but their prediction quality is insufficient on both datasets. Methods [M2] and [M3], derived from [M1], achieve slightly lower detection performance and training time, but they take longer to infer. Similarly, [M10] requires more time to train and infer than [M1], which is probably due to the complexity of computing the largest coverage sequence for each trace. Clustering method [M5] has low training resource requirements but very long inference time as a result of kNN operation. Methods [M7] and [M9] are among the most challenging methods for both phases. This is the reason why these methods could not be evaluated on the NGIDS-DS dataset, as this dataset contains many more system call sequences than ADFA-LD, which caused the execution time to increase exponentially. As a result of the number of metrics computed as features for [M11] and the Isolation Forest model, this method is also as time consuming as [M13]’s MLP model to train and [M12]’s LSTM model to infer. Compared to lighter methods, [M11] requires more resources for training and inference but provides weaker detection performances on ADFA-LD. [M18] offers reasonable resource requirements at the cost of limited relevance of anomaly scores and is one of the least effective methods on ADFA-LD. Unsurprisingly, the use of deep learning models ([M12], [M14], [M15], [M16], [M17]), except [M13], generates very long learning and inference phases that could possibly be further reduced using model pruning [218] and weights quantization [219] if applicable.

6. Discussion

6.1. Shortcomings of Existing Work

6.1.1. Difficulty in Reproducing Methods from the Literature

As feedback, we can share the difficulty of reproducing the work of other researchers. Indeed, the information needed to reproduce presented methods is often missing, such as hyperparameter values for machine and deep learning models. This requires additional work to find the optimal parameters, which runs the risk of deviating from those actually used by the authors. To address this issue, we strongly encourage authors to publish all the elements needed to reproduce their method, including source code where possible.

6.1.2. Terminological and Evaluation Inconsistencies

The growing use of algorithms from fields that are rapidly evolving, such as machine learning as demonstrated in Section 4.2.1, also leads to confusion in the terminology used by authors—for example, in the use of the term “semi-supervised learning” to refer to single-class supervised learning. Similarly, authors do not always remind readers of the formulas used to calculate their evaluation metrics. However, this would make it easier to identify the same metrics presented under different names, such as “detection rate,” which is often synonymous with “true positive rate” in several publications [81,166,195]. The authors also do not always justify their choice of metrics, as illustrated by the analysis in Section 4.2.3: why choose to compare the performance of the method presented by accuracy, which depends heavily on the underlying dataset [206], rather than the F1-score? We believe that the metrics used to measure IDS detection performance should be dependent on the needs of the considered use case. For example, monitoring a critical system will tend to favor detecting as many attacks as possible without any False Positives. In this case, measuring TPR for a fixed FPR may be relevant. If an IDS is designed to be used in conjunction with other mechanisms aimed at reducing the number of false alarms, then TPR is an interesting metric. The relevance of the anomaly score associated with an alert is also valuable information that enables incident response teams to take suitable action. In this case, an IDS that maximizes the NDCG score would be appropriate. We recommend that authors always make the raw scores obtained during their evaluations available, such as ROC curves and confusion matrices, so that other metrics than those presented can be recalculated a posteriori.

6.1.3. Usage of Outdated Datasets

The presented IDSs performances are based on tests performed on two datasets from the literature, namely ADFA-LD and NGIDS-DS. The first, ADFA-LD [17], is the most widely and recently used for evaluating system call-based IDSs, as analyzed in Section 4.2.2. It contains few attacks and comes from a now obsolete system: the Linux kernel from which the system calls were collected was discontinued in 2011 [220]. This directly limits the credibility of evaluations conducted by researchers, as such data do not provide a basis for comparing the performance of detection methods against modern threats and on current systems. The second dataset, NGIDS-DS [216], offers a much more comprehensive set of attacks, but the threat model is not explicitly described in its publication. It is therefore difficult to clearly identify the use case represented in these data and to comment on the very poor performance of literature methods on these data, as presented in Section 5.3. Furthermore, although more recent than ADFA-LD, the Linux kernel of the system on which the NGIDS-DS system traces were collected also saw its maintenance discontinued in 2015 [221]. Other public datasets exist [10,205,222], but they are not always up to date [223] and only partially cover the various intrusion detection application cases, such as IoT and industrial systems [12,18]. On the other hand, the lack of updated datasets limits the study of concept drift in system calls for the detection task, which we did not see addressed in any of the publications analyzed for this paper.

6.2. Limitations of System Call-Based IDSs

Although intrusion detection based on system calls offer promising detection performance, this type of detection is intrinsically limited by design.

6.2.1. Possible Weaknesses in the System Call Collection Mechanism

IDSs based on the dynamic analysis of system calls are, by design, unable to detect attacks that do not invoke system calls. In contrast, there are very few actions that an attacker can perform without invoking a system call during the entire attack chain [224]. Given this observation, the effectiveness of an IDS based on system calls is highly dependent on its implementation. For example, the retrieval of system calls generated at runtime by a user-space hook can be bypassed by an attacker making direct system calls to the kernel [225]. Thus, a malicious actor could carry out their actions without the IDS recording their behavior. Possible countermeasures include collecting system calls at a privileged level, for example through a kernel module or an eBPF program for Linux operating systems, coupled with a whitelist of programs authorized to make direct system calls [225].

6.2.2. Vulnerability to Mimicry and Adversarial Attacks

If intrusion detection is based on recognizing sequences of system call identifiers that have been learned as normal behavior, then a malicious program capable of artificially reproducing these sequences can hide its actions within sequences authorized by the IDS, for example by calling system calls that are bound to fail. Such a program could also scramble its execution signature so that it does not match any known malicious behavior. This type of attack is known as a mimicry attack [224]. Although this threat is well known to researchers, it does not appear in any of the publicly available datasets identified in this literature review. This could explain why, in the experimental evaluation conducted, methods that are particularly sensitive to basic mimicry attacks, such as Forrest et al. [10] according to Wagner and Soto [224], achieve detection performance similar to or even better than the more complex methods based on machine learning outlined in Section 5.3. The latter, however, is not immune to adversarial attacks, as reported in several surveys [226,227,228,229]. Thus, system call identifiers alone are not sufficient for the comprehensive monitoring of protected systems. The use of other information such as arguments and return values has been studied in this regard, making the imitation of legitimate sequences a much more complex task for an attacker. On the other hand, adversarial attacks aimed at disrupting machine learning models responsible for detection need to be clearly studied in order to build robust IDSs.

6.3. Towards Deployable System Call-Based IDSs

6.3.1. Learning from Attack Traces Is Difficult to Achieve on a Real Installation

The vast majority of IDSs require a stage of characterization of the normal behavior of the system on which they aim to be deployed, as we saw in Section 4.2.1. This stage requires data, in our case system calls, from the final system in operation. However, our analysis of the literature has also shown that almost half of the proposed IDSs are trained on data that include malicious behavior. Yet, it is not always possible to obtain these data due to the operational risks involved. A possible approach would be to extract features that are independent of system call identifiers or other data that are highly dependent on the underlying system, such as the method proposed by Liu et al. [14]. This would allow detection systems to be trained in controlled environments in which a known set of malicious actions can be executed without the risk of compromising or affecting the real system. This kind of approach also supports the development of generalized and scalable intrusion detection systems.

6.3.2. Detect Intrusions in Real-Time

Some detection models are designed to detect anomalies in system call sequences corresponding to an entire program execution. This limits the response capabilities following an intrusion, as the IDS will only be able to raise an alert once an attacker has already impacted the target. Furthermore, this is not compatible with certain applications, particularly IoT and industrial systems, where it is not possible to wait for a process to finish executing and then raise an alert, as most application processes never stop. It would therefore be preferable to turn to detection models capable of raising alerts while processes are running either by analyzing the execution of a specific number of system calls, referred to as fixed-length sequences data granularity in Section 4.2.1, or by regularly analyzing system calls executed over a period of time, which is referred to as time interval in our literature review. Real-time detection also requires that detection mechanisms be capable of processing data as it arrives. This means that inferences must be fast whether they are integrated into the monitored device [21,121] or offloaded to an infrastructure with more computing resources [32,89]. The process of collecting system calls can also be lightened by selecting only a few key system calls for anomaly detection [24,25,132,202] or by reducing the number of features extracted as seen in Section 4.2.1. Unfortunately, publications in the literature do not systematically provide information on the computational or memory complexity of their solutions, as can be seen in Section 5.4, which is nevertheless crucial for evaluating the applicability of proposed solutions to real-world use cases.

6.3.3. Deal with Frequent and Not Always Explicable Alerts

A difficulty encountered by IDSs based on system calls is related to the extensive use of machine learning in the detection process, as illustrated in Section 4.2.1. Explaining decisions made by linear models (e.g., linear discriminant analysis, SVM with linear kernel) and decision trees easily provides an explanation for the anomaly scores obtained, whereas non-linear algorithms and neural networks require the use of additional mechanisms to make their results partially explainable [230]. This leads to alerts that cannot always be explained as such, which is not recommended when the origin of the alerts needs to be investigated further to respond appropriately. Another difficulty inherent in the use of machine learning is the frequency of False Positive raised by these models. Indeed, consider an IDS that analyzes every second the system calls executed in the last second with a False Positive Rate of 1%, which is lower than the performance obtained by the methods evaluated in Section 5.3 on ADFA-LD; this means that this model will raise, on average, one false alarm every 100 s, making it unmanageable for deployment in a real environment. One way to reduce these False Positives is to combine them with detection models that use information other than system calls, which could then confirm or refute intrusions. However, monitoring additional information or running multiple detection models necessarily impacts the performance of the monitored system. Precise overhead measurements are then required before this type of detection system can be deployed on actual devices.

6.3.4. Interfacing with Response Mechanisms

Detecting intrusions provides knowledge of malicious actions being carried out on a system, but it does not in itself provide protection against them. A highly virulent attack can compromise a machine and spread to others very quickly before an incident response team has had time to intervene. An Intrusion Prevention System (IPS) is an IDS equipped with the ability to autonomously block malicious actions as soon as they are identified. However, just as the incident response team needs to know the source of the alert to take the appropriate measures to contain an intrusion, an IPS needs certain context from the detection mechanism to identify which actions to take. Intrusion detection is therefore not just a matter of binary classification: the relevance of the anomaly score, the explainability of alerts, and the identification of resources potentially affected by an incident are essential for cybersecurity teams and for interfacing with intrusion response mechanisms.

7. Threats to Validity

The methodology adopted for this paper, defined by Kitchenham et al. [19], identifies several biases in conducting a systematic review of the literature. Three widely used scientific libraries by the computer science community were used to carefully select primary studies with a common research logic to mitigate selection biais. In addition to the extensive coverage of published work, the inclusion and exclusion criteria for the selection of primary studies were clearly defined during the definition of the methodology before any analysis of existing work to avoid being guided in any way toward the definition of any specific criteria. The attrition bias occurs when there is a difference in processing several publications, leading to their exclusion from an analysis where they deserve to be included. Then, the mentioned advice [19] was applied by noting the rationale for the exclusion of concerned studies which were cross-read by each author, which is available as described in the Data Availability Statement at the end of this article. There is therefore little room for the personal interpretation of an author, which might have been reflected in this paper. We would also like to emphasize that no Large Language Model (LLM) was used in the analysis of the publications presented in this study.

A second selection bias may have occurred during the identification of methods to be reproduced for evaluation. We attempted to minimize this by strictly applying inclusion criteria, particularly the requirement that each publication contain all the information necessary to replicate the described methods. This significantly reduced the number of candidate publications. We then identified the methods that seemed to us to be the most innovative and diverse, based on the characteristics identified in the taxonomy, in order to obtain a sample of the various methods proposed in the scientific literature. Furthermore, it is possible that one or more methods have been badly re-implemented, leading to an unfair evaluation and a performance biais. This biais was mitigated by cross-checking the re-implementation of methods between authors, which have been made using the same Python packages and environment, in addition to cross-reading the associated publication. The implementation of reproduced methods was first validated by retrieving the results obtained by the authors in their publication, which was followed by making contact with them when we were unable to do so, which greatly limits the unfairness of assessments. Finally, the measurement bias was limited by defining a common evaluation strategy for reproduced methods before any experiment and after analyzing the literature in order to identify what we recognized as best practices.

8. Conclusions and Recommendations for Future Work

In light of the drastic increase in the number of cyberattacks perpetrated on digital infrastructures worldwide, it is becoming necessary to develop new ways of detecting such malicious actions. Researchers have focused in particular on intrusion detection based on the dynamic analysis of system calls at runtime.

In order to characterize the different proposed approaches and identify trends in this area of research over the last few years, a systematic review of the literature of 209 publications from 1996 to early 2026 was conducted, the methodology of which is rigorously presented in Section 3. From this, five research questions have been identified:

RQ-1: How has research on IDSs based on system calls evolved since 1996?
RQ-2: How are system call-based IDSs built?
RQ-3: On what data are IDSs based on system calls trained and tested?
RQ-4: How are IDSs based on system calls evaluated?
RQ-5: How do state-of-the-art IDSs based on system calls identifiers perform under the same execution conditions?

The first four research questions, RQ-1, RQ-2, RQ-3 and RQ-4, were addressed in Section 4 by analyzing the literature and defining a taxonomy comprising 16 categories from the detection method to the data and evaluation metrics used. Methods based on heuristics, rules, and stochastic processes, although widely developed in the early days of intrusion detection research, are becoming less common in recent work in favor of more complex algorithms. In particular, this paper highlighted the emergence and subsequent dominance of machine learning algorithms in the task of intrusion detection, which can be assimilated to a classification problem. The most studied targets are Unix targets in desktop environments with growing interest in Cloud and IoT use cases. Finally, the evaluation metrics favored by researchers for assessing the detection performance of proposed methods are the TPR and FPR, which are followed by accuracy, F1-score, and precision measures. The majority of publications do not present an analysis of the resources required to run presented IDSs, and when they do, they focus on a measure of temporal complexity during the inference of detection methods. The study of computational resources required to operate a proposed detection system is nonetheless crucial to put into perspective the trade-off between achieved detection performance and required resources.

The last research question, RQ-5, is addressed in Section 5 by reproducing 18 methods representing the diversity of approaches proposed in the literature according to the aforementioned taxonomy. Methods were evaluated on two datasets from the literature, ADFA-LD and NGIDS-DS, and according to the metrics identified in response to RQ-4, to which we added the NDCG calculation to rank methods according to the relevance of the alerts they raise. These results showed that analyzing system calls for intrusion detection is highly relevant, as it yields promising results as a defense-in-depth mechanism, particularly when using an anomaly detection approach whose main advantage lies in the detection of zero-day, i.e., unseen, attacks. However, no unsupervised method performs better than supervised methods, and none of them achieve satisfactory results on a difficult dataset like NGIDS-DS. Then, measuring the training and inference time of each reproduced method also revealed that there is not always a link between the computational complexity of a method and the accuracy of its detection.

We have analyzed in more depth in Section 6.1 the shortcomings found in existing work, particularly the difficulty in reproducing methods from the literature due to the lack of information in publications. The use by researchers of public datasets that are obsolete, both in the system studied and in the considered attacker model, is also cited as a potential factor undermining the credibility of the results obtained, particularly when compared to modern systems and threats. The main inherent weaknesses of IDSs based on a dynamic analysis of system calls are summarized in Section 6.2, from the implementation of system call collection mechanisms to the vulnerability to mimicry and adversarial attacks. Finally, the paper concludes in Section 6.3 by identifying the main challenges associated with deploying methods developed by researchers in real-world environments, from the training to the deployment of an on-line solution.

In summary, this paper has helped identify critical issues in the research on intrusion detection from dynamic analysis of system calls. Researchers are therefore encouraged to pursue the following avenues in future work:

Build test benches that enable the creation of public datasets representative of current systems and threats based on realistic knowledge databases such as MITRE ATT&CK and Cyber Kill Chain;
Propose detection methods capable of learning solely from normal execution traces and able to raise alerts during program execution;
Evaluate proposed IDSs on public datasets, including as much information as possible for reproducing their work and, if possible, publishing the source code.
Justify the choice and provide definitions of the metrics used to evaluate the detection performance of the methods they present;
Provide an analysis of the computational complexity and memory requirements for the execution of their method;
Propose mechanisms explaining raised alerts.

Author Contributions

Conceptualization, L.A., V.B., P.-H.T. and É.G.; methodology, L.A.; software, V.B. and L.A.; validation, L.A., V.B., P.-H.T. and É.G.; formal analysis, L.A., V.B. and É.G.; investigation, L.A. and V.B.; resources, L.A.; data curation, L.A. and V.B.; writing—original draft preparation, L.A.; writing—review and editing, L.A., V.B., P.-H.T. and É.G.; visualization, L.A. and V.B.; supervision, L.A. and P.-H.T.; project administration, P.-H.T. and L.A.; funding acquisition, P.-H.T. and É.G. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the French National Research Agency in the framework of the “Investissements d’avenir” program IRT Nanoelec (ANR-10-AIRT-05). This paper has been partially supported by the French National Research Agency under the France 2030 labels SuperviZ (ANR-22-PECY-0008) and MIAI Cluster (ANR-23-IACL-0006). The views reflected herein do not necessarily reflect the opinion of the French government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The resources compiling the results of the literature analysis presented in Section 4 are openly available at https://github.com/LalieA/SoK_System_Call_IDS, accessed on 1 June 2026. It includes the publications collected following the search for primary studies, the reasons for the inclusion or exclusion for each of them, and the characteristics identified for each of them according to the taxonomy defined in this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AIS	Artificial Immune System
AUC	Area Under the (ROC) Curve
CNN	Convolutional Neural Network
DT	Decision Tree
ELM	Extreme Learning Machine
FPR	False Positive Rate
HIDS	Host-based Intrusion Detection System
HMM	Hidden Markov Model
HPC	Hardware Performance Counter
IDS	Intrusion Detection System
IF	Isolation Forest
IoT	Internet of Things
I-IoT	Industrial Internet of Things
IPS	Intrusion Prevention System
kNN	k-Nearest Neighbors
LFC	Locality Frame Count
LSTM	Long Short-Term Memory
MLP	Multi-Layer Perceptron
NDCG	Normalized Discounted Cumulative Gain
NIDS	Network-based Intrusion Detection System
NLP	Natural Language Processing
OC-SVM	One-Class Support Vector Machine
OS	Operating System
PCA	Principal Component Analysis
RF	Random Forest
RNN	Recurrent Neural Network
ROC	Receiver Operating Characteristic
SLR	Systematic Literature Review
SOC	Security Operation Center
SVD	Singular Value Decomposition
SVM	Support Vector Machine
TF-IDF	Term Frequency-Inverse Document Frequency
TPR	True Positive Rate

Appendix A. Applied Search Strings

Referenced in Section 3.2.1.

Scopus:

TITLE-ABS-KEY(
(hids OR ((host OR "host-based" OR "host based") AND ((intrusion OR anomaly OR misuse OR malware) AND detection) OR ids)) AND ("system call" OR syscall)
)

IEEE Xplore:

("All Metadata":hids OR (("All Metadata":host OR "All Metadata":"host-based" OR "All Metadata":"host based") AND (("All Metadata":intrusion OR "All Metadata":anomaly OR "All Metadata":misuse OR "All Metadata":malware) AND "All Metadata":detection) OR "All Metadata":ids)) AND ("All Metadata":"system call" OR "All Metadata":syscall)

The ACM Guide to Computing Literature:

(hids OR ((host OR "host-based" OR "host based") AND ((intrusion OR anomaly OR misuse OR malware) AND detection) OR ids)) AND ("system call" OR syscall)

Appendix B. Number of Publications Involved in Each Category of System Call-Based IDS Taxonomy

Referenced in Section 4.2.1.

Figure A1. Number of publications involved in each category of system call-based IDS taxonomy, illustrated by a Sankey diagram; (a) Detection type, Learning paradigm, Trained on attacks? (b) Trained on attacks? Data granularity, Features, Feature reduction, Classification method; (c) Classification method, Collaborative, Additional information used.

Appendix C. ROC Curves of Reproduced Score Methods

Referenced in Section 5.

Figure A2. ROC curves of reproduced score methods on the ADFA-LD dataset.

Figure A3. ROC curves of reproduced score methods, on NGIDS-DS dataset. *: The 5 executions on the NGIDS-DS dataset could not be obtained due to excessive training, validation and testing time, exceeding 60 days per iteration.

Appendix D. Confusion Matrices of Reproduced Prediction Methods

Referenced in Section 5.

Figure A4. Confusion matrices of reproduced prediction methods on the ADFA-LD dataset.

Figure A5. Confusion matrices of reproduced prediction methods on the NGIDS-DS dataset.

References

Crowdstrike. 2024 Global Threat Report; Crowdstrike: Austin, TX, USA, 2024. [Google Scholar]
Novikava, A. Cybersecurity Statistics 2024: Key Insights and Numbers (NordLayer). Available online: https://nordlayer.com/blog/cybersecurity-statistics-of-2024/ (accessed on 1 June 2026).
Symantec. The 2024 Ransomware Threat Landscape; Symantec: San Jose, CA, USA, 2023. [Google Scholar]
Microsoft Threat Intelligence. Microsoft Digital Defense Report 2024; Microsoft: Redmond, WA, USA, 2024. [Google Scholar]
Shirey, R.W. Internet Security Glossary, Version 2. RFC 4949. 2007. Available online: https://www.rfc-editor.org/info/rfc4949/ (accessed on 1 June 2026).
Agence Nationale de la Sécurité des Systèmes d’Information (ANSSI) CyberDico de l’ANSSI. 2024. Available online: https://cyber.gouv.fr/cyberdico/ (accessed on 1 June 2026).
Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]
Denning, D.E. An intrusion-detection model. IEEE Trans. Softw. Eng. 1987, SE-13, 222–232. [Google Scholar] [CrossRef]
Lunt, T. Detecting intruders in computer systems. In Proceedings of the 1993 Conference on Auditing and Computer Technology; 1993; Volume 61. Available online: https://www.csl.sri.com/papers/canada93/ (accessed on 1 June 2026).
Forrest, S.; Hofmeyr, S.; Somayaji, A.; Longstaff, T. A Sense of Self for Unix Processes. In Proceedings of the IEEE Symposium on Security and Privacy; IEEE: Piscataway, NJ, USA, 1996; pp. 120–128. [Google Scholar] [CrossRef]
Hofmeyr, S.A.; Forrest, S.; Somayaji, A. Intrusion Detection Using Sequences of System Calls. J. Comput. Secur. 1998, 6, 151–180. [Google Scholar] [CrossRef]
Sworna, Z.T.; Mousavi, Z.; Babar, M.A. NLP Methods in Host-Based Intrusion Detection Systems: A Systematic Review and Future Directions. J. Netw. Comput. Appl. 2023, 220, 103761. [Google Scholar] [CrossRef]
Khandelwal, P.; Likhar, P.; Yadav, R.S. Machine Learning Methods Leveraging ADFA-LD Dataset for Anomaly Detection in Linux Host Systems. In Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT); IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
Liu, M.; Xue, Z.; Xu, X.; Zhong, C.; Chen, J. Host-Based Intrusion Detection System with System Calls: Review and Future Trends. ACM Comput. Surv. 2018, 51, 98. [Google Scholar] [CrossRef]
Bridges, R.A.; Glass-Vanderlan, T.R.; Iannacone, M.D.; Vincent, M.S.; Chen, Q.G. A Survey of Intrusion Detection Systems Leveraging Host Data. ACM Comput. Surv. 2020, 52, 128. [Google Scholar] [CrossRef]
Martins, I.; Resende, J.S.; Sousa, P.R.; Silva, S.; Antunes, L.; Gama, J. Host-Based IDS: A Review and Open Issues of an Anomaly Detection System in IoT. Future Gener. Comput. Syst. 2022, 133, 95–113. [Google Scholar] [CrossRef]
Creech, G.; Hu, J. Generation of a New IDS Test Dataset: Time to Retire the KDD Collection. In Proceedings of the 2013 IEEE Wireless Communications and Networking Conference (WCNC) Shanghai, China; IEEE: Piscataway, NJ, USA, 2013; pp. 4487–4492. [Google Scholar] [CrossRef]
Satilmiş, H.; Akleylek, S.; Tok, Z.Y. A Systematic Literature Review on Host-Based Intrusion Detection Systems. IEEE Access Pract. Innov. Open Solut. 2024, 12, 27237–27266. [Google Scholar] [CrossRef]
Kitchenham, B.; Charters, S.; Budgen, D.; Brereton, P.; Turner, M.; Linkman, S.; Jørgensen, M.; Mendes, E.; Visaggio, G. Guidelines for Performing Systematic Literature Reviews in Software Engineering. 2007. Available online: https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf (accessed on 1 June 2026).
Al-Asli, M.; Ghaleb, T.A. Review of Signature-based Techniques in Antivirus Products. In Proceedings of the 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Huertas Celdrán, A.; Sánchez Sánchez, P.M.; Feng, C.; Bovet, G.; Pérez, G.M.; Stiller, B. Privacy-Preserving and Syscall-Based Intrusion Detection System for IoT Spectrum Sensors Affected by Data Falsification Attacks. IEEE Internet Things J. 2023, 10, 8408–8415. [Google Scholar] [CrossRef]
Holubenko, V.; Silva, P. An Intelligent Mechanism for Monitoring and Detecting Intrusions in IoT Devices. In Proceedings of the 2023 IEEE 24th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Boston, MA, USA; IEEE: Piscataway, NJ, USA, 2023; pp. 470–479. [Google Scholar] [CrossRef]
Holubenko, V.; Gaspar, D.; Leal, R.; Silva, P. Autonomous Intrusion Detection for IoT: A Decentralized and Privacy Preserving Approach. Int. J. Inf. Secur. 2025, 24, 7. [Google Scholar] [CrossRef]
Asaka, M.; Onabuta, T.; Inoue, T.; Okazawa, S.; Goto, S. A New Intrusion Detection Method Based on Discriminant Analysis. IEICE Trans. Inf. Syst. 2001, 84, 570–577. [Google Scholar]
Pu, S.; Lang, B. An Intrusion Detection Method Based on System Call Temporal Serial Analysis. In Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues; Huang, D.S., Heutte, L., Loog, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4681, pp. 656–666. [Google Scholar] [CrossRef]
Tran, Q.A.; Jiang, F.; Ha, Q.M. Evolving Block-Based Neural Network and Field Programmable Gate Arrays for Host-Based Intrusion Detection System. In Proceedings of the 2012 Fourth International Conference on Knowledge and Systems Engineering, Danang, Vietnam; IEEE: Piscataway, NJ, USA, 2012; pp. 86–92. [Google Scholar] [CrossRef]
Win, T.Y.; Tianfield, H.; Mair, Q. Detection of Malware and Kernel-Level Rootkits in Cloud Computing Environments. In Proceedings of the 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing, New York, NY, USA; IEEE: Piscataway, NJ, USA, 2015; pp. 295–300. [Google Scholar] [CrossRef]
Huseynov, H.; Kourai, K.; Saadawi, T.; Igbe, O. Virtual Machine Introspection for Anomaly-Based Keylogger Detection. In Proceedings of the 2020 IEEE 21st International Conference on High Performance Switching and Routing (HPSR), Newark, NJ, USA; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Xie, M.; Hu, J. Evaluating Host-Based Anomaly Detection Systems: A Preliminary Analysis of ADFA-LD. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China; IEEE: Piscataway, NJ, USA, 2013; pp. 1711–1716. [Google Scholar] [CrossRef]
Sharafaldin, I.; Ghorbani, A.A. EagleEye: A Novel Visual Anomaly Detection Method. In Proceedings of the 2018 16th Annual Conference on Privacy, Security and Trust (PST), Belfast, Ireland; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, X.; Niyaz, Q.; Jahan, F.; Sun, W. Early Detection of Host-based Intrusions in Linux Environment. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA; IEEE: Piscataway, NJ, USA, 2020; pp. 475–479. [Google Scholar] [CrossRef]
Liu, M.; Xue, Z.; He, X. A Unified Host-based Intrusion Detection Framework Using Spark in Cloud. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China; IEEE: Piscataway, NJ, USA, 2020; pp. 97–103. [Google Scholar] [CrossRef]
Subba, B.; Gupta, P. A Tfidfvectorizer and Singular Value Decomposition Based Host Intrusion Detection System Framework for Detecting Anomalous System Processes. Comput. Secur. 2021, 100, 102084. [Google Scholar] [CrossRef]
Melvin, A.A.R.; Kathrine, G.J.W.; Pasupathi, S.; Shanmuganathan, V.; Naganathan, R. An AI Powered System Call Analysis with Bag of Word Approaches for the Detection of Intrusions and Malware in Australian Defence Force Academy and Virtual Machine Monitor Malware Attack Data Set. Expert Syst. 2024, 41, e13029. [Google Scholar] [CrossRef]
He, J.; Tang, C.; Li, W.; Li, T.; Chen, L.; Lan, X. BR-HIDF: An Anti-Sparsity and Effective Host Intrusion Detection Framework Based on Multi-Granularity Feature Extraction. IEEE Trans. Inf. Forensics Secur. 2024, 19, 485–499. [Google Scholar] [CrossRef]
Wagner, C.; Wagener, G.; State, R.; Engel, T. Malware Analysis with Graph Kernels and Support Vector Machines. In Proceedings of the 2009 4th International Conference on Malicious and Unwanted Software (MALWARE), Montreal, QC, Canada; IEEE: Piscataway, NJ, USA, 2009; pp. 63–68. [Google Scholar] [CrossRef]
Nykodym, T.; Skormin, V.; Dolgikh, A.; Antonakos, J. Automatic Functionality Detection in Behavior-Based IDS. In Proceedings of the 2011—MILCOM 2011 Military Communications Conference, Baltimore, MD, USA; IEEE: Piscataway, NJ, USA, 2011; pp. 1302–1307. [Google Scholar] [CrossRef]
Manzoor, E.; Milajerdi, S.M.; Akoglu, L. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1035–1044. [Google Scholar] [CrossRef]
Pandiaraja, P.; Muthumanickam, K.; Palani Kumar, R. A Graph-Based Model for Discovering Host-Based Hook Attacks. In Smart Technologies in Data Science and Communication; Ogudo, K.A., Saha, S.K., Bhattacharyya, D., Eds.; Springer Nature Singapore: Singapore, 2023; Volume 558, pp. 1–13. [Google Scholar] [CrossRef]
Kolbitsch, C.; Comparetti, P.M.; Kruegel, C.; Kirda, E.; Zhou, X.; Wang, X. Effective and Efficient Malware Detection at the End Host. In Proceedings of the USENIX Security Symposium, Montreal, QC, Canada, 10–14 August 2009; Volume 4, pp. 351–366. [Google Scholar]
Lu, H.; Wang, X.; Zhao, B.; Wang, F.; Su, J. ENDMal: An Anti-Obfuscation and Collaborative Malware Detection System Using Syscall Sequences. Math. Comput. Model. 2013, 58, 1140–1154. [Google Scholar] [CrossRef]
Muthumanickam, K.; Ilavarasan, E. Optimizing Detection of Malware Attacks through Graph-Based Approach. In Proceedings of the 2017 International Conference on Technical Advancements in Computers and Communications (ICTACC), Melmaurvathur, India; IEEE: Piscataway, NJ, USA, 2017; pp. 87–91. [Google Scholar] [CrossRef]
El Khairi, A.; Caselli, M.; Knierim, C.; Peter, A.; Continella, A. Contextualizing System Calls in Containers for Anomaly-Based Intrusion Detection. In Proceedings of the 2022 on Cloud Computing Security Workshop, Los Angeles, CA, USA; Association for Computing Machinery: New York, NY, USA, 2022; pp. 9–21. [Google Scholar] [CrossRef]
Chysi, A.; Nikolopoulos, S.D.; Polenakis, I. Detection and Classification of Malicious Software Utilizing Max-Flows between System-Call Groups. J. Comput. Virol. Hacking Tech. 2022, 19, 97–123. [Google Scholar] [CrossRef]
Guo, P. Intrusion Detection Based on Complete System Call Information. In Proceedings of the 2024 International Conference on Digital Society and Artificial Intelligence, Qingdao, China; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
Araujo, I.; Vieira, M. Enhancing Intrusion Detection in Containerized Services: Assessing Machine Learning Models and an Advanced Representation for System Call Data. Comput. Secur. 2025, 154, 104438. [Google Scholar] [CrossRef]
Irshad, H.; Ciocarlie, G.; Gehani, A.; Yegneswaran, V.; Lee, K.H.; Patel, J.; Jha, S.; Kwon, Y.; Xu, D.; Zhang, X. TRACE: Enterprise-Wide Provenance Tracking for Real-Time APT Detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4363–4376. [Google Scholar] [CrossRef]
Wang, S.; Wang, Z.; Zhou, T.; Sun, H.; Yin, X.; Han, D.; Zhang, H.; Shi, X.; Yang, J. THREATRACE: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3972–3987. [Google Scholar] [CrossRef]
Kolosnjaji, B.; Zarras, A.; Webster, G.; Eckert, C. Deep Learning for Classification of Malware System Call Sequences. In AI 2016: Advances in Artificial Intelligence; Kang, B.H., Bai, Q., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9992, pp. 137–149. [Google Scholar] [CrossRef]
Čeponis, D.; Goranin, N. Evaluation of Deep Learning Methods Efficiency for Malicious and Benign System Calls Classification on the AWSCTD. Secur. Commun. Netw. 2019, 2019, 2317976. [Google Scholar] [CrossRef]
Seo, J.; Bang, I.; You, J.; Cho, Y.; Paek, Y. SBGen: A Framework to Efficiently Supply Runtime Information for a Learning-Based HIDS for Multiple Virtual Machines. IEEE Access 2020, 8, 225356–225369. [Google Scholar] [CrossRef]
Wunderlich, S.; Ring, M.; Landes, D.; Hotho, A. Comparison of System Call Representations for Intrusion Detection. In Proceedings of the International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on European Transnational Education (ICEUTE 2019); Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 951, pp. 14–24. [Google Scholar] [CrossRef]
Bhardwaj, R.; Noferesti, M.; Janecek, M.; Ezzati-Jivan, N. EMD-SCS: A Dynamic Behavioral Approach for Early Malware Detection with Sonification of System Call Sequences. In Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, UK; IEEE: Piscataway, NJ, USA, 2023; pp. 1728–1737. [Google Scholar] [CrossRef]
Chawla, A.; Lee, B.; Fallon, S.; Jacob, P. Host Based Intrusion Detection System with Combined CNN/RNN Model. In ECML PKDD 2018 Workshops; Alzate, C., Monreale, A., Assem, H., Bifet, A., Buda, T.S., Caglayan, B., Drury, B., García-Martín, E., Gavaldà, R., Koprinska, I., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 11329, pp. 149–158. [Google Scholar] [CrossRef]
Gantikow, H.; Zohner, T.; Reich, C. Container Anomaly Detection Using Neural Networks Analyzing System Calls. In Proceedings of the 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Västerås, Sweden; IEEE: Piscataway, NJ, USA, 2020; pp. 408–412. [Google Scholar] [CrossRef]
Ring, J.H.; Van Oort, C.M.; Durst, S.; White, V.; Near, J.P.; Skalka, C. Methods for Host-based Intrusion Detection with Deep Learning. Digit. Threat. Res. Pract. 2021, 2, 26. [Google Scholar] [CrossRef]
Zhang, Y.; Luo, S.; Pan, L.; Zhang, H. Syscall-BSEM: Behavioral Semantics Enhancement Method of System Call Sequence for High Accurate and Robust Host Intrusion Detection. Future Gener. Comput. Syst. 2021, 125, 112–126. [Google Scholar] [CrossRef]
Lu, Y.; Teng, S. Application of Sequence Embedding in Host-based Intrusion Detection System. In Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China; IEEE: Piscataway, NJ, USA, 2021; pp. 434–439. [Google Scholar] [CrossRef]
Fournier, Q.; Aloise, D.; Azhari, S.V.; Tetreault, F. On Improving Deep Learning Trace Analysis with System Call Arguments. In Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, Spain; IEEE: Piscataway, NJ, USA, 2021; pp. 120–130. [Google Scholar] [CrossRef]
Wan, B.; He, Y.; Liu, X.; Wang, S.; Qian, Y. Host Intrusion Detection Method Based on Short Sequence of System Call. In Proceedings of the 2023 10th International Conference on Dependable Systems and Their Applications (DSA), Tokyo, Japan; IEEE: Piscataway, NJ, USA, 2023; pp. 312–322. [Google Scholar] [CrossRef]
Baksi, R.P.; Nalka, V.; Upadhyaya, S. Apt Detection of Ransomware—An Approach to Detect Advanced Persistent Threats Using System Call Information. In Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, UK; IEEE: Piscataway, NJ, USA, 2023; pp. 1621–1630. [Google Scholar] [CrossRef]
Kim, Y.; Hong, S.Y.; Park, S.; Kim, H.K. Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection. IEEE Access 2025, 13, 15346–15362. [Google Scholar] [CrossRef]
Ye, J.; Yan, M.; Wu, S.; Tan, J.; Wu, J. U-SCAD: An Unsupervised Method of System Call-Driven Anomaly Detection for Containerized Edge Clouds. Future Internet 2025, 17, 218. [Google Scholar] [CrossRef]
Murtaza, S.S.; Khreich, W.; Hamou-Lhadj, A.; Couture, M. A Host-Based Anomaly Detection Approach by Representing System Calls as States of Kernel Modules. In Proceedings of the 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), Pasadena, CA, USA; IEEE: Piscataway, NJ, USA, 2013; pp. 431–440. [Google Scholar] [CrossRef]
Murtaza, S.S.; Khreich, W.; Hamou-Lhadj, A.; Gagnon, S. A Trace Abstraction Approach for Host-Based Anomaly Detection. In Proceedings of the 2015 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Verona, NY, USA; IEEE: Piscataway, NJ, USA, 2015; pp. 1–8. [Google Scholar] [CrossRef]
Lee, T.H.; Huang, H.Y.; Juang, C. A High-Performance Deep Learning Architecture for Host-based Intrusion Detection System. In Proceedings of the 2020 IEEE Region 10 Conference (TENCON), Osaka, Japan; IEEE: Piscataway, NJ, USA, 2020; pp. 1198–1202. [Google Scholar] [CrossRef]
Vyšniūnas, T.; Čeponis, D.; Goranin, N.; Čenys, A. Risk-Based System-Call Sequence Grouping Method for Malware Intrusion Detection. Electronics 2024, 13, 206. [Google Scholar] [CrossRef]
Yin, J.; Ishikawa, Y.; Takefusa, A. A Lightweight Monitoring and Anomaly Detection Framework for IoT Devices. In Proceedings of the 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC), Toronto, ON, Canada; IEEE: Piscataway, NJ, USA, 2025; pp. 1184–1193. [Google Scholar] [CrossRef]
Cho, S.-B. Incorporating Soft Computing Techniques into a Probabilistic Intrusion Detection System. IEEE Trans. Syst. Man. Cybern. Part C (Appl. Rev.) 2002, 32, 154–160. [Google Scholar] [CrossRef]
Maggi, F.; Matteucci, M.; Zanero, S. Detecting Intrusions through System Call Sequence and Argument Analysis. IEEE Trans. Dependable Secur. Comput. 2010, 7, 381–395. [Google Scholar] [CrossRef]
Koucham, O.; Rachidi, T.; Assem, N. Host Intrusion Detection Using System Call Argument-Based Clustering Combined with Bayesian Classification. In Proceedings of the 2015 SAI Intelligent Systems Conference (IntelliSys), London, UK; IEEE: Piscataway, NJ, USA, 2015; pp. 1010–1016. [Google Scholar] [CrossRef]
Rachidi, T.; Koucham, O.; Assem, N. Combined Data and Execution Flow Host Intrusion Detection Using Machine Learning. In Intelligent Systems and Applications; Bi, Y., Kapoor, S., Bhatia, R., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 650, pp. 427–450. [Google Scholar] [CrossRef]
Yedukondalu, G.; Anand Chandulal, J.; Srinivasa Rao, M. Host-Based Intrusion Detection System Using File Signature Technique. In Innovations in Computer Science and Engineering; Saini, H.S., Sayal, R., Rawat, S.S., Eds.; Springer: Singapore, 2017; Volume 8, pp. 225–232. [Google Scholar] [CrossRef]
Da Costa, V.G.T.; Barbon, S.; Miani, R.S.; Rodrigues, J.J.P.C.; Zarpelao, B.B. Detecting Mobile Botnets through Machine Learning and System Calls Analysis. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
Liu, Z.; Japkowicz, N.; Wang, R.; Cai, Y.; Tang, D.; Cai, X. A Statistical Pattern Based Feature Extraction Method on System Call Traces for Anomaly Detection. Inf. Softw. Technol. 2020, 126, 106348. [Google Scholar] [CrossRef]
Baras, J.S.; Rabi, M. Intrusion Detection with Support Vector Machines and Generative Models. In Information Security; Chan, A.H., Gligor, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2433, pp. 32–47. [Google Scholar] [CrossRef]
Kruegel, C.; Mutz, D.; Valeur, F.; Vigna, G. On the Detection of Anomalous System Call Arguments. In Computer Security—ESORICS 2003; Goos, G., Hartmanis, J., Van Leeuwen, J., Snekkenes, E., Gollmann, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2808, pp. 326–343. [Google Scholar] [CrossRef]
Mutz, D.; Valeur, F.; Vigna, G.; Kruegel, C. Anomalous System Call Detection. ACM Trans. Inf. Syst. Secur. 2006, 9, 61–93. [Google Scholar] [CrossRef]
Joy, J.; John, A. Host Based Attack Detection Using System Calls. In Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, Coimbatore, India; Association for Computing Machinery: New York, NY, USA, 2012; pp. 7–11. [Google Scholar] [CrossRef]
Marschalek, S.; Luh, R.; Kaiser, M.; Schrittwieser, S. Classifying Malicious System Behavior Using Event Propagation Trees. In Proceedings of the 17th International Conference on Information Integration and Web-Based Applications & Services, Brussels, Belgium; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1–10. [Google Scholar] [CrossRef]
Haider, W.; Hu, J.; Xie, M. Towards Reliable Data Feature Retrieval and Decision Engine in Host-Based Anomaly Detection Systems. In Proceedings of the 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), Auckland, New Zealand; IEEE: Piscataway, NJ, USA, 2015; pp. 513–517. [Google Scholar] [CrossRef]
Haider, W.; Hu, J.; Yu, X.; Xie, Y. Integer Data Zero-Watermark Assisted System Calls Abstraction and Normalization for Host Based Anomaly Detection Systems. In Proceedings of the 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing, New York, NY, USA; IEEE: Piscataway, NJ, USA, 2015; pp. 349–355. [Google Scholar] [CrossRef]
Wüchner, T.; Ochoa, M.; Golagha, M.; Srivastava, G.; Schreck, T.; Pretschner, A. MalFlow: Identification of C&C Servers through Host-Based Data Flow Profiling. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy; Association for Computing Machinery: New York, NY, USA, 2016; pp. 2087–2094. [Google Scholar] [CrossRef]
Haider, W.; Moustafa, N.; Keshk, M.; Fernandez, A.; Choo, K.K.R.; Wahab, A. FGMC-HADS: Fuzzy Gaussian Mixture-Based Correntropy Models for Detecting Zero-Day Attacks from Linux Systems. Comput. Secur. 2020, 96, 101906. [Google Scholar] [CrossRef]
Cha, B.; Park, K.; Seo, J. Neural Network Techniques for Host Anomaly Intrusion Detection Using Fixed Pattern Transformation. In Computational Science and Its Applications—ICCSA 2005; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3481, pp. 254–263. [Google Scholar] [CrossRef]
Chung, M.; Cho, J.; Moon, J. An Effective Denial of Service Detection Method Using Kernel Based Data. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Cyber Security, Nashville, TN, USA; IEEE: Piscataway, NJ, USA, 2009; pp. 9–12. [Google Scholar] [CrossRef]
Xie, M.; Hu, J.; Yu, X.; Chang, E. Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD. In Network and System Security; Au, M.H., Carminati, B., Kuo, C.C.J., Eds.; Springer International Publishing: Cham, Switzerland, 2014; Volume 8792, pp. 542–549. [Google Scholar] [CrossRef]
Choy, J.; Cho, S.B. Anomaly Detection of Computer Usage Using Artificial Intelligence Techniques. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence; Springer: Cham, Switzerland, 2000; pp. 31–43. [Google Scholar]
Khater, B.S.; Abdul Wahab, A.W.; Idris, M.Y.I.; Hussain, M.A.; Ibrahim, A.A.; Amin, M.A.; Shehadeh, H.A. Classifier Performance Evaluation for Lightweight IDS Using Fog Computing in IoT Security. Electronics 2021, 10, 1633. [Google Scholar] [CrossRef]
Chhaybi, A.; Lazaar, S. A Novel Wavelet-Based Model For Android Malware Detection Utilizing System Calls Features. J. Netw. Syst. Manag. 2025, 33, 58. [Google Scholar] [CrossRef]
Gyamfi, N.K.; Goranin, N. A Classical and Hybrid Machine Learning Model for Host-Based Intrusion Systems. In Proceedings of the Data Analytics and Management; Swaroop, A., Virdee, B., Correia, S.D., Polkowski, Z., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2026; Volume 1603, pp. 34–46. [Google Scholar] [CrossRef]
Canzanese, R.; Mancoridis, S.; Kam, M. Run-Time Classification of Malicious Processes Using System Call Analysis. In Proceedings of the 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, PR, USA; IEEE: Piscataway, NJ, USA, 2015; pp. 21–28. [Google Scholar] [CrossRef]
Liao, X.; Wang, C.; Chen, W. Anomaly Detection of System Call Sequence Based on Dynamic Features and Relaxed-SVM. Secur. Commun. Netw. 2022, 2022, 6401316. [Google Scholar] [CrossRef]
Canzanese, R.; Mancoridis, S.; Kam, M. System Call-Based Detection of Malicious Processes. In Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security, Vancouver, BC, Canada; IEEE: Piscataway, NJ, USA, 2015; pp. 119–124. [Google Scholar] [CrossRef]
Che, Z.; Ji, X. An Efficient Intrusion Detection Approach Based on Hidden Markov Model and Rough Set. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface, Kaifeng, China; IEEE: Piscataway, NJ, USA, 2010; pp. 476–479. [Google Scholar] [CrossRef]
Sufatrio; Yap, R.H.C. Improving Host-Based IDS with Argument Abstraction to Prevent Mimicry Attacks. In Recent Advances in Intrusion Detection; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3858, pp. 146–164. [Google Scholar] [CrossRef]
Amer, S.H.; Hamilton, J.A. Investigating Intrusion Detection Systems That Use Trails of System Calls. In Proceedings of the 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems; IEEE: Piscataway, NJ, USA, 2008; pp. 377–384. [Google Scholar]
Tian, X.; Cheng, X.; Duan, M.; Liao, R.; Chen, H.; Chen, X. Network Intrusion Detection Based on System Calls and Data Mining. Front. Comput. Sci. China 2010, 4, 522–528. [Google Scholar] [CrossRef]
Jewell, B.; Beaver, J. Host-Based Data Exfiltration Detection via System Call Sequences. In Proceedings of the ICIW2011—6th International Conference on Information Warfare and Secuirty: ICIW; Academic Conferences Limited: Reading, UK, 2011; p. 134. [Google Scholar]
Alarifi, S.S.; Wolthusen, S.D. Detecting Anomalies in IaaS Environments through Virtual Machine Host System Call Analysis. In Proceedings of the 2012 International Conference for Internet Technology and Secured Transactions; IEEE: Piscataway, NJ, USA, 2012; pp. 211–218. [Google Scholar]
Milea, N.A.; Khoo, S.C.; Lo, D.; Pop, C. NORT: Runtime Anomaly-Based Monitoring of Malicious Behavior for Windows. In Runtime Verification; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7186, pp. 115–130. [Google Scholar] [CrossRef]
Gupta, S.; Kumar, P. An Immediate System Call Sequence Based Approach for Detecting Malicious Program Executions in Cloud Environment. Wirel. Pers. Commun. 2015, 81, 405–425. [Google Scholar] [CrossRef]
Marteau, P.F. Sequence Covering for Efficient Host-Based Intrusion Detection. IEEE Trans. Inf. Forensics Secur. 2019, 14, 994–1006. [Google Scholar] [CrossRef]
Nuansri, N.; Singh, S.; Dillon, T. A Process State-Transition Analysis and Its Application to Intrusion Detection. In Proceedings of the 15th Annual Computer Security Applications Conference (ACSAC’99), Phoenix, AZ, USA; IEEE: Piscataway, NJ, USA, 1999; pp. 378–387. [Google Scholar] [CrossRef]
Bowen, T.; Chee, D.; Segal, M.; Sekar, R.; Shanbhag, T.; Uppuluri, P. Building Survivable Systems: An Integrated Approach Based on Intrusion Detection and Damage Containment. In Proceedings of the DARPA Information Survivability Conference and Exposition, DISCEX’00, Hilton Head, SC, USA; IEEE: Piscataway, NJ, USA, 1999; Volume 2, pp. 84–99. [Google Scholar] [CrossRef]
Chari, S.N.; Cheng, P.C. BlueBoX: A Policy-Driven, Host-Based Intrusion Detection System. ACM Trans. Inf. Syst. Secur. 2003, 6, 173–200. [Google Scholar] [CrossRef]
Provos, N. Improving Host Security with System Call Policies. In Proceedings of the USENIX Security Symposium, Washington, DC, USA, 4–8 August 2003; pp. 257–272. [Google Scholar]
Battistoni, R.; Gabrielli, E.; Mancini, L.V. A Host Intrusion Prevention System for Windows Operating Systems. In Computer Security—ESORICS 2004; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3193, pp. 352–368. [Google Scholar] [CrossRef]
Tandon, G.; Chan, P.K. On the learning of system call attributes for host-based anomaly detection. Int. J. Artif. Intell. Tools 2006, 15, 875–892. [Google Scholar] [CrossRef]
Li, P.; Park, H.; Gao, D.; Fu, J. Bridging the Gap between Data-Flow and Control-Flow Analysis for Anomaly Detection. In Proceedings of the 2008 Annual Computer Security Applications Conference (ACSAC), Anaheim, CA, USA; IEEE: Piscataway, NJ, USA, 2008; pp. 392–401. [Google Scholar] [CrossRef]
Mohanty, H.; Swamy, M.V.; Thilak, P.; Ramaswamy, S. Secured Networking by Sandboxing LINUX 2.6. In Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA; IEEE: Piscataway, NJ, USA, 2009; pp. 3669–3674. [Google Scholar] [CrossRef]
Lanzi, A.; Balzarotti, D.; Kruegel, C.; Christodorescu, M.; Kirda, E. AccessMiner: Using System-Centric Models for Malware Protection. In Proceedings of the 17th ACM Conference on Computer and Communications Security, Chicago, IL, USA; Association for Computing Machinery: New York, NY, USA, 2010; pp. 399–412. [Google Scholar] [CrossRef]
Liu, A.; Jiang, X.; Jin, J.; Mao, F.; Chen, J. Enhancing System-Called-Based Intrusion Detection with Protocol Context. In Proceedings of the IARIA SECURWARE, Laurent du Var, France, 21–27 August 2011. [Google Scholar]
Ming, J.; Zhang, H.; Gao, D. Towards Ground Truthing Observations in Gray-Box Anomaly Detection. In Proceedings of the 2011 5th International Conference on Network and System Security, Milan, Italy; IEEE: Piscataway, NJ, USA, 2011; pp. 25–32. [Google Scholar] [CrossRef][Green Version]
Patanaik, C.K.; Barbhuiya, F.A.; Nandi, S. Obfuscated Malware Detection Using API Call Dependency. In Proceedings of the First International Conference on Security of Internet of Things, Kollam, India; Association for Computing Machinery: New York, NY, USA, 2012; pp. 185–193. [Google Scholar] [CrossRef]
Sprabery, R.; Estrada, Z.J.; Kalbarczyk, Z.; Iyer, R.; Bobba, R.B.; Campbell, R. Trustworthy Services Built on Event-Based Probing for Layered Defense. In Proceedings of the 2017 IEEE International Conference on Cloud Engineering (IC2E), Vancouver, BC, Canada; IEEE: Piscataway, NJ, USA, 2017; pp. 215–225. [Google Scholar] [CrossRef]
Sekeh, M.A.; Maarof, M.A.B. Fuzzy Intrusion Detection System via Data Mining Technique with Sequences of System Calls. In Proceedings of the 2009 Fifth International Conference on Information Assurance and Security, Xi’an, China; IEEE: Piscataway, NJ, USA, 2009; pp. 154–157. [Google Scholar] [CrossRef]
Anderson, B.; Quist, D.; Lane, T. Detecting Code Injection Attacks in Internet Explorer. In Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference Workshops, Munich, Germany; IEEE: Piscataway, NJ, USA, 2011; pp. 90–95. [Google Scholar] [CrossRef]
Elgraini, M.T.; Assem, N.; Rachidi, T. Host Intrusion Detection for Long Stealthy System Call Sequences. In Proceedings of the 2012 Colloquium in Information Science and Technology, Fez, Morocco; IEEE: Piscataway, NJ, USA, 2012; pp. 96–100. [Google Scholar] [CrossRef]
Sha, W.; Zhu, Y.; Chen, M.; Huang, T. Statistical Learning for Anomaly Detection in Cloud Server Systems: A Multi-Order Markov Chain Framework. IEEE Trans. Cloud Comput. 2018, 6, 401–413. [Google Scholar] [CrossRef]
Shamim, N.; Asim, M.; Baker, T.; Awad, A.I. Efficient Approach for Anomaly Detection in IoT Using System Calls. Sensors 2023, 23, 652. [Google Scholar] [CrossRef]
Xu, J.; Shelton, C.R. Intrusion Detection Using Continuous Time Bayesian Networks. J. Artif. Intell. Res. 2010, 39, 745–774. [Google Scholar] [CrossRef]
Yeung, D.Y.; Ding, Y. Host-Based Intrusion Detection Using Dynamic and Static Behavioral Models. Pattern Recognit. 2003, 36, 229–243. [Google Scholar] [CrossRef]
Qian, Q.; Xin, M. Research on Hidden Markov Model for System Call Anomaly Detection. In Intelligence and Security Informatics; Yang, C.C., Zeng, D., Chau, M., Chang, K., Yang, Q., Cheng, X., Wang, J., Wang, F.Y., Chen, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4430, pp. 152–159. [Google Scholar] [CrossRef]
Hu, J.; Yu, X.; Qiu, D.; Chen, H.H. A Simple and Efficient Hidden Markov Model Scheme for Host-Based Anomaly Intrusion Detection. IEEE Netw. 2009, 23, 42–47. [Google Scholar] [CrossRef]
Gao, D.; Reiter, M.; Song, D. Beyond Output Voting: Detecting Compromised Replicas Using HMM-Based Behavioral Distance. IEEE Trans. Dependable Secur. Comput. 2009, 6, 96–110. [Google Scholar] [CrossRef]
Alarifi, S.; Wolthusen, S. Anomaly Detection for Ephemeral Cloud IaaS Virtual Machines. In Network and System Security; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7873, pp. 321–335. [Google Scholar] [CrossRef]
Byrnes, J.; Hoang, T.; Mehta, N.N.; Cheng, Y. A Modern Implementation of System Call Sequence Based Host-based Intrusion Detection Systems. In Proceedings of the 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA; IEEE: Piscataway, NJ, USA, 2020; pp. 218–225. [Google Scholar] [CrossRef]
Zhengdao, Z.; Zhumiao, P.; Zhiping, Z. The Study of Intrusion Prediction Based on HsMM. In Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference, Yilan, Taiwan; IEEE: Piscataway, NJ, USA, 2008; pp. 1358–1363. [Google Scholar] [CrossRef]
Tokhtabayev, A.G.; Skormin, V.A. Non-Stationary Markov Models and Anomaly Propagation Analysis in IDS. In Proceedings of the Third International Symposium on Information Assurance and Security, Manchester, UK; IEEE: Piscataway, NJ, USA, 2007; pp. 203–208. [Google Scholar] [CrossRef]
Feng, L.; Wang, W.; Zhu, L.; Zhang, Y. Predicting Intrusion Goal Using Dynamic Bayesian Network with Transfer Probability Estimation. J. Netw. Comput. Appl. 2009, 32, 721–732. [Google Scholar] [CrossRef]
Asaka, M.; Onabuta, T.; Inoue, T.; Okazawa, S.; Goto, S. Remote Attack Detection Method in IDA: MLSI-based Intrusion Detection with Discriminant Analysis. Electron. Commun. Jpn. (Part I Commun.) 2003, 86, 50–62. [Google Scholar] [CrossRef]
Shin, Y.; Kim, K. Comparison of Anomaly Detection Accuracy of Host-based Intrusion Detection Systems Based on Different Machine Learning Algorithms. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef]
Wang, M.; Zhang, C.; Yu, J. Native API Based Windows Anomaly Intrusion Detection Method Using SVM. In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC’06), Taichung, Taiwan; IEEE: Piscataway, NJ, USA, 2006; Volume 1, pp. 514–519. [Google Scholar] [CrossRef]
Wang, X.; Yu, W.; Champion, A.; Fu, X.; Xuan, D. Detecting Worms via Mining Dynamic Program Execution. In Proceedings of the 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, Nice, France; IEEE: Piscataway, NJ, USA, 2007; pp. 412–421. [Google Scholar] [CrossRef]
Simon, C.K.; Sochenkov, I.V. Evaluating Host-Based Intrusion Detection on the Adfa-Wd and Adfa-Wd: Saa Datasets. In Proceedings of the II International Scientific Conference “Convergent Cognitive Information Technologies” (Convergent’2017), Moscow, Russia, 24–26 November 2017. [Google Scholar]
Aldribi, A.; An-Nosaian, M. Feature Extraction Techniques for Malicious System Calls. In Proceedings of the 2025 IEEE 6th International Conference on Pattern Recognition and Machine Learning (PRML), Chongqing, China; IEEE: Piscataway, NJ, USA, 2025; pp. 200–208. [Google Scholar] [CrossRef]
Khreich, W.; Khosravifar, B.; Hamou-Lhadj, A.; Talhi, C. An Anomaly Detection System Based on Variable N-gram Features and One-Class SVM. Inf. Softw. Technol. 2017, 91, 186–197. [Google Scholar] [CrossRef]
Kishore, P.; Barisal, S.K.; Mohapatra, D.P. An Incremental Malware Detection Model for Meta-Feature API and System Call Sequence. In Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, Sofia, Bulgaria, 6–9 September 2020; pp. 629–638. [Google Scholar] [CrossRef]
Fan, C.I.; Hsiao, H.W.; Chou, C.H.; Tseng, Y.F. Malware Detection Systems Based on API Log Data Mining. In Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference, Taichung, Taiwan; IEEE: Piscataway, NJ, USA, 2015; pp. 255–260. [Google Scholar] [CrossRef]
Subba, B.; Biswas, S.; Karmakar, S. Host Based Intrusion Detection System Using Frequency Analysis of N-Gram Terms. In Proceedings of the TENCON 2017—2017 IEEE Region 10 Conference, Penang, Malaysia; IEEE: Piscataway, NJ, USA, 2017; pp. 2006–2011. [Google Scholar] [CrossRef]
Rauf, M.A.A.A.; Asraf, S.M.H.; Idrus, S.Z.S. Malware Behaviour Analysis and Classification via Windows DLL and System Call. J. Phys. Conf. Ser. 2020, 1529, 022097. [Google Scholar] [CrossRef]
Cavalcanti, M.; Inacio, P.; Freire, M. Performance Evaluation of Container-Level Anomaly-Based Intrusion Detection Systems for Multi-Tenant Applications Using Machine Learning Algorithms. In Proceedings of the 16th International Conference on Availability, Reliability and Security, Vienna, Austria; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1–9. [Google Scholar] [CrossRef]
Castanhel, G.R.; Heinrich, T.; Ceschin, F.; Maziero, C. Taking a Peek: An Evaluation of Anomaly Detection Using System Calls for Containers. In Proceedings of the 2021 IEEE Symposium on Computers and Communications (ISCC), Athens, Greece; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Joraviya, N.; Gohil, B.N.; Rao, U.P. Ab-HIDS: An Anomaly-based Host Intrusion Detection System Using Frequency of N-gram System Call Features and Ensemble Learning for Containerized Environment. Concurr. Comput. Pract. Exp. 2024, 36, e8249. [Google Scholar] [CrossRef]
Rawat, S.; Gulati, V.; Pujari, A.K. Frequency- and Ordering-based Similarity Measure for Host-based Intrusion Detection. Inf. Manag. Comput. Secur. 2004, 12, 411–421. [Google Scholar] [CrossRef]
Sharma, A.; Pujari, A.K.; Paliwal, K.K. Intrusion Detection Using Text Processing Techniques with a Kernel Based Similarity Measure. Comput. Secur. 2007, 26, 488–495. [Google Scholar] [CrossRef]
Deshpande, P.; Sharma, S.C.; Peddoju, S.K.; Junaid, S. HIDS: A Host Based Intrusion Detection System for Cloud Computing Environment. Int. J. Syst. Assur. Eng. Manag. 2018, 9, 567–576. [Google Scholar] [CrossRef]
Zhao, Y.; Kuerban, A. MDABP: A Novel Approach to Detect Cross-Architecture IoT Malware Based on PaaS. Sensors 2023, 23, 3060. [Google Scholar] [CrossRef]
Larson, U.E.; Nilsson, D.K.; Jonsson, E.; Lindskog, S. Using System Call Information to Reveal Hidden Attack Manifestations. In Proceedings of the 2009 1st International Workshop on Security and Communication Networks; IEEE: Piscataway, NJ, USA, 2009; pp. 1–8. [Google Scholar]
Yalew, S.D.; Maguire, G.Q.; Haridi, S.; Correia, M. T2Droid: A TrustZone-Based Dynamic Analyser for Android Applications. In Proceedings of the 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia; IEEE: Piscataway, NJ, USA, 2017; pp. 240–247. [Google Scholar] [CrossRef]
Rapaka, A.; Novokhodko, A.; Wunsch, D. Intrusion Detection Using Radial Basis Function Network on Sequences of System Calls. In Proceedings of the International Joint Conference on Neural Networks; IEEE: Piscataway, NJ, USA, 2003; Volume 3, pp. 1820–1825. [Google Scholar]
Ahmed, U.; Masood, A. Host Based Intrusion Detection Using RBF Neural Networks. In Proceedings of the 2009 International Conference on Emerging Technologies, Islamabad, Pakistan; IEEE: Piscataway, NJ, USA, 2009; pp. 48–51. [Google Scholar] [CrossRef]
Salem, M.; Taheri, S.; Yuan, J.S. Anomaly Generation Using Generative Adversarial Networks in Host-Based Intrusion Detection. In Proceedings of the 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA; IEEE: Piscataway, NJ, USA, 2018; pp. 683–687. [Google Scholar] [CrossRef]
Hu, Z.; Liu, L.; Yu, H.; Yu, X. Using Graph Representation in Host-Based Intrusion Detection. Secur. Commun. Netw. 2021, 2021, 6291276. [Google Scholar] [CrossRef]
Frasão, A.; Heinrich, T.; Fulber-Garcia, V.; Will, N.C.; Obelheiro, R.R.; Maziero, C.A. I See Syscalls by the Seashore: An Anomaly-based IDS for Containers Leveraging Sysdig Data. In Proceedings of the 2024 IEEE Symposium on Computers and Communications (ISCC), Paris, France; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Kim, K. Gan Based Augmentation for Improving Anomaly Detection Accuracy in Host-Based Intrusion Detection Systems. Int. J. Eng. Res. Technol. 2020, 13, 3987. [Google Scholar] [CrossRef]
Joraviya, N.; Gohil, B.N.; Rao, U.P. DL-HIDS: Deep Learning-Based Host Intrusion Detection System Using System Calls-to-Image for Containerized Cloud Environment. J. Supercomput. 2024, 80, 12218–12246. [Google Scholar] [CrossRef]
Melvin, A.A.R.; Kathrine, J.W.; Jeyabose, A.; Cenitta, D. A Deep Learning Model Leveraging Time-Series System Call Data to Detect Malware Attacks in Virtual Machines. Int. J. Comput. Intell. Syst. 2025, 18, 58. [Google Scholar] [CrossRef]
Mishra, P.; Gupta, A.; Aggarwal, P.; Pilli, E.S. vServiceInspector: Introspection-assisted Evolutionary Bag-of-Ngram Approach to Detect Malware in Cloud Servers. Ad Hoc Netw. 2022, 131, 102836. [Google Scholar] [CrossRef]
Luckett, P.; McDonald, J.T.; Dawson, J. Neural Network Analysis of System Call Timing for Rootkit Detection. In Proceedings of the 2016 Cybersecurity Symposium (CYBERSEC), Coeur d’Alene, ID, USA; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar] [CrossRef]
Dymshits, M.; Myara, B.; Tolpin, D. Process Monitoring on Sequences of System Call Count Vectors. In Proceedings of the 2017 International Carnahan Conference on Security Technology (ICCST), Madrid, Spain; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar] [CrossRef]
Park, G.; Kim, J.; Choi, J.; Kim, J. CryptoGuard: Lightweight Hybrid Detection and Response to Host-based Cryptojackers in Linux Cloud Environments. In Proceedings of the 20th ACM Asia Conference on Computer and Communications Security, Hanoi, Vietnam; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1617–1631. [Google Scholar] [CrossRef]
Nair, A.K.; Kumar, S.H.S.; Gupta, D. Androids: Android-Based Intrusion Detection System Using Federated Learning. In Proceedings of the 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI), San Jose, CA, USA; IEEE: Piscataway, NJ, USA, 2025; pp. 172–177. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017. [Google Scholar] [CrossRef]
Creech, G.; Hu, J. A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguous and Discontiguous System Call Patterns. IEEE Trans. Comput. 2014, 63, 807–819. [Google Scholar] [CrossRef]
Anandapriya, M.; Lakshmanan, B. Anomaly Based Host Intrusion Detection System Using Semantic Based System Call Patterns. In Proceedings of the 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India; IEEE: Piscataway, NJ, USA, 2015; pp. 1–4. [Google Scholar] [CrossRef]
Maske, S.A.; Parvat, T.J. Advanced Anomaly Intrusion Detection Technique for Host Based System Using System Call Patterns. In Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar] [CrossRef]
Bertrand Van Ouytsel, C.H.; Dam, K.H.T.; Legay, A. Symbolic Analysis Meets Federated Learning to Enhance Malware Identifier. In Proceedings of the 17th International Conference on Availability, Reliability and Security, Vienna, Austria; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–10. [Google Scholar] [CrossRef]
Xu, X.; Xie, T. A Reinforcement Learning Approach for Host-Based Intrusion Detection Using Sequences of System Calls. In Advances in Intelligent Computing; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3644, pp. 995–1003. [Google Scholar] [CrossRef]
Xu, X.; Luo, Y. A Kernel-Based Reinforcement Learning Approach to Dynamic Behavior Modeling of Intrusion Detection. In Advances in Neural Networks—ISNN 2007; Liu, D., Fei, S., Hou, Z.G., Zhang, H., Sun, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4491, pp. 455–464. [Google Scholar] [CrossRef]
Xu, X. Sequential Anomaly Detection Based on Temporal-Difference Learning: Principles, Models and Case Studies. Appl. Soft Comput. 2010, 10, 859–867. [Google Scholar] [CrossRef]
Tokhtabayev, A.G.; Skormin, V.A.; Dolgikh, A.M. Detection of Worm Propagation Engines in the System Call Domain Using Colored Petri Nets. In Proceedings of the 2008 IEEE International Performance, Computing and Communications Conference, Austin, TX, USA; IEEE: Piscataway, NJ, USA, 2008; pp. 59–68. [Google Scholar] [CrossRef]
Tokhtabayev, A.; Skormin, V.; Dolgikh, A. Dynamic, Resilient Detection of Complex Malicious Functionalities in the System Call Domain. In Proceedings of the 2010—MILCOM 2010 Military Communications Conference, San Jose, CA, USA; IEEE: Piscataway, NJ, USA, 2010; pp. 1349–1356. [Google Scholar] [CrossRef]
Dolgikh, A.; Nykodym, T.; Skormin, V.; Antonakos, J.; Baimukhamedov, M. Colored Petri Nets as the Enabling Technology in Intrusion Detection Systems. In Proceedings of the 2011—MILCOM 2011 Military Communications Conference, Baltimore, MD, USA; IEEE: Piscataway, NJ, USA, 2011; pp. 1297–1301. [Google Scholar] [CrossRef]
Skormin, V.; Nykodym, T.; Dolgikh, A.; Antonakos, J. Customized Normalcy Profiles for the Detection of Targeted Attacks. In Applications of Evolutionary Computation; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7248, pp. 487–496. [Google Scholar] [CrossRef]
Kim, D.-W.; Yang, J.-W.; Sim, K.-B. Adaptive Intrusion Detection Algorithm Based on Learning Algorithm. In Proceedings of the 30th Annual Conference of IEEE Industrial Electronics Society, 2004, IECON 2004, Busan, Republic of Korea; IEEE: Piscataway, NJ, USA, 2004; Volume 3, pp. 2229–2233. [Google Scholar] [CrossRef]
Ou, C.M.; Ou, C. Immunity-Inspired Host-Based Intrusion Detection Systems. In Proceedings of the 2011 Fifth International Conference on Genetic and Evolutionary Computing, Kitakyushu, Japan; IEEE: Piscataway, NJ, USA, 2011; pp. 283–286. [Google Scholar] [CrossRef]
Ou, C.M. Host-Based Intrusion Detection Systems Adapted from Agent-Based Artificial Immune Systems. Neurocomputing 2012, 88, 78–86. [Google Scholar] [CrossRef]
Ou, C.M.; Ou, C.R.; Wang, Y.T. Agent-Based Artificial Immune Systems (ABAIS) for Intrusion Detections: Inspiration from Danger Theory. In Agent and Multi-Agent Systems in Distributed Systems-Digital Economy and E-Commerce; Hakansson, A., Hartung, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 462, pp. 67–94. [Google Scholar] [CrossRef]
Chasaki, D.; Mansour, C. SDN Security through System Call Learning. In Proceedings of the 2021 11th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Chaudhari, A.; Gohil, B.; Rao, U.P. A Novel Hybrid Framework for Cloud Intrusion Detection System Using System Call Sequence Analysis. Clust. Comput. 2024, 27, 3753–3769. [Google Scholar] [CrossRef]
Lu, K.; Chen, Z.; Jin, Z.; Guo, J. An Adaptive Real-Time Intrusion Detection System Using Sequences of System Call. In Proceedings of the CCECE 2003—Canadian Conference on Electrical and Computer Engineering, Toward a Caring and Humane Technology (Cat. No.03CH37436); IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 789–792. [Google Scholar]
Hoang, X.D.; Hu, J.; Bertok, P. A Multi-Layer Model for Anomaly Intrusion Detection Using Program Sequences of System Calls. In Proceedings of the 11th IEEE International Conference on Networks, ICON2003, Sydney, Australia; IEEE: Piscataway, NJ, USA, 2003; pp. 531–536. [Google Scholar] [CrossRef]
Han, S.J.; Cho, S.B. Combining Multiple Host-Based Detectors Using Decision Tree. In AI 2003: Advances in Artificial Intelligence; Goos, G., Hartmanis, J., Van Leeuwen, J., Gedeon, T.D., Fung, L.C.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2903, pp. 208–220. [Google Scholar] [CrossRef]
Raman, C.V.; Negi, A. A Hybrid Method to Intrusion Detection Systems Using HMM. In Distributed Computing and Internet Technology; Chakraborty, G., Ed.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3816, pp. 389–396. [Google Scholar] [CrossRef]
Gao, D.; Reiter, M.K.; Song, D. Behavioral Distance Measurement Using Hidden Markov Models. In Recent Advances in Intrusion Detection; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4219, pp. 19–40. [Google Scholar] [CrossRef]
Xinguang, P.; Yanyan, Z. Robust Host Anomaly Detector Using Strong Isolation. In Proceedings of the 2008 International Conference on Computer Science and Software Engineering, Wuhan, China; IEEE: Piscataway, NJ, USA, 2008; pp. 575–578. [Google Scholar] [CrossRef]
Tian, X.; Duan, M.; Sun, C.; Li, W. Intrusion Detection Based on System Calls and Homogeneous Markov Chains. J. Syst. Eng. Electron. 2008, 19, 598–605. [Google Scholar] [CrossRef]
Jiang, F.; Frater, M.; Hu, J. A Bio-inspired Host-Based Multi-engine Detection System with Sequential Pattern Recognition. In Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, Sydney, Australia; IEEE: Piscataway, NJ, USA, 2011; pp. 145–150. [Google Scholar] [CrossRef]
Yolacan, E.N.; Dy, J.G.; Kaeli, D.R. System Call Anomaly Detection Using Multi-HMMs. In Proceedings of the 2014 IEEE Eighth International Conference on Software Security and Reliability-Companion, San Francisco, CA, USA; IEEE: Piscataway, NJ, USA, 2014; pp. 25–30. [Google Scholar] [CrossRef]
Li, Y.H.; Tzeng, Y.R.; Yu, F. VISO: Characterizing Malicious Behaviors of Virtual Machines with Unsupervised Clustering. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom), Vancouver, BC, Canada; IEEE: Piscataway, NJ, USA, 2015; pp. 34–41. [Google Scholar] [CrossRef]
Bin Abbas, M.F.; Prakash, A.; Srikanthan, T. Hierarchical Framework for Runtime Intrusion Detection in Embedded Systems. In Proceedings of the 2019 TRON Symposium (TRONSHOW), Minato, Japan; IEEE: Piscataway, NJ, USA, 2019; pp. 1–9. [Google Scholar] [CrossRef]
Suratkar, S.; Kazi, F.; Gaikwad, R.; Shete, A.; Kabra, R.; Khirsagar, S. Multi Hidden Markov Models for Improved Anomaly Detection Using System Call Analysis. In Proceedings of the 2019 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Bouzar-Benlabiod, L.; Rubin, S.H.; Belaidi, K.; Haddar, N.E. RNN-VED for Reducing False Positive Alerts in Host-based Anomaly Detection Systems. In Proceedings of the 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), Las Vegas, NV, USA; IEEE: Piscataway, NJ, USA, 2020; pp. 17–24. [Google Scholar] [CrossRef]
Khreich, W.; Granger, E.; Sabourin, R.; Miri, A. Combining Hidden Markov Models for Improved Anomaly Detection. In Proceedings of the 2009 IEEE International Conference on Communications, Dresden, Germany; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar] [CrossRef]
Khreich, W.; Granger, E.; Miri, A.; Sabourin, R. Adaptive ROC-based Ensembles of HMMs Applied to Anomaly Detection. Pattern Recognit. 2012, 45, 208–230. [Google Scholar] [CrossRef]
Khreich, W.; Murtaza, S.S.; Hamou-Lhadj, A.; Talhi, C. Combining Heterogeneous Anomaly Detectors for Improved Software Security. J. Syst. Softw. 2018, 137, 415–429. [Google Scholar] [CrossRef]
Malan, D.J.; Smith, M.D. Exploiting Temporal Consistency to Reduce False Positives in Host-Based, Collaborative Detection of Worms. In Proceedings of the 4th ACM Workshop on Recurring Malcode, Alexandria, VA, USA; Association for Computing Machinery: New York, NY, USA, 2006; pp. 25–32. [Google Scholar] [CrossRef]
Kührer, M.; Hoffmann, J.; Holz, T. CloudSylla: Detecting Suspicious System Calls in the Cloud. In Stabilization, Safety, and Security of Distributed Systems; Felber, P., Garg, V., Eds.; Springer International Publishing: Cham, Switzerland, 2014; Volume 8756, pp. 63–77. [Google Scholar] [CrossRef]
Ko, C. Logic Induction of Valid Behavior Specifications for Intrusion Detection. In Proceedings of the 2000 IEEE Symposium on Security and Privacy, S&P 2000, Berkeley, CA, USA; IEEE: Piscataway, NJ, USA, 2000; pp. 142–153. [Google Scholar] [CrossRef]
Leu, F.Y.; Tsai, K.L.; Hsiao, Y.T.; Yang, C.T. An Internal Intrusion Detection and Protection System by Using Data Mining and Forensic Techniques. IEEE Syst. J. 2017, 11, 427–438. [Google Scholar] [CrossRef]
Bhat, M.D.; Pandita, P.A.; Chheda, H.A.; Ramteke, J. Determining User Behaviour Using System Calls To Prevent Internal Intrusions. In Proceedings of the 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India; IEEE: Piscataway, NJ, USA, 2020; pp. 40–45. [Google Scholar] [CrossRef]
Wang, W.; Guan, X.; Zhang, X. Processing of Massive Audit Data Streams for Real-Time Anomaly Intrusion Detection. Comput. Commun. 2008, 31, 58–72. [Google Scholar] [CrossRef]
Massachusetts Institute of Technology. 1998 DARPA Intrusion Detection Evaluation Dataset. Available online: https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset (accessed on 1 June 2026).
Axelsson, S. The base-rate fallacy and the difficulty of intrusion detection. ACM Trans. Inf. Syst. Secur. 2000, 3, 186–205. [Google Scholar] [CrossRef]
Nauman, M.; Azam, N.; Yao, J. A Three-Way Decision Making Approach to Malware Analysis Using Probabilistic Rough Sets. Inf. Sci. 2016, 374, 193–209. [Google Scholar] [CrossRef]
Kashkoush, M.S.; Azab, M.; Attiya, G.; Abed, A.S. Online Smart Disguise: Real-Time Diversification Evading Coresidency-Based Cloud Attacks. Clust. Comput. 2019, 22, 721–736. [Google Scholar] [CrossRef]
Warrender, C.; Forrest, S.; Pearlmutter, B. Detecting Intrusions Using System Calls: Alternative Data Models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344); IEEE: Piscataway, NJ, USA, 1999; pp. 133–145. [Google Scholar] [CrossRef]
Rawat, S.; Gulati, V.P.; Pujari, A.K.; Vemuri, V.R. Intrusion detection using text processing techniques with a binary-weighted cosine metric. J. Inf. Assur. Secur. 2006, 1, 43–50. [Google Scholar]
Oord, A.v.d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar; Moschitti, A., Pang, B., Daelemans, W., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
Schafer, R.W. What Is a Savitzky-Golay Filter? [Lecture Notes]. IEEE Signal Process. Mag. 2011, 28, 111–117. [Google Scholar] [CrossRef]
CVE-2012-0911. 2012. Available online: https://nvd.nist.gov/vuln/detail/CVE-2012-0911 (accessed on 1 June 2026).
Haider, W.; Hu, J.; Slay, J.; Turnbull, B.; Xie, Y. Generating Realistic Intrusion Detection System Dataset Based on Fuzzy Qualitative Modeling. J. Netw. Comput. Appl. 2017, 87, 185–192. [Google Scholar] [CrossRef]
Creech, G. Developing a High-Accuracy Cross Platform Host-Based Intrusion Detection System Capable of Reliably Detecting Zero-Day Attacks. Ph.D. Thesis, UNSW Sydney, Sydney, Australia, 2014. [Google Scholar] [CrossRef]
Hoefler, T.; Alistarh, D.; Ben-Nun, T.; Dryden, N.; Peste, A. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 2021, 22, 1–124. [Google Scholar]
Musa, A.; Kakudi, H.A.; Hassan, M.; Hamada, M.; Umar, U.; Salisu, M.L. Lightweight deep learning models for edge devices—A survey. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2025, 17, 18. [Google Scholar]
(via the Linux Kernel Mailing List Archive), G.K. Linux 2.6.38.8. Available online: https://lkml.iu.edu/hypermail/linux/kernel/1106.0/01226.html (accessed on 1 June 2026).
(via the Linux Kernel Mailing List Archive), G.K. Linux 3.19.8. Available online: https://lkml.iu.edu/hypermail/linux/kernel/1505.1/01671.html (accessed on 1 June 2026).
Grimmer, M.; Röhling, M.M.; Kreusel, D.; Ganz, S. A Modern and Sophisticated Host Based Intrusion Detection Data Set. IT-Sicherh. Voraussetzung Erfolgreiche Digit. 2019, 11, 135–145. [Google Scholar]
Kenyon, A.; Deka, L.; Elizondo, D. Are Public Intrusion Datasets Fit for Purpose Characterising the State of the Art in Intrusion Event Datasets. Comput. Secur. 2020, 99, 102022. [Google Scholar] [CrossRef]
Wagner, D.; Soto, P. Mimicry Attacks on Host-Based Intrusion Detection Systems. In Proceedings of the 9th ACM Conference on Computer and Communications Security, Washington, DC, USA; Association for Computing Machinery: New York, NY, USA, 2002; pp. 255–264. [Google Scholar] [CrossRef]
Chechik, O.; Ozer, O. A Deep Dive into Malicious Direct Syscall Detection. Available online: https://www.paloaltonetworks.com/blog/security-operations/a-deep-dive-into-malicious-direct-syscall-detection/ (accessed on 1 June 2026).
Vassilev, A. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations; Technical Report NIST AI NIST AI 100-2e2025; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2025. [Google Scholar] [CrossRef]
Corona, I.; Giacinto, G.; Roli, F. Adversarial Attacks against Intrusion Detection Systems: Taxonomy, Solutions and Open Issues. Inf. Sci. 2013, 239, 201–225. [Google Scholar] [CrossRef]
Alotaibi, A.; Rassam, M.A. Adversarial Machine Learning Attacks against Intrusion Detection Systems: A Survey on Strategies and Defense. Future Internet 2023, 15, 62. [Google Scholar] [CrossRef]
He, K.; Kim, D.D.; Asghar, M.R. Adversarial Machine Learning for Network Intrusion Detection Systems: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2023, 25, 538–566. [Google Scholar] [CrossRef]
Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A perspective on explainable artificial intelligence methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]

Figure 1. Research methodology for systematic literature review and experimental evaluation of state-of-the-art methods.

Figure 2. Logical proposition used for primary studies search on digital libraries.

Figure 3. (a) Removed publications by criteria for literature review. (b) Removed publications by criteria for method reproduction.

Figure 4. Reviewed publications per year.

Figure 5. Taxonomy of system call-based intrusion detection systems. ¹ Publications may describe an IDS using multiple data in addition to system call identifiers for intrusion detection, resulting in them being counted multiple times.

Figure 6. Learning paradigm of concerned state-of-the-art IDSs per year of publication. For each year, the number of publications in which the presented IDS implements at least one learning paradigm is reported.

Figure 7. Features extracted from system calls in state-of-the-art IDSs per year of publication. For each year, the number of publications implementing each type of feature is reported.

Figure 8. Classification methods used by state-of-the-art IDSs per year of publication. For each year, the number of publications implementing each classification method is reported. For clarity, categories with fewer than 6 papers out of all publications analyzed have been grouped under the category “Other”.

Figure 9. Taxonomy of data used to train and test system call-based intrusion detection systems. ¹ Considering only publications using private data. ² Authors may use multiple datasets for their work, resulting in their publication being counted multiple times. ³ These datasets were publicly available but are no longer accessible as of 26 January 2026.

Figure 10. Operating system of system calls used for IDS research, considering only publications using private data, per year of publication. For each year, the number of publications using private data from each operating system is reported.

Figure 11. Host environment of system calls used for IDS research, considering only publications using private data, per year of publication. For each year, the number of publications using private data from each host environment is reported.

Figure 12. Taxonomy of evaluation methods for system call-based intrusion detection systems. ¹ Authors may use multiple overhead measurements or evaluation metrics, resulting in their publication being counted multiple times.

Figure 13. Detection performances for score methods on ADFA-LD (top) and NGIDS-DS (bottom) datasets. ¹ The 5 executions on the NGIDS-DS dataset could not be obtained due to excessive training, validation and testing time, exceeding 60 days per iteration.

Figure 14. Detection performances for prediction methods on ADFA-LD (top) and NGIDS-DS (bottom) datasets.

Figure 15. NDCG scores of reproduced methods on ADFA-LD (top) and NGIDS-DS (bottom) datasets. ¹ The 5 executions on the NGIDS-DS dataset could not be obtained due to excessive training, validation and testing time, exceeding 60 days per iteration. ² The mechanism for combining models implemented in [M9] makes obtaining an anomaly score dependent on an FPR threshold that must be defined beforehand. Therefore, it is not possible to obtain an NDCG score for this method.

Figure 16. Execution time of reproduced methods for the training and inference phases on ADFA-LD dataset on a logarithmic scale. (a) Training time on all training set. (b) Inference time, mean per system call trace.

Table 1. Existing works analyzing host-based intrusion detection research.

	Year	Scope of the Study	SLR?	Considered Studies	Covered Period	Experimental Evaluation
[14]	2018	Syscall-based HIDSs	✗	70 *	1996–2018	✗
[7]	2019	HIDSs, NIDSs	✗	70 *	1998–2018	✗
[15]	2020	HIDSs	✗	81	1993–2018	✗
[16]	2022	HIDSs for IoT	✗	18	2016–2021	✗
[13]	2022	Syscall-based HIDSs evaluated on ADFA-LD	✗	17	2013–2021	✗
[12]	2023	Syscall-based HIDSs using NLP methods	✓	65	2011–2022	✗
[18]	2024	HIDSs	✓	21	2020–2023	✗
This work	2026	Syscall-based HIDSs	✓	209	1996–2026	✓

SLR?: Whether the study is a systematic literature review (✓) or not (✗). Considered studies: Number of studies included in the analysis of the research field. Counts marked with a * are those which do not clearly indicate surveyed studies, which can lead to inaccuracies in this reporting. The number of counted studies has been rounded up to the nearest ten. Covered period: Time period covered by the analysis of considered studies. Experimental evaluation: Whether the study re-implements state-of-the-art methods for empirical evaluation (✓) or not (✗).

Table 2. Inclusion and exclusion criteria for primary studies selection.

Inclusion criteria	I1	Publications in which the presented intrusion detection system is based on dynamic analysis of system calls. This excludes static analysis of system calls from binaries.
Inclusion criteria	I2	Studies that have been published between 1996 [10] and early 2026.
Exclusion criteria	E1	Publications that are not accessible.
	E2	Publications that are not written in English.
	E3	Conference reviews.
	E4	PhD thesis manuscripts and posters.
	E5	Conference versions of journal papers and prior studies that have been further developed in a second publication.
	E6	Surveys that do not provide an intrusion detection method.

Table 3. Inclusion, exclusion and quality assessment criteria for selecting candidate studies for re-implementation.

Inclusion criteria	I3	Publications in which the presented IDS is only based on dynamic analysis of system call sequences from one host without further information such as arguments or return code.
Exclusion criteria	E7	Methods that are too specific to their environment, such as the monitoring of well-defined applications.
	E8	Studies published strictly before 2020 with fewer than strictly 35 citations *.
	E9	Studies published between 2020 and 2022 with less than strictly 15 citations *.
Quality assessment criteria	AQ1	Is all the information needed to reproduce the proposed HIDS clearly defined?

* The number of citations for each publication was retrieved from Google Scholar on 11 December 2025.

Table 4. Characteristics of selected system call-based intrusion detection methods for experimental evaluation.

Method	Learning Paradigm	Trained on Attacks?	Features	Features Reduction	Classification Method	Data Granularity
[M1]	N/A	no	sequence-based	no	heuristics-based	full sequence
[M2], [M3]	N/A	no	sequence-based	no	heuristics-based	fixed-length sequence
[M4]	unsupervised	no	sequence-based	no	stochastic process (HMM)	fixed-length sequence
[M5]	supervised	no	frequency-based	no	machine learning (kNN)	full sequence
[M6]	N/A	no	group-based (categorization), frequency-based	no	heuristics-based	full sequence
[M7]	supervised	yes	frequency-based	no	machine learning (ELM)	full sequence
[M8]	supervised	yes	sequence-based	no	rough sets	fixed-length sequence
[M9]	unsupervised	no	sequence-based (STIDE, HMM), frequency-based (OC-SVM)	no	combination	full sequence
[M10]	N/A	no	sequence-based	no	heuristics-based	full sequence
[M11]	unsupervised	no	statistical description-based	no	machine learning (IF)	full sequence
[M12]	supervised	yes	sequence-based	no	deep learning (LSTM)	fixed-length sequence
[M13]	supervised	yes	frequency-based	no	deep learning (MLP)	fixed-length sequence
[M14], [M15]	unsupervised	no	embedding-based	no	deep learning ([M14]), combination ([M15])	full sequence
[M16]	supervised	yes	frequency-based	PCA/SVD	deep learning (MLP)	full sequence
[M17]	supervised	yes	sequence-based, group-based (categorization), embedding-based (Word2Vec)	no	deep learning (Text-CNN)	full sequence
[M18]	unsupervised	no	sequence-based	no	stochastic process (Markov chain)	full sequence

Table 5. Composition of datasets for experimental evaluation of state-of-the-art methods.

Methods	Dataset Composition (Normal + Attack)
Methods	–	Train	Validation	Test
[M1], [M2], [M3], [M4], [M5], [M6], [M9], [M10], [M11], [M14], [M15], [M18]	ADFA-LD	2914 + 0	728 + 149	1563 + 597
	NGIDS-DS	16,678 + 0	4169 + 969	8936 + 3876
[M7], [M8], [M12], [M16], [M17]	ADFA-LD	2914 + 417	728 + 104	1563 + 225
[M7], [M8], [M12], [M16], [M17]	NGIDS-DS	16,678 + 2713	4169 + 678	8936 + 1454
[M13]	ADFA-LD	1925 + 248	481 + 62	1033 + 134
[M13]	NGIDS-DS	16,678 + 2713	4169 + 678	8936 + 1454

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arnoud, L.; Breux, V.; Thevenon, P.-H.; Gaussier, É. SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls. J. Cybersecur. Priv. 2026, 6, 99. https://doi.org/10.3390/jcp6030099

AMA Style

Arnoud L, Breux V, Thevenon P-H, Gaussier É. SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls. Journal of Cybersecurity and Privacy. 2026; 6(3):99. https://doi.org/10.3390/jcp6030099

Chicago/Turabian Style

Arnoud, Lalie, Victor Breux, Pierre-Henri Thevenon, and Éric Gaussier. 2026. "SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls" Journal of Cybersecurity and Privacy 6, no. 3: 99. https://doi.org/10.3390/jcp6030099

APA Style

Arnoud, L., Breux, V., Thevenon, P.-H., & Gaussier, É. (2026). SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls. Journal of Cybersecurity and Privacy, 6(3), 99. https://doi.org/10.3390/jcp6030099

Article Menu

SoK: An In-Depth Analysis of Intrusion Detection Systems Based on System Calls

Abstract

1. Introduction

1.1. What Is an Intrusion?

1.2. What Is Intrusion Detection?

1.3. What Are System Calls and How Can They Be Used for Intrusion Detection?

2. Related Work

3. Materials and Methods

3.1. Research Questions

3.2. Literature Review Methodology

3.2.1. Search Strategy for Studies to Be Considered

3.2.2. Inclusion and Exclusion Criteria for Primary Studies Selection

3.2.3. Data Extraction from Primary Studies

3.3. Methodology for Experimental Evaluation of State-of-the-Art System Call-Based IDSs

3.3.1. Inclusion, Exclusion, and Quality Assessment Criteria for Selecting Candidate Studies for Re-Implementation

3.3.2. Data Extraction from Candidate Studies for Re-Implementation

4. Results: Study of Literature

4.1. Considered Studies for Literature Review and Experimental Evaluation

4.2. Complete Breakdown of Studies Proposing an Intrusion Detection Method Based on System Calls

4.2.1. Comprehensive Taxonomy of System Call-Based IDSs

Detection Type

Learning Paradigm

Trained on Attacks?

Data Granularity

Features

Feature Reduction

Classification Method

Collaborative

Additional Information Used

4.2.2. Data Used to Study System Call-Based Intrusion Detection Mechanisms

Data Availability

Data Type

Data Augmentation

Operating System

Host Environment

4.2.3. Evaluation Method of System Call-Based IDS

Overhead Measurement

Evaluation Metrics

5. Results: Evaluation of State-of-the-Art System Call-Based IDSs

5.1. Selected Studies

5.1.1. [M1]

5.1.2. [M2], [M3]

5.1.3. [M4]

5.1.4. [M5]

5.1.5. [M6]

5.1.6. [M7]

5.1.7. [M8]

5.1.8. [M9]

5.1.9. [M10]

5.1.10. [M11]

5.1.11. [M12]

5.1.12. [M13]

5.1.13. [M14], [M15]

5.1.14. [M16]

5.1.15. [M17]

5.1.16. [M18]

5.2. Evaluation Workflow

5.3. Detection Performances

5.4. Overhead Evaluation

6. Discussion

6.1. Shortcomings of Existing Work

6.1.1. Difficulty in Reproducing Methods from the Literature

6.1.2. Terminological and Evaluation Inconsistencies

6.1.3. Usage of Outdated Datasets

6.2. Limitations of System Call-Based IDSs

6.2.1. Possible Weaknesses in the System Call Collection Mechanism

6.2.2. Vulnerability to Mimicry and Adversarial Attacks

6.3. Towards Deployable System Call-Based IDSs

6.3.1. Learning from Attack Traces Is Difficult to Achieve on a Real Installation

6.3.2. Detect Intrusions in Real-Time

6.3.3. Deal with Frequent and Not Always Explicable Alerts

6.3.4. Interfacing with Response Mechanisms

7. Threats to Validity

8. Conclusions and Recommendations for Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement