4.1. The Experimental Data
In order to facilitate an experimental performance comparison with similar studies, the system call datasets UNM_sendmail and UNM_live_lpr published by the New Mexico university are used in this section [
70]. The UNM_sendmail data set consists of 346 normal traces and 25 abnormal traces. The abnormal data contains sunsendmailcp (sccp) intrusions and decode intrusions, in which the sccp intrusions enable the local user to obtain the root access by using special command options to make sendmail attach an e-mail message to a file, and the decode intrusions enable remote users to make changes to certain files on the local system. Data for UNM_live_lpr data set includes 15 months of activity and consists of 4298 normal tracks and 1003 abnormal tracks. The abnormal data contains lprcp symbolic link intrusions that take advantage of the vulnerability of the lpr program to control the files on the host computer and tamper with the contents of the files.
The trace consists of a sequence of system calls in chronological order. The meaning of trace varies from program to program. Each trace file lists pairs of numbers, the first number represents the process identity (PID) of the execution process, and the second number represents the specific system call (SC). The child processes forked by the parent process are traced individually. We take UNM_sendmail as an example to show the specific format of experimental data as follows:
The data consists of different data units. Each data unit consists of PID and SC. In the experiment, data is segmented according to the PID number and the SCs after the same PID are arranged together in chronological order. In the above data, the numbers 8840, 8843 and 6545 are PIDs and the number 4, 2, 5, 66, …, 2 are SCs. The lists of system calls issued by 8840, 8843 and 6545 are denoted as R8840 = (4, 2, 5, 66, 5, …, 6), R8843 = (115, 15, 99, 120, …, 17) and R6545 = (2, 55, …, 2) respectively.
The vector form of a system call sequence composed of the frequency of the short system call sequences. Given n is the number of traces, m is the total number of short system call sequences. Then the ith element in the vector is the frequency at which the short system sequence numbered i occurs in the system call sequence of the process. The vector corresponding to the jth trace, represented by the frequency of short system call sequence, is denoted by (f1j,⋯, fmj). That is to say, samples are represented as (Xj, yj) j = 1,⋯, n, where Xj = (f1j,⋯, fmj), yj (+1, −1).
4.2. The Experimental Performance
The experimental data of UNM_live_lpr process were composed of 4298 normal tracks and 1003 abnormal tracks. The 1003 anomaly traces contain LPRCP attacks that control and tamper with host files. The experimental data for the UNM_sendmail process consisted of 346 normal traces and 25 abnormal traces, including 20 SCCPS that used E-mail to obtain root directory information and 5 decode attacks that remotely modified local files.
False alarm is to judge the normal behavior of the program as abnormal behavior. If normal traces participate in the evaluation, and normal traces are misjudged as abnormal traces, then the false alarm rate is equal to . Detection rate refers to the proportion of detected abnormal traces in the total number of abnormal traces. If abnormal traces participate in the test and were detected, then the detection rate is . The missing rate indicates the proportion of unrecognized abnormal traces in the total number of abnormal traces. If abnormal traces participate in the evaluation and cannot be detected, then the missing rate is . DR =(1−) is used as the comprehensive detection formula, in which and stands for detection rate and missing rate respectively.
Gauss kernel k (x, y) = exp (|| xy ||2/22) is used in all algorithms during training. and can be selected according to the maximum error rate allowed in the target set. and can be adjusted according to the equality of upper bound of positive and negative error rate. That is, 1/ = 1/, 1/≤≤ 1, 1/≤≤ 1. Grid search method is adopted to select the optimal parameters. The search range of parameter C is {,,⋯,, }. The parameter and both have a search range of {,,⋯,, }. The step length and short sequence length are set to 1 and 6 respectively.
Table 1 summarizes the comparison results of detection performance between our method and the other eight methods on UNM_live_lp. The detection rate of SVM, FSVM1, FSVM2, FSVM3, and our algorithm are 61.53%,76.92%, 76.92%, 83.21%, and 84.61% respectively. The missing rates of SVM, FSVM1, FSVM2, FSVM3, and our algorithm are 38.47%, 23.08%, 23.08%, 16.23%, and 15.39% respectively. The false alarm rates of SVM, FSVM1, FSVM2, FSVM3, and our algorithm are 10.14%, 9.57%, 8.17%, 6.92%, and 4.51% respectively. In all the algorithms, our algorithm has the highest detection rate and the lowest false alarm rate and missing rate.
As shown in
Table 1, SVM has the lowest detection rate, 79.88%, among the five algorithms. Due to the construction of a fuzzy membership function, the other four algorithms treat the different contributions of different samples in the construction of the objective function differently, which makes their average detection rate reach 87.03%. FSVM1 algorithm designs a fuzzy membership function based on the linear function of the distance between the sample and its cluster center. The larger the distance is, the smaller the coefficients is. However, this membership design method is sometimes unable to distinguish abnormal points effectively. The FSVM2 algorithm not only considers the distance in the FSVM1 algorithm, but also considers the position relation between samples. FSVM2 algorithm determines the membership function according to the position of the sample and hypersphere. When the samples are located inside the hypersphere, the membership function is defined by the linear function of the distance. The coefficient of the sample decreases with the increase of the distance. When the samples are located outside the hypersphere, these samples are regarded as abnormal samples. The membership function is represented by different functions. The FSVM3 algorithm, like the FSVM2 algorithm, takes account of the distance between the sample and its cluster center and the affinity between samples. The difference between the FSVM3 and FSVM2 is that the FSVM3 algorithm is based on the SVDD algorithm by introducing two different parameters to control the affinity between positive and negative samples.
The method treats all samples equally during the training process, which makes the contribution of samples to constructing the optimal classification hyperplane equal. As a result, when training samples contain abnormal samples, the classification hyperplane obtained is not the optimal hyperplane. Our algorithm abides by the maximum sum of distance from the hypersphere to hyperplane, replaces the cluster center with the hyperplane inside the class, and designs the membership function according to the distance from the sample to the hyperplane inside the hypersphere. Therefore, it can be seen from the table that with the continuous improvement of the membership function, the detection rate of the algorithm increases successively. The detection rate changed from 83.71% of FSVM1 algorithm to 85.56% of FSVM2 algorithm, then to 86.20% of WCS-FSVM algorithm, and finally reached 92.63% of our algorithm.
The formula of comprehensive detection rate is closer to reality. The comprehensive detection rates of SVM, FSVM1, FSVM2, and FSVM3 are 72.97%, 77.59%, 78.56%, and 80.23% respectively. As show in
Figure 7, our algorithm has the highest comprehensive detection rate, 86.92%, among five algorithms.
At the same time, it is also evident from the data in
Table 2 that the detection rate, false alarm rate and missing rate of UNM_sendmail data are all lower than that of UNM_live_lpr data. The reason for this is the amount of data. The UNM_sendmail process data consists of 346 normal tracks and 25 abnormal tracks, which is less experimental data than the 4298 normal tracks and 1003 abnormal tracks of the UNM_live_lpr process. Therefore, there is sufficient data for UNM_live_lpr process training, which is conducive to pre-extracting relative boundary vectors containing enough support vectors and ensuring sufficient data parameters. In the experiments of UNM_live_lpr process and UNM_sendmail process, SVM uses the same error penalty factor for all samples. It will have a negative impact on the classification results when the SVM algorithm classifies unbalanced samples. This is the reason that the SVM always has the lowest detection rate among all algorithms. FSVM1 algorithm, FSVM2 algorithm, FSVM3 algorithm, and our algorithm are equivalent to using a penalty factor for each sample to be treated differently. When these algorithms are used to classify the quantity imbalance samples, good classification results can be obtained. In other words, membership function can be used to describe error penalty on samples. This is consistent with the conclusion that fuzzy support vector machine is equivalent to a multi-penalty support vector machine.
The comprehensive detection rates of SVM algorithm, FSVM1 algorithm, FSVM2 algorithm, FSVM3 algorithm, and our algorithm on UNM sendmail process data were 55.29%, 69.58%, 70.63%, 77.45%, and 80.79%, respectively. As shown in
Figure 8, among the five detection algorithms, SVM algorithm has the lowest comprehensive detection rate and the weakest detection performance. The algorithm proposed in this paper has the highest comprehensive detection rate and the strongest detection performance. The comprehensive detection performance of FSVM1 algorithm, FSVM2 algorithm and FSVM3 algorithm also varies with the fuzzy membership function, but the overall trend is consistent with the test results obtained from UNM_live_lpr process data, that is, the detection performance is improved with the improvement of fuzzy membership function.
As shown in
Table 3 and
Table 4, the length of the system call short sequence
k has a direct impact on the detection performance. When
k increases from 3 to 5, the detection rate, missing alarm rate and false alarm rate are increasing. When
k is 6, our algorithm obtains the highest detection rate, the lowest false alarm rate and the lowest missing rate. When
k is greater than 6, the detection rate of our algorithm starts to decrease, and the missing rate and false alarm rate start to increase. On UNM_sendmail data, the detection rate, false alarm rate, and missing alarm rate of PVFSVM algorithm have the same trend as on UNM_live_lpr data.
As shown in
Figure 9 and
Figure 10, if
k = 6, the comprehensive detection rates of our algorithm on UNM_live_lpr and UNM_sendmail are 86.92% and 80.79% respectively, which are higher than the comprehensive detection rate corresponding to other
k values on UNM_live_lpr and UNM_sendmail. When
k increases from 3 to 6, the comprehensive detection rate also increases gradually. When k increases from 6 to 8, the comprehensive detection rate decreases gradually.
This is consistent with professor Forrest’s conclusion on the selection of short sequence length, and professor Wenke Lee’s conclusion from the perspective of information theory that the best system call short sequence length is 6 or 7. Compared with the value of k of 6, when the value of short sequence is too small, the timing relation between short sequence patterns is lost. When the short sequence becomes longer, the short sequence pattern loses local information, no matter what kind of information loss will make the comprehensive detection performance worse. Therefore, the short sequence length can only be chosen as 6, which makes the detection performance of our algorithm reach the optimal level.
Table 5 summarizes the comparison results of detection performance between our method and the other eight methods on UNM_live_lp. The detection rate of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 65.11%, 39.1%, 45.6%, 62.6%, 45.05%, 36.7%, 33.9%, 76.5%, and 92.63% respectively. The missing rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 30.12%, 55.3%, 40.22%, 33.3%, 47.37%, 58.2%, 50.1%, 19.2%, and 7.37% respectively. The false alarm rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 4.77%, 2.30%, 14.26%, 4.1%, 7.58%, 5.91%, 16%, 4.3%, and 6.16% respectively.
These two methods, Compression method and Sequence Matching method, respectively consider the detection problem from the aspect of data reversibility and similarity matching degree, and fail to consider the sequence characteristics between system calls. Therefore, they are the two algorithms with the worst comprehensive detection performance. The IPMA method and Hybrid Markov method are complicated due to investigating the transition characteristics of system calls one by one. Although the
of IPMA method and Hybrid Markov method are 7.58% and 14.26%, the
is 47.37% and 40.22%. The
of Bayes 1-step Markov method is 33.3% and the
is 4.1%. The comprehensive test performance evaluation formula has a high value, so the test performance is good. Since this method looks the frequency of rare system calls, the
of Uniqueness method is 55.3% and the
is 2.3%. However, this requires statistics for all the system calls that are used. N.bayes belongs to Naive Bayesian method in essence, and it has good noise tolerance and fast calculation, but the false alarm rate is too high. Therefore, the comprehensive detection performance is good. The closeness method extracts user behavior patterns from the perspective of combination. It shows good detection performance under different closeness thresholds, but it ignores the characteristics of attack intensity. That is, the attacker will complete the attack task in the shortest possible time and try to behave normally the rest of the time. As show in
Figure 11, our algorithm has the highest comprehensive detection rate 86.92% among five algorithms. In all the algorithms, our algorithm has the highest detection rate, and the lowest false alarm rate and missing rate. The formula of the comprehensive detection rate is closer to reality. The comprehensive detection rate of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 62%, 36.91%, 39.13%, 60.03%, 41.63%, 34.82%, 28.47%, 73.22%, and 86.92% respectively.
Table 6 summarizes the comparison results of detection performance between our method and the other eight methods on UNM_sendmail. The detection rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 63.88%, 37.4%, 45.3%, 67.3%, 40.63%, 35.9%, 29.7%, 75.2%, and 84.61% respectively. The missing rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 32.12%, 60.3%, 40.44%, 30.8%, 42.37%, 60.2%, 50.5%, 20.1%, and 15.39% respectively. The false alarm rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 4%, 2.30%, 14.26%, 1.9%, 17%, 3.9%, 19.8%, 4.7%, and 4.51% respectively.
Similarly, the Compression method and Sequence Matching method pay attention to reversibility and similarity matching degree, respectively, and fail to consider the sequence characteristic between system calls. The IPMA method and Hybrid Markov method are complicated due to investigating transition characteristics of system calls one by one. Although the
of IPMA method and Hybrid Markov method is 17% and 14.26%, the
is 42.37% and 40.44%. The
of Bayes 1-step Markov method is 30.8%, and the
is 1.9%. The comprehensive test performance evaluation formula has a high value, so the test performance is good. Since this method looks at the frequency of rare system calls, the
of Uniqueness method is 60.3%, and the
is 2.3%. However, this requires statistics for all the system calls that are used. N.bayes belongs to Naive Bayesian method in essence, and it has good noise tolerance and fast calculation, but the false alarm rate is too high. Therefore, the comprehensive detection performance is good. The closeness method extracts user behavior patterns from the perspective of combination. It shows good detection performance under specific closeness thresholds, but it ignores the characteristics of attack intensity. That is, the attacker will complete the attack task in the shortest possible time and try to behave normally the rest of the time. As shown in
Figure 12, our algorithm has the highest comprehensive detection rate 86.92% among five algorithms. In all the algorithms, our algorithm has the highest detection rate, and the lowest false alarm rate and missing rate. The formula of comprehensive detection rate is closer to reality. The comprehensive detection rates of N.bayes, Uniqueness, Hybrid Markov, Bayes1-step Markov, IPMA, Sequence matching, Compression, Closeness, and our algorithm are 61.32%, 36.54%, 38.84%, 60.02%, 33.72%, 34.49%, 23.81%, 71.67%, and 80.79% respectively.