A Multi-Channel Contrastive Learning Network Based Intrusion Detection Method

: Network intrusion data are characterized by high feature dimensionality, extreme category imbalance, and complex nonlinear relationships between features and categories. The actual detection accuracy of existing supervised intrusion-detection models performs poorly. To address this problem, this paper proposes a multi-channel contrastive learning network-based intrusion-detection method (MCLDM), which combines feature learning in the multi-channel supervised contrastive learning stage and feature extraction in the multi-channel unsupervised contrastive learning stage to train an effective intrusion-detection model. The objective is to research whether feature enrichment and the use of contrastive learning for speciﬁc classes of network intrusion data can improve the accuracy of the model. The model is based on an autoencoder to achieve feature reconstruction with supervised contrastive learning and for implementing multi-channel data reconstruction. In the next stage of unsupervised contrastive learning, the extraction of features is implemented using triplet convolutional neural networks (TCNN) to achieve the classiﬁcation of intrusion data. Through experimental analysis, the multichannel contrastive learning network-based intrusion-detection method achieves 98.43% accuracy in dataset CICIDS17 and 93.94% accuracy in dataset KDDCUP99.


Introduction
The network is a national infrastructure and one of the primary targets of attack in modern warfare, where defense against cyber attacks has become a growing concern for researchers.In 2020, Brazil's Light S.A. electricity company was hacked to extort $14 million in ransom, and in late February 2022, the Internet was frequently attacked and controlled from abroad, and cross-space cyber attacks were carried out against Russia and Ukraine.A network intrusion-detection system is achieved by analyzing the characteristics of network data streams to determine the network streams as normal data streams and attack data streams.Intrusion-detection systems are still challenging in the face of the high dimensionality of data features and extreme imbalance of intrusion categories, and the systems exhibit low accuracy and high false alarm rates.To solve the above problems, numerous researchers have mainly focused on machine learning methods [1], deep learning methods [2], and contrastive learning [3].
Various machine learning and deep learning-based solutions have been proposed in the past decades.Among them, machine learning-based network intrusion network detection systems rely mainly on feature engineering, so as to learn information about the characteristics of network intrusion data [4].Deep learning-based network intrusiondetection approaches, on the other hand, do not rely on huge feature engineering, but instead learn the complex features of network intrusion data from deep network structures [5].
As the application of supervised deep learning continues to evolve, deep learning has shown a significant decrease in performance when dealing with data imbalance [6], and unsupervised contrastive learning continues to narrow the gap between supervised deep learning.The purpose of contrastive learning is to achieve judgments on predicted samples by reducing the distance between like classes and increasing the distance between different classes and by computing the distance (e.g., Euclidean distance).Contrastive learning is a combination of deep learning hierarchical learning features and self-defined sample distances to deal with data imbalance.
Autoencoder is widely used to attract our interest.Autoencoder as a kind of neural network, whose architecture includes an input layer, encoder, hidden layer, decoder, and output layer, implements recoding the raw data according to the label of the data and minimizing a loss function.Recently, autoencoder is widely used in data dimensionality reduction, data reconstruction, and data noise reduction.For example, in [7], an autoencoder is used to achieve feature extraction from the raw data and shows good experimental performance.As in the literature [8], the decoder of the autoencoder is used for noise reduction in the raw data.Moreover, the autoencoder is simple to train and continues to be more efficient.
In contrastive learning, many of the state-of-the-art deep neural networks are used in contrastive learning [9,10].In the literature [11], it is proposed that the detection effectiveness of a network intrusion-detection model depends on the loss function of contrastive learning and is one of the important components of the model.In network intrusion detection, intrusion data and anomaly data represent only a small fraction of the network data [12].In the literature [13], the researchers studied the resampling of deep neural networks, thus verifying that neural network algorithms are robust in dealing with data imbalance.
Multichannel feature extraction is applied in the tasks of image analysis [14] and speech recognition [15] to improve the accuracy of the model by learning to correlate between different channels.During multi-channel data enhancement, our method uses the network data stream as the raw data vector and dichotomizes the data according to the labels of the raw data, which are normal and attack network streams.The normal autoencoder is trained using normal data, and the attack autoencoder is trained using attack data.The raw data are fed into the corresponding autoencoder according to the labels to obtain a one-dimensional embedding representation of the output.We compute the cross-correlation matrix for the one-dimensional data of the autoencoder, and reconstruct the different cross-correlation matrices to obtain the multi-channel data and use it as a new description of the raw data.The multi-channel data can represent the connection existing between different features and enrich the features of the raw data.In this paper, the extraction of the features of the one-dimensional embedding representation is transformed into the extraction of the features of the multi-channel cross-correlation matrix, which increases the gap between different classes of data and improves the accuracy of model detection.
Contrastive learning applications have recently received attention in image classification by setting innovative loss functions to improve the poor learning performance of the model in addressing data imbalance.However, these methods are usually applied in the feature extraction of multi-channel images.Therefore, we use TCNN as a contrastive learning network for feature-reconstruction multichannel two-dimensional vector-data feature extraction to reduce the distance between like classes and increase the distance between different classes to improve the accuracy of network intrusion detection.
The main innovations of the proposed model are as follows.
(a) A new network intrusion-detection model is proposed to recode the network intrusion data using autoencoder according to the labels, realizing autoencoder coupled with TCNN, which has high accuracy and low false alarm rate and improves the security of intrusion detection.
(b) The features are augmented, and the raw single-channel one-dimensional data are feature-enhanced into multi-channel two-dimensional data.(c) The problem of extreme data imbalance encountered in network intrusion is solved, and we evaluate the model extensively.

Related Research
Network intrusion detection is often viewed as a binary classification problem, where the classification of network data streams is achieved by setting the model's feature extraction method and the model's classification rules, among which machine learning methods such as k-Nearest Neighbor (KNN), support vector machine (SVM), decision tree (DT), etc., are used.Zhou et al. proposed an intrusion-detection method that is based on the selection of the most relevant features, and an integrated classifier based on Random Forest (RF), C4.5, and Penalized Attribute Forest (Forest PA), and finally, classification is achieved by voting technique [16].While traditional machine learning algorithms enable network intrusion detection, these methods have low accuracy rates during experiments.
Deep learning methods learn to extract data features by a hierarchical approach, which enables the extraction of high-dimensional features from raw data.Deep learning is currently used with remarkable effects in image recognition [17] and sentiment analysis [15].Autoencoders are widely used in deep learning, and Zeng et al. stacked autoencoders and used the output of the previous layer of autoencoding as the input of the next layer [18].Sara A. Althubiti et al. applied the Long Short-Term Memory algorithm (LSTM) to a network intrusion-detection system, and validated their model on the CICIDS dataset, with results demonstrating deep learning algorithms outperform machine learning methods [19].Lopez et al. [20] used a one-dimensional convolutional neural network to achieve better experimental results for feature extraction of one-dimensional network intrusion data.Deep learning-based network intrusion-detection algorithms are able to achieve high accuracy rates, but are ineffective in handling network intrusion data imbalance experiments.
The contrastive learning approach uses a hierarchical learning approach to achieve a transformation of the raw data to map the raw data to a suitable feature space.Contrastive learning is implemented by deep neural networks, as well as by defining a sample distance loss function to learn different classes of sample features in order to alleviate the prediction error under data imbalance.Currently, contrastive learning is mainly used in face recognition and face verification in [21].As pointed out in the literature [22], the performance of contrastive learning networks depends mainly on the defined loss function (e.g., contrastive loss, triplet loss) and the network sampling method.In the above contrastive learning method, no data preprocessing is performed, and the training process results in a model that is not optimal.
Novel feature extraction models have been reported in recent cybersecurity research.In the literature [23], an autoencoder was used to learn the raw data features, and a deep neural network was used to extend the new feature data to compose multi-channel data from the raw data and the new feature data to train a multi-channel convolutional neural network.In the literature [24], based on the combination of autoencoder and contrastive learning, it is demonstrated that contrastive learning has good performance in dealing with data imbalance and nonlinear data structure problems.Therefore, we propose an auto-encoder to pre-process the data and reconstruct the multi-channel data to increase the differences between different categories and finally implement a contrast learning network intrusion-detection method.

Model Methodology
In this section, we describe MCLDM-our proposed multichannel contrastive learning method for implementing network intrusion detection-which is a combination of a supervised contrastive learning method (two autoencoders) and an unsupervised contrastive learning method for multichannel feature extraction (TCNN).The symbols used in the MCLDM are shown in the following Table 1.

Symbol Description
The labels in X are a subset of the normal samples X a The labels in X are a subset of the attack samples g n normal autoencoder trained on X n g a attack autoencoder trained on X a X n+ X n− X n Output via autoencoder g n g a X a− X a+ X a Output via autoencoder g n g a X n cor X a cor the crosscorrelation matrix with X n X n+ cor X a+ cor the crosscorrelation matrix with X n+ X n− cor X Two stages are included in this MCLDM: the training stage and the prediction stage.In the training stage, the purpose of the MCLDM is mainly to learn the features of the raw vector of network intrusion data, train two types of autoencoders, realize the recoding of the normal network stream and the attack network stream, and output the one-dimensional reconstructed vector.The reconstructed vector and the raw vector are computed separately for the cross-correlation matrix, and the cross-correlation matrix is combined to form the multi-channel reconstructed data.Finally, the multi-channel reconstructed data are used as input to train the TCNN to achieve unsupervised contrastive learning according to the objective function, and finally the embedding of the comparison output and the calculation of the loss function to achieve the training of the MCLDM.In the prediction stage, the predicted data are input to the MCLDM to obtain the final embedding, and the type of data stream is determined by analyzing the embedding.
As shown in Figure 1, in the MCLDM training stage, (1) training data set X is divided into normal sample set X n and attack sample set X a by labeling the data.(2) Normal sample X n and attack sample X a are used as inputs to train autoencoder g n and g a , respectively.
(3) Normal sample X n and attack sample X a are input to g n and g a to obtain [X n+ , X n− ] and [X a+ , X a− ], respectively. ( 4 Triples [X n , X n+ , X n− ] and [X a , X a+ , X a− ] are constructed.( 5) Triples [X n , X n+ , X n− ] and [X a , X a+ , X a− ] are obtained by combining them to obtain the cross-correlation matrix triplet [x n cor , X n+ cor , X n− cor ] and [X a cor , X a+ cor , X a− cor ], which is combined to obtain the multichannel triplet ([x n cor , The multichannel triplet is used as the input of the TCNN to learn the vector features of the training set.(7) The triplet loss is obtained by calculating different class-embedding representations to realize the training of the model in the state of data type imbalance.

Training Stage
The training stage of MCLDM pseudocode is described in Algorithm 1.This stage is to analyze the historical network intrusion data, learn the network intrusion data features, enrich the data features and reconstruct the multi-channel data, map the raw data to different vector spaces, and to realize to distinguish the normal network flow from the attack network flow.Specifically, three main stages are included in MCLDM.
(1) By constructing two independent autoencoders and training the autoencoders according to the binary labels, where the labels are normal data streams and attack data streams, we achieve mapping the different autoencoder output data vectors into a vector space different from the raw data distribution.(2) Different autoencoders output reconstruction vectors, calculate reconstruction vector cross-correlation matrix, and combine different cross-correlation matrix arrays to obtain different multi-channel vector data, which include anchor points, positive samples and negative samples.The multi-channel vector data are formed into a triplet.(3) Train TCNN using the reconstructed ternary vectors in the previous stage.

Algorithm 1 MCLDM training stage
Input: D: training sample set {(X i , label i )} N i=1 with label i ∈ {normal, attack}.X represents the raw data matrix N D-dimensional variables X ⊂ D , X n ⊂ X label=normal and X a ⊂ X label=attack .Output: (g n , g a , φ): the trained intrusion-detection model 1 Begin: Initialize parameters 2 #Autoencoder training stage

Training Stage
The training stage of MCLDM pseudocode is described in Algorithm 1.This stage is to analyze the historical network intrusion data, learn the network intrusion data features, enrich the data features and reconstruct the multi-channel data, map the raw data to different vector spaces, and to realize to distinguish the normal network flow from the attack network flow.Specifically, three main stages are included in MCLDM.
(1) By constructing two independent autoencoders and training the autoencoders according to the binary labels, where the labels are normal data streams and attack data streams, we achieve mapping the different autoencoder output data vectors into a vector space different from the raw data distribution.(2) Different autoencoders output reconstruction vectors, calculate reconstruction vector cross-correlation matrix, and combine different cross-correlation matrix arrays to obtain different multi-channel vector data, which include anchor points, positive sam-

Autoencoder Training Stage
Autoencoders are deep neural networks with an architecture that learns the features of the data through encoders and decoders [17].Autoencoder mainly consists of an input layer, encoder, hidden variable, decoder, and output layer.The data are fed to the encoder through the input layer and the encoder is converted to the hidden variable dimension by compressing the encoding of the raw vector data.The decoder performs decoding operations on the compressed data and outputs the reconstructed vector at the output layer.The autoencoder principle mainly consists of two stages: encoder f -input vector X by mapping into the hidden variable X, denoted as X = f (X); and decoder f -hidden variable Y by mapping into the output space Xn , denoted as Xn = f X .
We select a set of N training data samples from the training set denoted as d = {(X i , label i )} N i , where X i ⊂ D denotes the one-dimensional vector representation of the corresponding training data i samples D features, and label i denotes the label information of the corresponding samples, and where X n = {X i , label = normal} and X a = {X i , label = attack} denote the normal network flow and the attack network flow, respectively.We train two independent autoencoders using X n and X a , respectively, where the normal network flow encoder is denoted as g n and the attack network flow encoder is denoted as g a .In real scenarios where the attack samples are much fewer than normal samples, we train different autoencoders to cope with a data imbalance.

Multi-Channel Data Construction
The raw vector data is reconstructed by the autoencoder and is denoted as Xn .In principle, the reconstructed vector Xn is more concentrated and less noisy than the raw vector X.We input the raw vector X to the autoencoders g n and g a to obtain X + , and X − .By computing the cross-correlation matrix of X, X + , and X − , denoted as X cor , X + cor , and X − cor , we combine the three cross-correlation matrices to obtain the multi-channel data (

TCNN Training Stage
The convolutional neural network consists of an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer.The convolutional layer extracts the feature information from the input layer, the pooling layer aims to reduce the dimensionality of the input data, and the fully connected layer aims to flatten the two-dimensional data into a one-dimensional vector.We use the constructed triplet ([X cor , X cor ], [X cor , X + cor ], [X cor , X − cor ]) to train a TCNN.In this paper, where we are dealing with multichannel reconstruction data and the convolutional neural network, we choose AlexNet, where the TCNN includes AlexNet and fully connected layers.The TCNN processes ( ) and is a shared network weight feedforward network.In this stage, we use the triplet loss function proposed in [10] to train MCLDM, as shown in Equation (1).
The TCNN minimizes the distance between the anchor samples and the positive samples, i.e., minimizes Dis ap and maximizes the distance between the anchor and the negative samples, i.e., maximizes Dis an .In traditional neural network algorithms, classification is achieved by predicting the probability of a category, but the predicted probability does not work well when dealing with data imbalance.Therefore, converting the probability of predicted categories into predicted distances mitigates the effect of data imbalance on the model, where Dis an and Dis ap are denoted as the final output Euclidean distances.

Prediction Stage
In the prediction stage as in Figure 2, the test sample Y a of the query is input to the autoencoder g n and g a output [Y n , Y n+ , Y n− ], the corresponding cross-correlation matrix [Y a cor , Y a+ cor , Y a− cor ] is calculated, and the multi- ) is obtained by combination, and the multi-channel triplet is input to the TCNN to calculate the distance of the output triplet-embedding representation, and finally, the prediction category is judged according to the distance.
Electronics 2023, 12, x FOR PEER REVIEW 8 of 16 MCLDM prediction stage is described in Algorithm 2. Now, we consider a sample that needs to be queried D Y  .In the first step, the data are reconstructed by the nor- mal autoencoder g n and attack autoencoder g a .In the second step, multi-channel data construction is implemented.In the third step, the embedding representation of the output of the TCNN is calculated to achieve the judgment, as shown in Equations ( 2) and (3).MCLDM prediction stage is described in Algorithm 2. Now, we consider a sample that needs to be queried Y ⊂ D .In the first step, the data are reconstructed by the normal autoencoder g n and attack autoencoder g a .In the second step, multi-channel data construction is implemented.In the third step, the embedding representation of the output of the TCNN is calculated to achieve the judgment, as shown in Equations ( 2) and (3).
Finally, compare the distance between Dis n and Dis a .If Dis n < Dis a , classify the label of X as a normal network data stream, otherwise classify the label of X as an attack network data stream.

Implementation Details
MCLDM is implemented in python 3.8, and the framework used for the deep neural network is TensorFlow 2.8 with Keras 2.8., where the API for data preprocessing includes Scikit-learn.For training with the dataset, we used the library structure Parzen estimator algorithm implemented in the Hyperopt library for automatic tuning; this way 20% of the dataset is used as the test set, and in particular, our procedure used random sampling to select the validation set.The hyperparameter values for the automatic search using the tree-structured Parzen estimation are shown in Table 2.The cpu of the device we use is as follows: 15 cores/GPU, Intel®Xeon(R) Platinum 8358P CPU @ 2.60 GHz.The GPU model is RTX3080 with 10 GB of video memory.The system of the device is ubuntu20.08.Each autoencoder includes three fully connected layers with 32, 16, and 32 neurons, and two dropout layers to prevent overfitting of the neuronal network.The neurons in each layer use Relu as the activation function and speed up the training of the network.
The TCNN is composed of three AlexNet and fully connected layers with shared weights.Each convolutional neural network is a deep neural network consisting of five convolutional layers, three pooling layers, one Flatten layer, three fully connected layers, and two Dropout layers.The activation function of the first two fully connected layers is Relu to speed up the network training, and the activation function of the final embedding representation layer is Sigmoid, which aims to make the size of the output embedding representation of [0, 1] in each dimension.Finally, the Euclidean of the output embedding is calculated and the data are determined as normal or attack samples by the distance.

Experimental Validation
For the experimental evaluation, we use two benchmark datasets, CICIDS17 and KD-DCUP99, which have a long timespan and are sufficient to validate the network intrusion detection to which MCLDM is applicable.

Dataset Description
CICIDS17 was collected by the Canadian Institute of Cyber Security in 2017 and contains a total of 5 days of network intrusion logs collected from 3 July 2017 to 7 July 2017, where each network traffic sample includes 79 characteristics of information.The dataset includes eight data types, one normal sample and seven attack samples.The data contain 100,000 training sets and 900,000 test sets.The amount of normal traffic is much larger than the amount of attack data (80% vs. 20%) in both the training and test sets.
The KDDCUP99 dataset was adopted in the KDD competition in 1999, and has been frequently used to evaluate network intrusion-detection models since then.The dataset consists of five data types, one normal sample and four attack samples.The dataset consists of 494,021 training samples and 31,029 test samples.The amount of normal traffic is much smaller than the amount of attack data (19.7% vs. 80.5%) in both the training and test sets.The specific database description is shown in Table 3.In the dataset description we labeled the imbalance of the dataset.The data are unbalanced during training and testing.4).
where X std denotes the representation of the input data sample X after normalization, X min denotes the minimum amount of some feature value, and X max denotes the maximum amount of some feature value.

Evaluation Metrics
Accuracy, Precision, Recall, and F1-Score metrics to evaluate the performance of the proposed MCLDM are calculated as in Table 4.All results are calculated from four variables: TN denotes the number of correctly predicted normal samples, TP denotes the number of correctly predicted attack samples, FN denotes the number of incorrectly predicted normal samples, and FP denotes the number of incorrectly predicted attack number samples.This section may be divided by subheadings.It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Observation Model Using Performance Metrics
We validated the MCLDM using Accuracy, Precision, Recall, and F1-Score evaluation metrics separately, as shown in Figure 3, and MCLDM performed better on each evaluation metric.The performance of Accuracy, Recall, and F1-Score on data set CICIDS17 is better than that on data set KDDCUP99, and the performance of Precision on data set KDDCUP99 is better than that on CICIDS17.Precision was 99.69%, Recall was 92.69%, and F1-Score was 96.06%.In the data set KDDCUP99, the Accuracy was 98.43%, Precision was 98.65%, Recall was 97.17%, and F1-Score was 97.93%.We measured the detection time of each sample by adjusting the size of the test CICIDS and KDDCUP99 datasets separately, and we found that changing the size of the data did not affect the test results of the samples.was 98.65%, Recall was 97.17%, and F1-Score was 97.93%.We measured the detection time of each sample by adjusting the size of the test CICIDS and KDDCUP99 datasets separately, and we found that changing the size of the data did not affect the test results of the samples.

Embedding Analysis
We analyze the embeddings of MCLDM output to show how MCLDM performs contrastive learning to distinguish normal network flows from attack network flows.MCLDM is made easier to distinguish between normal and attack data streams by decreasing the Euclidean distance between the anchor point and the positive sample embedding representation, i.e., decreasing n Dis , while increasing the Euclidean distance between the anchor point and the negative sample, i.e., increasing a Dis .Figures 5 and 6 show the difference in the values of n Dis and a Dis during training and testing in datasets CICIDS17 and KDDCUP99, respectively.By analyzing Figures 4 and 5, it is obvious that the distance between the anchor embedding and the positive sample embedding is  was 98.65%, Recall was 97.17%, and F1-Score was 97.93%.We measured the detectio of each sample by adjusting the size of the test CICIDS and KDDCUP99 dataset rately, and we found that changing the size of the data did not affect the test result samples.

Embedding Analysis
We analyze the embeddings of MCLDM output to show how MCLDM perform trastive learning to distinguish normal network flows from attack network MCLDM is made easier to distinguish between normal and attack data streams creasing the Euclidean distance between the anchor point and the positive sample e ding representation, i.e., decreasing n Dis , while increasing the Euclidean distan tween the anchor point and the negative sample, i.e., increasing

Embedding Analysis
We analyze the embeddings of MCLDM output to show how MCLDM performs contrastive learning to distinguish normal network flows from attack network flows.MCLDM is made easier to distinguish between normal and attack data streams by decreasing the Euclidean distance between the anchor point and the positive sample embedding representation, i.e., decreasing Dis n , while increasing the Euclidean distance between the anchor point and the negative sample, i.e., increasing Dis a .Figures 5 and 6 show the difference in the values of Dis n and Dis a during training and testing in datasets CICIDS17 and KDDCUP99, respectively.By analyzing Figures 4 and 5, it is obvious that the distance between the anchor embedding and the positive sample embedding is much smaller than that between the anchor embedding and the negative sample embedding.By comparing the embedding representation analysis of the training and test sets, it is obvious that both achieve the expected results.
much smaller than that between the anchor embedding and the negative sample embedding.By comparing the embedding representation analysis of the training and test sets, it is obvious that both achieve the expected results.For this purpose, we use four architectural configurations as a baseline, which are defined by removing autoencoder, data construction, and convolutional neural network from MCLDM.We have considered the following four architectures.much smaller than that between the anchor embedding and the negative sample embedding.By comparing the embedding representation analysis of the training and test sets, it is obvious that both achieve the expected results.For this purpose, we use four architectural configurations as a baseline, which are defined by removing autoencoder, multichannel data construction, and convolutional neural network from MCLDM.We have considered the following four architectures.For this purpose, we use four architectural configurations as a baseline, which are defined by removing autoencoder, multichannel data construction, and convolutional neural network from MCLDM.We have considered the following four architectures.We verified the performance of MCLDM, NN, ANN, CNN, and ACNN in datasets CICIDS17 and KDDCUP99, respectively.In the Table 5, we show the results of Accuracy and F1-score in different datasets, respectively, where MCLDM outperforms the other structures.This also proves that the combination of autoencoder, multichannel data reconstruction, and convolutional neural network is beneficial to obtain better accuracy in network intrusion detection.To verify that MCLDM remains robust in the data type imbalance phenomenon, we performed a validation using dataset CICIDS17, that was collected in a realistic network scenario that includes data-type imbalance sites, including 20% of attack data and 80% of normal data.We performed the validation by adjusting the amount of attack data in the dataset, and these experimental data include all normal data and different percentages of attack samples.We conducted four experiments using 100%, 75%, 50%, and 25% attack samples, respectively.Different degrees of data imbalance scenarios are achieved by reducing the use of attack samples for training.By experimenting with MCLDM, NN, ANN, CNN, and ACNN structures, the F1-score changes as shown in Figure 7.We find that the F1-score decreases for all algorithms, but MCLDM continues to outperform the other structures for different degrees of data imbalance, which also proves that MCLDM can handle data imbalance scenarios.
(d) ACNN: It is a structure similar to CNN, but differs from the MCLDM structure in that it lacks the step of multi-channel data reconstruction.
We verified the performance of MCLDM, NN, ANN, CNN, and ACNN in datasets CICIDS17 and KDDCUP99, respectively.In the Table 5, we show the results of Accuracy and F1-score in different datasets, respectively, where MCLDM outperforms the other structures.This also proves that the combination of autoencoder, multichannel data reconstruction, and convolutional neural network is beneficial to obtain better accuracy in network intrusion detection.To verify that MCLDM remains robust in the data type imbalance phenomenon, we performed a validation using dataset CICIDS17, that was collected in a realistic network scenario that includes data-type imbalance sites, including 20% of attack data and 80% of normal data.We performed the validation by adjusting the amount of attack data in the dataset, and these experimental data include all normal data and different percentages of attack samples.We conducted four experiments using 100%, 75%, 50%, and 25% attack samples, respectively.Different degrees of data imbalance scenarios are achieved by reducing the use of attack samples for training.By experimenting with MCLDM, NN, ANN, CNN, and ACNN structures, the F1-score changes as shown in Figure 7.We find that the F1-score decreases for all algorithms, but MCLDM continues to outperform the other structures for different degrees of data imbalance, which also proves that MCLDM can handle data imbalance scenarios.

Comparison with Competitive Rivals
Our experiments are trained and predicted on CICIDS17, KDDCUP99.We compare the proposed MCLDM with competing adversaries.We compare the performance of Accuracy and F1-score metrics with competing lines in the recent state-of-the-art literature, respectively.
The Accuracy and F1-Score of MCLDM in all datasets are reported in Table 3, and a comparison of the data in Table 6 shows that good results were obtained for both Accuracy and F1-score metrics of MCLDM in datasets CICIDS17 and KDDCUP99.In the literature 24, good experimental results were also obtained using AE combined with deep metric learning (DML), where DML is the paradigm of contrastive learning, respectively.The contrastive analysis shows that MCLDM outperforms the latest competing model on dataset CICIDS17 by 0.19% for the Accuracy metric and 2.23% for the F1-score metric than the latest competing model.MCLDM exceeds the Accuracy metric of the latest competitive model by 0.44% on the dataset KDDCUP99, and the F1-score is 0.26% higher than the latest competitive model.

Conclusions
In this study, we propose a novel model for network intrusion detection, which uses autoencoders to reconstruct and noise-reduce the raw one-dimensional network stream data; calculates the cross-correlation matrix of the reconstructed data; combines the crosscorrelation matrix to obtain multi-channel data; and, finally, uses a TCNN to achieve feature extraction of multi-channel data.Contrastive learning is achieved according to the prescribed loss function to achieve the training effect and show a good accuracy in the test set.The main idea is that after noise reduction, the data are combined into multi-channel data by autocorrelation calculation, so that the feature difference between different types of data will increase, which helps to distinguish the normal network flow from the abnormal network flow.By comparing with other techniques, we are good at dealing with the data imbalance problem, we used TCNN and finally designed MCLDM.MCLDM has some limitations, and the model is to transform the network intrusion-detection problem into a binary classification problem, and does not implement multiple classifications of network intrusions; therefore, the specific classes of network intrusions are not known, so our next task will be to implement network into specific classifications.
We evaluated the effectiveness of MCLDM using two benchmark datasets.The experimental analysis fully demonstrates the effectiveness of our proposed MCLDM approach.In particular, it has a good performance in dealing with the problem when the data are unbalanced.It is experimentally demonstrated that the multi-channel data constructed by class-specific autoencoders and using unsupervised contrastive learning helps to separate the different classes of data and finally achieve network intrusion detection.

#TCNN training stage 14 Φ
← train TCNN(Triplet) 15 Return g n , g a , Φ Electronics 2023, 12, x FOR PEER REVIEW 5 of 16 used as the input of the TCNN to learn the vector features of the training set.(7) The triplet loss is obtained by calculating different class-embedding representations to realize the training of the model in the state of data type imbalance.

Figure 2 .Finally
Figure 2. MCLDM prediction stage.Finally, compare the distance between MCLDM is implemented in python 3.8, and the framework used for the deep neural network is TensorFlow 2.8 with Keras 2.8., where the API for data preprocessing includes

5. 3 . 1 .
Model Accuracy and Loss VariationThe Accuracy and loss of our proposed MCLDM during the training stage are plotted in Figure4above, from which can be seen the fact that MCLDM changes very rapidly with increasing epochs at the beginning of the training set, indicating that the proposed MCLDM has good learning performance.As the epoch increases, the changes in Accuracy and loss level off gradually.When the Accuracy and loss curves of MCLDM start to deviate from the level of flatness, we stop the training of MCLDM.

Figure 4 .
Figure 4. Accuracy and loss variation when models are trained using CICIDS17, KDDCUP99 dataset.

5. 3 . 1 .
Model Accuracy and Loss VariationThe Accuracy and loss of our proposed MCLDM during the training stage are plotted in Figure4above, from which can be seen the fact that MCLDM changes very rapidly with increasing epochs at the beginning of the training set, indicating that the proposed MCLDM has good learning performance.As the epoch increases, the changes in Accuracy and loss level off gradually.When the Accuracy and loss curves of MCLDM start to deviate from the level of flatness, we stop the training of MCLDM.

5. 3 . 1 .
Model Accuracy and Loss VariationThe Accuracy and loss of our proposed MCLDM during the training stage are in Figure4above, from which can be seen the fact that MCLDM changes very with increasing epochs at the beginning of the training set, indicating that the pr MCLDM has good learning performance.As the epoch increases, the changes in Ac and loss level off gradually.When the Accuracy and loss curves of MCLDM start t ate from the level of flatness, we stop the training of MCLDM.
and testing tasets CICIDS17 and KDDCUP99, respectively.By analyzing Figures4 and 5, it is o that the distance between the anchor embedding and the positive sample embed

Figure 4 .
Figure 4. Accuracy and loss variation when models are trained using CICIDS17, KDDCUP99 dataset.

Figure 5 .
Figure 5. Embedding analysis (on the CICIDS17 dataset) is shown on the left on the training set and on the right on the test set.

Figure 6 .
Figure 6.Embedding analysis (on the KDDCUP99 dataset) is shown on the left on the training set and on the right on the test set.5.3.3.Ablation Study Our research continues to analyze the following reasons that together contribute to the good accuracy of MCLDM in network intrusion detection.(1) Data reconstruction and synthesis of additional information are achieved through an autoencoder.(2) The multi-channel representation enriches the features of the raw data and increases the gap between different classes of data.(3) TCNN can contribute to the accuracy of MCLDM.
(a) NN: It is fully connected by the last three layers of the MCLDM architecture.(b) ANN: It is composed of an autoencoder and NN structure.This structure contains an autoencoder and can explain the advantages of using an autoencoder.(c) CNN: It is composed of CNN structure in MCLDM structure, and its principle is similar to NN.It is required to convert the input sample   X to   cor X before training the model.

Figure 5 .
Figure 5. Embedding analysis (on the CICIDS17 dataset) is shown on the left on the training set and on the right on the test set.

Figure 5 .
Figure 5. Embedding analysis (on the CICIDS17 dataset) is shown on the left on the training set and on the right on the test set.

Figure 6 .
Figure 6.Embedding analysis (on the KDDCUP99 dataset) is shown on the left on the training set and on the right on the test set.5.3.3.Ablation Study Our research continues to analyze the following reasons that together contribute to the good accuracy of MCLDM in network intrusion detection.(1) Data reconstruction and synthesis of additional information are achieved through an autoencoder.(2) The multi-channel representation enriches the features of the raw data and increases the gap between different classes of data.(3) TCNN can contribute to the accuracy of MCLDM.
(a) NN: It is fully connected by the last three layers of the MCLDM architecture.(b) ANN: It is composed of an autoencoder and NN structure.This structure contains an autoencoder and can explain the advantages of using an autoencoder.(c) CNN: It is composed of CNN structure in MCLDM structure, and its principle is similar to NN.It is required to convert the input sample  

Figure 6 .
Figure 6.Embedding analysis (on the KDDCUP99 dataset) is shown on the left on the training set and on the right on the test set.5.3.3.Ablation Study Our research continues to analyze the following reasons that together contribute to the good accuracy of MCLDM in network intrusion detection.(1) Data reconstruction and synthesis of additional information are achieved through an autoencoder.(2) The multi-channel representation enriches the features of the raw data and increases the gap between different classes of data.(3) TCNN can contribute to the accuracy of MCLDM.
(a) NN: It is fully connected by the last three layers of the MCLDM architecture.(b) ANN: It is composed of an autoencoder and NN structure.This structure contains an autoencoder and can explain the advantages of using an autoencoder.(c) CNN: It is composed of CNN structure in MCLDM structure, and its principle is similar to NN.It is required to convert the input sample [X] to [X cor ] before training the model.(d) ACNN: It is a structure similar to CNN, but differs from the MCLDM structure in that it lacks the step of multi-channel data reconstruction.

Figure 7 .
Figure 7. F1-score values of MCLDM, NN, ANN, CNN, and ACNN with the number of attacks in the CICIDS2017 dataset.

Figure 7 .
Figure 7. F1-score values of MCLDM, NN, ANN, CNN, and ACNN with the number of attacks in the CICIDS2017 dataset.
and build a triplet training sample.We denote [X cor , X cor ] as an Anchor sample, [X cor , X + cor ] as a Positive sample, and [X cor , X − cor ] as a Negative sample.When label = normal, we specify [X n cor , X n cor ] as the Anchor sample, [X n cor , X n+ cor ] as the Positive sample, and [X n cor , X n− cor ] as the Negative sample.When label = abnormal, we specify [X a cor , X a+ cor ] as the Anchor sample, [X a cor , X a− cor ] as the Positive sample, and [X a cor , X a+ cor ] as the Negative sample.By multi-channel data reconstruction, the feature extraction of one-dimensional vectors is transformed into the extraction of two-dimensional multi-channel data, increasing the gap between different categories and helping to cope with training data imbalance.

Table 3 .
Description of the data set.Using two datasets for the training stage, we select the training set as the training sample and the test sample by random stratified sampling, where the training sample is 80% of the training set and the test sample is 20% of the training set.In the laboratory, we use data normalization.The purpose of normalizing the data is to reduce the training time of MCLDM and to accelerate the convergence of MCLDM.After hot-coding the data, we use the min-max scalar technique in Scikit-learn, with Equation (