Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Network Intrusion Detection Based on an Efficient Neural Architecture Search

Symmetry 2021, 13(8), 1453; https://doi.org/10.3390/sym13081453

by Renjian Lyu¹, Mingshu He^2,*

, Yu Zhang², Lei Jin¹

and Xinlei Wang²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Symmetry 2021, 13(8), 1453; https://doi.org/10.3390/sym13081453

Submission received: 10 June 2021 / Revised: 22 July 2021 / Accepted: 3 August 2021 / Published: 9 August 2021

(This article belongs to the Section Computer)

Round 1

Reviewer 1 Report

The authors of this work propose a model that applies Neural Architecture Search (NAS) in the field of network traffic classification and search for the optimal architecture suitable for traffic detection based on the network traffic dataset. Each layer of our depth model is constructed according to the principle of maximum coding rate attenuation, which has strong consistency and symmetry in structure.

Overall, the application of NAS in the field of intrusion detection is interesting. However, some concerns need to be addressed before the work can be accepted for publication:

The authors need to compare their work with recent intrusion detection approaches such as:
- Resource-aware detection and defense system against multi-type attacks in the cloud: Repeated bayesian stackelberg game. IEEE Transactions on Dependable and Secure Computing(2019).
- How to distribute the detection load among virtual machines to maximize the detection of distributed attacks in the cloud?. In 2016 IEEE International Conference on Services Computing (SCC), pp. 316-323. IEEE, 2016.
- Machine learning algorithms in context of intrusion detection. In 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 369-373. IEEE, 2016.
- An adaptive ensemble machine learning model for intrusion detection." IEEE Access7 (2019): 82512-82521.
- A new hybrid approach for intrusion detection using machine learning methods." Applied Intelligence49, no. 7 (2019): 2735-2761.

It is advised that the authors evaluate the performance of their solution on more recent intrusion detection datasets (e.g., 2020, 2019).

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] 1. Moderate English changes required.

[Response] Thank you very much for your valuable suggestion. We modified several descriptions to improve the quality. Some of them are shown as follows:

Revised content	Line Number
Before revision: In addition, we introduce a surrogate model in the search task to improve the efficiency of NAS. After revision: In addition, we introduce a surrogate model in the search task.	17
Before revision: The work of these studies often focuses on feature selection After revision: These studies often focused on feature selection	31
Before revision: but there are few studies on the topological structure of classification models. After revision: but there were few studies on the topological structure of classification models.	33
Before revision: that are carefully designed by researchers in the field of image recognition. After revision: that were carefully designed by researchers in the field of image recognition.	35
Before revision: This causes the research focus of researchers in the field of traffic classification After revision: This causes the focus of researchers in the field of traffic classification	39
Before revision: The continuous advancement of NAS research and application has made people realize that After revision: With the continuous advancement of NAS research and application, people have realized that	42-43
Before revision: surrogate model was adopted to predict the performance of candidate architectures to navigate the direction of the architecture search task. After revision: surrogate model was adopted to predict the performance of candidate architectures which can navigate the direction of the architecture search task.	65
Before revision: the network architecture most suitable for traffic datasets can be better discovered After revision: the network architecture most suitable for traffic datasets can be easily discovered	83
Before revision: the search space of the network architecture is improved by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset and improve the performance of the search model. After revision: by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset, the performance of the search model is improved, so as to improve the search space of the network architecture.	92-95
Before revision: strategy After revision: strategies	102
Before revision: proposes After revision: proposed	112
Before revision: uses After revision: used	114
Before revision: which used several deep learning models to learn the different data distributions of clusters. After revision: which used several deep learning models to learn different data distributions of clusters.	141
Before revision: in the traffic classification task to improve the classification performance of the model. After revision: in the traffic classification task, which improved the classification performance of the model.	145
Before revision: which is key to After revision: which is also the key to	149
Before revision: takes After revision: took	168
Before revision: Combined with a genetic algorithm After revision: Combined with the genetic algorithm	170
Before revision: Darts [33] weakens the discrete search space into a continuous search space and searches a high-performance network architecture with complex graphical topology. After revision: Darts [33] weakened the discrete search space into a continuous search space and searches the high-performance network architecture with complex graphical topology.	172-173
Before revision: At the same time After revision: Meanwhile	174
Before revision: studies After revision: studied	174
Before revision: optimizes After revision: optimized	175
Before revision: to more rapidly and effectively find the appropriate architecture After revision: to find the appropriate architecture more rapidly and effectively	178
Before revision: proposes After revision: proposed	181
Before revision: uses After revision: used	182
Before revision: which enables the model to effectively balance global exploration and local exploration. After revision: which enabled the model to balance global exploration and local exploration more effectively.	189-190
Before revision: the candidate architecture is trained on the training dataset, and its performance indicators are obtained on the verification dataset After revision: the candidate architecture was trained on the training dataset, and its performance indicators were obtained on the verification dataset	193-194
Before revision: consumes After revision: consumed	195
Before revision: regards After revision: regarded	197
Before revision: some artificial designs, such as skip connections, are introduced After revision: some artificial designs are introduced, such as skip connections	216-217
Before revision: the relevant operation block in the Inception is introduced to After revision: the relevant operation block is introduced in the Inception to	236
Before revision: different receptive fields to obtain After revision: different receptive fields, which can obtain	236
Before revision: how to more rapidly and effectively find the appropriate network architecture. After revision: how to find the appropriate network architecture more rapidly and effectively	249-250
Before revision: The evolutionary algorithm is a widely used algorithm in architecture search. After revision: The evolutionary algorithm has been widely used in architecture search.	250
Before revision: The optimization objectives in the architecture search process are usually multiple objectives. After revision: The optimization objectives in the architecture search process are usually multiple.	251
Before revision: MOPSO will choose one as the global optimal solution according to the crowding degree After revision: MOPSO will choose one according to the crowding degree	305
Before revision: Large capacity attacks After revision: High capacity attacks	394
Before revision: The data in the ISCXIDS2012 After revision: The data in ISCXIDS2012	411
Before revision: Based on the concept of a configuration file After revision: Based on the concept of configuration file	414
Before revision: The hexadecimal number is After revision: The hexadecimal numbers are	449
Before revision: The sub algebra items generated by each iteration number 40 After revision: The number of sub algebra items generated by each iteration is 40	467-468
Before revision: one...another... After revision: the first...the second...	489-491
Before revision: the tau index of none of the three surrogate models remains the best. After revision: none of the three surrogate models’ tau index remains the best.	581
Before revision: In the experiment, we use the AS model, which adaptively selects different surrogate models in the search task, instead of the above three surrogate models. After revision: In the experiment, instead of using the above three surrogate models, we use the AS model, which adaptively selects different surrogate models in the search task.	583-584
Before revision: is not high After revision: is low	606
Before revision: which is not good After revision: which is poor	610
Before revision: the Pareto front After revision: the Pareto frontier	615
Before revision: high After revision: higher	620
Before revision: Compared with the general... After revision: First, compared with the general...	635
Before revision: In addition After revision: Third	646
Before revision: and (3) expansion of the number After revision: (3) expansion of the number	657

[Comment] 2. The authors need to compare their work with recent intrusion detection approaches such as:

Resource-aware detection and defense system against multi-type attacks in the cloud: Repeated bayesian stackelberg game. IEEE Transactions on Dependable and Secure Computing(2019).
How to distribute the detection load among virtual machines to maximize the detection of distributed attacks in the cloud?. In 2016 IEEE International Conference on Services Computing (SCC), pp. 316-323. IEEE, 2016.
Machine learning algorithms in context of intrusion detection. In 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 369-373. IEEE, 2016.
An adaptive ensemble machine learning model for intrusion detection." IEEE Access7 (2019): 82512-82521.
A new hybrid approach for intrusion detection using machine learning methods." Applied Intelligence49, no. 7 (2019): 2735-2761.

[Response] Thank you very much for your valuable suggestion. We have carefully studied the references you provided. Considering the design of the overall architecture of our model, it is not suitable for some datasets composed of limited features (KDD99 and NSL-KDD). But we found that their work in the field of anomaly detection has a very important reference value. Therefore, we add the work of these references to the citation and select other papers for comparative experiments. The specific amendments are as follows:

Revised content	Line Number
Ref. [42] proposed a repeated Bayesian Stackelberg game based on machine learning technology, which improved the detection performance of cloud based systems and has high operation efficiency. Ref. [43] proposed a resource aware maxmin game theoretical model, which improved the detection probability of distributed attacks in multiple users’ virtual machines (VMs), reduced the false positive rate of anomaly detection system, and improved the utilization efficiency of resources in the detection process. Ref. [44] compared the classification effects of different machine learning on KDD99 dataset. The machine learning algorithms includes SVM, naive Bayes, J.48 and decision table. Ref. [45] designed an adaptive ensemble machine learning model, which integrated the decision tree, random forest, KNN, DNN and other basic classifiers, An accuracy rate of 85.2% was achieved by adopting the adaptive voting algorithm on the NSL-KDD dataset. Ref. [46] proposed a hybrid layered intrusion detection system, which combined different machine learning algorithms and feature selection techniques to achieve higher accuracy and lower false positive rate on NSL-KDD dataset.	121-135
CIC-DDoS2019: In the dataset CIC-DDoS2019, researchers analyzed new attacks that can be executed using TCP/UDP-based protocols at the application layer and proposed new classifications: Reflection-based DDoS and Exploitation-based attacks. They all accomplish the attack by using a legitimate third-party component to hide the attacker's identity. In the former attack type, the attack can be executed through the application layer protocol, using the transport layer protocol, that is, the Transmission control protocol (TCP), User datagram protocol (UDP), or a combination of the two. For the latter, attack can also be performed through application-layer protocols, using transport-layer protocols such as TCP and UDP. The dataset uses a B-Profile system to describe the abstract behavior of human interaction and to generate natural benign background traffic in the proposed testbed. The dataset builds abstract behaviors for 25 users based on HTTP, HTTPS, FTP, SSH, and email protocols. In order to facilitate the test, we used only the first day's traffic data in our study and limited the number of samples to alleviate data imbalances. The specific distribution is shown in Table 4.	418-431
In addition, we compare the recent research methods in anomaly detection direction in table 11. From the comparison results, we can see that our model has a great improvement in performance indicators on different datasets, which proves the effectiveness of NAS model.	628-631

In addition, we have added two new tables to introduce the newly added dataset and supplement comparison respectively. The table names are “Table 4. Data distribution of dataset CIC-DDoS2019.” and “Table 11. Experiment results and comparation of different datasets.”

[Comment] 3. It is advised that the authors evaluate the performance of their solution on more recent intrusion detection datasets (e.g., 2020, 2019).

[Response] Thank you very much for your valuable suggestion. On the basis of the original two datasets, we added one more dataset as comparative experiment, namely CIC-DDoS2019.

Thanks again for your efforts. In addition to the above revisions, more than 50 modifications have been made to improve the quality of this paper. For easy review, we highlighted the changes in the PDF. If there are any questions, please feel free to contact me. If there are any questions, please feel free to contact me.

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Reviewer 2 Report

In my opinion, the subject of the article is important and necessary. Detection of intruders in the network is especially important nowadays when due to a pandemic most people work remotely. The methods of detecting intruders in the network should be constantly verified and developed to prevent security incidents. The use of various methods of artificial intelligence for this purpose is the right solution. This article is well written. The content is understandable for the reader. I can indicate two suggestions for change in the article. Figures 7 and 9 are hardly legible to me. The authors also have a problem with the MDPI template. The drawings are not in the correct order. The authors did not specify the journal to which the article was submitted. The pdf with the article generates an extra blank page.

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] 1. In my opinion, the subject of the article is important and necessary. Detection of intruders in the network is especially important nowadays when due to a pandemic most people work remotely. The methods of detecting intruders in the network should be constantly verified and developed to prevent security incidents. The use of various methods of artificial intelligence for this purpose is the right solution. This article is well written. The content is understandable for the reader.

[Response] Thank you very much for your valuable suggestion. And we are pleased to get your recognition of our work. In order to further improve the quality of the paper, we modified several descriptions. Some of them are shown as follows:

Revised content	Line Number
Before revision: In addition, we introduce a surrogate model in the search task to improve the efficiency of NAS. After revision: In addition, we introduce a surrogate model in the search task.	17
Before revision: The work of these studies often focuses on feature selection After revision: These studies often focused on feature selection	31
Before revision: but there are few studies on the topological structure of classification models. After revision: but there were few studies on the topological structure of classification models.	33
Before revision: that are carefully designed by researchers in the field of image recognition. After revision: that were carefully designed by researchers in the field of image recognition.	35
Before revision: This causes the research focus of researchers in the field of traffic classification After revision: This causes the focus of researchers in the field of traffic classification	39
Before revision: The continuous advancement of NAS research and application has made people realize that After revision: With the continuous advancement of NAS research and application, people have realized that	42-43
Before revision: surrogate model was adopted to predict the performance of candidate architectures to navigate the direction of the architecture search task. After revision: surrogate model was adopted to predict the performance of candidate architectures which can navigate the direction of the architecture search task.	65
Before revision: the network architecture most suitable for traffic datasets can be better discovered After revision: the network architecture most suitable for traffic datasets can be easily discovered	83
Before revision: the search space of the network architecture is improved by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset and improve the performance of the search model. After revision: by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset, the performance of the search model is improved, so as to improve the search space of the network architecture.	92-95
Before revision: strategy After revision: strategies	102
Before revision: proposes After revision: proposed	112
Before revision: uses After revision: used	114
Before revision: which used several deep learning models to learn the different data distributions of clusters. After revision: which used several deep learning models to learn different data distributions of clusters.	141
Before revision: in the traffic classification task to improve the classification performance of the model. After revision: in the traffic classification task, which improved the classification performance of the model.	145
Before revision: which is key to After revision: which is also the key to	149
Before revision: takes After revision: took	168
Before revision: Combined with a genetic algorithm After revision: Combined with the genetic algorithm	170
Before revision: Darts [33] weakens the discrete search space into a continuous search space and searches a high-performance network architecture with complex graphical topology. After revision: Darts [33] weakened the discrete search space into a continuous search space and searches the high-performance network architecture with complex graphical topology.	172-173
Before revision: At the same time After revision: Meanwhile	174
Before revision: studies After revision: studied	174
Before revision: optimizes After revision: optimized	175
Before revision: to more rapidly and effectively find the appropriate architecture After revision: to find the appropriate architecture more rapidly and effectively	178
Before revision: proposes After revision: proposed	181
Before revision: uses After revision: used	182
Before revision: which enables the model to effectively balance global exploration and local exploration. After revision: which enabled the model to balance global exploration and local exploration more effectively.	189-190
Before revision: the candidate architecture is trained on the training dataset, and its performance indicators are obtained on the verification dataset After revision: the candidate architecture was trained on the training dataset, and its performance indicators were obtained on the verification dataset	193-194
Before revision: consumes After revision: consumed	195
Before revision: regards After revision: regarded	197
Before revision: some artificial designs, such as skip connections, are introduced After revision: some artificial designs are introduced, such as skip connections	216-217
Before revision: the relevant operation block in the Inception is introduced to After revision: the relevant operation block is introduced in the Inception to	236
Before revision: different receptive fields to obtain After revision: different receptive fields, which can obtain	236
Before revision: how to more rapidly and effectively find the appropriate network architecture. After revision: how to find the appropriate network architecture more rapidly and effectively	249-250
Before revision: The evolutionary algorithm is a widely used algorithm in architecture search. After revision: The evolutionary algorithm has been widely used in architecture search.	250
Before revision: The optimization objectives in the architecture search process are usually multiple objectives. After revision: The optimization objectives in the architecture search process are usually multiple.	251
Before revision: MOPSO will choose one as the global optimal solution according to the crowding degree After revision: MOPSO will choose one according to the crowding degree	305
Before revision: Large capacity attacks After revision: High capacity attacks	394
Before revision: The data in the ISCXIDS2012 After revision: The data in ISCXIDS2012	411
Before revision: Based on the concept of a configuration file After revision: Based on the concept of configuration file	414
Before revision: The hexadecimal number is After revision: The hexadecimal numbers are	449
Before revision: The sub algebra items generated by each iteration number 40 After revision: The number of sub algebra items generated by each iteration is 40	467-468
Before revision: one...another... After revision: the first...the second...	489-491
Before revision: the tau index of none of the three surrogate models remains the best. After revision: none of the three surrogate models’ tau index remains the best.	581
Before revision: In the experiment, we use the AS model, which adaptively selects different surrogate models in the search task, instead of the above three surrogate models. After revision: In the experiment, instead of using the above three surrogate models, we use the AS model, which adaptively selects different surrogate models in the search task.	583-584
Before revision: is not high After revision: is low	606
Before revision: which is not good After revision: which is poor	610
Before revision: the Pareto front After revision: the Pareto frontier	615
Before revision: high After revision: higher	620
Before revision: Compared with the general... After revision: First, compared with the general...	635
Before revision: In addition After revision: Third	646
Before revision: and (3) expansion of the number After revision: (3) expansion of the number	657

[Comment] 2. I can indicate two suggestions for change in the article. Figures 7 and 9 are hardly legible to me.

[Response] Thank you very much for your valuable suggestion. We readjusted the image format in the paper, adjusted the font size and changed the output format of the image (from PNG to PDF) to display more clearly. In addition, we enlarged the image size to make it more appropriate.

[Comment] 3. The authors also have a problem with the MDPI template.

[Response] Thank you very much for your valuable suggestion. We checked the template format and revised some errors.

[Comment] 4. The drawings are not in the correct order.

[Response] Thank you very much for your valuable suggestion. We found that Figure 8 and Figure 9 are not in the right order, so we adjusted the order of the pictures according to the order.

[Comment] 5. The authors did not specify the journal to which the article was submitted.

[Response] Because each layer of our depth model is constructed according to the principle of maximum coding rate attenuation, which has strong consistency and symmetry in structure, we intend to submit our article to Symmetry.

[Comment] 6. The pdf with the article generates an extra blank page.

[Response] Thank you very much for your valuable suggestion. We have removed the extra blank page generated due to format problems.

Thanks again for your efforts. In addition to the above revisions, more than 50 modifications have been made to improve the quality of this paper. If there are any questions, please feel free to contact me.

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have presented a architecture search machine learning for network traffic detection. Following are the comments for improvement.

1)There are various typo in equation and languages. Please proof read. For example , the F1 score should be addition in the denominator and missing equation (??) on page 8

2)Some equation notation is not explained such as arch in the algorithm 1 and equation (5) is not explained. The update of gbest and pbest are not explained.

3)Is equation (4) correct? Maximize on the right equate to minimise on the left. Please explain if it is.

4)Please explain the conclusion of choosing nxn kernels than (n+2)x(n+2). Similarly for 1xn and nx1 convolution kernel over nxn.

5)Novelty of the solution as it seems to a be ensemble model with surrogate to choose the best solution. Is there other similar ensemble model? Please state the novelty and the time incurred over traditional ensemble method if any.

6)Please explain the 3 surrogate model and the algorithm used.

Author Response

Dear reviewer,

First of all, thank you for your patient guidance. The instrumental comments will further improve our work. According the comments, we revised the paper and the details are described here.

[Comment] 1. Moderate English changes required.

[Response] Thank you very much for your valuable suggestion. We modified several descriptions to improve the quality. Some of them are shown as follows:

Revised content	Line Number
Before revision: In addition, we introduce a surrogate model in the search task to improve the efficiency of NAS. After revision: In addition, we introduce a surrogate model in the search task.	17
Before revision: The work of these studies often focuses on feature selection After revision: These studies often focused on feature selection	31
Before revision: but there are few studies on the topological structure of classification models. After revision: but there were few studies on the topological structure of classification models.	33
Before revision: that are carefully designed by researchers in the field of image recognition. After revision: that were carefully designed by researchers in the field of image recognition.	35
Before revision: This causes the research focus of researchers in the field of traffic classification After revision: This causes the focus of researchers in the field of traffic classification	39
Before revision: The continuous advancement of NAS research and application has made people realize that After revision: With the continuous advancement of NAS research and application, people have realized that	42-43
Before revision: surrogate model was adopted to predict the performance of candidate architectures to navigate the direction of the architecture search task. After revision: surrogate model was adopted to predict the performance of candidate architectures which can navigate the direction of the architecture search task.	65
Before revision: the network architecture most suitable for traffic datasets can be better discovered After revision: the network architecture most suitable for traffic datasets can be easily discovered	83
Before revision: the search space of the network architecture is improved by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset and improve the performance of the search model. After revision: by filtering suitable operation blocks and introducing new operation blocks to adapt to the network traffic dataset, the performance of the search model is improved, so as to improve the search space of the network architecture.	92-95
Before revision: strategy After revision: strategies	102
Before revision: proposes After revision: proposed	112
Before revision: uses After revision: used	114
Before revision: which used several deep learning models to learn the different data distributions of clusters. After revision: which used several deep learning models to learn different data distributions of clusters.	141
Before revision: in the traffic classification task to improve the classification performance of the model. After revision: in the traffic classification task, which improved the classification performance of the model.	145
Before revision: which is key to After revision: which is also the key to	149
Before revision: takes After revision: took	168
Before revision: Combined with a genetic algorithm After revision: Combined with the genetic algorithm	170
Before revision: Darts [33] weakens the discrete search space into a continuous search space and searches a high-performance network architecture with complex graphical topology. After revision: Darts [33] weakened the discrete search space into a continuous search space and searches the high-performance network architecture with complex graphical topology.	172-173
Before revision: At the same time After revision: Meanwhile	174
Before revision: studies After revision: studied	174
Before revision: optimizes After revision: optimized	175
Before revision: to more rapidly and effectively find the appropriate architecture After revision: to find the appropriate architecture more rapidly and effectively	178
Before revision: proposes After revision: proposed	181
Before revision: uses After revision: used	182
Before revision: which enables the model to effectively balance global exploration and local exploration. After revision: which enabled the model to balance global exploration and local exploration more effectively.	189-190
Before revision: the candidate architecture is trained on the training dataset, and its performance indicators are obtained on the verification dataset After revision: the candidate architecture was trained on the training dataset, and its performance indicators were obtained on the verification dataset	193-194
Before revision: consumes After revision: consumed	195
Before revision: regards After revision: regarded	197
Before revision: some artificial designs, such as skip connections, are introduced After revision: some artificial designs are introduced, such as skip connections	216-217
Before revision: the relevant operation block in the Inception is introduced to After revision: the relevant operation block is introduced in the Inception to	236
Before revision: different receptive fields to obtain After revision: different receptive fields, which can obtain	236
Before revision: how to more rapidly and effectively find the appropriate network architecture. After revision: how to find the appropriate network architecture more rapidly and effectively	249-250
Before revision: The evolutionary algorithm is a widely used algorithm in architecture search. After revision: The evolutionary algorithm has been widely used in architecture search.	250
Before revision: The optimization objectives in the architecture search process are usually multiple objectives. After revision: The optimization objectives in the architecture search process are usually multiple.	251
Before revision: MOPSO will choose one as the global optimal solution according to the crowding degree After revision: MOPSO will choose one according to the crowding degree	305
Before revision: Large capacity attacks After revision: High capacity attacks	394
Before revision: The data in the ISCXIDS2012 After revision: The data in ISCXIDS2012	411
Before revision: Based on the concept of a configuration file After revision: Based on the concept of configuration file	414
Before revision: The hexadecimal number is After revision: The hexadecimal numbers are	449
Before revision: The sub algebra items generated by each iteration number 40 After revision: The number of sub algebra items generated by each iteration is 40	467-468
Before revision: one...another... After revision: the first...the second...	489-491
Before revision: the tau index of none of the three surrogate models remains the best. After revision: none of the three surrogate models’ tau index remains the best.	581
Before revision: In the experiment, we use the AS model, which adaptively selects different surrogate models in the search task, instead of the above three surrogate models. After revision: In the experiment, instead of using the above three surrogate models, we use the AS model, which adaptively selects different surrogate models in the search task.	583-584
Before revision: is not high After revision: is low	606
Before revision: which is not good After revision: which is poor	610
Before revision: the Pareto front After revision: the Pareto frontier	615
Before revision: high After revision: higher	620
Before revision: Compared with the general... After revision: First, compared with the general...	635
Before revision: In addition After revision: Third	646
Before revision: and (3) expansion of the number After revision: (3) expansion of the number	657

[Comment] 2. There are various typo in equation and languages. Please proof read. For example , the F1 score should be addition in the denominator and missing equation (??) on page 8.

[Response] Thank you very much for your valuable suggestion. We modified the formula of F1 score, changed the original "Macro-F1" of page 14 to "Weight-f1", and added the missing formula of page 8.

[Comment] 3. Some equation notation is not explained such as arch in the algorithm 1 and equation (5) is not explained. The update of gbest and pbest are not explained.

[Response] Thank you very much for your valuable suggestion. We explained the variables in algorithm 1 and added some algorithm instructions. At the same time, we also explained the variables in equation (5). The specific amendments are as follows:

Revised content

Line

Number

In MOPSO, first, a certain number of particles are initialized randomly, and fitness (multi-objective optimization index) is calculated. Then, the individual optimal solution and global optimal solution of each particle are initialized. Then, the algorithm updates the position and velocity of the particle according to the velocity formula, as in (5), and position formula, as in (6), where r1 and r2 are random numbers, w represents the internal factor, c1 represents the local velocity factor, c2 represents the global velocity factor, pbest represents the individual optimal solution and gbest represents the global optimal solution. After the velocity and position of the particles are updated, the particle fitness is recalculated, and the individual optimal solution pbest and the global optimal solution gbest are updated according to the fitness. Finally, the iteration is repeated until it converges or reaches the maximum number of iterations to obtain high-quality search results.

308-319

[Comment] 4. Is equation (4) correct? Maximize on the right equate to minimise on the left. Please explain if it is.

[Response] Thank you very much for your valuable suggestion. Equation (4) has not been explained too much in the previous version, so we have specially explained it on the basis of the original version for the convenience of understanding. The specific amendments are as follows:

Revised content

Line

Number

Eq. (4) shows another method: the Tchebycheff method (te). Here is the reference point . is equivalent to a coordinate transformation. Different from the weight aggregation of the first method, Tchebycheff method is the maximum value of comparison. That is, given a set of and input x, select the maximum value of (on the right side of the equation), and then according to the minimum objective optimization principle to select a smaller value (the left side of the equation), here x is the independent variable. One disadvantage of this method is that its aggregate function is not smooth for continuous multi-objective optimization problems, but its performance is still better than that of the ws method.

281-289

[Comment] 5. Please explain the conclusion of choosing nxn kernels than (n+2)x(n+2). Similarly for 1xn and nx1 convolution kernel over nxn.

[Response] Thank you very much for your valuable suggestion. We have explained the reasons for choosing different convolution kernels in lines 345 to 354. The specific amendments are as follows:

Revised content

Line

Number

With the same size of receptive field, the number of (n+2)×(n+2) convolution kernel parameters is , while two n×n convolution kernels is .To make the model lightweight, we use the superposition of two n×n convolution kernels to replace the (n+2)×(n+2) convolution kernel in the searchable operation structure. At the same time, in the face of some large convolution kernels, a 1×n convolution kernel and a n×1 convolution kernel are added to replace a n×n convolution kernel to reduce the number of parameters, as the number of parameters of the former is (1×n)+(n×1), which is less than n×n of the latter. In addition, different sizes of depth separable convolution and whole convolution are used to replace the general convolution kernel so that the model parameters are reduced when the receptive field is the same.

345-354

[Comment] 6. Novelty of the solution as it seems to a be ensemble model with surrogate to choose the best solution. Is there other similar ensemble model? Please state the novelty and the time incurred over traditional ensemble method if any.

[Response] Thank you very much for your valuable suggestion. There are many types of ensemble model, the purpose is to integrate the learning ability of each model and improve the generalization ability of the final model. The common model integration methods are voting, averaging, stacking and blending. In our study, the method of voting is used, that is, in each iteration process, voting is based on the correlation coefficient between the predicted results and the actual results, and the surrogate model with larger correlation coefficient is selected to predict the model performance. The advantage of voting is that it is the most intuitive and convenient of all ensemble learning methods.

[Comment] 7. Please explain the 3 surrogate model and the algorithm used.

[Response] Thank you very much for your valuable suggestion. We have divided the original "2.5. Surrogate model" into three parts, and introduced the application method of surrogate model, three different surrogate models and adaptive switching (AS) selection mechanism respectively. The specific amendments are as follows:

Revised content

Line

Number

Because substantial computing resources are needed to iteratively optimize the candidate architectures one by one to make them converge, we introduce a surrogate model to predict the performance of the model in the process of model architecture search using a genetic algorithm. The input of the model is neural network architecture coding (as shown in Figure 1), and the output is neural network performance prediction (such as accuracy, F1 value, etc.).

We use three different prediction surrogate models: multi-laver perceptron (MLP) [15], classification and regression trees (CART) [36], and Gaussian process (GP) [37]. MLP generally has three layers: input layer, hidden layer and output layer. The hidden layer and input layer are generally fully connected, while the hidden layer to output layer is generally softmax regression. CART is a kind of decision tree. CART algorithm can be used to create both classification tree and regression tree. In this study, in order to predict the performance of the model (discrete value), a regression tree is established. The steps of GP model to complete the regression task are: (1) determine the Gaussian process; (2) determine the expression of prediction points according to the posterior probability; (3) solve the super parameters by maximum likelihood; (4) input data to obtain the prediction results.

It cannot be guaranteed that every surrogate model can perform well in different classification tasks. We use an adaptive switching (AS) selection mechanism, select the best prediction model (by comparing the correlation between prediction and actual value.) in each iteration, to train three kinds of surrogate models at the same time when training the surrogate models and select the appropriate model adaptively through cross selection.

321-343

Kind Regards,

Mingshu He

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have made a considerable effort to answer the reviewers' comments. Therefore, I recommend the acceptance of the manuscript.

Article Menu

Network Intrusion Detection Based on an Efficient Neural Architecture Search

Further Information

Guidelines

MDPI Initiatives

Follow MDPI