Next Article in Journal
Improvement Breakdown Voltage by a Using Crown-Shaped Gate
Next Article in Special Issue
High Accuracy Detection of Mobile Malware Using Machine Learning
Previous Article in Journal
Handwritten Numeral Recognition Integrating Start–End Points Measure with Convolutional Neural Network
Previous Article in Special Issue
Business Email Compromise Phishing Detection Based on Machine Learning: A Systematic Literature Review
 
 
Article
Peer-Review Record

Detecting Browser Drive-By Exploits in Images Using Deep Learning

Electronics 2023, 12(3), 473; https://doi.org/10.3390/electronics12030473
by Patricia Iglesias *, Miguel-Angel Sicilia * and Elena García-Barriocanal *
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2023, 12(3), 473; https://doi.org/10.3390/electronics12030473
Submission received: 17 October 2022 / Revised: 19 December 2022 / Accepted: 20 December 2022 / Published: 17 January 2023
(This article belongs to the Special Issue High Accuracy Detection of Mobile Malware Using Machine Learning)

Round 1

Reviewer 1 Report

The article is devoted to the issue of detecting steganography in images. It's like I'm back in 2011. It was then that I first read about LSB (a method of steganography in which the lower bits of one of the RGB colors in a pixel are changed to bits of the encoded text). However, the topic of hide and seek will always be relevant. The structure of the article differs from that adopted in the MDPI for research articles (Introduction, Models and Methods, Experiments, Discussion, Conclusions). The level of English is acceptable. The article is short and easy to read. The article has 1 figure (of acceptable quality) and not a single formula. The article cites 30 sources, not all of which are relevant.

1. Let's start over. And the first thing that catches your eye is the "browser drive-by exploits" that the authors are struggling with. What is this attack? A search in the text of the article showed that this beast is found only in the title. Steganography? No, this is a harsh reality. The authors think that these vulnerabilities are well known, common and classified. I am an old cryptozoologist, so I immediately went to fill in the gap in my knowledge on my favorite sites. These are the bulletins of the National Computer Incident Coordination Center, the opencve.io website and the cvetrends.com website. And you know, I did not find “browser drive-by exploits” among the current world threats. I ask the authors to justify the relevance of the search for these particular vulnerabilities. I also ask you to clearly classify these vulnerabilities based on open sources.

2. Here https://github.com/Panda-Lewandowski/StegMachine is the original tool. It allows you to generate data, perform a visual attack separately by channels, calculate the RS-, SPA-score and visualize the Chi-square results. The authors of this sinister device are not going to stop there. Only the authors and their kind AI in the form of CNN can save the world. I ask you to prove the effectiveness of the author's approach in active confrontation. Moreover, from lines 202-208 it turns out that the authors themselves formed a dataset to test their offspring. It's not sports.

3. As I said, I am an old cryptozoologist. Ugh, cryptographer. Back in the warm, lamp year of 2011, I developed simple rules for the competent use of LSB. Here they are:

- Embed the message in random bytes; - Minimize the amount of embedded information as much as possible (remember Uncle Hamming); - Use ±1 coding; - Choose pictures with noisy LSB; - Use images that didn't appear anywhere. I ask the authors to show that their offspring can withstand my experience. At the same time, there will be more figures in the article.

4. And now jokes aside. Steganography is a mathematical discipline. I ask you to analytically substantiate the consistency of the given in Fig. 1 architecture. Why is she like this? Why hasn't it been optimized? Why did the authors, when solving the recognition problem, not calculate the probabilities of errors of the first and second kind? Why didn't they build ROCK curves? Why, using a deep neural network, did not use a control sample? How can authors prove the static stability of the quality indicators demonstrated by their offspring in a real setting? Why, finally, did they not indicate the place of work at the beginning of the article?

Author Response

First of all, we thank the reviewer for his/her suggestions for improving the paper. We explain below how each of them has been addressed: 

Comment

Changes in the article

I did not find “browser drive-by exploits” among the current world threats. I ask the authors to justify the relevance of the search for these particular vulnerabilities. I also ask you to clearly classify these vulnerabilities based on open sources.

We have added a description of the polyglots attacks and how are performed. Besides, we have added examples of important attacks during 2020 and 2021 based on browsers exploits.

I ask you to prove the effectiveness of the author's approach in active confrontation. Moreover, from lines 202-208 it turns out that the authors themselves formed a dataset to test their offspring. It's not sports.

We have added more images based on other steganography techniques with random infection based on LSB (fermat and Fibonacci) and F5 algorithm. We have added another dataset from IOWA University composed by real steganography F5 images.

We have described in more detail how we perform the train and validation and  the dataset for the training is completely different from the validation dataset to ensure the accuracy of the calculated performance metrics.

[Regarding rules for a competent use of LSB] I ask the authors to show that their offspring can withstand my experience. At the same time, there will be more figures in the article.

We know the effectiveness of the Statistics techniques . We have reviewed the provided github, but we are not able to make it work. However, we have reviewed the code and it is based in the analysis of the different layers and the visual attacks.

We have added an article Jin(2022) where is described how a CNN is able to create a deep math to detect the special features to classify faces . Based on the same idea, we believe that the network is able to learn features that for the visual attacks or statisticals methods could be missed.

For verifying the effectiveness of the model, We have added F5 images, since it is more robust to this type of steganalysis than the LSB and the model is able to detect. In the section of Experimental Setup 3.2, it is described how we are included a new dataset

 

I ask you to analytically substantiate the consistency of the given in Fig. 1  architecture. Why is she like this?

 

Why hasn't it been optimized?

 

Why did the authors, when solving the recognition problem, not calculate the probabilities of errors of the first and second kind?

 

Why didn't they build ROCK curves?

 

Why, using a deep neural network, did not use a control sample?

 

How can authors prove the static stability of the quality indicators demonstrated by their offspring in a real setting?

 

 

 

 

Why, finally, did they not indicate the place of work at the beginning of the article?

We have added the reason for using a CNN, the reason for not adding a massive preprocessing step , the optimization method performed and the validation accuracy and ROC to explain the performance.

 

 

 

 

 

 

 

We have rewritten the section where we explain how the training and validation are performed to clarify that the training dataset and the validation dataset are completely different and how have been built them. We also have added the ROC curves.

 

We added the performance metrics graphs for each experiment to show the reason for moving to another approach.

 

We have followed the paper template and we haven’t seen any place for place of work. Anyway, the place of work is the University of Alcalá, Spain and the domain is uah.es, where all of our emails belong.

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

The author should carefully revise the paper according to the following comments.

1. The reason for adopting the neural network structure shown in Figure 1 should be described in depth.

2. The technical details of the paper should be supplemented, including the training method of the deep learning model.

3. Training process should be shown in the paper.

4. More experiments and comparison should be performed.

5. Whether the experimental settings of the model are the same in different databases

6. Following papers should be cited in the paper. “Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process,” IEEE Access, vol. 6, pp. 15844-15869, 2018. “Cloud shape classification system based on multi-channel cnn and improved fdm.” IEEE Access 8 (2020): 44111-44124. “Deep Facial Diagnosis: Deep Transfer Learning From Face Recognition to Facial Diagnosis,” in IEEE Access, vol. 8, pp. 123649-123661, 2020. “Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking.” Medical Image Analysis 71 (2021): 102048. “Pseudo RGB-Face Recognition,” in IEEE Sensors Journal, 2022, doi: 10.1109/JSEN.2022.3197235.

Author Response

First of all, we thank the reviewer for his/her suggestions for improving the paper. We explain below how each of them has been addressed: 

Comment

Responses & Changes in the article

The reason for adopting the neural network structure shown in Figure 1 should be described in depth.

We have rewritten the different sections of the article and we have included a section where we describe the CNN and the reasons to choose this type of NN (3.1. Description of the approach). Besides, We have added the proposed articles where explains the reason for using a CNN for classification and how they are able to create a deep feature maps.

Besides, We have added an article (Wu(2020)), where explains how the redundant filters exploits the performance of CNN and based on the article, the reason for only having a rescaling step for the preprocessing.

The technical details of the paper should be supplemented, including the training method of the deep learning model.

We have rewritten the article to clarify how the training and the validation have been performed. Trying to be clear we have added the figure 1 “Framework for blind steganalysis”, that the training dataset and the validation dataset are completely different.

Training process should be shown in the paper.

We have rewritten the section to clarify better the training process.

More experiments and comparison should be performed.

We have worked deeper in the model training and we have rewritten the experimental setup, adding new dataset based according to different steganalysis criteria. Previous works in steganalysis have been updated.

We have also obtained and reported more metrics to analysis results.

Whether the experimental settings of the model are the same in different databases

We have not changed the hyperparameters between the different models. There are always the same ones. We have included some clarifications about it in the description of the CNN in the section 3.1. Description of the approach.

Following papers should be cited in the paper. “Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process,” IEEE Access, vol. 6, pp. 15844-15869, 2018. “Cloud shape classification system based on multi-channel cnn and improved fdm.” IEEE Access 8 (2020): 44111-44124. “Deep Facial Diagnosis: Deep Transfer Learning From Face Recognition to Facial Diagnosis,” in IEEE Access, vol. 8, pp. 123649-123661, 2020. “Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking.” Medical Image Analysis 71 (2021): 102048. “Pseudo RGB-Face Recognition,” in IEEE Sensors Journal, 2022, doi: 10.1109/JSEN.2022.3197235

We have analyzed the references to improve the work and we have reported accordingly in the paper. We have also  added additional ones to try to justify the reason for using this type of NN.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript proposed a Deep Learning based framework for detecting stenography in images, accounting for the specifics of the situation in which the images and the malicious content are delivered using the least significant bits (LSB) technique. Evaluation on different datasets verified the effectiveness of the proposed framework.

 

Major comments:

1. Discussion of the evaluation results is not sufficient, e.g., the low accuracy of testing Coco Gray.

2. In evaluation chapter, there’s no comparison to other state-of-the-art methods. Also, it’s not discussed thoroughly why the proposed framework could achieve a better result compared to previous work.

3. The evaluation result does not include any discussion of training / testing time cost.

 

Minor comments:

1. Please review the manuscript to avoid all the grammar errors, some sentences are not easy to read, e.g., Line 155-157

 

Author Response

First of all, we thank the reviewer for his/her suggestions for improving the paper. We explain below how each of them has been addressed: 

Comment

Responses & Changes in the article

Discussion of the evaluation results is not sufficient, e.g., the low accuracy of testing Coco Gray.

We have included figure 4, which shows validation accuracy and accuracy for each epoch. As it can be observed, it is a horizontal line, so it seems like there is no learning during the training.

In evaluation chapter, there’s no comparison to other state-of-the-art methods. Also, it’s not discussed thoroughly why the proposed framework could achieve a better result compared to previous work.

We have rewritten the section 2.4.  We explain in more detail the differences between previous proposals and our proposal. The most similar one is Reinel(2021) and due to that, we explain in detail the changes between his proposal and ours and the improvements that our work brings. We also added the main difference with previous steganalysis CNN models and based on Wu(2020), trying to better explain the cause of our model performance.

The evaluation result does not include any discussion of training / testing time cost.

We have rewritten the section 3 describing in more detail the framework that we have used for the training and test and we have also added the Figure 1 Framework of Blind Steganalysis to clarify how the train dataset and the validation dataset are completely different and the validation dataset is not used for the training, only for the calculation of the performance of the model (validation accuracy and ROC)

Please review the manuscript to avoid all the grammar errors, some sentences are not easy to read, e.g., Line 155-157

We get rid of these lines,  and rewritten the section.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I formulated the following basic recommendations for the basic version of the article:

1. I did not find “browser drive-by exploits” among the current world threats. I ask the authors to justify the relevance of the search for these particular vulnerabilities. I also ask you to clearly classify these vulnerabilities based on open sources.

2. I ask you to prove the effectiveness of the author's approach in active confrontation. Moreover, from lines 202-208 it turns out that the authors themselves formed a dataset to test their offspring. It's not sports.

3. [Regarding rules for a competent use of LSB] I ask the authors to show that their offspring can withstand my experience. At the same time, there will be more figures in the article.

4. I ask you to analytically substantiate the consistency of the given in Fig. 1 architecture. Why is she like this?

Why hasn't it been optimized?

Why did the authors, when solving the recognition problem, not calculate the probabilities of errors of the first and second kind?

Why didn't they build ROCK curves?

Why, using a deep neural network, did not use a control sample?

How can authors prove the static stability of the quality indicators demonstrated by their offspring in a real setting?

Why, finally, did they not indicate the place of work at the beginning of the article?

The authors consistently answered all of them. I liked the compact and informative style of their answers. I recommend the updated version of the article for publication and wish the authors further creative success.

Author Response

Thanks a lot! I have checked the grammar errors and the size of the images.

Reviewer 2 Report

The results of the paper are impressive. The overall writing and organization has been acceptable. Therefore, this paper is recommended for publication.

Author Response

Thanks a lot for the review! I have rewritten the paper with the review of the grammar and I have resized the figure 4 as suggested. 

Reviewer 3 Report

1. Please double check all the figures, most of them are too small, and the two subfigures of Figure 4 are too close to each other.

2. It’s better to also mention whether the code of the proposed method will be open source.

Author Response

Thanks a lot for the review! I have rewritten the paper with the review of the grammar and I have resized the figure 4 as suggested.

Back to TopTop