Next Article in Journal
Emotion Detection Using Facial Expression Involving Occlusions and Tilt
Next Article in Special Issue
Biomedical Text NER Tagging Tool with Web Interface for Generating BERT-Based Fine-Tuning Dataset
Previous Article in Journal
Study on the Removal of Iron and Manganese from Groundwater Using Modified Manganese Sand Based on Response Surface Methodology
 
 
Article
Peer-Review Record

Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network

Appl. Sci. 2022, 12(22), 11795; https://doi.org/10.3390/app122211795
by Sergii Babichev 1,2,*,†,‡, Lyudmyla Yasinska-Damri 3,‡, Igor Liakh 4,‡ and Jiří Škvor 1,†,‡
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(22), 11795; https://doi.org/10.3390/app122211795
Submission received: 6 November 2022 / Revised: 14 November 2022 / Accepted: 18 November 2022 / Published: 20 November 2022
(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis)

Round 1

Reviewer 1 Report

This manuscript proposed a novel hybridised method for differently and co-expressed gene expression profiles extraction, where clustering technique and convolutional neural networks were employed for the task of interest. To validate the performance of the proposed, a comprehensive open dataset GSE19188 was used, with satisfactory results. Overall, the topic of this research is interesting, and the manuscript was well organised and written. The detailed comments are provided as follows.

1.       The contribution and innovation of the manuscript should be clarified clearly in abstract and introduction.

2.       Broaden and update literature review on convolution neural networks or deep learning in engineering applications. E.g. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network; Vision-based concrete crack detection using a hybrid framework considering noise effect

3.       In general, the performance of CNN model is heavily dependent on the setting of hyperparameters. How did the authors set network hyperparameters in this research to achieve the optimal prediction performance?

4.       Please give the evaluation metrics of the proposed method in detail.

5.       A comparative study is suggested to be included via the comparison with other similar methods.

 

6.       More future research should be included in conclusion part.

Author Response

This manuscript proposed a novel hybridised method for differently and co-expressed gene expression profiles extraction, where clustering technique and convolutional neural networks were employed for the task of interest. To validate the performance of the proposed, a comprehensive open dataset GSE19188 was used, with satisfactory results. Overall, the topic of this research is interesting, and the manuscript was well organised and written. The detailed comments are provided as follows.

In the beginning, we would like to thank the reviewer for both the positive evaluation and the presented remarks, which undoubtedly contribute the improving the manuscript. In the revised version of the manuscript, we have tried to consider all remarks maximally.

  1. The contribution and innovation of the manuscript should be clarified clearly in abstract and introduction.

Thanks for the remark. We have added the main contribution of the research at the end of both the abstract and introduction:

To our mind, the proposed hybrid inductive model allows us to increase objectivity during the formation of the subsets of differently and co-expressed gene expression profiles for further their application in various disease diagnosis systems and for gene regulatory network reconstruction.

  1. Broaden and update literature review on convolution neural networks or deep learning in engineering applications. E.g. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network; Vision-based concrete crack detection using a hybrid framework considering noise effect

The questions regarding the application of both convolution neural networks and deep learning techniques in various fields of scientific research are presented in [21.22]. So, in [21], the authors considered the application of a 2D convolutional neural network in the torsional capacity evaluation of reinforced concrete beams. The model of diagnosing surface cracks of concrete structures based on CNN application has been considered in [22].  In these research, the authors have shown the performance of the CNN for the solution of complex problems. 

  1. In general, the performance of CNN model is heavily dependent on the setting of hyperparameters. How did the authors set network hyperparameters in this research to achieve the optimal prediction performance?

Thanks for the question. This research is a continuation of the previous of our research [30], where we considered the various types of CNN with various combinations of hyperparameters to the classification of objects, the attributes of which are the gene expression data. In this research, we have also considered the CNN stability level to the noise component. The results of the research have shown the better performance of 1D two-layer CNN, where 32 filters and kernel size 8 have been used. Maximal pooling, in this case, was 2. This structure of CNN we used within the current research. Of course, the sizes of filters were adapted considering the length of the input vector of gene expressions. These filter sizes we presented in the experimental part. Considering this remark, we added in the manuscript the following information.

 

The 1D two-layer convolutional neural network was used as the classifier within the framework of our research. This choice is determined by our previous results, presented in [30]. In this research, we studied various CNN topologies with various combinations of hyperparameters to classify samples, the attributes of which are gene expressions. The result of the research has shown the better performance of 1D two-layer CNN, where 32 filters and kernel size 8 have been used. Maximal pooling, in this case, was 2. This structure of CNN we used within the framework of the classification procedure implementation. The size of the used filters was adapted considering the length of the gene expression vector during the simulation procedure implementation.  

  1. Please give the evaluation metrics of the proposed method in detail.

Thanks for the remarks. We have added detailed information about the applied hybrid proximity metric of gene expression profiles.

The used hybrid proximity metric takes combines various methods of Shannon entropy calculation when the mutual information of gene expression profiles is evaluated and the Pearson's chi-squared test. As has been shown in [2], the objects classification results differ when the various GEP proximity metrics are applied during the mutually correlated gene expression data formation. In order to increase the objectivity of the distance between GEP evaluation, in this research, the authors proposed the stepwise procedure of proximity metric formation. In the first step, the mutual information based on the use of various methods of Shannon entropy and Pearson's chi-squared criterion values are evaluated for appropriate pair of GEPs.  Then, the Harrington desirability function is applied to form the general proximity metric. A larger value of the general proximity metric, in this case, corresponds to a larger proximity of this pair of GEPs.

  1. A comparative study is suggested to be included via the comparison with other similar methods.

 Thanks for the remarks. We have added the comparison analysis at the end of the Discussion subsection.

The comparison analysis with other research in this subject area [1] allows us to conclude about the performance of the proposed technique. So, in most cases presented in the review [1], high classification accuracy is achieved when using a small number of genes.  The proposed method allows us to form the subsets of differently and co-expressed gene expression profiles, which contribute to the high value of the investigated samples classification accuracy. Moreover, the allocated subsets of GEPs can be used at the next step in the hybrid model of disease diagnosis, such as various types of cancer, Alzheimer, Parkinson, etc, in order to take a more objective solution regarding the state of the patient using the object classification results obtained on various subsets of the allocated genes.  

  1. More future research should be included in conclusion part.

Thanks for the remark. I have added this information.

  The further perspectives of the authors' research are the application of the proposed technique within the hybrid model of various disease diagnostics and when gene regulatory network reconstruction on the basis of allocated subsets of differently and co-expressed genes.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript is devoted to solving the problem of the formation of subsets of differently and co-expressed gene expression profiles, which can be used in the next stage for both creations of the various disease diagnosis systems or gene regulatory network reconstruction based on the allocated genes. The topic is interesting and actual. The manuscript is well structured; it contains all the necessary sections for this type of publication. However, to my mind, there are some shortcomings which should be corrected before the manuscript acceptance. Below, I present my remarks.

 

1. The manuscript will look better if, at the end of the section Literature Review, allocate the unsolved part of the general problem.

2. Please, correct the indexes in formulas 4-6 (max_expr).  

3. To my mind, it will be better to present short information regarding the hybrid metric of mutual information used within the framework of the authors' research.

4. At the end of section 3, please add the information concerning the used CNN. This information is absent.

The research results are presented correctly, and they are interesting. To my mind, the manuscript can be accepted after minor revision.

 

Author Response

The manuscript is devoted to solving the problem of the formation of subsets of differently and co-expressed gene expression profiles, which can be used in the next stage for both creations of the various disease diagnosis systems or gene regulatory network reconstruction based on the allocated genes. The topic is interesting and actual. The manuscript is well structured; it contains all the necessary sections for this type of publication. However, to my mind, there are some shortcomings which should be corrected before the manuscript acceptance. Below, I present my remarks.

 In the beginning, we would like to thank the reviewer for both the positive evaluation and the presented remarks, which undoubtedly contribute the improving the manuscript. In the revised version of the manuscript, we have tried to consider all remarks maximally.

  1. The manuscript will look better if, at the end of the section Literature Review, allocate the unsolved part of the general problem.

Thanks for the remark. We have added this information.

The unsolved part of the general problem is the absence of objective techniques to allocate large numbers of differently and co-expressed gene expression profiles and which contribute to the high accuracy of the studied object's classification.

  1. Please, correct the indexes in formulas 4-6 (max_expr).  

Thanks for the remark. We have corrected the indexes in these formulas.

  1. To my mind, it will be better to present short information regarding the hybrid metric of mutual information used within the framework of the authors' research.

Thanks for the remarks. We have added information about the applied hybrid proximity metric of gene expression profiles.

The used hybrid proximity metric takes combines various methods of Shannon entropy calculation when the mutual information of gene expression profiles is evaluated and the Pearson's chi-squared test. As has been shown in [2], the objects classification results differ when the various GEP proximity metrics are applied during the mutually correlated gene expression data formation. In order to increase the objectivity of the distance between GEP evaluation, in this research, the authors proposed the stepwise procedure of proximity metric formation. In the first step, the mutual information based on the use of various methods of Shannon entropy and Pearson's chi-squared criterion values are evaluated for appropriate pair of GEPs.  Then, the Harrington desirability function is applied to form the general proximity metric. A larger value of the general proximity metric, in this case, corresponds to a larger proximity of this pair of GEPs.

 

 

  1. At the end of section 3, please add the information concerning the used CNN. This information is absent.

 

Thanks for the question. This research is a continuation of the previous of our research [30], where we considered the various types of CNN with various combinations of hyperparameters to the classification of objects, the attributes of which are the gene expression data. In this research, we have also considered the CNN stability level to the noise component. The results of the research have shown the better performance of 1D two-layer CNN, where 32 filters and kernel size 8 have been used. Maximal pooling, in this case, was 2. This structure of CNN we used within the current research. Of course, the sizes of filters were adapted considering the length of the input vector of gene expressions. These filter sizes we presented in the experimental part. Considering this remark, we added in the manuscript the following information.

 

The 1D two-layer convolutional neural network was used as the classifier within the framework of our research. This choice is determined by our previous results, presented in [30]. In this research, we studied various CNN topologies with various combinations of hyperparameters to classify samples, the attributes of which are gene expressions. The result of the research has shown the better performance of 1D two-layer CNN, where 32 filters and kernel size 8 have been used. Maximal pooling, in this case, was 2. This structure of CNN we used within the framework of the classification procedure implementation. The size of the used filters was adapted considering the length of the gene expression vector during the simulation procedure implementation.  

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

All the technical issues have been well addressed by the authors. I do not have further comments.

Back to TopTop