Next Article in Journal
A Neural Topic Modeling Study Integrating SBERT and Data Augmentation
Previous Article in Journal
Parameter Analysis on Seismic Response of Long Lined Tunnel by 2.5D Substructure Method
 
 
Article
Peer-Review Record

Computer Aided Classifier of Colorectal Cancer on Histopatological Whole Slide Images Analyzing Deep Learning Architecture Parameters

Appl. Sci. 2023, 13(7), 4594; https://doi.org/10.3390/app13074594
by Elena Martínez-Fernandez 1, Ignacio Rojas-Valenzuela 1, Olga Valenzuela 2,* and Ignacio Rojas 1
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Appl. Sci. 2023, 13(7), 4594; https://doi.org/10.3390/app13074594
Submission received: 17 February 2023 / Revised: 17 March 2023 / Accepted: 29 March 2023 / Published: 5 April 2023

Round 1

Reviewer 1 Report

In this article, a study was conducted to detect colorectal cancer in histopathological images. For this purpose, the classification process was carried out by making various improvements in the VGG19 method. The innovative aspect of the study is limited. With the improvements and the best parameters brought to the VGG19 model, 96.4% classification success was achieved.

Strengths

-> The proposed method was compared with other studies and very successful results were obtained (In table 5).

-> The findings are effectively supported by graphs, tables, and diagrams and are easy to understand.

 

Some improvements should be made to the article.

-> The superiority of the proposed method over other methods should be clearly stated.

-> The abstract must be make strong. Abstract should be reviewed again.

-> The Conclusions should be reviewed again. The original aspect of the study and its difference from

other studies should be clearly explained. (The conclusion should be explored better and it needs to

contemplate the eventual restrictions of the developed technique to address future works in this area.)

-> The proposed method in Table 5 is compared with other methods. However, detailed information about the table should be presented. It should be explained why better results are obtained than other methods.

-> Table 5 should be placed in the experimental results section instead of the results section.

Author Response

We thank the reviewer 1 for th constructive comments, that have undoubtedly allowed us to improve the paper.. We have addressed all of them and modified the paper accordingly.  Our detailed answers follow:

 

REVIEWER 1

 

Comments and Suggestions for Authors

In this article, a study was conducted to detect colorectal cancer in histopathological images. For this purpose, the classification process was carried out by making various improvements in the VGG19 method. The innovative aspect of the study is limited. With the improvements and the best parameters brought to the VGG19 model, 96.4% classification success was achieved.

Strengths

-> The proposed method was compared with other studies and very successful results were obtained (In table 5).

-> The findings are effectively supported by graphs, tables, and diagrams and are easy to understand.

 Some improvements should be made to the article.

Reviewer 1. P1: The superiority of the proposed method over other methods should be clearly stated.

Reply: Thank you very much for your comments. We agree with the reviewer on this matter. It is indeed very important to know the characteristics and advantages of the proposed method and compare it with other methods and papers that have appeared in the bibliography. Throughout the manuscript, the importance of a detailed analysis of certain hyperparameters of a deep learning system has been highlighted (learning rate is very relevant), since they have a great impact on the accuracy of the system. Finally, a new section has been added to analyze in more detail the results obtained with our method and the results obtained with other methods presented in the bibliography. The new section is “5.6. Comparison with other methodologies

Reviewer 1. P2: The abstract must be make strong. Abstract should be reviewed again.

Reply: Thank you very much for your suggestion. We fully agree and for this reason a new abstract has been written, which highlights in more detail the proposed methodology and the results of the work carried out. The new abstract is:

“Diagnosis of different pathologies and stages of cancer using whole histopathology slide images (WSI) is the gold standard for determining the degree of tissue metastasis. The use of deep learning systems in the field of medical images, especially histopathology images, is becoming increasingly important.

Training and optimization of deep neural network models involve fine-tuning parameters and hyperparameters such as learning rate, batch size (BS), and boost to improve the performance of the model in task-specific applications. Tuning hyperparameters is a major challenge in designing a deep neural network models, having a large impact on the performance.

 This paper analyzes how the parameters and hyperparameters of a deep learning architecture affect the classification of colorectal cancer (CRC) histopathology images, using the well-known VGG19 model. This paper also discusses the pre-processing of these images, such as the use of color normalization and stretching transformations on the data set. Among these hyperparameters, the most important neural network hyperparameter is the learning rate (LR). In this paper, different strategies for the optimization of LR are analyzed (both static and dynamic) and a new experiment based on the variation of LR is proposed (the relevance of dynamic strategies over fixed LR is highlighted), after each layer of the neural network together with decreasing variations according to the epochs.

The results obtained are very remarkable, obtaining in the simulation an accurate system that achieves 96.4% accuracy on test images (for nine different tissue classes) using the triangular-cyclic learning rate.”

Reviewer 1. P3: The Conclusions should be reviewed again. The original aspect of the study and its difference from other studies should be clearly explained. (The conclusion should be explored better and it needs to contemplate the eventual restrictions of the developed technique to address future works in this area.).

Reply: Thank you very much again for your suggestion, because in fact, abstract and conclusion have been rewritten and extensively improved, taking into account your comments. The new Conclusions are:

“This paper systematically analyzes various parameters, hyperparameters and methods for training and optimizing deep learning systems for multiclass classification. The dataset used for training and testing includes various healthy tissues and colorectal cancer.It is important to note that gradient descent is widely used in large-scale optimization problems in machine learning; in particular, it plays an important role in computing and tuning the connection weights of deep learning models. Gradient-based optimization methods have hyperparameters that require infinite possibilities for configuration. Determining the values and optimization methodology of the hyperparameters of a deep-learning system is currently a challenge that is important for the behavior and precision of the system. Moreover, these hyperparameters also affect the computation time and cost.In this paper, the performance of a deep learning model based on the well-known VGG19 structure was evaluated, using three different methods for its training: learning from scratch (i.e., all parameters composing the different levels of the neural network, including the CNN levels, are tuned thanks to the learning phase), transfer learning (using a VGG19 system previously optimized in other classification problems, all parameters are optimized/tuned with the images of the new problem), and transfer learning associated with frozen layers (only a subset of the parameters belonging to the last layers is optimized).

It was analyzed (obtaining different error metrics) how these different strategies have a different behavior in the time necessary for the training of the neural system and to the accuracy of the system. The system that requires the most time is learning from scratch. The system that learns the fastest (less than half the time for learning from scratch) is transfer learning + frozen layers (with a total of 50\% of the original layers frozen or not modified). In terms of precision, transfer learning produced the best results. Therefore, this strategy was used for the following analysis in this article: the effect of learning rate.

The learning rate is one of the most important hyperparameters in a neural network and, of course, in deep learning models.

In this article, various strategies for LR optimisation, both static and dynamic, have been analysed in depth. Different dynamic cyclic functions (triangular, triangular drop, exponential, sinusoidal) were used to test the performance of our model for classification of histopathological images. Four different frameworks for LR decay were also used: Decay 1 (decay after a complete epoch), step decay, polynomial decay and piecewise constant decay. There is no significant impact of using different methods of scheduler variation of LR in terms of computation time required, but from the point of view of accuracy, the best method is polynomial decay.

Finally, discriminative fine-tuning is also analysed as a novel technique proposed in this paper in conjunction with dynamic LR strategies. Discriminative fine-tuning allows tuning layers of the Deep Learning model with different learning rates.

The results obtained are very remarkable since in the simulation an accurate system that achieves an accuracy of 96.4\% and a value close to 1 for the AUC in test images is obtained (for nine different tissue classes), using the triangular-cyclic learning rate.

As future work, the LR strategy and deep learning model, can be trained and tested on other cancer data-sets for classification, however, taking into account the possible adaptation and restrictions of the new problem.”

Reviewer 1. P4:  The proposed method in Table 5 is compared with other methods. However, detailed information about the table should be presented. It should be explained why better results are obtained than other methods. Table 5 should be placed in the experimental results section instead of the results section.

Reply: Thank you very much for your comment. Your comment is very interesting because the detailed information about the different methods compared in Table 5 is helpful for the potential reader. For this reason, a new section has been created: "5.6. Comparison with other methods" (this was commented on question P1), which contains the following new information:

“Table 5 summarizes the relevant methods that appear in the bibliography, which use deep learning models for multiclass classification. In table 5 there are strategies for dynamic modification of the learning rate, such as cyclical learning rates [Smith et al.[10], polynomial learning rate (Purnendu et al [7]) or dynamic learning rates [Anil et al [9]). There are also static learning rate methods, such as Anil et al[8], and methodologies based on transfer learning (such as that of Alinsaif et al.[14]). The proposed method achieves high accuracy in test images by performing an in-depth study of the optimal strategy for the learning rate.We believe that the proposed methodology provides excellent accuracy results through a detailed and comprehensive study of the optimal strategy for the learning rate.”

As we completely agree that it is necessary to detail information about the different methods compared, and the proposed methodology, in section: “2. Relate work”, a more detailed analysis is made of some of the methods presented in table 5 (the most prominent and cited from the bibliography). The new information in section 2 is:

“Kather et al. [11] presented a new dataset of 5,000 histological images of human colorectal cancer that includes eight different tissue types. Ten anonymized H&E-stained CRC tissue slides were obtained from the pathology archive at the University Medical Center Mannheim ( Heidelberg University, Mannheim, Germany). Contiguous tissue areas were manually labelled and tesselated, resulting in 625 non-overlapping tissue tiles of size 150 × 150 pixels. The following eight tissue types were selected for analysis: Tumour epithelium, simple stroma, complex stroma, immune cells, debris, normal mucosal glands, adipose tissue, and background (the resulting 625 × 8 = 5000 images together formed the training and testing set for the classification problem). The authors used four classification strategies (1-nearest neighbour, linear SVM, radial- basis function SVM and decision trees ) and found that the radial basis function (rbf) support vector machine (SVM) performed the best (87.4% accuracy for multiclass tissue separation).

Ciompi et al. [12] proposed a CRC tissue classification system based on convolutional nets (ConvNets). They used data from two different sources, namely a cohort of images of whole slides of rectal cancer samples (a set of 74 histological slides from 74 patients), and a dataset of colorectal cancer images and patches (from 10 patients, using 5000 patches of 150x150 pixel, the dataset of [11]). They investigated the importance of staining normalisation (applying staining normalisation to training and test data removes most sources of variability due to staining from the equation) in classifying CRC tissues in H&E stained images and achieved 79.7% accuracy.

Bianconi et al. [13] used several data sets for experimental analysis of a novel method called IOCLBP, which is based on a simple-to-implement yet highly discriminative local descriptor for color images. The authors have demonstrated the superiority of IOCLBP alone and/or in combination with LCC over related methods (LBP variants) for the classification of binary and multiclass problems. One such problem is the so-called Epistroma: histological images of colorectal cancer from 643 patients admitted to Helsinki University Central Hospital, Finland, between 1989 and 1998. The tissue samples were stained with diaminobenzidine and hematoxylin and divided into two classes: Epithelium (825 samples) and Stroma (551 samples). The size of the images varied from 172×172 pixels to 2372x2372 pixels. In this binary classification problem of colorectal cancer, the accuracy achieved was 93.4%.

Alinsaif et al. [14] used different deep learning models (SqueezeNet, MobileNet, ResNet, and DenseNet) to present two different approaches: 1) generating features from pre-trained models (i.e., without fine-tuning); 2) fine-tuning the CNN from pre-trained models. The second approach was effective and provided better classification results. When training an SVM on deep features, the authors applied ILFS to obtain a reduced subspace of features while achieving high accuracy. The authors used different problems or datasets to analyze the results, including the data from Kather et al. [11] and that from Epistroma (previously discussed in Bianconi et al. [13]). Based on SVM classification using the best-scoring deep features from different pre-trained models, the best results obtained for the Kather problem, with 1000 features, was an accuracy of 95.4\% and an AUC of 0.906. For the Epistroma problem (simpler since it is biclass ), an accuracy of 99.06\% and an AUC of 0.997 were obtained with 250 features, using DenseNet in both datasets.

However, neither study worked with a data set of 9 different CRC tissue classes (presented in [15] and used in this paper).”

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Overall paper is interesting in scope and highlights how the Pathological basis of disease links to clinical treatment; however I am highly concerned about the overall drafting of the paper. The abstract and introduction describe the role of VGG19 and how it connects to CRC histopathology, and describe an accuracy of 96.4% using a known library. There are also multi-typos throughout, confusing CCR and CRC, with a conclusion that relates to lymphoma classification rather than CRC diagnosis and classification. I would recommend a thorough re-write of this paper involving either a Gastroenterologist or a Colorectal Surgeon or Pathologist as well as formal review of the English language in the text, rewriting in a scientific journal format in more formal language. Otherwise, I do see that this paper could add to the existing quality of available body of literature.

Author Response

We thank the reviewer 2 for their constructive comments. We have addressed all of them and modified the paper accordingly.  Our detailed answers follow:

 

REVIEWER 2

 

Reviewer 2. P1: Overall paper is interesting in scope and highlights how the Pathological basis of disease links to clinical treatment; however I am highly concerned about the overall drafting of the paper. The abstract and introduction describe the role of VGG19 and how it connects to CRC histopathology, and describe an accuracy of 96.4% using a known library.

Reply: Thank you very much for your suggestion. We fully agree and for this reason a new abstract has been written, which highlights in more detail the proposed methodology and the results of the work carried out. The new abstract is:

“Diagnosis of different pathologies and stages of cancer using whole histopathology slide images (WSI) is the gold standard for determining the degree of tissue metastasis. The use of deep learning systems in the field of medical images, especially histopathology images, is becoming increasingly important.

Training and optimization of deep neural network models involve fine-tuning parameters and hyperparameters such as learning rate, batch size (BS), and boost to improve the performance of the model in task-specific applications. Tuning hyperparameters is a major challenge in designing a deep neural network models, having a large impact on the performance.

 This paper analyzes how the parameters and hyperparameters of a deep learning architecture affect the classification of colorectal cancer (CRC) histopathology images, using the well-known VGG19 model. This paper also discusses the pre-processing of these images, such as the use of color normalization and stretching transformations on the data set. Among these hyperparameters, the most important neural network hyperparameter is the learning rate (LR). In this paper, different strategies for the optimization of LR are analyzed (both static and dynamic) and a new experiment based on the variation of LR is proposed (the relevance of dynamic strategies over fixed LR is highlighted), after each layer of the neural network together with decreasing variations according to the epochs.

The results obtained are very remarkable, obtaining in the simulation an accurate system that achieves 96.4\% accuracy on test images (for nine different tissue classes) using the triangular-cyclic learning rate.”

 

The sections:  “1. Introduction” and “2.Related work” have been modified and improved. In particular, more information on the compared methods has been included in section 2.

 

Reviewer 2. P2: There are also multi-typos throughout, confusing CCR and CRC, with a conclusion that relates to lymphoma classification rather than CRC diagnosis and classification. I would recommend a thorough re-write of this paper involving either a Gastroenterologist or a Colorectal Surgeon or Pathologist as well as formal review of the English language in the text, rewriting in a scientific journal format in more formal language.

Reply: Thank you very much for your comments and sorry for the mistake (confusing CCR and CRC). We have carried out a thorough review of the entire manuscript, and we have tried to correct the typos found and improve all sections of the manuscript. On the conclusions of the work, we have made a new writing of this section. Indeed, we believe that the conclusions should be improved, and now the conclusions are:

“This paper systematically analyzes various parameters, hyperparameters and methods for training and optimizing deep learning systems for multiclass classification. The dataset used for training and testing includes various healthy tissues and colorectal cancer.It is important to note that gradient descent is widely used in large-scale optimization problems in machine learning; in particular, it plays an important role in computing and tuning the connection weights of deep learning models. Gradient-based optimization methods have hyperparameters that require infinite possibilities for configuration. Determining the values and optimization methodology of the hyperparameters of a deep-learning system is currently a challenge that is important for the behavior and precision of the system. Moreover, these hyperparameters also affect the computation time and cost.In this paper, the performance of a deep learning model based on the well-known VGG19 structure was evaluated, using three different methods for its training: learning from scratch (i.e., all parameters composing the different levels of the neural network, including the CNN levels, are tuned thanks to the learning phase), transfer learning (using a VGG19 system previously optimized in other classification problems, all parameters are optimized/tuned with the images of the new problem), and transfer learning associated with frozen layers (only a subset of the parameters belonging to the last layers is optimized).

It was analyzed (obtaining different error metrics) how these different strategies have a different behavior in the time necessary for the training of the neural system and to the accuracy of the system. The system that requires the most time is learning from scratch. The system that learns the fastest (less than half the time for learning from scratch) is transfer learning + frozen layers (with a total of 50\% of the original layers frozen or not modified). In terms of precision, transfer learning produced the best results. Therefore, this strategy was used for the following analysis in this article: the effect of learning rate.

The learning rate is one of the most important hyperparameters in a neural network and, of course, in deep learning models.

In this article, various strategies for LR optimisation, both static and dynamic, have been analysed in depth. Different dynamic cyclic functions (triangular, triangular drop, exponential, sinusoidal) were used to test the performance of our model for classification of histopathological images. Four different frameworks for LR decay were also used: Decay 1 (decay after a complete epoch), step decay, polynomial decay and piecewise constant decay. There is no significant impact of using different methods of scheduler variation of LR in terms of computation time required, but from the point of view of accuracy, the best method is polynomial decay.

Finally, discriminative fine-tuning is also analysed as a novel technique proposed in this paper in conjunction with dynamic LR strategies. Discriminative fine-tuning allows tuning layers of the Deep Learning model with different learning rates.

The results obtained are very remarkable since in the simulation an accurate system that achieves an accuracy of 96.4\% and a value close to 1 for the AUC in test images is obtained (for nine different tissue classes), using the triangular-cyclic learning rate.

As future work, the LR strategy and deep learning model, can be trained and tested on other cancer data-sets for classification, however, taking into account the possible adaptation and restrictions of the new problem.”

Reviewer 2. P3: Otherwise, I do see that this paper could add to the existing quality of available body of literature.

Reply: Thank you very much for your comments. We have tried to greatly improve the new version of the manuscript, which will have positive repercussions for the future reader's understanding of the paper.

Author Response File: Author Response.pdf

Reviewer 3 Report

The main comments about the work are: 

- the abstract is quite short and could benefit from more information         

- Figure 1 could be described in the text. Figures 1-3 are quite unnecessary

- An inclusion of the number of patients/instances in some way, evaluation methods, and models for the related work would be necessary when describing related work

- Text from lines 52-68 does not belong in the related work section in its current form, it is much more suited for the introduction section. Furthermore, the description of the related work, and what the proposed paper provides as a novelty is poorly written. In addition, Table 5 has no place in the conclusion, it could possibly be placed in the related work section or discussion, but either way, all papers mentioned in the table should be adequately described in the related work section.

- the database should be properly cited, please look at the instructions for citing that are available at the link you provided

- line 96, in what way are the classes balanced? And how was this done for the training and testing set?

- line 97 “figure” should start with a capital letter

- line 106, a “standard desktop workstation” is quite vague, if you want to be precise, you can give the CPU and RAM characteristics and that should be enough

- line 116 “Along” starts with a capital letter

- lines 118-124, the description and giving context for the VGG19 are fine, but there are many details that are redundant in this paragraph, for example, the description of the organization of images, while there is also a bold claim that the results obtained using VGG19 are generally optimal. Your proposed research focuses on discussing the influence of hyperparameters, and you should understand that different CNN architectures could also provide better results than the VGG19. Instead of stating that VGG19 is generally optimal, you could cite several papers where the VGG19 architecture was used and state in which research fields its implementation gave good results.

- The “Machine Learning Models” section is unnecessary, especially in its current form, as you only use one model, VGG19. You could create a single subsection that would include the description of the model, parameters, and hyperparameters.

-  The fixed number of epochs, without a validation set, using the SGD optimizer (and not trying any others, especially Adam which is quite common) and other (hyper)parameters that are fixed in this research raise serious doubts about the method in this paper. The entire section is difficult to follow, and it seems that only the learning rate and the schedulers are analyzed in this paper, which is fine, but considering the influence of other parameters and the arbitrary fixed number of epochs (instead of using early stopping) make the findings of this paper have a questionable contribution.

Author Response

We thank reviewer 3 for his constructive comments, which, thanks to this feedback, allow us to substantially improve the manuscript. We have addressed all of them and amended the document accordingly. Our detailed answers follow:

REVIEWER 3:

Comments and Suggestions for Authors

The main comments about the work are: 

Reviewer 3. P1: the abstract is quite short and could benefit from more information         

Reply: Thank you very much for your suggestion. We fully agree that the abstract in the first version was short and could have used more information. For this reason, a new abstract has been written, in which the proposed methodology and the results of the work carried out are presented in more detail. The new abstract is:

“Diagnosis of different pathologies and stages of cancer using whole histopathology slide images (WSI) is the gold standard for determining the degree of tissue metastasis. The use of deep learning systems in the field of medical images, especially histopathology images, is becoming increasingly important.

Training and optimization of deep neural network models involve fine-tuning parameters and hyperparameters such as learning rate, batch size (BS), and boost to improve the performance of the model in task-specific applications. Tuning hyperparameters is a major challenge in designing a deep neural network models, having a large impact on the performance.

 This paper analyzes how the parameters and hyperparameters of a deep learning architecture affect the classification of colorectal cancer (CRC) histopathology images, using the well-known VGG19 model. This paper also discusses the pre-processing of these images, such as the use of color normalization and stretching transformations on the data set. Among these hyperparameters, the most important neural network hyperparameter is the learning rate (LR). In this paper, different strategies for the optimization of LR are analyzed (both static and dynamic) and a new experiment based on the variation of LR is proposed (the relevance of dynamic strategies over fixed LR is highlighted), after each layer of the neural network together with decreasing variations according to the epochs.

The results obtained are very remarkable, obtaining in the simulation an accurate system that achieves 96.4% accuracy on test images (for nine different tissue classes) using the triangular-cyclic learning rate.”

Reviewer 3. P2: Figure 1 could be described in the text. Figures 1-3 are quite unnecessary.

Reply: Thank you very much for your comments. The Section: “1. Introduction”, has been modified. Now, figure 1 is described in more detail in the text:

“Figure 1 shows the evolution of the number of papers indexed on the Web of Science platform (previously known as Web of Knowledge, a platform that ​provides access to several databases of scientific journals and conference proceedings in the field of colorectal cancer and deep learning models.

Figure 2 shows an analysis of publications (2014-2022) by research area (a paper may be attributed to more than one research area). However, as can be seen from [3], there are far fewer studies that address the impact of changing hyperparameters in Deep Learning, although adjusting them can drastically change the results obtained.”

Regarding Figure 3, this figure shows a block diagram of the WSI classification for pathology with the VGG19 deep learning model for a multiclass problem used in this work. It may help the future reader. The text commenting on that figure is:

“Figure 3 shows a typical classification model pipeline, presenting common techniques such as data acquisition/split, image preprocessing, whole slide image (WSI) tiling, and evaluation with a test set for the multiclass problem presented in this paper.”

Reviewer 3. P3: An inclusion of the number of patients/instances in some way, evaluation methods, and models for the related work would be necessary when describing related work.

Reply: We appreciate this comment, because indeed the inclusion of more information f the number of patients/instances in some way and models about presented in the Section “2. Related work”  is important for reading the manuscript. For this reason, we have made a profound change and significantly improved Section 2, which now is:

“Kather et al. [11] presented a new dataset of 5,000 histological images of human colorectal cancer that includes eight different tissue types. Ten anonymized H&E-stained CRC tissue slides were obtained from the pathology archive at the University Medical Center Mannheim ( Heidelberg University, Mannheim, Germany). Contiguous tissue areas were manually labelled and tesselated, resulting in 625 non-overlapping tissue tiles of size 150 × 150 pixels. The following eight tissue types were selected for analysis: Tumour epithelium, simple stroma, complex stroma, immune cells, debris, normal mucosal glands, adipose tissue, and background (the resulting 625 × 8 = 5000 images together formed the training and testing set for the classification problem). The authors used four classification strategies (1-nearest neighbour, linear SVM, radial- basis function SVM and decision trees ) and found that the radial basis function (rbf) support vector machine (SVM) performed the best (87.4% accuracy for multiclass tissue separation).

Ciompi et al. [12] proposed a CRC tissue classification system based on convolutional nets (ConvNets). They used data from two different sources, namely a cohort of images of whole slides of rectal cancer samples (a set of 74 histological slides from 74 patients), and a dataset of colorectal cancer images and patches (from 10 patients, using 5000 patches of 150x150 pixel, the dataset of [11]). They investigated the importance of staining normalisation (applying staining normalisation to training and test data removes most sources of variability due to staining from the equation) in classifying CRC tissues in H&E stained images and achieved 79.7% accuracy.

Bianconi et al. [13] used several data sets for experimental analysis of a novel method called IOCLBP, which is based on a simple-to-implement yet highly discriminative local descriptor for color images. The authors have demonstrated the superiority of IOCLBP alone and/or in combination with LCC over related methods (LBP variants) for the classification of binary and multiclass problems. One such problem is the so-called Epistroma: histological images of colorectal cancer from 643 patients admitted to Helsinki University Central Hospital, Finland, between 1989 and 1998. The tissue samples were stained with diaminobenzidine and hematoxylin and divided into two classes: Epithelium (825 samples) and Stroma (551 samples). The size of the images varied from 172×172 pixels to 2372x2372 pixels. In this binary classification problem of colorectal cancer, the accuracy achieved was 93.4%.

Alinsaif et al. [14] used different deep learning models (SqueezeNet, MobileNet, ResNet, and DenseNet) to present two different approaches: 1) generating features from pre-trained models (i.e., without fine-tuning); 2) fine-tuning the CNN from pre-trained models. The second approach was effective and provided better classification results. When training an SVM on deep features, the authors applied ILFS to obtain a reduced subspace of features while achieving high accuracy. The authors used different problems or datasets to analyze the results, including the data from Kather et al. [11] and that from Epistroma (previously discussed in Bianconi et al. [13]). Based on SVM classification using the best-scoring deep features from different pre-trained models, the best results obtained for the Kather problem, with 1000 features, was an accuracy of 95.4\% and an AUC of 0.906. For the Epistroma problem (simpler since it is biclass ), an accuracy of 99.06\% and an AUC of 0.997 were obtained with 250 features, using DenseNet in both datasets.

However, neither study worked with a data set of 9 different CRC tissue classes (presented in [15] and used in this paper).”

Reviewer 3. P4: Text from lines 52-68 does not belong in the related work section in its current form, it is much more suited for the introduction section. Furthermore, the description of the related work, and what the proposed paper provides as a novelty is poorly written.

Reply: We appreciate this comment, and we totally agree with the idea that text from lines 52-68 does not belong in the related work section in its current form, and that it is better to put it in the introduction. The introduction has therefore been modified. Regarding the related work comment, we also fully agree with its improvement, as we have previously commented (Reply to Reviewer 3. P3). We have substantially improved the main objective of this paper in the manuscript, highlighting its novelty and the methodology used.

Reviewer 3. P5:  In addition, Table 5 has no place in the conclusion, it could possibly be placed in the related work section or discussion, but either way, all papers mentioned in the table should be adequately described in the related work section.

Reply: Thank you very much for your comments. We fully agree and believe that this is a proposal that will greatly contribute to improving the quality of work. The detailed information about the different methods compared in Table 5 is helpful for the potential reader. For this reason, a new section has been created: "5.6. Comparison with other methods", which contains the following new information:

“Table 5 summarizes the relevant methods that appear in the bibliography, which use deep learning models for multiclass classification. In table 5 there are strategies for dynamic modification of the learning rate, such as cyclical learning rates [Smith et al.[10], polynomial learning rate (Purnendu et al [7]) or dynamic learning rates [Anil et al [9]). There are also static learning rate methods, such as Anil et al[8], and methodologies based on transfer learning (such as that of Alinsaif et al.[14]). The proposed method achieves high accuracy in test images by performing an in-depth study of the optimal strategy for the learning rate.We believe that the proposed methodology provides excellent accuracy results through a detailed and comprehensive study of the optimal strategy for the learning rate.”

As we completely agree that it is necessary to detail information about the different methods compared, in section: “2. Relate work”, a more detailed analysis is made of some of the methods presented in table 5 (the most prominent and cited from the bibliography).

Reviewer 3. P6: The database should be properly cited, please look at the instructions for citing that are available at the link you provided

Reply: Thank you for this clarification and apologies for the error. We have made the citation correctly (using the bibliographic reference standard, which is consistent for all references in the paper)

Reviewer 3. P7: line 96, in what way are the classes balanced? And how was this done for the training and testing set?

Reply: Thank you very much for your comments. We have included in Section: “3. Materials and Dataset“ the following paragraph that explains it:

“For training the deep learning model, code was generated to balance all classes (mainly based on data augmentation of the class with the lower number of patches, more detail in Section 5.”

Reviewer 3. P8: line 97 “figure” should start with a capital letter

Reply: Thank you for pointing out this error. It has already been modified in the final manuscript.

Reviewer 3. P9: line 106, a “standard desktop workstation” is quite vague, if you want to be precise, you can give the CPU and RAM characteristics and that should be enough

Reply: Thank you very much for your comments. Indeed, more precise information on the characteristics of the computational system used is important, especially since the paper also indicates the computational time. Therefore, the following paragraph has been added to Section: “3. Materials and Dataset”:

“Neural network preprocessing, training and deployment was done in Python (version 3.9.0)  on one  workstation with CPU 11th Gen Intel(R) i7-11700K, 3,6GHz, with 128GB RAM and  Nvidia RTX30-60 GPU.”

 

Reviewer 3. P10: line 116 “Along” starts with a capital letter

Reply: Thank you for pointing out this error. This bug has been fixed in the new manuscript

Reviewer 3. P11: lines 118-124, the description and giving context for the VGG19 are fine, but there are many details that are redundant in this paragraph, for example, the description of the organization of images, while there is also a bold claim that the results obtained using VGG19 are generally optimal. Your proposed research focuses on discussing the influence of hyperparameters, and you should understand that different CNN architectures could also provide better results than the VGG19. Instead of stating that VGG19 is generally optimal, you could cite several papers where the VGG19 architecture was used and state in which research fields its implementation gave good results.

Reply: Thank you very much for your comments. We have modified Section 4 and therefore deleted what previously began as Subsection 4.1, which contained information about VGG19 and other deep learning models. We agree that it is relevant to cite several papers in which the VGG19 model has been successfully used. The following paragraph was added:

“4.1 VGG19: Parameters and Hyperparameters

The use of VGG19 has been very successfully applied in the literature for the classification of histopathological images [16,17] and specifically also for the analysis of colorectal cancer with digital pathology images [18].”

Reviewer 3. P12: The “Machine Learning Models” section is unnecessary, especially in its current form, as you only use one model, VGG19. You could create a single subsection that would include the description of the model, parameters, and hyperparameters.

Reply: Thank you very much for your suggestion. ​We have deleted the subsection " Machine Learning Models", and since we have focused only on VGG19 in this paper, we have created a single subsection that describes in detail how to modify the structure of the VGG19 system for the multiclass problem analyzed. and defines the parameters and hyperparameters used in the simulations. The new subsection is: “4.1. VGG19: Parameters and Hyperparameters”.

Reviewer 3. P13:  The fixed number of epochs, without a validation set, using the SGD optimizer (and not trying any others, especially Adam which is quite common) and other (hyper)parameters that are fixed in this research raise serious doubts about the method in this paper. The entire section is difficult to follow, and it seems that only the learning rate and the schedulers are analyzed in this paper, which is fine, but considering the influence of other parameters and the arbitrary fixed number of epochs (instead of using early stopping) make the findings of this paper have a questionable contribution.

Reply: Thank you very much for your comments. Three subsets of the main dataset are used in this paper: Training, Validation, and Test. The histopathological images of the training, validation, and test datasets were also randomly selected. This information is graphically represented in Figure 3, where a block diagram shows a typical classification model pipeline that includes common techniques such as data acquisition/splitting (randomly), image preprocessing, whole slide tiling (WSI), and final evaluation with a test set for the multiclass problem. Indeed, a maximum number of epochs is specified, but of course early stopping may occur. Widely referenced Python libraries have been used to train deep learning models, complementing the methodology presented in this paper. It is noted that the Adam optimizer was also analyzed, but was not commented on for the sake of simplifying the document. It is noted that both the abstract and all sections of the manuscript have been significantly modified (including the last section, “6. Conclusions”, to make the paper clearer and more understandable.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The article is eligible for publication with this version.

Reviewer 2 Report

The authors have addressed the major concerns raised in initial review sufficiently and improved the readability of their study significantly in the updated draft. While minor changes and edits to the abstract and main text of the article remain, the authors have made the article readable for the scientific community, and while the findings are general and applicable to range of audience, may be of interest to those in the field of CRC pathology.

Back to TopTop