Next Article in Journal
Multi-UAV Mission Allocation under Constraint
Next Article in Special Issue
Ten Years of Gabor-Domain Optical Coherence Microscopy
Previous Article in Journal
A Novel Calibration Algorithm for Cable-Driven Parallel Robots with Application to Rehabilitation
Previous Article in Special Issue
Comparison of Intensity- and Polarization-based Contrast in Amyloid-beta Plaques as Observed by Optical Coherence Tomography
Open AccessArticle
Peer-Review Record

Ensemble of Deep Convolutional Neural Networks for Classification of Early Barrett’s Neoplasia Using Volumetric Laser Endomicroscopy

Appl. Sci. 2019, 9(11), 2183; https://doi.org/10.3390/app9112183
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2019, 9(11), 2183; https://doi.org/10.3390/app9112183
Received: 29 April 2019 / Revised: 17 May 2019 / Accepted: 21 May 2019 / Published: 28 May 2019
(This article belongs to the Special Issue Optical Coherence Tomography and its Applications)

Round 1

Reviewer 1 Report

Review

This is a manuscript that discusses method to automatically identify/detect Barrett’s neoplasia in OCT imaging data using Deep Convolution Neural Networks. While in vivo microscopy via optical coherence tomography is a powerful diagnostic methodology currently used in the best clinics/hospitals worldwide, interpretation of imaging data remains challenging and is associated with a significant failure rate.

The comparison with existing algorithms was performed. The authors reported the improvement. The comparison of the method with the gold standard manual evaluation is not presented in a direct form. The algorithm demonstrates some promising results and could be of interest to the readers working with medical image analysis and automated diagnostics. However, the results need to be presented more clearly, particularly regarding avoiding general terms and presenting more solid quantitative results. 

If the authors provide a major revision, I think the paper could prove to be interesting and useful to an audience, making it acceptable for publication in the Journal.

Major critiques:

(1) Not all abbreviations are introduced and explained: UAC (line 7).

(2) While the article is interesting, it is not sufficiently presenting neither solid results and quantitively data for their algorithm nor the state-of-the-art within the field of automated evaluation of in vivo OCT and comparison to it. 

(3) The abstract might be re-written to be more self-descripting. Particularly, volumetric laser endomicroscopy in the context of the manuscript is a conventional OCT and this term might be introduced from the beginning, probably already in the title.

(4) The manuscript and its introduction are missing a clear rationale regarding the nature of the problem. While it is true that Barrett’s Esophagus (BE) occurs at the distal end of the esophagus and is precursor of esophageal adenocarcinoma, it is not clear from the introduction (line 14 - 18).

(5) The authors are using to many general/qualitative terms without providing solid quantitative numbers:  how many centers participated in the current study (line 6 introduces information which is only quantified at line 62), what is the improvement with respect to the previous work (line 7 introduces, Table 2 quantifies), comparison to how many other algorithms was performed, how much lager set was used when comparing to the previous works (line 54).

(6) Additional references might be introduced: sampling error of biopsy (line 21).

(7) Images are missing scale bars and gray scale bars, e.g. Figure 1 and Figure 4. Thus, are not self descriptive.

(8) Qualitative comparison of histology and OCT imaging data might be interested to the reader.



Author Response

Dear editor/reviewer of Applied Sciences,

Thank you for reviewing our submission and providing us with valuable comments. We have revised the article such that all concerns are addressed and the suggested alterations are incorporated. Below, we respond to these concerns in a point-wise fashion, where our responses are in italic dark blue. Quotes from the former version of the manuscript are in italic red text, whereas quotes from the new manuscript are in italic green text.

Yours sincerely,

Roger Fonolla and Thom Scheeve,

Also on behalf of the co-authors

Reviewer 1

This is a manuscript that discusses method to automatically identify/detect Barrett’s neoplasia in OCT imaging data using Deep Convolution Neural Networks. While in vivo microscopy via optical coherence tomography is a powerful diagnostic methodology currently used in the best clinics/hospitals worldwide, interpretation of imaging data remains challenging and is associated with a significant failure rate.

The comparison with existing algorithms was performed. The authors reported the improvement. The comparison of the method with the gold standard manual evaluation is not presented in a direct form. The algorithm demonstrates some promising results and could be of interest to the readers working with medical image analysis and automated diagnostics. However, the results need to be presented more clearly, particularly regarding avoiding general terms and presenting more solid quantitative results.

If the authors provide a major revision, I think the paper could prove to be interesting and useful to an audience, making it acceptable for publication in the Journal.

We would like to thank the reviewer for his appreciation of our work, the presented manuscript and the overall positive review.

 

Major critiques:

(1) Not all abbreviations are introduced and explained: UAC (line 7).

We would like to thank the reviewer for identifying and pointing out these errors in the manuscript and we have corrected them accordingly.

Line 8 (Abstract)

Old:

“We achieve an AUC = 0.96”

Now:

“We achieve an area under the receiver operating characteristic curve (AUC) of 0.96”

 

(2) While the article is interesting, it is not sufficiently presenting neither solid results and quantitively data for their algorithm nor the state-of-the-art within the field of automated evaluation of in vivo OCT and comparison to it.

To the best of our knowledge we have identified all the studies that have used VLE to identify in-vivo esophageal BE cancer. If we overlooked any study, we kindly ask the reviewer to point us to suggestions.

In the field of in-vivo OCT analysis, the state-of-the-art work has focused mainly on retinal images, from segmentation to classification of the retinal layers or macular diseases. The problem that we present is different from what is presented in the literature, since in-vivo classification of BE with OCT have only been investigated by Scheeve et al., which in their study only 18 patients were used with much less data than we present in the proposed manuscript.

Compared to the state-of-the-art, we performed our experiments in two different datasets, where the first 22 patients were used to train the algorithm, and the 23 remaining patients were treated as an unseen dataset and uniquely used for testing the algorithm. With the former explanation, we aim to supply sufficient quantitative data in the form of a total of 8,772 VLE images for the training dataset (Table 1 and now added to the data description) and 7,191 VLE images for the unseen test (now added to data description). Following this approach, we achieved an area under the receiver operating characteristic curve of 0.96 on the test set, which is far superior to what other studies obtained. The already published paper of Scheeve et al. is only using a single dataset of 18 patients and using only 1 VLE image per ROI (single-frame), instead of using 51 images per ROI (multi-frame) as we present in this work. We believe that this is well explored throughout the manuscript.

In conclusion, we thank the reviewer for his constructive criticism, and we have added the following details in the manuscript:

Section 2.2 Data Collection and Description, Line 101:

“Out of the total cohort of patients, the first 22 patients were used as the training dataset (134 NDBE and 38 HGD/EAC ROIs, totalling 8,772 VLE images) and the remaining 23 were treated as the unseen test dataset (99 NDBE and 42 HGD/EAC ROIs, totalling 7,191 VLE images)”

Section 3. Results, Line 246:

“Moreover, we show that using a multi-frame approach for classifying an ROI increases the confidence of our algorithm, compared to single-frame analysis (0.90 vs 0.86 with Scheeve et al. [15]) method, and 0.96 vs 0.91 with our proposed DCNN).”

 

(3) The abstract might be re-written to be more self-descripting. Particularly, volumetric laser endomicroscopy in the context of the manuscript is a conventional OCT and this term might be introduced from the beginning, probably already in the title.

We appreciate the suggestions of the reviewer. To emphasises the nature of VLE technology we have modified the abstract and now it contains a sentence mentioning VLE in the context of OCT.

Abstract, Line 2:

“Volumetric laser endomicroscopy (VLE) is a novel technology incorporating a second-generation form of optical coherence tomography [...]”

We believe that the title includes the right amount of information to be kept for the reader and adding additional information will lead to unnecessary confusion, which we prefer to avoid.

 

(4) The manuscript and its introduction are missing a clear rationale regarding the nature of the problem. While it is true that Barrett’s Esophagus (BE) occurs at the distal end of the esophagus and is precursor of esophageal adenocarcinoma, it is not clear from the introduction (line 14 - 18).

We thank the reviewer for his suggestion and we have followed it by modifying the first paragraph of the Introduction and changed it into:

Old:

“Esophageal adenocarcinoma (EAC) is among the most common and lethal cancers in the world. EAC has shown a rapid increase since the late 1980s and it is estimated that the number of new esophageal cancer cases will be doubled by 2030 [1]. Barrett's esophagus (BE) is a condition in which normal squamous epithelium is replaced by metaplastic columnar epithelium and is associated with an increased risk of developing EAC [2]. Patients diagnosed with BE currently undergo regular surveillance with white-light endoscopy (WLE) with the aim to detect early high-grade dysplasia (HGD) and intramucosal adenocarcinoma. However, an unknown number of early neoplastic lesions are missed because of their subtle appearances, or as a result of sampling errors during biopsy.”

New:

“Esophageal adenocarcinoma (EAC) is among the most common and lethal cancers in the world. EAC has shown a rapid increase since the late 1980s and it is estimated that the number of new esophageal cancer cases will be doubled by 2030 [1]. Barrett's esophagus (BE) is a condition in which normal squamous epithelium at the distal end of the esophagus is replaced by metaplastic columnar epithelium due to overexposure to gastric acid and it is associated with an increased risk of developing EAC [2]. For this reason, patients diagnosed with BE currently undergo regular surveillance with white-light endoscopy (WLE) with the aim to detect early high-grade dysplasia (HGD) and intramucosal adenocarcinoma. It is important to detect these lesions early, as curative treatment is still possible at this stage by a minor endoscopic intervention. However, early neoplastic lesions are regularly missed because of their subtle appearances, or as a result of sampling errors during biopsy [3-5].”

 

(5) The authors are using to many general/qualitative terms without providing solid quantitative numbers:  how many centers participated in the current study (line 6 introduces information which is only quantified at line 62), what is the improvement with respect to the previous work (line 7 introduces, Table 2 quantifies), comparison to how many other algorithms was performed, how much lager set was used when comparing to the previous works (line 54).

We acknowledge the point that reviewer is making, and we thoroughly gone through the manuscript again and tried to quantify all general terms at the place in the manuscript where we introduce them.

We have extended the abstract to include the exact number of patients we use as well as the number of patients used in the compared work.

Old

“In this work, we train an ensemble of deep convolutional neural networks to detect neoplasia in BE patients, using a dataset of images acquired with VLE in a multicenter study. We achieve an AUC = 0.96 on the unseen test dataset and we compare our results with previous work done with VLE analysis. Our method for detecting neoplasia in BE patients facilitates future advances on patient treatment and provides clinicians with new assisting solutions to process and better understand VLE data.”

New

“In this work, we train an ensemble of deep convolutional neural networks to detect neoplasia in 45 BE patients, using a dataset of images acquired with VLE in a multi-center study. We achieve an area under the receiver operating characteristic curve (AUC) of 0.96 on the unseen test dataset and we compare our results with previous work done with VLE analysis, where only AUC of 0.90 was achieved via cross-validation on 18 BE patients. Our method for detecting neoplasia in BE patients facilitates future advances on patient treatment and provides clinicians with new assisting solutions to process and better understand VLE data.”

Moreover, we have incorporated the following additions:

·       Line 53-55 we have added the specific machine learning methods used in the work of Swager et al. and Scheeve et al.

“Both studies investigated the features using several machine learning methods, such as support vector machine, random forest or AdaBoost, and showed successful results towards BE neoplasia assessment.”

·       Line 56-57 we have added the number of patients used in the state-of-the-art.

“However, the results were only obtained in a small patient population (29 endoscopic resections and 18 VLE laser-marked ROIS, respectively), […]”

·       Line 59 we have added the number of patients we used in our study to be compared with the state-of-the-art.

“In this study, we extend the work of Scheeve et al. [15] by using a larger dataset of 45 patients, [...]”

All along the manuscript we have added quantitative numbers and modified explanations to avoid using general terms, especially in Section 3. Results.

Old:

“Our study is based on experiments with a more robust dataset, which allows the DCNN to learn a wider range of features. Moreover, we show that using a multi-frame approach for classifying an ROI increases the the confidence of our algorithm, compared to a single-frame analysis.”

New:

“Our study is based on experiments with a more robust dataset, provided by increasing to 45 the number of BE patients, which allows the DCNN to learn a wider range of features. Moreover, we show that using a multi-frame approach for classifying an ROI increases the confidence of our algorithm, compared to single-frame analysis (0.90 vs 0.86 with Scheeve et al. [15] method, and 0.96 vs 0.91 with our proposed DCNN).”

In Line 82 we specified the number of centers that have participated in the study: Amsterdam UMC (AMC; Amsterdam, The Netherlands), the Catharina Hospital (CZE; Eindhoven, The Netherlands), and the St. Antonius Hospital (ANZ; Nieuwegein, The Netherlands)

 

(6) Additional references might be introduced: sampling error of biopsy (line 21).

We would like to thank the reviewer for identifying and pointing out these errors in the manuscript and we have added three additional references that support the claim.

In Line 26 we add references to the studies on sampling error:

Tschanz et al., Arch. Pathol. Lab. Med. 2005 [3]; Gordon et al., Gastrointest. Endosc. 2014 [4]; Schölvinck et al., Endoscopy 2017 [5].

 

(7) Images are missing scale bars and gray scale bars, e.g. Figure 1 and Figure 4. Thus, are not self descriptive.

We agree with the reviewer that the scale of the presented examples was difficult to understand. We have added color bars and scale bars in both Figure 1 and Figure 4 (In the revised manuscript is found under the name of Figure 5).

    (SEE REPORT NOTES FOR THE IMAGE)

“Figure 1. [Best viewed in color] Example of preprocessing applied to each VLE frame. At the left side of the image the balloon line (red line) is located by calculating the average intensity of the whole image (cyan line) and then using the first derivative (yellow line) to extract the end point of the balloon pixel (green star). Scale bars: 0.5 mm.”

    (SEE REPORT NOTES FOR THE IMAGE)

“Figure 5. [Best viewed in color] Several VLE frames and its corresponding class activation maps (CAM). Images (a) and (b) belong to ROIs with HGD, represented as CAM in images (e) and (f). Images (c) and (d) correspond to ROIs with NDBE, with its CAM in images (g) and (h). Scale bars: 0.5 mm. Color bar (a)-(d): Pixel intensity. Color bar (e)-(h): Class activation intensity.”

 

(8) Qualitative comparison of histology and OCT imaging data might be interested to the reader.

Although the point brought by the reviewer is very interesting, we think that it is beyond the scope of the manuscript. We would like to refer the reader to the publication of Swager et al. (which is already referred to in the manuscript under Ref. [7]), where the reader can find an analysis of a VLE – histology match as well as images supporting the findings, like the one we show below:

    (SEE REPORT NOTES FOR THE IMAGE)

In Section 1. Introduction we added a mention to the work of Swager et al. comparing VLE-histology images.

General comment:

*We want to point out that we have as well corrected some minor grammatical errors throughout the manuscript.


Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript “Ensemble of Deep Convolutional Neural Networks for classification of early Barrett’s neoplasia using volumetric laser endomicroscopy” by Fonollà et al. is devoted to development Neural Network-assisted classification of non-dysplastic and neoplastic BE patients based on VGG16 NNs. The manuscript is well-written in general and can be of interest for a broad scientific audience. It can be published if the following deficiencies were addressed:

1.      While the NN side was explained reasonably well, the preprocessing steps are a little bit unclear. I would recommend adding a diagram with all preprocessing steps

2.      Line 141: It is unclear why authors moved from grayscale images to quasi-RGB images with triplicated grayscale channels. Did you use any pre-training?

3.      Authors should comment on how all images from 1 measurement were labeled. It is unclear whether all 51 images were labeled the same: NDBE or HGD?

4.      Were all 3 DCNNs identical? with different weights seeded?

 

Minor deficiencies:

5.      Line 44: Acronyms (e.g., NDBE, and HGD) should be defined on their first appearance

6.      Line 93: In “In the previous works of Klomp et al. and Scheeve et al.,” the explicit references should be included

7.      Line 204: Incorrect reference to Equation (3).

8.      Authors need to explain what “single-frame” and “multi-frame” is in their context.


Author Response

Dear editor/reviewer of Applied Sciences,

Thank you for reviewing our submission and providing us with valuable comments. We have revised the article such that all concerns are addressed and the suggested alterations are incorporated. Below, we respond to these concerns in a point-wise fashion, where our responses are in italic dark blue. Quotes from the former version of the manuscript are in italic red text, whereas quotes from the new manuscript are in italic green text.

Yours sincerely,

Roger Fonolla and Thom Scheeve,

Also on behalf of the co-authors

Reviewer 2

The manuscript “Ensemble of Deep Convolutional Neural Networks for classification of early Barrett’s neoplasia using volumetric laser endomicroscopy” by Fonollà et al. is devoted to development Neural Network-assisted classification of non-dysplastic and neoplastic BE patients based on VGG16 NNs. The manuscript is well-written in general and can be of interest for a broad scientific audience. It can be published if the following deficiencies were addressed:

We would like to thank the reviewer for his appreciation of our work, the presented manuscript and the overall positive review.

1.      While the NN side was explained reasonably well, the preprocessing steps are a little bit unclear. I would recommend adding a diagram with all preprocessing steps

We would like to thank the reviewer for this suggestion. Accordingly, we have added the following diagram to help better understand the preprocessing steps at the end of the Section 2.4.2.

SEE REPORT NOTES FOR THE IMAGE

2.      Line 141: It is unclear why authors moved from grayscale images to quasi-RGB images with triplicated grayscale channels. Did you use any pre-training?

We would like to thank the reviewer for pointing out the insufficient clarifications during the preprocessing steps. As the reviewer hints at, we indeed used a pre-trained VGG16 with ImageNet weights, which is further explained in the Training section. To avoid any confusion, we have changed the following paragraphs to help understand the reader the steps we followed in our work.

·       In Section 2.4.2. Preprocessing DCNN we have extended the following paragraph

Old text:

“Each of the networks were initialized using ImageNet weights [29], hence each image was resized to 224 X 224 pixels to match the DCNN input shape. Furthermore, VLE images are originally embedded into the grayscale space, we triplicated the gray channels to represent the image in the RGB space. As final step, the dataset was normalized by subtracting the mean and dividing by the standard deviation specified by the original ImageNet weights.”

New text:

“Each of the networks were initialized using pre-trained ImageNet weights [29]. One limitation of using a pre-trained model is that the associated architecture cannot be changed, since the weights are originally trained for a specific input configuration. Hence, to match the requirements of the pre-trained ImageNet weights, each image was resized to 224 X 224 pixels. In addition, the dataset was normalized by subtracting the mean and dividing by the standard deviation, specified by the pre-trained ImageNet weights. As final step, the grayscale channel of each VLE image was triplicated to simulate the RGB input requirement of the pre-trained model (Figure 2).”

 

3.      Authors should comment on how all images from 1 measurement were labeled. It is unclear whether all 51 images were labeled the same: NDBE or HGD?

We thank the reviewer for the suggestion. We agree with the reviewer that further explanation should be added into the setup of the presented experiments.

Therefore, we have interchanged Section 2.2 VLE Imaging System to Section 2.1 and Section 2.1 Data Collection and Description to Section 2.2.

We have replaced the following paragraph in the Section 2.2. Data Collection and Description, as well as added three new references that support the new explanations:

Old:

“For each patient, one or several regions of interest (ROIs) were extracted. VLE images in the ROIs were extracted from the VLE data, guided by the VLE-histology correlations. The histopathological correlation was assumed to apply over 1.25 mm (25 cross-sectional images) in both vertical directions (i.e, distal and proximal), resulting in 51 images per region.”

New:

“For each patient, one or several regions of interest (ROIs) were extracted in the following manner. First, four-quadrant laser-mark pairs were placed at 2-cm intervals using the VLE system, according to the Seattle biopsy protocol [20,21]. Next, a full VLE scan was performed, after which the VLE balloon was retracted from the esophagus. Then, regular endoscopy was used to obtain biopsies in between the laser-mark pairs. Finally, ROIs were cropped from the full scan in between the same laser-mark pairs, and were labeled according to pathology outcome, ensuring histology-correlation [22] of the extracted ROIs. The histopathological correlation was assumed to apply over 1.25 mm, conform a small biopsy specimen, comprising 25 cross-sectional images in both vertical directions (i.e., distal and proximal), and thus resulting in 51 images per ROI.”

 

4.      Were all 3 DCNNs identical? with different weights seeded?

The reviewer addresses a very interesting issue.

1.       Yes, all the DCNNs were trained identically until convergence using the same optimizer hyperparameters. The only remarkable difference arises due to the cyclic learning rate, where each DCNN might converge at a different cyclic point.

2.       By going back to the earlier point of the reviewer on pre-training in Line 141 (question 2 of the reviewer), we have already clarified whether we performed pre-training or not. Since in this work we take advantage of a pre-trained network, the weights are not differently seeded and are initialized exactly the same way. In the text we have covered this issue by stating:

Section 2.4.2 Preprocessing DCNN, Line 154: "Each of the networks were initialized using pre-trained ImageNet weights [32]."

 

Minor deficiencies:

5.      Line 44: Acronyms (e.g., NDBE, and HGD) should be defined on their first appearance

We have checked this and found this was done in Section 1: HGD, defined in Line 20 (old manuscript) / Line 22 (updated manuscript). NDBE, defined in Line 44 (old manuscript) / Line 48 (updated manuscript).

6.      Line 93: In “In the previous works of Klomp et al. and Scheeve et al.,” the explicit references should be included.

We have done this.

7.      Line 204: Incorrect reference to Equation (3).

We would like to thank the reviewer for identifying and pointing out these errors in the manuscript and we have corrected them accordingly.

8.      Authors need to explain what “single-frame” and “multi-frame” is in their context.

This is indeed a useful suggestion and we extended the paragraph in Section 2.3 with the aim to better clarify the work done by Scheeve et al. were we first refer to the terms if single-frame and multi-frame.

Old:

"In the previous works of Klomp et al. and Scheeve et al., several clinically-inspired quantitative image features were developed, the layer histogram (LH) and gland statistics (GS), to detect BE neoplasia in single frames. For a fair comparison with these studies, and per our request to the authors, we present our results by extending the single-frame analysis, further referred to as multi-frame analysis. The development of the clinically-inspired features has been described previously [12,17,18]. A summary of the methodology for the multi-frame analysis is given in the following sections."

New:

"In the previous works of Klomp et al. [23] and Scheeve et al. [15], several clinically-inspired quantitative image features were developed, the layer histogram (LH) and gland statistics (GS), to detect BE neoplasia in single frames. We refer to analysing one VLE image in a ROI to predict BE neoplasia as single-frame analysis. In Scheeve et al. [15], a single VLE image per ROI was used to compute the resulting prediction for each ROI. For a fair comparison with these studies, and per our request to the authors, we present our results by extending the single-frame analysis to 51 VLE images per ROI, further referred to as multi-frame analysis. The development of the clinically-inspired features has been described previously [15,23,24]. A summary of the methodology for the multi-frame analysis is given in the following sections.”

 

General comment:

*We want to point out that we have as well corrected some minor grammatical errors throughout the manuscript.

 


Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thank you for the opportunity to review the updated manuscript. I appreciate the careful revision the authors have done and accurate implementation of all suggestions/comments. The authors have strengthened the paper. It is my belief that the manuscript is substantially improved after the edits making it acceptable for publication. 

Back to TopTop