Next Article in Journal
CCpos: WiFi Fingerprint Indoor Positioning System Based on CDAE-CNN
Previous Article in Journal
A Flexible PI/Si/SiO2 Piezoresistive Microcantilever for Trace-Level Detection of Aflatoxin B1
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Evaluation of Deep Learning-Based Prostate Cancer Screening Methods in Histopathological Images: Measuring the Impact of the Model’s Complexity on Its Processing Speed

by
Lourdes Duran-Lopez
1,2,3,4,*,
Juan P. Dominguez-Morales
1,2,4,
Antonio Rios-Navarro
1,2,4,
Daniel Gutierrez-Galan
1,2,4,
Angel Jimenez-Fernandez
1,2,4,
Saturnino Vicente-Diaz
1,2,4 and
Alejandro Linares-Barranco
1,2,3,4
1
Robotics and Tech. of Computers Lab, Universidad de Sevilla, 41012 Seville, Spain
2
Escuela Técnica Superior de Ingeniería Informática (ETSII), Universidad de Sevilla, 41012 Seville, Spain
3
Escuela Politécnica Superior, Universidad de Sevilla, 41012 Seville, Spain
4
Smart Computer Systems Research and Engineering Lab (SCORE), Research Institute of Computer Engineering (I3US), Universidad de Sevilla, 41012 Seville, Spain
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(4), 1122; https://doi.org/10.3390/s21041122
Submission received: 13 January 2021 / Revised: 30 January 2021 / Accepted: 1 February 2021 / Published: 5 February 2021
(This article belongs to the Topic Scientific Advances in STEM: From Professor to Students)

Abstract

:
Prostate cancer (PCa) is the second most frequently diagnosed cancer among men worldwide, with almost 1.3 million new cases and 360,000 deaths in 2018. As it has been estimated, its mortality will double by 2040, mostly in countries with limited resources. These numbers suggest that recent trends in deep learning-based computer-aided diagnosis could play an important role, serving as screening methods for PCa detection. These algorithms have already been used with histopathological images in many works, in which authors tend to focus on achieving high accuracy results for classifying between malignant and normal cases. These results are commonly obtained by training very deep and complex convolutional neural networks, which require high computing power and resources not only in this process, but also in the inference step. As the number of cases rises in regions with limited resources, reducing prediction time becomes more important. In this work, we measured the performance of current state-of-the-art models for PCa detection with a novel benchmark and compared the results with PROMETEO, a custom architecture that we proposed. The results of the comprehensive comparison show that using dedicated models for specific applications could be of great importance in the future.

1. Introduction

Prostate cancer (PCa) is the second most common cancer and the fifth leading cause of cancer death in men (GLOBOCAN [1]). In 2018, almost 1.3 million cases and around 360,000 deaths worldwide were registered due to this malignancy. According to the World Health Organization (WHO), there will be an increase of prostate cancer (PCa) cases worldwide, with 1,017,712 new cases being estimated for 2040. Most of these cases will be registered in Africa, Latin America, the Caribbean and Asia, and appear to be related to an increased life expectancy [2].
To diagnose PCa, digital rectal examination (DRE) is the primary test for the initial clinical assessment of the prostate. Then, prostate-specific antigen (PSA) is used in a screening method for the investigation of an abnormal prostatic nodule found in a digital rectal examination (DRE). Finally, in the case of abnormal DRE and elevated PSA results, trans-rectal ultrasound-guided biopsy is performed to obtain samples of the prostate tissue [3]. Then, these tissue samples are scanned, resulting on gigapixel-resolution images called whole-slide images (WSIs), which are then analyzed and diagnosed by pathologists.
Due to the high increment of new cases, and thanks to the impacts of Artificial Intelligence (AI) in recent years [4,5], several computer-aided diagnosis (CAD) systems have been developed to speed up the process of PCa diagnosis. A computer-aided diagnosis (CAD) system is an automatic or semi-automatic algorithm whose purpose is to assist doctors in the interpretation of medical images in order to provide a second opinion in the diagnosis. Among the different AI algorithms, deep learning (DL) has become very popular in recent years, and convolutional neural networks (CNNs) particularly [6]. They have been applied in several fields in medical image analysis, such as in disorder classification [7], lesion/tumor classification [8], disease recognition [9] and image construction/enhancement [10], among others.
Deep learning (DL) algorithms have also been applied to other medical image analysis fields such as histopathology, in which whole-slide images (WSIs) are used. Since it is not possible for a convolutional neural network (CNN) to work with a whole WSI as input due to its large size, a common approach is to divide this image into small subimages called patches. This procedure has been widely used in order to develop CAD systems in this field.
Recently, many researchers have investigated the application of CAD systems to the diagnosis of PCa in WSIs. Ström et al. [11] developed a deep learning (DL)-based CAD system to perform a binary classification distinguishing between malignant and normal tissue. The classification was performed using an ensemble of 30 widely used InceptionV3 models [12] pretrained on ImageNet. They achieved areas under the curve (AUC) of 0.997 and 0.986 on the validation and test subsets, respectively. For areas detected as malignant, the authors trained another ensemble of 30 InceptionV3 CNNs in order to discriminate between different PCa Gleason grading system (GGS) scores, achieving a mean pairwise kappa of 0.62 at slide level. Campanella et al. [13] presented a CAD system to detect malignant areas in WSIs. The classification was performed with the well-known ResNet34 model [14] together with a recurrent neural network (RNN) for tumor/normal classification. achieving an area under curve (AUC) of 0.986 at slide level. In a previous study [15], we proposed a CAD system, in which we focused on performing a patch-level classification of histopathological images between normal and malignant tissue. The proposed architecture, called PROMETEO, consisted of four convolution stages (convolution, batch normalization, activation and pooling layers) and three fully connected layers. The network achieved 99.98% accuracy, 99.98% F1 score and 0.999 AUC on a separate test set at patch level after training the network with a 3-fold cross-validation method.
These previous works achieved competitive results in terms of accuracy, precision and other commonly-used evaluation metrics. However, to the best of our knowledge, most state-of-the-art works do not focus on prioritizing the speed of the CAD system as an important factor. Many of them used very complex, well-known networks to train and test, without taking into account the computational cost and the time required to perform the whole process. Since these algorithms are not intended to replace pathologists but to assist them in their task, in some cases it is better to prioritize the speed of the analysis, sacrificing some precision so that the expert has a faster and more dynamic response from the system.
In this paper, a novel benchmark was designed in order to measure the processing and prediction time of a CNN architecture for a PCa screening task. First, the proposed benchmark was run for the PROMETEO architecture on different computing platforms in order to measure the impacts that their hardware components have on the WSI processing time. Then, using the personal computer (PC) configuration that achieved the best performance, the benchmark was run with different state-of-the-art CNN models, comparing them in terms of average prediction time both at patch level and at slide level, and also reporting the slowdown when compared to PROMETEO.
The rest of the paper is structured as follows: Section 2 introduces the materials and methods used in this work, including the dataset (Section 2.1), the CNN models (Section 2.2) and the benchmark proposed (Section 2.3). Then, the results obtained are presented in Section 3, which are divided in two different experiments: first, the performance of a proposed CNN model is evaluated in different platforms, and then it is compared to state-of-the-art, widely-known CNN architectures. Section 4 and Section 5 present the discussion and the conclusions of this work, respectively.

2. Materials and Methods

2.1. Dataset

In this work, a dataset with WSIs obtained from three different hospitals was used. These cases consisted of different Hematoxylin and Eosin-stained slides globally diagnosed as either normal or malignant.
From Virgen de Valme Hospital (Sevilla, Spain), 27 normal and 70 malignant cases obtained by means of needle core biopsy were digitized into WSIs. Clínic Barcelona Hospital (Barcelona, Spain) provided 100 normal and 129 malignant WSIs, also obtained by means of needle core biopsy. Finally, from Puerta del mar Hospital (Cádiz, Spain), 65 malignant (26 obtained from needle core biopsy and 39 from incisional biopsy) and 79 (33 obtained from needle core biopsy and 46 from incisional biopsy) WSIs were obtained. Table 1 summarizes the WSIs considered in the dataset.

2.2. CNN Models

Different CNNs models were considered in this work in order to compare their performance by using the benchmark proposed in Section 2.3. Three different architectures from state-of-the-art DL-based PCa detection works were compared, along with other well-known CNN architectures. The first one is the custom CNN model, called PROMETEO, which we proposed in [15], where we also demonstrated that applying stain-normalization algorithms to the patches in order to reduce color variability could improve the generalization of the model when predicting new unseen images from different hospitals and scanners. The second CNN architecture that was considered in this work is the well-known ResNet34 model [14], which was used by Campanella et al. in [13]. The third one is InceptionV3, introduced in [12], which was used by Ström et al. [11].
Apart from these three CNN models, other widely-known architectures were evaluated with the same benchmark, comparing their performance in terms of execution time with the rest of the networks for the same task. These were VGG16 and VGG19 [16], MobileNet [17], DenseNet121 [18], Xception [19] and ResNet101 [14].

2.3. Benchmark

In this work, a novel benchmark was designed in order to measure and compare the performances of different CNN models and platforms on a PCa screening task. In order to make the benchmark feasible to be shared with other researchers so that it could be run in different computers, a reduced set of WSIs were chosen from the dataset presented in Section 2.1. Since the total amount of WSIs of the dataset represent more than 300 gigabytes (GB) hard drive space, only 40 of them were considered, building up a benchmark of around 50 GB, which is much more shareable. These 40 WSIs were randomly selected, considering all the three different hospitals and scanners, and thus representing well the diversity of the dataset in this benchmark.
The benchmark performs a set of processing steps which are detailed next (see Figure 1). First, as it was introduced in Section 1, since it is not possible for a CNN to use a whole WSI as input due to its large size, these images are divided into small subimages called patches (100 × 100 pixels at 10× magnification in this case), which are read from each WSI. This process is called "read," and apart from extracting the patches from the input WSI, those corresponding to background are discarded (identified as D in the figure). Then, in the scoring step, a score is given to each patch depending on three factors: the amount of tissue that it contains, the percentage of pixels that are within Hematoxylin and Eosin’s hue range and the dispersion of the saturation and brightness channels. This score allows discarding patches corresponding to unwanted areas, such as pen marks, external agents and patches with small amounts of tissue, among others. In Figure 1, discarded patches in this step are highlighted in red, while those that pass the scoring filter are highlighted in green. The third step, called stain normalization, performs a color normalization of the patch based on Reinhard’s stain-normalization algorithm [20,21] in order to reduce color variability between samples. In prediction, which is the last step of the process, each of the patches are used as input to a trained CNN, which classifies them as either malignant or normal tissue. Deeper insights into these steps are given in [15]. When the execution of the benchmark finishes, it reports both the hardware and system information of the computer used to run the benchmark, and the results of the execution. These results consist of the mean execution time and standard deviation for each of the four processes (read, scoring, stain normalization and prediction) shown in Figure 1 and presented in [15], both at patch level and at WSI level.

3. Results

The CNN-based PROMETEO architecture described in Section 2.2 was proposed and evaluated in terms of accuracy and many other evaluation metrics in [15]. In this work, we evaluated that model in terms of performance and execution time per patch and WSI.
First, the same architecture was tested in different platforms using the benchmark proposed in Section 2.3. These results allow us to measure and quantify the impacts of different components in the whole processing and prediction process, which is useful for designing an edge-computing prostate cancer detection system. Then, the benchmark was used to evaluate the performances of different state-of-the-art CNN architectures on the computing platform that achieved the best results on the first experiment.
Fourteen different PC configurations were used to evaluate the performance of the PROMETEO architecture introduced in Section 2.2. The hardware specifications (central processing unit (CPU) and graphics processing unit (GPU)) of these computers are listed in Table A1 of Appendix A. In Figure 2, the average patch processing time is shown for each of the fourteen configurations, where the mean time for the steps performed when processing a patch (see Section 2.3) is reported. As it can be seen, the step that requires more time is the prediction in most of the cases, but it is highly reduced in configurations consisting of a GPU.
Figure 3 depicts the average and standard deviation of the execution time needed per WSI when running the benchmark on the fourteen different PC configurations. As in Figure 2, each of the steps considered in the whole process is shown. As it can be seen, reading the whole WSI patch by patch is the step that involves the longest amount of time in most of the devices (mainly in those configurations with no GPU). This might seem contradictory considering Figure 2, but it is important to mention that, in that step, all patches from a WSI are read and analyzed, but not all of them are processed in the following steps. Unwanted areas, such as background regions with no tissue, are discarded before being scored. Then, only those which are not background and pass the scoring step are stain normalized and predicted by the CNN.

3.1. PROMETEO Evaluation

The sum of the average execution time of the four preprocessing steps for each WSI was computed and it can be seen in Figure 4. The best case (device M) takes 22.56 ± 5.67 s on average to perform the whole process per WSI, where the prediction step only represents 4.20 ± 1.73 s.
The execution times obtained and used for generating the plots presented in this subsection are detailed in Table A2 of Appendix A.

3.2. Performance Comparison for Different State-of-the-Art Models

After evaluating the PROMETEO architecture using the benchmark designed for this work with different PCs, the same network was compared to other widely-known architectures. For this purpose, the same computer (device M) was used in order to perform a fair comparison. The same benchmark that was used in the previous evaluation (see Section 3.1) was executed in computer M (see Table A1) for each of the CNN architectures mentioned in Section 2.2. The CNNs considered are PROMETEO [15], ResNet34 and ResNet101 [14], InceptionV3 [12], VGG16 and VGG19 [16], MobileNet [17], DenseNet121 [18] and Xception [19].
The average patch processing time per preprocessing step can be seen in Figure 5 for each of the architectures mentioned. Since the architecture does not have an effect on the first three steps (reading the patch from the WSI, scoring it in order to discard unwanted patches, and normalizing it), the times needed to process them are similar across all the different cases reported in the figure. This does not happen with the prediction time, which directly depends on the complexity of the network.
Figure 6 reports the combined processing time that device M takes to compute a WSI on average, together with its corresponding standard deviation. The same case explained in Section 3.1, where the WSI reading step takes much longer than the patch reading step in relation to the rest of the subprocesses, can also be observed in this figure. It is important to mention that the model proposed by the authors is faster than the rest in terms of prediction time, with a total of 22.56 ± 5.67 s per WSI on average.
Table 2 presents a summary of the results obtained for each architecture, focusing on the prediction process, which is the only one affected when changing the CNN architecture. Moreover, the number of trainable parameters and the slowdown are also reported. The latter is calculated by dividing the average prediction time per WSI of the corresponding CNN by that obtained with PROMETEO. This way, the improvement in terms of prediction time between PROMETEO and the rest of the architectures considered can be clearly seen. The proposed model predicts 2.55× faster than the CNN used in [13] and 11.68× faster than the one used in [11]. It is also important to mention that, in the latter, the authors did not use only an InceptionV3 model, but an ensemble of 30 of them. In this case, the figures and tables only report the execution times for a single network. When compared to other different widely-known architectures, PROMETEO is between 7.41× and 12.50× faster.
The execution times obtained and used for generating the plots presented in this subsection are detailed in Table A3 of Appendix B.

4. Discussion

In order to design a fast edge-computing platform for PCa detection, an evaluation of a proposed CNN was performed. This allowed us to compare different hardware components and configurations and measure the impacts of them when processing WSIs. Apart from the figures presented in Section 3.1, two specific cases are highlighted in Figure 7. Figure 7a shows the impact that the frequency of the CPU has on the whole process when using the same computer. As it can be seen, the four processing steps clearly benefit when a faster CPU is used. On the other hand, Figure 7b compares two cases where the same configuration is used, except for the GPU, which was removed in one of them. As expected, the GPU highly accelerated the prediction time (by around three times in this case). Therefore, in order to build a low-cost edge-computing platform for PCa diagnosis, this analysis could be useful and should be taken into account in order to prioritize in which component the funds should be invested. As it was explained, all patches from a WSI have to be read, but not all of them have to be predicted, since the majority of them correspond to background and are discarded first. Therefore, the CPU has a higher impact than the GPU in the whole process.
When comparing PROMETEO to other state-of-the-art CNN models, the former achieved the fastest prediction time, being from 2.55 times up to 12.50 times faster than any of the rest. Although the results in terms of accuracy and other commonly-used metrics in DL algorithms cannot be compared since the authors in [11,13,15] used different datasets, all of them reported state-of-the-art results for PCa detection. In [15], the authors compared PROMETEO to many of the models used in this work in terms of accuracy when using the same dataset for training and testing the CNN, showing that similar results were obtained.
The use of transfer learning in CNNs for medical image analysis has become a commonplace technique, and most of the current research focuses on using this approach for avoiding the problem of having to design, train and validate a custom CNN model for a specific task. This has proved to achieve state-of-the-art results in many different fields and has also accelerated the process of training a custom CNN from scratch [22]. However, when using this technique, very deep CNNs are commonly considered, which, as presented in this work, leads to a higher computational cost when predicting an input image, and therefore, a slower processing time. Some specific tasks could benefit from designing shallower custom CNN models from scratch, such as DL-based PCa screening, providing a faster response to the pathologists in order to help them in this laborious process. With the increases in the number of cases and the mortality produced by PCa, this factor could become even more relevant in the future.
As an alternative, cloud computing has provided powerful computational resources to big data processing and machine learning models [23]. Recent works have focused on accelerating CNN-based medical image processing tasks by using cloud solutions. While it is true that processing images using GPUs and tensor processing unit (TPUs) in the cloud is faster than in any local edge-computing device, there is an aspect that is not commonly taken into account when stating this fact: the time required to upload the image to the cloud. This depends on many factors and it is not easy to predict. Moreover, when digitizing histological images, scanners store them in a local hard drive using around 1 GB for each of them. As an example, with an upload speed of 300 Mbps, it would take more than 27 s in ideal conditions just for uploading the WSI to the cloud, which is more than the time it would take to fully process the image on a local platform.
To design a fast, low-cost, edge-computing platform, both the hardware components considered and the CNN model design have to be taken into account. Optimizing these two aspects led to achieving a very short WSI processing time when compared to current DL-based solutions without penalizing the performance of the system in terms of accuracy. In the next future, the authors would like to build a custom bare-bones approach based on the evaluations achieved in this work and test it in some of the hospitals that collaborated with us in this project.

5. Conclusions

In this work, we have presented a comprehensive evaluation of the performance of PROMETEO, a previously-proposed DL-based CNN architecture for PCa detection in histopathological images, which achieved 99.98% accuracy, 99.98% F1 score and 0.999 AUC on a separate test set at patch level.
Our proposed model outperforms other widely-used state-of-the-art CNN architectures such as ResNet34, InceptionV3, VGG16, VGG19, MobileNet, DenseNet121, Xception and ResNet101 in terms of prediction time. PROMETEO takes 22.56 s to predict a WSI on average, including the preprocessing steps needed, using an Intel® Core™ i7-8700K (Intel, Santa Clara, CA, USA) and an NVIDIA® GeForce™ GTX 1080 Ti (NVIDIA, Santa Clara, CA, USA). If we focus only on the prediction time, PROMETEO is between 2.55 and 12.50 times faster than any of the other architectures considered.
The promising results obtained suggest that edge-computing platforms and custom CNN designs could play important roles in the future for AI-based medical image analysis, being able to aid pathologists in their laborious tasks speed-wise.

Author Contributions

Conceptualization, L.D.-L. and J.P.D.-M.; methodology, L.D.-L.; software, L.D.-L. and J.P.D.-M.; validation, L.D.-L., J.P.D.-M., A.R.-N., D.G.-G. and A.J.-F.; formal analysis, L.D.-L. and J.P.D.-M.; investigation, L.D.-L. and J.P.D.-M.; resources, L.D.-L., J.P.D.-M., A.R.-N., D.G.-G., A.J.-F., S.V.-D. and A.L.-B.; data curation, L.D.-L. and J.P.D.-M.; writing—original draft preparation, L.D.-L. and J.P.D.-M.; writing—review and editing, L.D.-L., J.P.D.-M., A.R.-N., D.G.-G., A.J.-F., S.V.-D. and A.L.-B.; visualization, L.D.-L.; supervision, J.P.D.-M., S.V.-D. and A.L.-B.; project administration, S.V.-D. and A.L.-B.; funding acquisition, S.V.-D. and A.L.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Spanish grant (with support from the European Regional Development Fund) MIND-ROB (PID2019-105556GB-C33), the EU H2020 project CHIST-ERA SMALL (PCI2019-111841-2) and by the Andalusian Regional Project PAIDI2020 (with FEDER support) PROMETEO (AT17_5410_USE).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to than Gabriel Jimenez-Moreno and Luis Muñoz-Saavedra for executing the benchmark on their computers and reporting to us its performance. We would also like to thank Antonio Felix Conde-Martin and the Pathological Anatomy Unit of Virgen de Valme Hospital in Seville (Spain) for their support in the PROMETEO project, together with VITRO S.A., along with providing us with annotated WSIs from the same hospital. We would finally like to thank Puerta del Mar Hospital (Cádiz, Spain) and Clínic Barcelona Hospital (Barcelona, Spain) for providing us with diagnosed WSIs from different patients.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AUCArea Under Curve
CADComputer-Aided Diagnosis
CNNConvolutional Neural Network
CPUCentral Processing Unit
DLDeep Learning
DREDigital Rectal Examination
GBGigabyte
GGSGleason Grading System
GPUGraphic Processing Unit
H&EHematoxylin and Eosin
PCPersonal Computer
PCaProstate Cancer
PSAProstate-Specific Antigen
RNNRecurrent Neural Network
TPUTensor Processing Unit
WHOWorld Health Organization
WSIWhole-Slide Image

Appendix A. PROMETEO Evaluation

Table A1. Hardware specifications (CPU and GPU) of the different computers used in the PROMETEO evaluation.
Table A1. Hardware specifications (CPU and GPU) of the different computers used in the PROMETEO evaluation.
DeviceCPUGPU
AIntel® Core™ i7-8850U @ 1.80 GHz
4 cores, 8 threads
-
BIntel® Core™ i9-7900X @ 3.30 GHz
10 cores, 20 threads
-
CIntel® Core™ i7-6700HQ @ 1.20 GHz
4 cores, 8 threads
-
DIntel® Core™ i7-6700HQ @ 2.60 GHz
4 cores, 8 threads
-
EIntel® Core™ i5-6500 @ 3.20 GHz
4 cores, 4 threads
-
FIntel® Core™ i7-4770K @ 3.50 GHz
4 cores, 8 threads
-
GIntel® Core™ i7-8700K @ 3.70 GHz
6 cores, 12 threads
-
HIntel® Core™ i7-4970 @ 3.60 GHz
4 cores, 8 threads
-
IIntel® Core™ i9-7900X @ 3.30 GHz
10 cores, 20 threads
NVIDIA® GeForce™ GTX 1080 Ti
11 GB GDDR5X
JAMD® Ryzen™ 9 3900X @ 4.20 GHz
12 cores, 24 threads
NVIDIA® GeForce™ GTX 1080 Ti
11 GB GDDR5X
KIntel® Core™ i5-6500 @ 3.20 GHz
4 cores, 4 threads
NVIDIA® GeForce™ GT 730
2 GB GDDR5
LIntel® Core™ i7-4770K @ 3.50 GHz
4 cores, 8 threads
NVIDIA® GeForce™ GTX 1080 Ti
11 GB GDDR5X
MIntel® Core™ i7-8700K @ 3.70 GHz
6 cores, 12 threads
NVIDIA® GeForce™ GTX 1080 Ti
11 GB GDDR5X
NIntel® Core™ i7-4970 @ 3.60 GHz
4 cores, 8 threads
NVIDIA® GeForce™ RTX 2060
6 GB GDDR6
Table A2. PROMETEO evaluation results. The average (Avg) and standard deviation (Std) of the execution times (in seconds) are shown for each of the four processes presented in Section 2.3 (Figure 1), both at patch level and at slide (WSI) level.
Table A2. PROMETEO evaluation results. The average (Avg) and standard deviation (Std) of the execution times (in seconds) are shown for each of the four processes presented in Section 2.3 (Figure 1), both at patch level and at slide (WSI) level.
PatchWSI
ReadScoreStain NormalizationPredictionReadScoreStain NormalizationPrediction
DeviceAvgStdAvgStdAvgStdAvgStdAvgStdAvgStdAvgStdAvgStd
A0.001207570.003113630.005852980.005020590.005129050.004181730.017300450.0085061725.80355764.338293357.686913212.500548286.741141852.182357922.11925916.90573801
B0.000689730.001501090.003067870.00370150.002887330.000624380.014415870.0017514613.38928112.296296333.593218111.049563243.387563210.9894435316.82133354.91918301
C0.001823370.005038920.008076970.006284360.007295320.006342260.022493180.010041645.26933138.9989745212.12396143.7722336910.85175223.3686466331.99828559.74059672
D0.001039010.002280840.004376080.000804030.004042580.000988540.014359140.0021129819.52566973.397238284.942451661.448897914.590972191.3459604416.69593364.87815493
E0.000826950.001679530.004214290.000682220.004005940.000909840.012232860.0021694216.04845052.762784714.946528291.44392074.700104311.3721865714.33996194.1861481
F0.000832810.001727590.004136880.000751050.00383540.000946660.014799340.0029338315.83762342.749643564.77148661.39767354.426557761.2958139716.99979114.98414625
G0.000627770.001339610.002921480.000384630.002704510.000671590.01021720.0015029112.13614342.08326633.425161141.000711983.176800980.9272667911.91599233.48494303
H0.000842910.001758640.003983220.000656380.00375660.000938030.037684910.0081877615.89869142.816388214.54583951.341997314.294953891.2674300342.287913512.595224
I0.000695170.00152820.002996730.00039360.002859970.000665570.003450980.0102395713.46636052.327103053.518543021.027198573.352975410.979656364.291455491.29811287
J0.000629760.001375080.004284610.000271880.002469430.000606320.002753940.0090687212.33923242.121661555.0394791.470224532.919450680.850827113.354720031.00963885
K0.000841530.001735140.004177080.000590190.003977340.000919360.019735230.0171399916.4628362.816945854.898978471.4301214.677065471.3651200523.42784296.84671918
L0.000913540.001940070.004498050.001639090.004214550.001788760.04206230.0131622917.7434883.083141935.292751121.559278494.970269511.462322875.164410151.56004368
M0.00061160.001291190.002867770.000390140.002654520.000685210.00305450.0355901511.88338872.03931663.363545070.981956413.115149710.909821444.201281961.7395392
N0.00085020.004984450.004054460.00071760.003758490.000972990.041505350.0160298416.63328542.854866994.750206761.395765064.412366811.294688249.193405314.4530511

Appendix B. Comparison between Different CNN Architectures

Table A3. Execution time comparison between different architectures. The average (Avg) and standard deviation (Std) of the execution times (in seconds) are shown for each of the four processes presented in Section 2.3 (Figure 1), both at patch level and at slide (WSI) level.
Table A3. Execution time comparison between different architectures. The average (Avg) and standard deviation (Std) of the execution times (in seconds) are shown for each of the four processes presented in Section 2.3 (Figure 1), both at patch level and at slide (WSI) level.
PatchWSI
ReadScoreStain NormalizationPredictionReadScoreStain NormalizationPrediction
ArchitectureAvgStdAvgStdAvgStdAvgStdAvgStdAvgStdAvgStdAvgStd
PROMETEO0.0006120.0012910.0028680.000390.0026550.0006850.0030540.00484511.883392.0393173.3635450.9819563.115150.9098214.2012821.739539
ResNet340.0006120.0012790.0028440.0004050.0026760.0006860.0089820.01008611.920952.0455513.3333160.9737933.1486340.91875110.712053.134596
InceptionV30.0006210.0013170.0029150.000410.0026910.0011360.0397720.01382812.101352.0764463.4151520.9971783.1685440.92531646.7213813.64997
VGG160.0006350.0013410.0029310.0004270.0027850.0006910.0286640.00924112.373712.1404483.454751.0075573.2807680.95753134.9219710.16074
VGG190.0006280.0013130.0029310.0004250.002780.0006820.0299310.00930512.348462.1167933.446311.0058343.2667290.95344936.2500610.5361
MobileNet0.0006120.0012780.002850.000420.0026880.0011110.0256890.01098611.960252.0441153.3834970.9867453.2084410.93636231.110179.030854
DenseNet1210.0006110.0012840.0028790.0003890.0026870.0006830.0424890.0168611.823922.0354133.3731270.9851493.1486810.91955751.4829114.94588
Xception0.00060.0012610.002880.0003740.0027580.0006550.034050.01178911.683131.9874593.3755360.9868633.2352890.94518741.7648612.17527
ResNet1010.0006070.0012650.0028390.0003980.0027010.000670.0432870.01467911.846372.0354723.3275980.9713573.1711040.92578552.5171315.26661

References

  1. Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Rawla, P. Epidemiology of prostate cancer. World J. Oncol. 2019, 10, 63. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Borley, N.; Feneley, M.R. Prostate cancer: Diagnosis and staging. Asian J. Androl. 2009, 11, 74. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef] [PubMed]
  5. Ahuja, A.S. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ 2019, 7, e7702. [Google Scholar] [CrossRef] [PubMed]
  6. Shen, D.; Wu, G.; Suk, H.I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Shi, J.; Zheng, X.; Li, Y.; Zhang, Q.; Ying, S. Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Alzheimer’s disease. IEEE J. Biomed. Health Inform. 2017, 22, 173–183. [Google Scholar] [CrossRef] [PubMed]
  8. Dou, Q.; Chen, H.; Yu, L.; Zhao, L.; Qin, J.; Wang, D.; Mok, V.C.; Shi, L.; Heng, P.A. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE Trans. Med. Imaging 2016, 35, 1182–1195. [Google Scholar] [CrossRef] [PubMed]
  9. Duran-Lopez, L.; Dominguez-Morales, J.P.; Corral-Jaime, J.; Vicente-Diaz, S.; Linares-Barranco, A. COVID-XNet: A custom deep learning system to diagnose and locate COVID-19 in chest X-ray images. Appl. Sci. 2020, 10, 5683. [Google Scholar] [CrossRef]
  10. Bahrami, K.; Shi, F.; Rekik, I.; Shen, D. Convolutional neural network for reconstruction of 7T-like images from 3T MRI using appearance and anatomical features. In Deep Learning and Data Labeling for Medical Applications; Springer: Berlin/Heidelberg, Germany, 2016; pp. 39–47. [Google Scholar]
  11. Ström, P.; Kartasalo, K.; Olsson, H.; Solorzano, L.; Delahunt, B.; Berney, D.M.; Bostwick, D.G.; Evans, A.J.; Grignon, D.J.; Humphrey, P.A.; et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: A population-based, diagnostic study. Lancet Oncol. 2020, 21, 222–232. [Google Scholar] [CrossRef]
  12. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  13. Campanella, G.; Hanna, M.G.; Geneslaw, L.; Miraflor, A.; Silva, V.W.K.; Busam, K.J.; Brogi, E.; Reuter, V.E.; Klimstra, D.S.; Fuchs, T.J. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 2019, 25, 1301–1309. [Google Scholar] [CrossRef] [PubMed]
  14. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  15. Duran-Lopez, L.; Dominguez-Morales, J.P.; Conde-Martin, A.F.; Vicente-Diaz, S.; Linares-Barranco, A. PROMETEO: A CNN-Based Computer-Aided Diagnosis System for WSI Prostate Cancer Detection. IEEE Access 2020, 8, 128613–128628. [Google Scholar] [CrossRef]
  16. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  17. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  18. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  19. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  20. Reinhard, E.; Adhikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
  21. Magee, D.; Treanor, D.; Crellin, D.; Shires, M.; Smith, K.; Mohee, K.; Quirke, P. Colour normalisation in digital histopathology images. In Proceedings of the Optical Tissue Image analysis in Microscopy, Histopathology and Endoscopy (MICCAI Workshop), London, UK, 24 September 2009; pp. 100–111. [Google Scholar]
  22. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. arXiv 2019, arXiv:1911.02685. [Google Scholar] [CrossRef]
  23. Zhang, Q.; Bai, C.; Chen, Z.; Li, P.; Yu, H.; Wang, S.; Gao, H. Deep learning models for diagnosing spleen and stomach diseases in smart Chinese medicine with cloud computing. Concurr. Comput. Pract. Exp. 2019, e5252. [Google Scholar] [CrossRef]
Figure 1. Block diagram detailing each of the steps considered for processing a whole-slide image (WSI) in the proposed benchmark.
Figure 1. Block diagram detailing each of the steps considered for processing a whole-slide image (WSI) in the proposed benchmark.
Sensors 21 01122 g001
Figure 2. PROMETEO average patch processing time (in seconds) per step for each of the hardware configurations detailed in Table A1.
Figure 2. PROMETEO average patch processing time (in seconds) per step for each of the hardware configurations detailed in Table A1.
Sensors 21 01122 g002
Figure 3. PROMETEO average WSI processing time (in seconds) and standard deviation per step for each of the hardware configurations detailed in Table A1.
Figure 3. PROMETEO average WSI processing time (in seconds) and standard deviation per step for each of the hardware configurations detailed in Table A1.
Sensors 21 01122 g003
Figure 4. PROMETEO average WSI processing time (in seconds) and standard deviation of the hardware configurations detailed in Table A1.
Figure 4. PROMETEO average WSI processing time (in seconds) and standard deviation of the hardware configurations detailed in Table A1.
Sensors 21 01122 g004
Figure 5. Average patch processing time (in seconds) per step for each of the CNN architectures using computer M (see Table A1).
Figure 5. Average patch processing time (in seconds) per step for each of the CNN architectures using computer M (see Table A1).
Sensors 21 01122 g005
Figure 6. Average WSI processing time (in seconds) and standard deviation for each of the CNN architectures using computer M (see Table A1).
Figure 6. Average WSI processing time (in seconds) and standard deviation for each of the CNN architectures using computer M (see Table A1).
Sensors 21 01122 g006
Figure 7. Impacts of the CPU and the GPU in the different WSI processing steps. (a) Same PC, different CPU frequency. Left: 1.2 GHz; right: 2.6 GHz. (b) Same PC. Left: without using GPU; right: using GPU.
Figure 7. Impacts of the CPU and the GPU in the different WSI processing steps. (a) Same PC, different CPU frequency. Left: 1.2 GHz; right: 2.6 GHz. (b) Same PC. Left: without using GPU; right: using GPU.
Sensors 21 01122 g007
Table 1. Dataset summary.
Table 1. Dataset summary.
HospitalNo. of WSIs
NormalMalignantTotal
Virgen de Valme Hospital277097
Clínic Hospital100129229
Puerta del Mar Hospital7965144
Table 2. Average patch and WSI prediction time, slowdown and number of trainable parameters for each of the CNN architectures considered in this work.
Table 2. Average patch and WSI prediction time, slowdown and number of trainable parameters for each of the CNN architectures considered in this work.
ModelAvg. Prediction Time (patch)Avg. Prediction Time (WSI)Slowdown *Trainable Parameters
PROMETEO3.054 ± 4.845 ms4.201 ± 1.739 s1,107,010
ResNet348.982 ± 10.086 ms10.712 ± 3.134 s2.55×21,800,107
InceptionV341.301 ± 44.282 ms49.076 ± 14.353 s11.68×23,851,784
VGG1628.664 ± 9.241 ms34.921 ± 10.160 s8.31×138,357,544
VGG1929.931 ± 9.305 ms36.250 ± 10.536 s8.63×143,667,240
MobileNet25.689 ± 10.986 ms31.110 ± 9.030 s7.41×4,253,864
DenseNet12142.489 ± 16.859 ms51.483 ± 14.945 s12.25×8,062,504
Xception34.050 ± 11.789 ms41.764 ± 12.175 s9.94×22,910,480
ResNet10143.287 ± 14.679 ms52.517 ± 15.266 s12.50×44,707,176
* Calculated by using the average prediction time per WSI and taking the PROMETEO architecture as reference. A slowdown of A× means that model B is A times slower than PROMETEO.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Duran-Lopez, L.; Dominguez-Morales, J.P.; Rios-Navarro, A.; Gutierrez-Galan, D.; Jimenez-Fernandez, A.; Vicente-Diaz, S.; Linares-Barranco, A. Performance Evaluation of Deep Learning-Based Prostate Cancer Screening Methods in Histopathological Images: Measuring the Impact of the Model’s Complexity on Its Processing Speed. Sensors 2021, 21, 1122. https://doi.org/10.3390/s21041122

AMA Style

Duran-Lopez L, Dominguez-Morales JP, Rios-Navarro A, Gutierrez-Galan D, Jimenez-Fernandez A, Vicente-Diaz S, Linares-Barranco A. Performance Evaluation of Deep Learning-Based Prostate Cancer Screening Methods in Histopathological Images: Measuring the Impact of the Model’s Complexity on Its Processing Speed. Sensors. 2021; 21(4):1122. https://doi.org/10.3390/s21041122

Chicago/Turabian Style

Duran-Lopez, Lourdes, Juan P. Dominguez-Morales, Antonio Rios-Navarro, Daniel Gutierrez-Galan, Angel Jimenez-Fernandez, Saturnino Vicente-Diaz, and Alejandro Linares-Barranco. 2021. "Performance Evaluation of Deep Learning-Based Prostate Cancer Screening Methods in Histopathological Images: Measuring the Impact of the Model’s Complexity on Its Processing Speed" Sensors 21, no. 4: 1122. https://doi.org/10.3390/s21041122

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop