Convolutional Neural Network in the Evaluation of Myocardial Ischemia from CZT SPECT Myocardial Perfusion Imaging: Comparison to Automated Quantification

This study analyzes CZT SPECT myocardial perfusion images that are collected at Chang Gung Memorial Hospital, Kaohsiung Medical Center in Kaohsiung. This study focuses on the classification of myocardial perfusion images for coronary heart diseases by convolutional neural network techniques. In these gray scale images, heart blood flow distribution contains the most important features. Therefore, data-driven preprocessing is developed to extract the area of interest. After removing the surrounding noise, the three-dimensional convolutional neural network model is utilized to classify whether the patient has coronary heart diseases or not. The prediction accuracy, sensitivity, and specificity are 87.64%, 81.58%, and 92.16%. The prototype system will greatly reduce the time required for physician image interpretation and write reports. It can assist clinical experts in diagnosing coronary heart diseases accurately in practice.


Introduction
Heart disease is the leading cause of death globally, and coronary artery disease (CAD) is the most common type of heart disease. Over 20,000 people die from cardiovascular diseases annually. Mortality rate due to CAD is higher than other kinds of cardiovascular diseases. CAD occurs when the blood vessels supplying the heart muscle become hardened and narrowed. Reduction in the prevalence, morbidity, and mortality related to CAD is an important issue of public health given the significant burden of diseases and contribution to total costs of health care. Early detection of CAD may improve life expectancy and quality of life by preventing myocardial infarction (MI) and heart failure.
Noninvasive testing methods are used as first-line diagnostic and prognostic tools for patients presenting with chest pain or other symptoms of CAD to improve risk stratification of patients for CAD and to guide subsequent tests and interventions. Noninvasive diagnostics include functional tests such as exercise electrocardiography (ECG) [1], stress echocardiography [2], radionuclide myocardial perfusion imaging (MPI) using single photon emission computed tomography (SPECT), or positron emission tomography (PET) [3,4]. There are also noninvasive anatomic tests, including coronary CT angiography (CCTA) [5], coronary magnetic resonance angiography (MRA) [6], and coronary artery calcium scoring (CACS) [7].
Appl. Sci. 2021, 11, 514 2 of 11 SPECT MPI is a widely used technique for the risk stratification and diagnosis of CAD. There are software packages available for quantification of relative perfusion data using normal databases to define the summed stress score (SSS) and total perfusion deficit (TPD), such as 4DMSPECT (4DM) [8], Emory Cardiac Toolbox (ECTb), and Cedars-Sinai Quantitative Perfusion SPECT (QPS) [9]. Automated quantification has provided similar overall diagnostic accuracy when compared to visual interpretation by expert observers for the detection of obstructive CAD.
Recently developed MPI SPECT using a Cadmium Zinc Telluride (CZT) camera has proven highly efficient by reducing isotope dose and imaging time [10]. It maintains equal image quality and diagnostic accuracy as compared to MPI SPECT using a conventional gamma camera. Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures. Due to the rapid application of deep learning in medicine [11]. Recently, nuclear cardiology has also started to adopt this technique. The deep learning approach includes an appropriate combination of features of abnormalities based on a large number of annotated images, and it differs from statistical approaches in software packages using regional count distribution based on means and deviations. By the literature review, there is only one published study addressing the application of CNN on the prediction of obstructive CAD, based on the classification of Technetium (Tc)-99m MPI images obtained from CZT cameras [12]. The purpose of this study is to propose a CNN-based model based on Thallium (Tl)-201 MPI SPECT images to predict myocardial ischemia and to compare the diagnostic accuracy of this model against automated software package grading. This research has obtained the IRB approval from Chang Gung Medical Foundation, IRB No. 201801122B0C501.

Image Preprocessing
Fifty cross-sectional heart images are for each subject. The pixel for each image is 70 × 70. Myocardium that belongs to coronary artery has the circumference shape opening in lower left. In clinical diagnosis, doctors determine whether the subject has a myocardial defect based on the degree of saturation for the myocardium circumference area in the image. Subject tends to be healthy if the brightness in circumference shape is saturated that is shown in Figure 1.
SPECT MPI is a widely used technique for the risk stratification and diagnosis of CAD. There are software packages available for quantification of relative perfusion data using normal databases to define the summed stress score (SSS) and total perfusion deficit (TPD), such as 4DMSPECT (4DM) [8], Emory Cardiac Toolbox (ECTb), and Cedars-Sinai Quantitative Perfusion SPECT (QPS) [9]. Automated quantification has provided similar overall diagnostic accuracy when compared to visual interpretation by expert observers for the detection of obstructive CAD.
Recently developed MPI SPECT using a Cadmium Zinc Telluride (CZT) camera has proven highly efficient by reducing isotope dose and imaging time [10]. It maintains equal image quality and diagnostic accuracy as compared to MPI SPECT using a conventional gamma camera. Convolutional Neural Network (CNN) is a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures. Due to the rapid application of deep learning in medicine [11]. Recently, nuclear cardiology has also started to adopt this technique. The deep learning approach includes an appropriate combination of features of abnormalities based on a large number of annotated images, and it differs from statistical approaches in software packages using regional count distribution based on means and deviations. By the literature review, there is only one published study addressing the application of CNN on the prediction of obstructive CAD, based on the classification of Technetium (Tc)-99m MPI images obtained from CZT cameras [12]. The purpose of this study is to propose a CNN-based model based on Thallium (Tl)-201 MPI SPECT images to predict myocardial ischemia and to compare the diagnostic accuracy of this model against automated software package grading. This research has obtained the IRB approval from Chang Gung Medical Foundation, IRB No. 201801122B0C501.

Image Preprocessing
Fifty cross-sectional heart images are for each subject. The pixel for each image is 70 × 70. Myocardium that belongs to coronary artery has the circumference shape opening in lower left. In clinical diagnosis, doctors determine whether the subject has a myocardial defect based on the degree of saturation for the myocardium circumference area in the image. Subject tends to be healthy if the brightness in circumference shape is saturated that is shown in Figure 1. On the other hand, if there is an apparent dark part, it means that the subject suffers from coronary heart disease that is shown in Figure 2. This set of data re- On the other hand, if there is an apparent dark part, it means that the subject suffers from coronary heart disease that is shown in Figure 2. This set of data requires a series of image preprocessing to take out the myocardium circumference area and remove the surrounding noise. First, remove surrounding non-myocardial parts by initial cropping. Second, select the images near the center for cross-sectional images by the image selection method. Finally, align the myocardium in the center by image cropping again. The image process is shown in Figure 3. Organs around the coronary arteries contain non-myocardial parts in these cross-sectional images [13]. The muscles reflect an image when it has absorbed the tracer. All process methods are related to brightness for image selection and cropping. Therefore, the brightness of these organs leads to interference. We retain specific columns and rows for each image as initial image cropping. Each image has 41 × 36 pixels after cropping, as shown in Figure 4. It removes most non-myocardial muscles, which are unimportant parts. quires a series of image preprocessing to take out the myocardium circumference area and remove the surrounding noise. First, remove surrounding non-myocardial parts by initial cropping. Second, select the images near the center for cross-sectional images by the image selection method. Finally, align the myocardium in the center by image cropping again. The image process is shown in Figure 3. Organs around the coronary arteries contain non-myocardial parts in these cross-sectional images [13]. The muscles reflect an image when it has absorbed the tracer. All process methods are related to brightness for image selection and cropping. Therefore, the brightness of these organs leads to interference. We retain specific columns and rows for each image as initial image cropping. Each image has 41 × 36 pixels after cropping, as shown in Figure 4. It removes most non-myocardial muscles, which are unimportant parts.   The myocardium covers more area when closed to the sequence center of the image. Disease symptoms usually are more obvious in the sequencing center of the images. It means the image features between healthy and unhealthy patients are more different. Since the heart position is not aligned, the heart sequence center does not quires a series of image preprocessing to take out the myocardium circumference area and remove the surrounding noise. First, remove surrounding non-myocardial parts by initial cropping. Second, select the images near the center for cross-sectional images by the image selection method. Finally, align the myocardium in the center by image cropping again. The image process is shown in Figure 3. Organs around the coronary arteries contain non-myocardial parts in these cross-sectional images [13]. The muscles reflect an image when it has absorbed the tracer. All process methods are related to brightness for image selection and cropping. Therefore, the brightness of these organs leads to interference. We retain specific columns and rows for each image as initial image cropping. Each image has 41 × 36 pixels after cropping, as shown in Figure 4. It removes most non-myocardial muscles, which are unimportant parts.   The myocardium covers more area when closed to the sequence center of the image. Disease symptoms usually are more obvious in the sequencing center of the images. It means the image features between healthy and unhealthy patients are more different. Since the heart position is not aligned, the heart sequence center does not quires a series of image preprocessing to take out the myocardium circumference area and remove the surrounding noise. First, remove surrounding non-myocardial parts by initial cropping. Second, select the images near the center for cross-sectional images by the image selection method. Finally, align the myocardium in the center by image cropping again. The image process is shown in Figure 3. Organs around the coronary arteries contain non-myocardial parts in these cross-sectional images [13]. The muscles reflect an image when it has absorbed the tracer. All process methods are related to brightness for image selection and cropping. Therefore, the brightness of these organs leads to interference. We retain specific columns and rows for each image as initial image cropping. Each image has 41 × 36 pixels after cropping, as shown in Figure 4. It removes most non-myocardial muscles, which are unimportant parts.   The myocardium covers more area when closed to the sequence center of the image. Disease symptoms usually are more obvious in the sequencing center of the images. It means the image features between healthy and unhealthy patients are more different. Since the heart position is not aligned, the heart sequence center does not The myocardium covers more area when closed to the sequence center of the image. Disease symptoms usually are more obvious in the sequencing center of the images. It means the image features between healthy and unhealthy patients are more different. Since the heart position is not aligned, the heart sequence center does not necessarily appear in the middle of the 50 images in the respective sequence. The following two methods are used to find out the sequence center image for each subject. The first method is about brightness sum. The first step is to add up all the pixel values of the two images before and after for each image, corresponding to Equation (1). The (Image) ijk means that the pixel value of i-th row and j-th column in k-th image for a subject. This step can avoid the situation where the selected image is very bright and the two images before and after are very dark. This situation is not consistent with the characteristics of the heart sequence Appl. Sci. 2021, 11, 514 4 of 11 center image. In the second step, find the image with the maximum sum of brightness values. This image is regarded as the image sequence center m 1 obtained using the first method, which corresponds to Equation (2).
The second method is about brightness threshold. When the original image cropping procedure cannot completely remove the non-heart organs in the image, these organs appear particularly bright. This brightness relative to the image accounted for a large proportion. Compared with the two image selection methods, the first method more easily selects the wrong image sequence center. These non-heart organs are usually located at the edges of the image. Even if it is very bright, the number of grids occupied is very small. The choice of the second method is related to the number of grids, so it can select the correct heart sequence center image. In contrast to the previous situation, the myocardial area that the tracer is well absorbed for the subject is small. "Brightness sum" method can choose the right heart sequence center image.
In the first step, every pixel is divided by a number that is the average of these images that belong to a subject. In the next step, the pixel that passes the threshold value was taken out to calculate the number of rows and columns that contain these pixels. Suppose there are two matrixes-C and R. C jk is the j-th column stored values and R ik is the i-th row stored value in k-th image are used in Equations (3) and (4). The C jk or R ik will be set as one when there is at least one pixel that pass threshold value in this row or column. The threshold value is set as 3 by tuning in this paper. In the final step, we will multiply the number of rows and columns that pass the threshold to obtains the multiple values that are named "threshold area" for each image. The maximum of these threshold areas will be treated as heart sequence center image m 2 by this method in Equation (5). The two methods take out their respective heart images, which are numbered m 1 and m 2 . The average value of m 1 and m 2 is taken as the final central image of the sequence. From near the central image of the sequence, select 15 images that will be placed in the final model. In the experimental chapter, we will verify that 15 images are the best choice.
Center Image The myocardium is not in the middle in most of the images, even if images are processed by initial image cropping and image selection. Due to the subjects' heart positions are slightly different. So, these images need to be cropped again to solve this problem. Cropping process needs to be performed separately for rows and columns. Currently every image has 41 × 36 pixels. The first step, suppose there are two vectors A and B in Equations (6) and (7). A p is equal to the sum of bright forward some row pixels as this p-th row's representative brightness sum value. B q is equal to the sum of bright forward some column pixels as this q-th column's representative brightness sum value. Add 19 rows or 16 columns by tuning in this paper. The purpose is to find out the initial of cropping. These rows contain the entire myocardium coverage by observing. In Equations (8) and (9), choose the row that has maximum value in A as the first cropping row. Choose the column that has maximum value in B as the first cropping column. The first cropping row and column will be at the top left border of the cropping. The second step, select forward several rows and columns from the first cropping row and column by experiment. In Figure 5, the image which is complete cropping that the myocardium will show in the middle. For every image, we will do pixel normalization, which reduces the stuck during training in neural network model.
First cropping row y = argmax First cropping column x = argmax x (B) (9) The purpose is to find out the initial of cropping. These rows contain the entir ocardium coverage by observing. In Equations (8) and (9), choose the row th maximum value in A as the first cropping row. Choose the column that has mum value in B as the first cropping column. The first cropping row and co will be at the top left border of the cropping. The second step, select forward s rows and columns from the first cropping row and column by experiment. In Fig  the image which is complete cropping that the myocardium will show in the m For every image, we will do pixel normalization, which reduces the stuck during ing in neural network model.

Model Prediction
There are 979 subjects in this study. Among them, 601 subjects are health 378 subjects are unhealthy. We will separate into two parts. First, from 601 h and 378 unhealthy subjects, we will randomly choose 550 and 340 subjects to ing and validation set. Secondly from 601 healthy and 378 unhealthy subjec will also randomly choose 51 and 38 subjects to testing set. For the experiment use five-fold cross-validation to do hyper parameter adjustment [14]. By aver five accuracy, choose the best parameters that include image preprocessing an per parameters that are in 3D convolutional neural network [15]. By selectin best result, using the model trained through the training set and the validation predict the test set. Current deep learning technology that is applied in image c

Model Prediction
There are 979 subjects in this study. Among them, 601 subjects are healthy and 378 subjects are unhealthy. We will separate into two parts. First, from 601 healthy and 378 unhealthy subjects, we will randomly choose 550 and 340 subjects to training and validation set. Secondly from 601 healthy and 378 unhealthy subjects, we will also randomly choose 51 and 38 subjects to testing set. For the experiment stage, use fivefold cross-validation to do hyper parameter adjustment [14]. By average of five accuracy, choose the best parameters that include image preprocessing and hyper parameters that are in 3D convolutional neural network [15]. By selecting the best result, using the model trained through the training set and the validation set to predict the test set. Current deep learning technology that is applied in image classification usually uses convolutional neural network model. These models can fold the similarities in the image. They can use few numbers of parameters for training model, reduce training time, and solve memory shortages. Thus, this study uses convolutional neural network models.

Results
The experiments adjust several optimal selection and parameters. Adjust a factor with regulated the other factors for each experiment. Set default values for the factors that have been regulated. The factors that will be adjusted are the selected range of image selection, the range of image cropping 2, filters size, pooling size, activation function [16], optimizer selection, and dropout rate [17]. Use five-fold cross-validation method to implement the following experiments. The optimal results of the previous experiment will be used as the default values for the next experiment.

Optimal Hyper Parameters for Image Preprocessing and Model
The purpose of this experiment is to find the optimal selected range in image selection. Change the selected number of sheets after finding the center image. The optimal selected range of image selection that is shown in Table 1 is fifteen. This result means that reducing selected images will reduce information on the diseases that leads to predict lower accuracy. Adding selection images will increase noise that will not help the prediction. The purpose of this experiment is to find the necessity of two image selection methods. The optimal method of image selection that is shown in Table 2 is comprehensive judgment. The purpose of this experiment is to find the optimal selection size in image cropping 2. The optimal selection size that is shown in Table 3 is 20 × 16.

Optimal Model Dimension
After finding the optimal parameters of image preprocessing, the purpose of this experiment is to find out whether the 2D model or 3D model is better. When the 2D model was used, the third dimension of filters and pooling size were moved. The performance of the 3D model is better than the 2D model, as shown in Table 4. This result means that the decision of the diseases from images is not observed individually. The diagnosis of diseases needed to be determined by a series of images.

Optimal Model Hyper Parameters
The purpose in this experiment is to find out the optimal of filters size and pooling size in the model. The optimal pooling size that is shown in Table 5 is 2 × 2 × 1.

Pooling
Size The purpose in this experiment is to find out some optimal activation function, optimizer, and dropout rate in the 3D convolutional neural network model. The results are shown in Tables 6 and 7. The optimal activation function is "Relu" and the optimal optimizer is "SGD".

Testing Set Prediction
Build the final model after finding the most optimal hyper-parameters, the weights of the prediction model are decided by running through the training and validation sets. The 3D convolutional neural network model structure is shown in Figure 6. The iteration process curve is shown in Figure 7. Since we use the five-fold cross-validation method to check model robustness, five execution results can be obtained in five iterations. We have plotted the loss curves and accuracy curves for validation sets in five iterations in Figure 7, {val_loss1, val_loss2, val_loss3, val_loss4, val_loss5} and {val_acc1, val_acc2, val_acc3, val_acc4, val_acc5}. The mean curves of five iterations in validation sets and training sets are also plotted in Figure 7, {val_loss_mean, train_loss_mean} and {val_acc_mean, train_acc_mean}. The loss of training and validation declines steadily. The accuracy of training and validation rises steadily. It stops iterating before overfitting. Finally, we evaluate the performance of this determined prediction model for the specified test set. The test set is the same for different runs. The prediction result for comparing image preprocessing steps of testing set is that shown in Table 8. The confusion matrix with combining three steps of image preprocessing is shown in Table 9. There are 4 normal cases that are miss-classified as abnormal. Also, there are 7 abnormal cases that are miss-classified as normal.     The testing set is processed by image cropping 1, image selection, and image cropping 2. The prediction accuracy gradually rises. The result of combining three processing is better than the others that prove the importance of image preprocessing. The mean of accuracy, sensitivity and specificity is 0.8764, 0.8158, and 0.9216. Lower sensitivity means that some unhealthy subjects cannot be detected that need to be solved in future works.
The process of judging myocardial defects through models will be shown by grad-cam heat maps. From the convolutional neural network model architecture, return the filter value in the convolutional layer through the backpropagation operation. A heat map matrix is obtained after adding the filter by weight. Overlay heat map to the original image for visualization. The 15 grad cam maps for each subject are presented separately. For normal images, the model focuses on continuous areas that are shown in   Table 9. Confusion Matrix with Combine Three Preprocessing.

Actual Normal Actual Abnormal
Predict normal 47 7 Predict abnormal 4 31 The testing set is processed by image cropping 1, image selection, and image cropping 2. The prediction accuracy gradually rises. The result of combining three processing is better than the others that prove the importance of image preprocessing. The mean of accuracy, sensitivity and specificity is 0.8764, 0.8158, and 0.9216. Lower sensitivity means that some unhealthy subjects cannot be detected that need to be solved in future works.
The process of judging myocardial defects through models will be shown by grad-cam heat maps. From the convolutional neural network model architecture, return the filter value in the convolutional layer through the backpropagation operation. A heat map matrix is obtained after adding the filter by weight. Overlay heat map to the original image for visualization. The 15 grad cam maps for each subject are presented separately. For normal images, the model focuses on continuous areas that are shown in Figure 8. For abnormal images, the model focuses on both sides of the myocardial defect that are shown in Figure 9.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 11 Figure 8. For abnormal images, the model focuses on both sides of the myocardial defect that are shown in Figure 9.

Discussion and Conclusions
According to the guideline of American Heart Association (AHA), myocardial perfusion imaging (MPI) is an important reference for determining the necessity of performing invasive cardiac catheterization. With the breakthrough of technology and hardware in recent years, artificial intelligence technology has been gradually applied to assist in the interpretation of various medical images. However, there still lacks enough attention on the application of machine learning in the field of nuclear medicine imaging. This research project aims to apply machine learning to construct a nuclear medicine cardiac perfusion imaging AI diagnosis aid system that helps to enhance the efficiency of clinical diagnosis of physicians [18]. The model of automatic MPI diagnosis aids system in this project plan to use the huge amount of MPI data gathered from 2007 to 2016 as an input for the training model. Both supervised learning and non-supervised Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 11 Figure 8. For abnormal images, the model focuses on both sides of the myocardial defect that are shown in Figure 9.

Discussion and Conclusions
According to the guideline of American Heart Association (AHA), myocardial perfusion imaging (MPI) is an important reference for determining the necessity of performing invasive cardiac catheterization. With the breakthrough of technology and hardware in recent years, artificial intelligence technology has been gradually applied to assist in the interpretation of various medical images. However, there still lacks enough attention on the application of machine learning in the field of nuclear medicine imaging. This research project aims to apply machine learning to construct a nuclear medicine cardiac perfusion imaging AI diagnosis aid system that helps to enhance the efficiency of clinical diagnosis of physicians [18]. The model of automatic MPI diagnosis aids system in this project plan to use the huge amount of MPI data gathered from 2007 to 2016 as an input for the training model. Both supervised learning and non-supervised

Discussion and Conclusions
According to the guideline of American Heart Association (AHA), myocardial perfusion imaging (MPI) is an important reference for determining the necessity of performing invasive cardiac catheterization. With the breakthrough of technology and hardware in recent years, artificial intelligence technology has been gradually applied to assist in the interpretation of various medical images. However, there still lacks enough attention on the application of machine learning in the field of nuclear medicine imaging. This research project aims to apply machine learning to construct a nuclear medicine cardiac perfusion imaging AI diagnosis aid system that helps to enhance the efficiency of clinical diagnosis of physicians [18]. The model of automatic MPI diagnosis aids system in this project plan to use the huge amount of MPI data gathered from 2007 to 2016 as an input for the training model. Both supervised learning and non-supervised learning techniques in machine learning would be applied in model building. We use MPI data from 2017 as criteria for evaluating the efficacy of the model. When a new MPI image being input, the model will classify the image into one of three categories-normal, abnormal, or unpredictable. If the MPI data is being classified as unpredictable, the system would warn physicians and physicians would do the diagnosis. Otherwise, the physicians only need to do a double check for those MPI data that have been classified into normal and abnormal categories.
In this study, we collected the dataset accumulated from Kaohsiung Chang Gung Memorial Hospital. All images are original images without artificial modification so that there is no distortion due to human factors. The performance of the proposed classification method has achieved prediction accuracy, sensitivity, and specificity at the level of 87.64%, 81.58%, and 92.16%. These results can be implemented in computer-aided systems for heart diseases in future studies. The system will provide useful reference information for physicians in clinical practice, and physicians will continue to feedback the interpretation information to modify the prediction model to improve the accuracy of the examination report and expand to other image prediction applications in the future. It is expected to be used in other hospitals of the Chang Gung Medical Foundation in the future, and there are plans to cooperate with other hospitals to expand the model sample and strengthen the predictive ability. It is expected to become a leader in nuclear medicine research.