A Shape Approximation for Medical Imaging Data

This study proposes a shape approximation approach to portray the regions of interest (ROI) from medical imaging data. An effective algorithm to achieve an optimal approximation is proposed based on the framework of Particle Swarm Optimization. The convergence of the proposed algorithm is derived under mild assumptions on the selected family of shape equations. The issue of detecting Parkinson’s disease (PD) based on the Tc-99m TRODAT-1 brain SPECT/CT images of 634 subjects, with 305 female and an average age of 68.3 years old from Kaohsiung Chang Gung Memorial Hospital, Taiwan, is employed to demonstrate the proposed procedure by fitting optimal ellipse and cashew-shaped equations in the 2D and 3D spaces, respectively. According to the visual interpretation of 3 experienced board-certified nuclear medicine physicians, 256 subjects are determined to be abnormal, 77 subjects are potentially abnormal, 174 are normal, and 127 are nearly normal. The coefficients of the ellipse and cashew-shaped equations, together with some well-known features of PD existing in the literature, are employed to learn PD classifiers under various machine learning approaches. A repeated hold-out with 100 rounds of 5-fold cross-validation and stratified sampling scheme is adopted to investigate the classification performances of different machine learning methods and different sets of features. The empirical results reveal that our method obtains 0.88 ± 0.04 classification accuracy, 0.87 ± 0.06 sensitivity, and 0.88 ± 0.08 specificity for test data when including the coefficients of the ellipse and cashew-shaped equations. Our findings indicate that more constructive and useful features can be extracted from proper mathematical representations of the 2D and 3D shapes for a specific ROI in medical imaging data, which shows their potential for improving the accuracy of automated PD identification.


Introduction
In the medical field, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and single-photon emission computed tomography (SPECT) have been used for preventive medicine, screening for disease, and confirming a diagnosis. These techniques are capable of providing the three-dimensional (3D) structure of a region of interest (ROI) by a series of two-dimensional (2D) images sequentially ordered in time.
In clinical practice, doctors use their domain knowledge to select a subset of the 2D images from the series. The selected images are superimposed into a single 2D image, which is called a 2D-combined image hereafter, for disease diagnosis. For example, Prashanth et al. [1], Taylor et al. [2], Oliveira et al. [3], Nicastro et al. [4], and Iwabuchi et al. [5] constructed classifiers for identifying Parkinson's disease (PD) by extracting features from 2D-combined SPECT images. However, this approach may lose some useful information or features for classification since a 2D-combined image is a projection of the original 3D structure on a specific 2D space. Recently, some methodologies have been proposed to draw information from medical 3D images. For instance, Ziegler et al. [6] used multispectral structural magnetic resonance imaging (MRI) tools to measure substantia nigra volume loss before basal forebrain degeneration in early PD. Palumbo et al. [7] proposed a volumetric 3D region of interest (ROI) of putamen and caudate nucleus by BasGan software. Prashanth et al. [8] proposed to draw 'indirect 3D features' by building a regression surface of an ROI based on the 2D-combined image. Hsu et al. [9] found that the classification performance for identifying PD is improved by calculating the volumes of an ROI in 3D space. Cheng et al. [10] adopted the 3D features such as volume, surface area, and diameter to describe the geometric characteristics of the volumes of interest to assist the diagnosis of idiopathic PD. Xu et al. [11] assessed the longitudinal volume change of different hippocampal subfields in PD patients with and without cognitive decline using 3D T1-weighted MRI.
Most of the aforementioned studies of drawing 3D features focused on computing the volumes of the ROI and applying the quantities derived from the volumes of the ROI to PD identification [6,7,[9][10][11]. Although the geometric features such as volume and the volume-related quantities are valuable proxies representing the 3D shapes of the ROI, they still cannot fully capture the desired shapes since different shapes of the ROI could have the same volumes or volume-related quantities. This fact could restrict the classification performances of these volume-based methods. In addition, the 3D shapes of the ROI did affect nuclear medicine physicians' judgment in practice. In visual interpretation, the normal striatal shape looks cashew-shaped in 3D space. The common pattern of abnormalities in PD is usually a decrease uptake in the putamen (comma-shaped) initially and progressing anteriorly to caudate nucleus over time. The striatal shape becomes oval-shaped gradually with disease progression. Consequently, if there is a way to capture the 3D shapes of the ROI for both normal and abnormal subjects more specifically and directly than just computing the volumes of the ROI, we might have the chance to obtain more useful shape features for classification. To fill this gap and unlike the indirect approach of [8] to obtain '3D features', we propose to directly characterize the 3D shape of an ROI by a suitable mathematical representation. The coefficients of the fitted equations are then adopted to learn classifiers to reflect the influence of the 3D shapes of the ROI. To the best of our knowledge, this is the first work using a family of mathematical equations to portray the 3D shapes of the ROI for normal and abnormal subjects simultaneously.
Throughout this paper, we adopt the issue of identifying PD to illustrate our approach. Parkinsonian syndromes (PS) are a group of movement disorders characterized by tremor, bradykinesia, and rigidity. PD is the most common neurodegenerative movement disorder and accounts for 75% of all PS. PD with early death of dopaminergic neurons in the substantia nigra pars compacta and dopamine deficiency within the basal ganglia leads to classical parkinsonian motor symptoms [12,13]. It is important to differentiate between PS caused by nigrostriatal dopaminergic degeneration and nondegenerative causes of parkinsonism such as essential tremor, vascular parkinsonism, psychogenic parkinsonism, or drug-induced parkinsonism. Dopamine transporter (DAT) imaging can detect presynaptic dopamine neuronal dysfunction and is useful in the differentiation between conditions with and without presynaptic dopaminergic deficit. Currently, SPECT radioligands for DAT, such as I-123 FP-CIT and Tc-99m TRODAT-1, have been available for daily clinical practice [14,15]. In practice, the healthy cases and abnormal cases have different appearances of uptakes of the ROI in 2D-combined and 3D imaging data. For example, a healthy subject has comma-shaped uptakes in his/her 2D-combined SPECT image while a subject with PD has dotted-shaped uptakes. In addition, a healthy subject has cashew-shaped uptakes around his/her striatum in 3D space while a subject with PD has sphere-shaped uptakes [8]. As a result, we propose to depict the appearances of uptakes of the ROI by proper mathematical formulas for the corresponding 2D and 3D shapes, which create a possibility to identify the differences between healthy and abnormal subjects by the associated coefficients contained in the mathematical representations. Specifically, for the issue of identifying PD, we fit ellipses for the uptakes in the 2D-combined image and cashew-shaped 3D equation for the uptakes in the 3D space for each subject under the framework of minimizing an associated loss function. Nevertheless, when conducting these two shape fittings, we have to face multidimensional optimization problems, which usually have no closed-form solution and one can only rely on numerical approximations [16].
In this study, we adopt the Particle Swarm Optimization (PSO) algorithm proposed by [17] to deal with the considered shape fittings. The PSO algorithm is inspired by the behavior of social organisms in groups, such as bird and fish schooling or ant colonies. It has been applied to numerous areas and has been shown to be capable of solving multidimensional optimization problems effectively (see [18][19][20][21] and the references therein). Another challenge encountered in the shape fitting process is to compute the Euclidean distance (also called L2 distance) between an observation and a given shape equation in R 2 or R 3 space. Morera et al. [22] proposed to calculate the L2 distance from a point to a quadratic surface by a partial differential equation (PDE) procedure. However, their PDE procedure is difficult to be extended to higher-order surfaces like the cashew-shaped equations considered in this study. To overcome this challenge, we propose a "flashlight-searching" (FS) algorithm to compute the distance for higher-order cases. The FS algorithm is shown to be capable of producing accurate results for the quadratic examples in [22]. In addition, under some mild assumptions on the selected family of shape equations, the convergence of the proposed FS algorithm is derived, which reveals the effectiveness and easy-for-implementation of the PSO-FS algorithm. Once conducting the proposed shape fitting procedure, the coefficients of the optimal 2D ellipse and cashew-shaped 3D equation are employed to learn classifiers for PD.
Between January 2017 and May 2018, 634 patients who underwent Tc-99m TRODAT-1 brain SPECT/CT in the Department of Nuclear Medicine, Kaohsiung Chang Gung Memorial Hospital, Taiwan, were enrolled in this retrospective study. The Chang Gung Medical Foundation Institutional Review Board approved this retrospective study and waived the requirement for written informed consent. The TRODAT SPECT images were interpreted by three experienced-board-certified nuclear medicine physicians, and were assigned into normal or abnormal accordingly. Six popular classification methods such as logistic regression (LR), linear discriminant analysis (LDA), naive Bayes (NB), random forests (RF), support vector machine (SVM), and extreme gradient boost (XGB) are adopted to learn classifiers for PD using the coefficients contained in the 2D and 3D shape-fitting equations. The 6 classifiers are usually adopted in many comparison studies for medical data [3,9,[23][24][25]. The numerical results reveal that these coefficients did improve the classification accuracies for most of the considered classifiers. Beyond the 6 machine learning classifiers considered in this study, deep learning (DL) is one of the hottest technology applied to classification problem in many fields. Recently, Dolz et al. [26] proposed a 3D and fully convolutional neural network method for subcortical brain structure segmentation in MRI, and yielded satisfactory segmentations. The objective of this study is to show that suitable shape equations of the ROI are useful for classification. Intuitively, the shape equations of the ROI will be more precise if better segmentations are used. Therefore, it will be interesting to construct shape equations based on Dolz et al. [26]'s segmentations. Nevertheless, it is beyond the scope of this work, and we leave it to our future study.
The remainder of this paper is organized as follows. Section 2 describes the proposed criteria for determining suitable shape equations to approximate the regions where we are interested in. Section 3 introduces an easy-to-implement algorithm for obtaining the optimal solutions of the considered criteria. The convergence of the proposed algorithm is also derived. The numerical results of applying the proposed approach to PD imaging data are presented in Section 4. Discussions are presented in Section 5. Technical calculations and proofs are given in the Appendix A.

Optimization Criteria
In this section, we introduce the proposed criteria for shape fitting in 2D and 3D spaces. Shape analysis and surface fitting are useful and promising methods for extracting features, which can be used to develop diagnostic models [8,[27][28][29]. To construct a classifier for PD identification, Prashanth et al. [8] noticed that the shapes of uptakes in the 2D-combined images are quite different between normal and PD subjects. They employed a pair of ellipses to describe the ROI for a 2D-combined PD image, and many features are then calculated from the fitted ellipse equations. For example, the lengths of major and minor axes, the eccentricity, orientation, roundness, area, etc. of an ellipse can be computed directly from the mathematical representation of the fitted ellipses. Moreover, they proposed to use the following polynomial equation of order 3 to extract 3D features from the 2D-combined images: f (x, y) = p 00 + p 10 x + p 01 y + p 20 x 2 + p 11 xy + p 02 y 2 + p 30 x 3 + p 21 x 2 y + p 12 xy 2 + p 03 y 3 , where f (x, y) is the grayscale with respect to the point (x, y) on the 2D plane and the coefficients p ij , i, j ∈ {0, 1, 2, 3} are estimated by minimizing the L2 loss. Figure 1 presents the fitting results of normal and PD subjects by using Prashanth et al. [8]'s approach with (1). Note that for demonstration, we present the color version of the SPECT images throughout this study. Prashanth et al. [8] used several characteristics calculated from the fitted ellipses and the coefficients in (1) as features to learn an SVM classifier. Their numerical results reveal that the classification accuracy for PD can be effectively improved. One interesting point should be noticed that Prashanth et al. [8] did not construct a model for the real shapes of uptakes in the 3D space directly, but established a regression model for the grayscale surface by using the 2D-combined image of each subject solely. This motivates us to investigate that whether we could obtain more useful features extracted from modeling the real shapes of uptakes in the 3D space according to the series of 2D scanned images of each subject. The details are illustrated in the following. Let p i , i = 1, 2, . . . , n denote the boundary points of the ROI. In Figure 2, the boundary points with grayscale value greater than 0.6 of two images are marked in black, where the left one is an image of a normal subject and the right one is an image of a subject with PD. In addition, let Ψ(·; θ) = 0 denote an equation which is suitable to well depict the shape of the ROI, where θ are unknown parameters. For example, Figure 2 shows that ellipse equations are suitable to depict the shapes of the ROI shown in a 2D-combined image and Figure 3 reveals that cashew-shaped equations are suitable to depict the structures of the ROI for a normal case or a PD case in 3D space. Intuitively, normal subjects should have similar 2D and 3D shapes while the subjects with PD should have different shapes from normal subjects. Potentially, the coefficient θ and the properties of the equation Ψ(·; θ) = 0 could provide useful information for separating normal and abnormal cases. Nevertheless, the location of the center and the rotation angles of the surface of the uptakes for each subject (even for normal subjects) would have slight differences, which increase the complexity of fitting a suitable surface equation for the boundary points in practical implementation. In this study, we adopt the image data of PD to demonstrate the performances of the proposed method. As shown in Figures 2 and 3, the ellipse equations and cashew-shaped equations are selected to depict the ROI in 2D and 3D spaces, respectively. The details of the two families of equations for including the considerations mentioned above and the corresponding optimization criteria for 2D and 3D cases are introduced in the following. For 2D surface fitting, an ellipse equation in which x = (x, y), denote the set of points on the ellipse. An optimization criterion for finding a suitable H(θ 2D ), abbreviated by H 2D , for the observed boundary points p i , i = 1, . . . , n, in a 2D-combined image: where d(p i , H 2D ) = min x∈H 2D d(p i , x), and d(p i , x) = p i − x denote the Euclidean distance between p i and x.

For 3D surface fitting, a cashew-shaped equation is defined as
in which x = (x, y, z) and θ 0 3D = (a, b, c, d) ∈ [0, 1] 4 . In order to consider the translation, rotation, and scaling of the equation, we do the following:

1.
Rotate x by the following rotation matrix M with three radians θ x , θ y , θ z ∈ [0, 2π] 3 : Furthermore, by translating and scaling x by t = (t x , t y , t z ) ∈ R 3 and s > 0, respectively, the general form of a cashew-shaped equation is represented by Based on the representation shown in (5), let H(θ 3D ) = {x : Ψ 3D (x; θ 3D ) = 0} denote the set of points on the cashew-shaped. Similar to (3), an optimization criterion for finding a suitable H(θ 3D ), abbreviated by H 3D , for the observed boundary points p i , i = 1, . . . , n, in 3D space: where d( The idea for determining optimal shape equations through the criteria shown in Equations (3) and (6) for 2D and 3D cases, respectively, is based on minimizing the L2 distance of the observed points to an ellipse or a cashew-shaped equation, which is similar to the classical least-squares (LS) regression. However, the classical LS cannot be applied directly for the equations Ψ 2D = 0 and Ψ 3D = 0. For example, the variable y in the equation Ψ 2D = 0 is not a function of x, which makes the shape approximation problem different from classical regression. Therefore, we propose an algorithm to find the optimal solutions of (3) and (6), and the details are introduced in the next section.

The Proposed Algorithm
For the boundary points p i , i = 1, . . . , n, and a determined family of surface equation Ψ(·; θ) = 0, we propose to use an optimal surface satisfying Ψ(·; θ * ) = 0 which has the minimal L2 distance to the boundary points p i , i = 1, . . . , n, to approximate the shapes of the uptakes of the ROI as shown in (3) and (6). Since the optimal solution θ * is a high-dimensional vector and there is usually no closed-form representation for θ * , a PSO framework is employed to obtain θ * and an effective scheme, called "flashlight-searching" (FS), is proposed to calculate the distance from a point to a surface in each PSO iteration. Herein, the proposed algorithm is abbreviated by "PSO-FS". In the following, we briefly introduce the PSO procedure.
Kennedy et al. [17] proposed an optimization algorithm, called PSO, which searches the optimal solution through particles and emulates the interaction of information sharing among particles iteratively. In each iteration, the updated search position of each particle is determined by the best position in its historical trajectory and the best position found in the trajectories of all particles. Specifically, we denote the position of the i-th particle in R d by a d-dimensional vector x i and denote its velocity vector by v i , i = 1, . . . , n. In the (t + 1)-th iteration, each particle changes its position according to the new velocity. That is, the position x t+1 i is defined as: where P t i and G t denote the best position of the i-th particle and the best position of all particles up to the t-th iteration, respectively. The parameters ω, (c 1 , c 2 ) and (r 1 , r 2 ) are inertia weight, two positive constants and two random parameters within [0, 1], respectively. The PSO algorithm is repeated until a predetermined stopping criterion is reached or the change rate of the particles approaches zero.
For example, suppose that the objective is to find the minimum of a surface f : R 2 → R, where the contour plots of f with the paths of 5 particles for the first 4 iterations of PSO are shown in Figure 4. In the four panels of Figure 4, the solid points denote the locations of the 5 particles in each iteration and the gray points combined with dashed lines denote the corresponding path of each particle. For each particle, the red arrow denotes the velocity of the i-th particle v t+1 i in the t-th iteration, and the black, green, and yellow dotted-arrows denote the 3 components , respectively, on the right-hand side of (8) to obtain v t+1 i . In addition, the green circle represents P t i and the particle with red cross represents G t . From Figure 4, one can find that the particles did share their information with others to determine their searching directions in each iteration and all of them tend to approach the point with minimal f gradually.
In this study, we employ the PSO procedure to obtain the optimal solutions of the criteria (3) and (6) in 2D and 3D spaces, respectively. For example, to proceed with the PSO procedure for searching θ * 2D defined in (3), the vector x t i defined in (7) is set to be x t i = θ t 2D,i , where θ t 2D,i denotes the position of the i-th particle at the t-th PSO iteration. In addition, the P t i and G t defined in (8) are computed from Ψ 2D (·) defined in (2).
Nevertheless, to calculate (3) and (6) in each PSO iteration, we have to calculate d(p i , H), which usually does not have a closed-form representation and a numerical method can help to compute the distance. For example, Morera et al. [22] proposed a numerical method under the framework of PDE to compute the L2 distance from a point to a quadric surface. However, their approach heavily relies on the properties of quadric equations when handling the corresponding PDEs and is not trivial to be extended to the cases of higher order surfaces, like the cashew-shaped equations considered in this study. Therefore, we propose an easy-to-implement FS algorithm instead to overcome the difficulty. The details of the FS algorithm are illustrated in the following.
For any given which allows the case of more than one and each J i is a connected set. A schematic diagram of these notations is given in Figure 5 for the special case of D p containing only one x * . The FS algorithm for computing d(p i , H) proceeds as follows. combined with dashed lines denote the corresponding path of each particle, and the red arrows denote the velocity of each particle determined by the black, green, and yellow dotted-arrows, which represent the three directions of the components on the right-hand side of (8). In the t-th iteration, the points with green circle denote P t i for i = 1, . . . , 5, and the point with red cross denotes G t , t = 1, . . . , 4, where P t i and G t are defined in (8).

1.
Set an r > 0 such that for all y ∈ B r (p), we have Ψ(y) > 0 or Ψ(y) < 0, where B r (p) is defined similarly to B (x * ). Let x 0 ∈ H denote the initial point for estimating d(p, H) and x 0 can be obtained by letting where y r = arg min y {Ψ(y) : y ∈ B b r (p)} and B b r (p) denotes the set of boundary points of B r (p).

2.
Let τ be a predetermined tolerance, and set m = 0.
(i) Compute the tangent plane to Ψ at x m and denote it by Γ m . Let {x m,j , j = 1, . . . , k} be a set of points on Γ m and ||x m,j − x m || = δ, where δ is a predetermined positive constant. (ii) Letx m,j ∈ J andx m,j = c j (x m,j − p) + p as defined in (10), where c j ∈ R, j = 1, . . . , k. Let and n x , which denotes the normal vector to Ψ at x. If 1 − G(x m+1 ) < τ, setd(p, H) = d(p, x m+1 ) and stop, whered(p, H) denotes the estimate of d(p, H). Otherwise, set m = m + 1 and repeat Steps (i) and (ii). Figure 6 presents a schematic diagram of the FS algorithm at the m-th iteration. The idea of the FS algorithm comes from the gradient descent approach. In Step 1 of the algorithm, the initial point x 0 is determined by looking around the neighborhood of p and selecting the searching direction by y r , which can be treated as a hint of the direction with high possibilities to reach an x * ∈ D p . Similarly, the algorithm selects the next searching direction around x m at the m-th iteration in the Step 2(i). This searching method mimics the scenario when one looks for a way out by a flashlight in a dark room, where r and δ in Steps 1 and 2(i), respectively, play the roles of different irradiation distances of the flashlight. Therefore, we name the above scheme by FS algorithm.  Figure 5, x m is the point at m-th iteration, Γ m is the tangent plane of Ψ at x m , x m,1 , and x m,2 are points satisfying ||x m,j − x m || = δ, δ is a predetermined positive constant, andx m,j is the points in J satisfyingx m,j = c j (x m,j − p) + p, and c j ∈ R, j = 1, . . . , k.
By using the FS algorithm in each iteration of the PSO procedure, the optimal solutions of the criteria (3) and (6) can be obtained with the boundary points p i , i = 1, . . . , n. In the rest of this section, we derive the convergence of the FS algorithm. To begin, assume the following conditions. C1. Ψ is continuously differentiable, denoted by Ψ ∈ C 1 . C2. For any fixed p ∈ R 3 , x ∈ J, and d(p, x) > d(p, H), there exists a 2-dimensional neighborhood ζ (x) being a 2-dimensional open ball of radius ζ ∈ (0, ] centered at x, such that d(p, x) > d(p, κ x ) and for any x a and x b ∈ κ x ∩ J i , i = 1, 2, . . . , and for The above two conditions are set on the function Ψ, which means that these two conditions can be checked for any selected family of shape equations. Specifically, C1 is a smoothness condition, which is satisfied by the equations with polynomial, exponential, logarithmic, and trigonometric components. C2 is a local convexity condition defined on a 2D subspace in R 3 . An intuitive way to understand C2 is to cut a cashew, for example, with a knife and check whether C2 is satisfied by the shape of the boundary of the cross-section.
Based on C1 and C2, we have the following result. The proof the Theorem 1 is given in the Appendix A. Most importantly, Theorem 1 indicates that the proposed FS algorithm is capable of computing an accurate distance from a fixed point to a given surface when the surface equation satisfies the smoothness and local convexity conditions as shown in C1 and C2, respectively.

Numerical Results
In this section, we introduce the details of extracting features from the ellipses and cashew-shaped surfaces. In addition, 6 popular classification methods are employed to investigate whether these features are useful for identifying PD subjects from the SPECT images. The SPECT images of 634 subjects with 305 female and an average age of 68.3 years old collected from Kaohsiung Chang Gung Memorial Hospital between January 2017 and May 2018 are employed for our investigation. For each subject, 3 SPECT images are selected and combined to a 2D-combined image, and 4 SPECT images are selected for fitting cashew-shaped surfaces according to physicians' expertise and experience. Three experienced board-certified nuclear medicine physicians are invited to give their opinions on the 2D-combined image of each subject. Blinded to patient clinical symptoms and histories, the physicians determine the images independently according to the guideline. The final consensus result of each subject was assigned when at least 2 physicians achieved a common agreement, where 256 subjects were assigned to be abnormal (grade 2 or 3), 77 subjects were potentially abnormal (grade 1), 174 were normal, and 127 were nearly normal, which includes the slightly degenerated cases without neurodegenerative disorder of PD. For simplicity, we combine the first 2 categories as abnormal and the last two categories as normal, denoted by 1 and 0, respectively. Figure 2 shows examples of 2D-combined images for the above 2 labels. Before collecting the shape features, a sensitivity study for the settings of (c 1 , c 2 , ω) in (8) is provided. In the literature, the parameters c 1 , c 2 , and w are set differently for different data [30][31][32]. In these studies, c 1 and c 2 are tuned between 1 and 3, and ω is tuned between 0.1 and 1.2. In the following, we conduct a sensitivity analysis with setting c 1 ∈ {0.8, 1.0, 1.2}, c 2 ∈ {1.6, 2.0, 2.4}, and ω ∈ {0.4, 0.5, 0.6} based on our experience when applying PSO to our data for the subjects shown in Figures 2 and 7. Table 1 reports the results of these 27 different combinations of (c 1 , c 2 , ω). From the means and standard deviations presented in Table 1, we found that the values of n −1 ∑ n i=1 d(p i , Ψ 2D ) and n −1 ∑ n i=1 d(p i , Ψ 3D ), which are the values of the objective functions defined in (3) and (6), respectively, are not sensitive to the 27 different settings. To save the computational costs, we adopt c 1 = 1, c 2 = 2 and ω = 0.5 in the following comparison study.  Once the parameters in Equations (3) and (6) are estimated by the convergent values of the proposed algorithm with the SPECT images for each subject, we treat these parameters of the ellipse and cashew-shaped equations as the features extracted from 2D and 3D ROIs, respectively. To investigate whether the features obtained from the ellipse and cashew-shaped equations are useful, we consider the following 6 different sets of features and compare their classification performances for PD identification.  (3), where 100 particles are randomly set centered on an initial estimator of θ 2D . Figure 2 presents approximated ellipses in red of a normal subject and a subject with PD, which reveal that the PSO-FS algorithm is capable of obtaining satisfactory ellipses to approximate the shapes of uptakes. Set 3 The features extracted from the cashew-shaped equations (22 features): The procedure for fitting a 3D structure is similar to the above 2D shape fitting process. Figure 7 presents the sequences of 2D images in the upper panel and the corresponding cashew-shaped surfaces in 3D space in the lower panel of a normal subject and a subject with PD, where the 2D images in the red rectangles are recommended by the physicians and are used to construct the cashew-shaped surfaces. The fitting results shown in Figure 7 display that the PSO-FS algorithm is capable of obtaining satisfactory cashew-shaped equations to approximate the shapes of uptakes. Set 4 The union of Sets 1 and 2. Set 5 The union of Sets 1, 2, and 3. Set 6 The 34 features defined in Table 1 of [8].

1.
We randomly split the data into training and test sets with the proportions of 80% and 20%, respectively, under a stratified sampling scheme, where the proportions of normal, nearly normal, potentially abnormal, and abnormal subjects in the training and test sets are the same.

2.
In the training set, we conduct a stratified 5-fold CV framework on each of the SVM, NB, RF, XGB, LR, and LDA classifiers by R-packages svm(), naiveBayes(), randomForest(), xgboost(), glm(), and lda(), respectively. The tuning parameters in each of the SVM, RF, and XGB methods are determined by computing the validation accuracies under several settings of tuning parameters and selecting the one with the highest validation accuracy. For each of the SVM, RF, and XGB classifiers, we set 5 candidates for each tuning parameters centered at the default values of the corresponding R packages. Therefore, we have 6 candidate classifiers, where each of them has the best 5-fold CV performances for the training data.

3.
Let the classifier with the best 5-fold CV accuracy from the 6 candidates be our final selection. Learn this classifier again with all the training data and use it to compute the classification performances of the test data. 4.
Repeat the above 3 steps 100 times. Compute the mean and standard deviation (SD) of accuracy (ACC), sensitivity (SEN), specificity (SPE), and GM under the 5-fold CV framework of the 6 classifiers for training data, and compute the mean, SD, and 95% confidence interval (CI) of the 4 measurements for test data based on the results of the 100 rounds.  Tables 2 and 3, respectively. In Table 2, the highest AVEs of the ACC, SEN, SPE, and GM values among the 6 classifiers for each set of features are marked in bold, where the RF classifier has more robust performances than other classifiers for the 6 sets of features on the average. In Table 3, the highest AVEs of the ACC, SEN, SPE, and GM values among the 6 sets of features are marked in bold, where Set 5 has the highest ACC, SEN, and GM values. Interestingly, in the 100 rounds, the SVM and RF classifiers are selected to be our final classifier for test data most frequently with the features contained in Set 1. However, this is not the case for other sets of features. For example, the RF and LR classifiers are preferred for Set 3; the RF and LDA classifiers are preferred for Set 6. This phenomenon reveals that different sets of features or different data might need different classifiers to highlight their impact of classification, which is also the main reason why we use a 5-fold CV scheme to select our final classifier in step 3 of the above procedure.  In view of Tables 2 and 3, we found that the classification performances did not depend on the number of features solely. For example, the ACCs and GMs of Set 1 are higher than those of Sets 2 and 3 in most cases although Set 1 contains less number of features than Set 2 or Set 3. Nevertheless, when we combine the features in Sets 1 and 2 into Set 4 or the features in Sets 1-3 into Set 5, the classification performances are improved significantly in terms of paired-wise t-tests as shown in the previous question. These phenomena reveal that the features in Set 1 did provide useful and important information for identifying PD but there is still room for improvement. The numerical results in Tables 2 and 3 reveal that the 2D and 3D features in Sets 2 and 3, respectively, extracted from the proposed mathematical representations show their potential for improving the accuracy of automated PD identification.
Moreover, in Table 3, the average ACCs for test data based on 100 rounds with the 6 sets of features are 0.858, 0.792, 0.825, 0.873, 0.880, and 0.858, respectively. Since we use the same hold-out and 5-fold CV samples to learn classifiers with each of the 6 feature sets in each round, we adopt paired-wise t-tests on the ACCs of 4 cases, 'Set 6 vs. Set 1', 'Set 1 vs. Set 4', 'Set 6 vs. Set 5', and 'Set 4 vs. Set 5', to test whether these sets of features provide significantly different ACCs for test data. The p-values of these 4 cases are 0.44, 2.3 × 10 −7 , 4.3 × 10 −11 , and 7.3 × 10 −4 , respectively, which indicate that Set 5 Set 4 Set 6 ≈ Set 1, where A B denotes that the population mean of A is significantly higher than the population mean of B, and A ≈ B denotes that population means of A and B have no significant difference. Figure 8 presents the scatter plots of the ACCs of the 4 cases for test data based on 100 rounds, which reveal consistent phenomena with the statistical test results since most points in Figure 8b-d are located below the 45-degree line.

Discussion
A shape approximation approach is proposed to portray the ROI from medical images. An effective PSO-FS algorithm for obtaining a suitable approximation is proposed under the framework of PSO. The convergence of the PSO-FS algorithm is derived under mild assumptions on the selected family of equations. We apply the proposed procedure to the issue of detecting PD from the SPECT, where ellipse and cashew-shaped equations are selected to approximate the ROI in the 2D and 3D spaces, respectively. The coefficients of the ellipse and cashew-shaped equations, together with some well-known features of PD existing in the literature, are employed to learn PD classifiers under 6 commonly used machine learning approaches. The empirical results reveal that the coefficients of the ellipse and the cashew-shaped equations are capable of improving classification performances.
According to the findings in our empirical results, the proposed method has shown its potential to be applied to other medical imaging data to extract important features contained in the corresponding 2D and 3D images. Most importantly, once a suitable family of equations is selected to depict the regions of uptakes of a specific ROI based on the suggestions made by experts, the PSO-FS algorithm can help to obtain the optimal equation effectively if the selected family of equations satisfies some mild conditions. Therefore, more constructive and useful features can be computed according to the corresponding mathematical representations of the 2D and 3D shapes, which provide researchers the opportunities to gain more insights into the associated problems.
Moreover, although our empirical study provides numerical evidence to show the usefulness of the 2D and 3D shape approximations of the ROI for PD identification, the proposed method still has the following limitations: • For a specific ROI, the proposed method needs to select a suitable family of mathematical representation, like the ellipse and cashew-shaped equations for identifying PD in 2D and 3D spaces, respectively. However, we may not have enough understanding or knowledge to derive useful features directly from a mathematical representation of any shape of the ROI, like the cashew-shaped equation used in this study. Therefore, we just adopt the coefficients of the shape equation as features for classification. Although these coefficients can represent the shape of the ROI, it may not be the most effective way for a suitable classifier to learn. More studies are needed to dig more insights for this issue. • This study and [8] both adopted the family of ellipse equations to portray the ROI observed from an 2D-combined image for PD identification. However, the ROI for normal subjects should be comma-shaped, which is not an ellipse. Due to the reason that we know ellipses better than comma-shaped equations, the family of ellipse equations is selected to approximate the ROI. This selection, of course, has systematic biases (or called model risk), which might reduce the classification performance. • In the proposed PSO-FS algorithm, we need to compute the distance of each boundary point of the ROI to a given shape equation in each PSO iteration. This procedure is computationally expensive, especially when training a classifier with a huge number of boundary points. A more efficiently computing way should be developed in the future.

Conflicts of Interest:
The authors declare no conflict of interest. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Chang Gung Medical Foundation Institutional Review Board (no. 202001334B0). The IRB approves the waiver of the participants' consent.

Appendix A. Derivation of the Cashew-Shaped Equation (5) Satisfying C2
The objective is to show that at the m-th iteration of the FS algorithm with the cashew-shaped equation Ψ 3D = 0 defined in (5) Without loss of generality, we consider Ψ 3D (·) with s = 1 and (t x , t y , t z ) = (0, 0, 0) in (5), and consider the case of p and x m having the same y-coordinate, where p and x m are denoted by (p x , c y , p z ) and (m x , c y , m z ), respectively. Since d(p, x m ) > d(p, H), which indicates that the vector p − x m = (p x − m x , 0, p z − m z ) is not orthogonal to the tangent plane to Ψ 3D = 0 at x m . Hence, there must exist one 2D plane, denoted by L m , such that p, x m ∈ L m , and the vector (p x − m x , 0, p z − m z ) in L m is not orthogonal to the tangent line to Ψ 3D = 0 restricted in L m at x m , which indicates that d(p, x m ) > d(p,Ĵ m ), whereĴ m denotes the intersection of Ψ 3D = 0 and L m . The rest of the proof is to show that there exists anx m inĴ m such that C2 holds.
To achieve this goal, we first rotate the coordinate system such that the equation of the plane L m is represented by y = c y in the new coordinate system with a rotation matrix M 1 , which is defined similarly to the M in (5). Therefore, p, x m andx m have the same y-coordinate in the new system. For example, Figure A1a presents a schematic diagram, where x m andx m are two points on a cashew-shaped surface Ψ 3D = 0, and p is an external point of {x : Ψ 3D ≤ 0}. If we rotate Ψ 3D = 0 to a new coordinate system, denoted byx = (x,ŷ,ẑ) as shown in Figure A1b, with a proper rotation matrix, we have that p, x m , andx m have the sameŷ values. Apparently, to prove that Ψ 3D = 0 satisfies C2 in Figure A1a is equivalent to prove it under the scenario in Figure A1b. In addition, since Ψ 3D (·) is transformed from Ψ 0 3D (·) defined in (4), letx T = M 1 · M · x T denote the coordinate in thex-coordinate system and the corresponding equation ofĴ m is represented as Ψ 0 3D (M −1 · M −1 1 · (x, c y ,ẑ) T ) = 0, which is a polynomial equation ofx andẑ. Consequently, for a fixedp x , which is the x-coordinate of p in thê x-coordinate system, there exists a ζ > 0 such thatx m = (m x , c y ,m z ) ∈Ĵ m , d(x m ,x m ) = ζ, and C2 holds in thex-coordinate system, which indicates that C2 also holds in the original x-coordinate system since x T = (M 1 M) −1xT . Therefore, the cashew-shaped equations defined in (5) also satisfy C2. Lemma A2. Assume that C1 and C2 hold. Then we have 1. G is continuous on J i , i = 1, 2, . . ..

2.
If x ∈ J and d(p, x) > d(p, H), there exists another point x ∈ J x ⊆ J such that d(p, x ) < d(p, x).
Proof. Without loss of generality, we consider the case of v x and n x being normalized vectors. Hence, we have G(x) = cos(θ x ) = |<v x , n x >|, for all x ∈ J, where <a, b> denotes the inner product of two vectors a and b. For any x 0 ∈ J, the triangular inequality leads to In addition, C1 leads to for all ε 1 > 0 and ε 2 > 0, there exist δ 1 > 0 and δ 2 > 0 such that and ||n x − n x 0 || < ε 2 , for all ||x − x 0 || < δ 2 , where ||x − x 0 || < δ 1 (or ||x − x 0 || < δ 2 ) indicates that x belongs to the J i containing x 0 . Consequently, let ε 1 = ε 2 = ε/2 and δ = min(δ 1 , δ 2 ). By (A5)-(A7), we have that for any ε > 0 there exists a δ > 0 such that |G(x) − G(x 0 )| < ε for all ||x − x 0 || < δ. Therefore, G is a continuous function defined on J i , i = 1, 2, . . .. For a point x ∈ J, let Γ x denote the tangent plane to Ψ at x. The existence of Γ x is provided by C1. In addition, if x is an interior point of J, there exists a ζ > 0 such that B 2 ζ (x) ∩ H ⊆ J, where B ζ (x) denotes a 2-dimensional open ball of radius ζ centered at x. Denote B (2) ζ (x) ∩ H by κ x for short. In particular, if x is a boundary point of J, we defined κ x as the intersection of B (2) ζ (x) ∩ H and J. Furthermore, if d(p, x) = d(p, H), we have G(x) < 1. Consequently, there exists anx ∈ Γ x such that G(x) < G(x), which is equivalent to d(p,x) < d(p, x), and anx ∈ κ x associated tox, wherex denotes the intersection of κ x and the line formed byx and p.
According to C2, we consider the following two cases: wherex is redefined as the intersection of κ x and the line formed by p and y. Therefore, by (A9) and (A10), we conclude that there exists anx ∈ κ x such that d(p,x ) < d(p, x) if C1 and C2 hold. The second result of Lemma A2 indicates that in the Step 2 of the FS algorithm, one can find a better estimate of d(p, H) than d(p, x) if d(p, x) > d(p, H) by setting a small enough δ. According to Lemmas A1 and A2, we prove Theorem 1 in the following.