Pathological Brain Detection by a Novel Image Feature—Fractional Fourier Entropy

: Aim : To detect pathological brain conditions early is a core procedure for patients so as to have enough time for treatment. Traditional manual detection is either cumbersome, or expensive, or time-consuming. We aim to offer a system that can automatically identify pathological brain images in this paper. Method : We propose a novel image feature, viz., Fractional Fourier Entropy (FRFE), which is based on the combination of Fractional Fourier Transform (FRFT) and Shannon entropy. Afterwards, the Welch’s t -test (WTT) and Mahalanobis distance (MD) were harnessed to select distinguishing features. Finally, we introduced an advanced classifier: twin support vector machine (TSVM). Results : A 10 ˆ K-fold stratified cross validation test showed that this proposed “FRFE + WTT + TSVM” yielded an accuracy of 100.00%, 100.00%, and 99.57% on datasets that contained 66, 160, and 255 brain images, respectively. Conclusions : The proposed “FRFE + WTT + TSVM” method is superior to 20 state-of-the-art methods.

Pathological brain detection (PBD) is of essential importance.It can help physicians make decisions, and to avoid wrong judgements on subjects' condition.Magnetic resonance imaging (MRI) features high-resolution on soft tissues in the subjects' brains, generating a mass dataset [1].At present, there are numerous works on the use of brain magnetic resonance (MR) images for solving PBD problems [2,3].
Due to the enormous volume of the imaging dataset from the human brain, traditional manual techniques are either tedious, or time-consuming, or costly.Therefore, it is necessary to develop a novel computer-aided diagnosis (CAD) system [4] to help patients have enough time to receive treatment.
In the last decade, many methods from different countries were presented with the same goal of detecting pathological brains [5][6][7][8] (more references will be introduced in Section 2).Most of them have two stages: (1) Feature extraction, to extract efficient features that can distinguish pathological brain from healthy brains; feature reduction can be skipped if the size of features dataset is reasonable; and (2) Classification, to construct a classifier using the extracted (and reduced) features.
For the first stage of feature extraction, the latest solutions transform the brain image by discrete wavelet transform (DWT) [9], which is proven to be superior to the traditional Fourier transform.Nevertheless, DWT comes across a problem of choosing the best decomposition level and the optimal wavelet function.
For the final stage of classification, recent approaches like to use feed-forward neural network (FNN) or support vector machine (SVM), which are becoming popular in the fields of classification and detection [10].However, SVM has the limitation that its hyperplanes should be parallel.This parallelism restrains the classification perform.
To solve the above two problems, we propose two improvements: on the one hand, we propose a novel image feature-Fractional Fourier Entropy (FRFE)-which is based on two steps: (1) the use of a Fraction Fourier Transform (FRFT) to replace the traditional Fourier transform; and (2) Shannon entropy to extract features from the FRFT spectrums.
On the other hand, we suggest removing the hyperplane parallelism restraint [11].We introduce for this purpose two non-parallel SVMs, the generalized eigenvalue proximal SVM (GEPSVM) and twin support vector machine (TSVM).
In the remainder of this paper Section 2 offers the state-of-the-art.Then, Section 3 describes the materials used.Section 4 presents the extracted features and how to select important features.Section 5 offers the mechanisms of the standard support vector machine and non-parallel support vector machine.Section 6 covers the experimental design.Section 7 provides the results.Discussions are presented in Section 8. Finally Section 9 is devoted to our conclusions.

State-of-the-Art
Recent PBD methods are of two types.One treats a 3D dataset as a whole, and the other selects the most important slice from the 3D data.The former needs to scan the whole brain, which is expensive and time-consuming.The latter only needs to scan the focus related slice, which is cheap and rapid.In this study, we focus on the latter.
Chaplot et al. [12] were the first to apply "DWT" to PBD problems.Their classifiers are SVM and self-organizing map (SOM).El-Dahshan et al. [13] employed a 3-level discrete wavelet transform.Dong et al. [14] proposed the use of a novel scaled conjugate gradient (SCG) approach for PBD.Zhang and Wu [15] proposed to employ kernel support vector machine (KSVM).Saritha et al. [16] combined wavelet transform with Shannon entropy, and they named the novel feature wavelet-entropy (WE).They harnessed spider-web plots (SWPs) with the aim of decreasing the number of WEs.They employed the probabilistic NN (PNN) for classification.Zhang et al. [17,18] found spider-web plots had no effect on PBD.Das et al. [19] suggested using a Ripplet transform (RT) in PBD.Their classifier is a least squares support vector machine (LS-SVM).Zhang et al. [20] employed particle swarm optimization (PSO) to find the optimal parameters in a kernel support vector machine.El-Dahshan et al. [21] proposed the use of a feedback pulse-coupled neural network to segment brain images.Following Saritha's work, Zhou et al. [22] employed WE, and Naive Bayes classifier (NBC) as the classifier.Zhang et al. [23] employed a discrete wavelet packet transform (DWPT) to replace DWT, and employed Tsallis entropy (TE) to replace Shannon entropy (SE).Yang et al. [24] employed wavelet-energy as the features.Damodharan and Raghavan [25] used tissue segmentation to detect neoplasms in brains.Guang-Shuai et al. [26] employed both wavelet-energy and SVM.The overall accuracy was less than 83%.Wang et al. [27] employed genetic algorithm (GA) to solve the task of PBD.Nazir et al. [28] proposed performing image denoising first.Their overall accuracy was higher than 91%.Harikumar and Kumar [29] used ANN with optimal performance achieved by use of a db4 wavelet and radial basis function (RBF) kernel.Wang et al. [30] suggested the use of a stationary wavelet transform (SWT).Zhang et al. [31] offered a new Hybridization of Biogeography-based optimization (BBO) and the Particle swarm optimization (PSO) method.Hence, they termed it HBP for short.Farzan et al. [32] used the longitudinal percentage of brain volume changes (PBVC) in a two-year follow up and its intermediate counterparts in early 6-month and late 18-month tests as features.Their experimental results showed SVM with RBF performed the best with an accuracy of 91.7%, higher than K-means at 83.3%, fuzzy c-means (FCM) at 83.3%, and linear SVM at 90%.Zhang et al. [33] used two types of features: one is Hu moment invariants (HMI); the other is wavelet entropy.Munteanu et al. [34] employed Proton Magnetic Resonance Spectroscopy (MRS) data, in order to identify mild cognitive impairment (MCI) and Alzheimer's disease (AD) in healthy controls.Savio and Grana [35] employed Regional Homogeneity to build a CAD for detecting schizophrenia based on resting-state function magnetic resonance imaging (fMRI).Zhang et al. [36] proposed employing a three dimensional discrete wavelet transform to extract features from structural MRI, with the aim of detecting Alzheimer's disease and mild cognitive impairment.
The contribution of this paper is to use fractional Fourier entropy and non-parallel SVMs, with the aim of developing a novel PBD system which has superior classification performance than the above approaches.

Materials
At present, there are three benchmark datasets of different sizes, viz., D66, D160, and D255.They were all used for our tests.All datasets contain T2-weighted MR brain images, which were acquired along the axial axis with sizes of 256 ˆ256.The readers can download them from the Medical School of Harvard University website.The first two datasets consisted of examples from seven types of diseases (meningioma, AD, AD plus visual agnosia, Huntington's disease, sarcoma, Pick's disease, and glioma) along with normal brain images.The last dataset D255 contains all seven types of diseases as mentioned before, and four new diseases (multiple sclerosis, cerebral toxoplasmosis, chronic subdural hematoma, and herpes encephalitis).Figure 1 shows samples of the brain images.
Entropy 2015, 17, 1-19 3 achieved by use of a db4 wavelet and radial basis function (RBF) kernel.Wang et al. [30] suggested the use of a stationary wavelet transform (SWT).Zhang et al. [31] offered a new Hybridization of Biogeography-based optimization (BBO) and the Particle swarm optimization (PSO) method.Hence, they termed it HBP for short.Farzan et al. [32] used the longitudinal percentage of brain volume changes (PBVC) in a two-year follow up and its intermediate counterparts in early 6-month and late 18-month tests as features.Their experimental results showed SVM with RBF performed the best with an accuracy of 91.7%, higher than K-means at 83.3%, fuzzy c-means (FCM) at 83.3%, and linear SVM at 90%.Zhang et al. [33] used two types of features: one is Hu moment invariants (HMI); the other is wavelet entropy.Munteanu et al. [34] employed Proton Magnetic Resonance Spectroscopy (MRS) data, in order to identify mild cognitive impairment (MCI) and Alzheimer's disease (AD) in healthy controls.Savio and Grana [35] employed Regional Homogeneity to build a CAD for detecting schizophrenia based on resting-state function magnetic resonance imaging (fMRI).Zhang et al. [36] proposed employing a three dimensional discrete wavelet transform to extract features from structural MRI, with the aim of detecting Alzheimer's disease and mild cognitive impairment.
The contribution of this paper is to use fractional Fourier entropy and non-parallel SVMs, with the aim of developing a novel PBD system which has superior classification performance than the above approaches.

Materials
At present, there are three benchmark datasets of different sizes, viz., D66, D160, and D255.They were all used for our tests.All datasets contain T2-weighted MR brain images, which were acquired along the axial axis with sizes of 256 × 256.The readers can download them from the Medical School of Harvard University website.The first two datasets consisted of examples from seven types of diseases (meningioma, AD, AD plus visual agnosia, Huntington's disease, sarcoma, Pick's disease, and glioma) along with normal brain images.The last dataset D255 contains all seven types of diseases as mentioned before, and four new diseases (multiple sclerosis, cerebral toxoplasmosis, chronic subdural hematoma, and herpes encephalitis).Figure 1 shows samples of the brain images.The cost of predicting pathological images as normal is severe, as the treatmen s of patients may be deferred.On the other hand, the cost of misclassification of normal as abnormal is not serious, since other diagnosis means can remedy the error.
This cost-sensitivity (CS) problem can be solved by changing the class distribution at the beginning stage, since the original data is accessible.That means, we intentionally pick more abnormal brains than normal brains in the dataset, with the aim of making the classifier biased to pathological brains, with the aim of addressing the CS problem.

Feature Extraction and Selection
The difference between Fourier transform (FT) [37] and its variant "fractional FT (FRFT) [38]", is that FRFT can analyze nonstationary signals, which FT cannot.Besides, FRFT transforms a particular signal into a unified time-frequency domain.

Basic Concept
The α-angle fractional Fourier transform (FRFT) of a particular signal x(t) was denoted by X α : where α is real-valued, t the time, u the frequency, and K the transform kernel as: here j denotes the imaginary unit.To solve the problem that cot and csc will diverge when the values of α are assigned with multiples of π, we take the limit and obtain the following equation [39]: here δ represents the Diract delta function, and m an arbitrary integer.Sometimes scholars used angular frequency ω, so: where: In this study, 2D-FRFT was performed on 2D brain images.Due to the linearity of FRFT [40], 2D-FRFT can be implemented by first applying 1D-FRFT to rows and then to columns.Besides, for 2D-FRFT, there are two angles: named α and β.At the condition of α = β = 0, the FRFT degrades to an identity operator.At the condition of α = β = 1, the FRFT becomes the conventional FT.

Fractional Fourier Domain
An example of FRFT on a one-dimensional signal tri(t) was implemented, in order to illustrate how the angle α influences the fractional Fourier domain (see Figure 2).Note that the frequency spectrum of tri(t) is the square of the sinc function, that is sinc 2 (u).We can observe that the FRFT output is in an intermediate domain between time and frequency, viz., a unified time-frequency domain.

Weighted-Type FRFT
Weighted-type fractional Fourier transform (WFRFT) belongs to the simplest implementation method of FRFT.It replaced the continuous variables t by its discrete version n, and replaced the variable u by the discrete one k.The form of WFRFT is listed below [41,42]: Then, the WFRFT of signal x can be defined as: We can observe by Equation ( 6) that WFRFT can be treated as a linear weighted combination of DFT matrix, Inverse DFT (IDFT) matrix, time inverse matrix, and identity matrix [43].

Weighted-Type FRFT
Weighted-type fractional Fourier transform (WFRFT) belongs to the simplest implementation method of FRFT.It replaced the continuous variables t by its discrete version n, and replaced the variable u by the discrete one k.The form of WFRFT is listed below [41,42]: Then, the WFRFT of signal x can be defined as: We can observe by Equation ( 6) that WFRFT can be treated as a linear weighted combination of DFT matrix, Inverse DFT (IDFT) matrix, time inverse matrix, and identity matrix [43].

Shannon Entropy
In information theory the Shannon entropy is the expected information content (IC) received in a message.Entropy is a measure of the unpredictability of IC [44].Suppose X is a discrete random variable, whose values may fall within the set of (x 1 , x 2 , . . ., x n ) with a probability mass function P(X), we have entropy H defined in the form of: HpXq " E ´´log q pPpXqq ¯( 9) where E represents the expected value operator and q the logarithm base.Equation ( 9) can be generalized to a finite sample with explicit form of: The units of entropy H are bits when q = 2, nats when q = e, and Hartleys when q = 10.Sometimes P(x i ) is equal to zero, then, we force the 0log a 0=0.

Feature Selection
We employ the two-sample location test, with the aim of selecting the most important FRFEs from the 25 ones.Student's t-test is the most popular method that assumes "equal means" and "equal variances" of the two data sets [45].This "equal variances" does not make sense and can be discarded; while the "equal means" is necessary.Therefore, we used Welch's t-test (WTT) that is an adaption of the Student's t-test.WTT only checks whether the two populations have equal means [46].WTT is widely used in various applications to select important features [47][48][49].The WTT is computed by: wpp, qq " pµ p ´µq q{ where µ denotes the sample mean and σ 2 denotes the variance of a particular feature, n the sample size, w the WTT score.The null hypothesis in this work is that the FRFE values of both pathological and healthy brains have the same means (equal variances are not of concern).The alternative hypothesis is that they have unequal means.WTT was performed at the confidence interval of 95%.Then, the selected FRFEs are used as input features for following classification.
Entropy 2015, 17, 8278-8296 Mahalanobis distance (MD) [50] is another popular feature selection method.MD measures the distance between various datasets of two different classes.Its definition is written as: mpp, qq " b pµ p ´µq q T Σ ´1pµ p ´µq q (14) where m represents the MD score, ř is defined as: where C p and C q represents the covariance matrixes of the characteristic vectors in class p and class q, respectively.
In this work, we treat the WTT score w and MD score m as a measure of the distinguishability of individual features of two classes.The higher the score is, the more distinguishable the feature is.

Support Vector Machine (SVM)
Suppose there is an N-sample training set with a p-dimension size.Suppose x n denotes a p-dimensional data point, and y i denotes the corresponding class, with a value of either ´1 or +1, denoting the sample target is either class 1 or class 2. Our aim is to build a hyperplane, which separates the first class from the second class is the desired SVM.Usually the hyperplane is (p´1)-dimensional.
With the help of simple mathematical knowledge, a hyperplane can be written as wx ´b " 0, where w and b denotes the weights and biases.Hence, SVM can be written as: Positive slack vector ξ = (ξ 1 , . . ., ξ i , . . ., ξ N ) are introduced to measure the misclassification degree of sample x i .Then, the optimal hyperplane corresponding to the SVM is yielded by solving: here c represents is the error penalty and o is a vector of ones of N-dimension.Scholars have tend to drop this parallelism and proposed a different non-parallel support vector machine (NPSVM).

NPSVM I-Generalized Eigenvalue Proximal SVM
Mangasarian and Wild [51] proposed a GEPSVM, which yielded better performance than standard support vector machines [52,53].Samples from class 1 are denoted as X 1 and samples from class 2 are denoted as X 2 , GEPSVM builds the two nonparallel planes by: Take the first plane as the example (the second plane can be obtained in a similar way), we deduce from Equation (18): Tikhonov regularization term is included to decrease the norm of z: here t represents a nonnegative Tikhonov factor.Equation ( 22) can be solved by the "Rayleigh Quotient (RQ)" approach.

NPSVM II-Twin Support Vector Machine
Jayadeva et al. [54] were the first to propose the TSVM.Reports have shown that TSVM is better than both SVM and GEPSVM [55][56][57].Another advantage of TSVM is that its convergence rate is four times faster than conventional SVM [54].The TSVM is constructed by solving the two QP tasks: here o i (i = 1,2) are the same as in Equation ( 19), and c i (i = 1,2) are positive parameters.The constraint requires the hyperplane to be at a distance of more than one from points of the other class.The first and second terms in the equations above represent the sum of squared distances from the hyperplane to one class, and the sum of error variables, respectively.

K-Fold Stratified Cross Validation
Obeying common convention, and considering the advantages of stratified cross validation (SCV), 6-and 5-fold SCV were employed for D66 and the other two datasets, respectively.Table 1 lists the SCV settings of the three datasets.Note that true class here represents the abnormal brains, and false class the normal brains.Note that D160 and D255 are divided into five folds, while D66 is divided into six folds.This is because of stratification.The D66 dataset contains 48 pathological brains and 18 healthy brains, so a 6-fold partition can guarantee each fold includes eight pathological brains and three healthy brains.If we divide D66 into five folds, then the stratification cannot be guaranteed.

Implementation
The proposed PBD system contains three successful components: FRFE, WTT, and SVM (or GEPSVM or TSVM).Figure 3 shows the diagram and Table 2 presents the pseudocode, where offline learning is to train the classifier, and online prediction is used to predict new brain images.

Implementation
The proposed PBD system contains three successful components: FRFE, WTT, and SVM (or GEPSVM or TSVM).Figure 3 shows the diagram and Table 2 presents the pseudocode, where offline learning is to train the classifier, and online prediction is used to predict new brain images.

Offline learning
Step I Feature Extraction: Fractional Fourier Entropy (FRFE) were performed on all ground-truth images: Twenty-five different WFRFT were carried out with α and β from the set of [0.6, 0.7, 0.8, 0.9, 1.0], respectively.Entropy was extracted based on the 25 fractional Fourier spectrums.
Step II Feature Selection: Welch's t-test (WTT) was employed to select the most important FRFEs among the 25 ones the 95% confidence interval.
Step III Classifier Training: Those chosen FRFEs with their class labels, were fed into train SVM and two NPSVMs.
Step IV Classifier Evaluation: Evaluate the classification performance based on a 10 times K-fold SCV, and report which classifiers performs best.

Online prediction
Step I Feature Extraction: A new query image is decomposed with 25 FRFE results extracted Step II Feature Selection: Select the most important FRFEs from the 25 ones.
Step III Query Image Prediction: Input the selected FRFEs of the query image to the reported best classifier, so as to obtain whether the query brain is pathological or healthy.

Results and Discussion
The programs were developed by in house on the basis of the signal processing toolbox of 64 bit Matlab 2014a (The Mathworks ©, Natick, MA, USA).The simulation experiments were implemented on a P4 IBM computer equipped with a 3.2 GHz processor, 8 GB RAM, and the Windows 7 operating system.

Offline learning
Step I Feature Extraction: Fractional Fourier Entropy (FRFE) were performed on all ground-truth images: Twenty-five different WFRFT were carried out with α and β from the set of [0.6, 0.7, 0.8, 0.9, 1.0], respectively.Entropy was extracted based on the 25 fractional Fourier spectrums.
Step II Feature Selection: Welch's t-test (WTT) was employed to select the most important FRFEs among the 25 ones the 95% confidence interval.
Step III Classifier Training: Those chosen FRFEs with their class labels, were fed into train SVM and two NPSVMs.
Step IV Classifier Evaluation: Evaluate the classification performance based on a 10 times K-fold SCV, and report which classifiers performs best.

Online prediction
Step I Feature Extraction: A new query image is decomposed with 25 FRFE results extracted Step II Feature Selection: Select the most important FRFEs from the 25 ones.
Step III Query Image Prediction: Input the selected FRFEs of the query image to the reported best classifier, so as to obtain whether the query brain is pathological or healthy.

Results and Discussion
The programs were developed by in house on the basis of the signal processing toolbox of 64 bit Matlab 2014a (The Mathworks ©, Natick, MA, USA).The simulation experiments were implemented on a P4 IBM computer equipped with a 3.2 GHz processor, 8 GB RAM, and the Windows 7 operating system.

FRFE Results
In the second experiment, we calculate the FRFE of each ground truth image.The mean and standard deviation (SD) of pathological and healthy brains are listed below in Table 3.In each cell, the numbers above represents the mean and SD of FRFEs of pathological brain, the numbers below the ones of healthy brain.

Feature Selection
Then, using either WTT or MD, we finally obtain the same results, i.e., we select 12 distinguishable features as shown in Table 4, where S represents the corresponding feature is

FRFE Results
In the second experiment, we calculate the FRFE of each ground truth image.The mean and standard deviation (SD) of pathological and healthy brains are listed below in Table 3.In each cell, the numbers above represents the mean and SD of FRFEs of pathological brain, the numbers below the ones of healthy brain.

Feature Selection
Then, using either WTT or MD, we finally obtain the same results, i.e., we select 12 distinguishable features as shown in Table 4, where S represents the corresponding feature is selected and X represents unselected.Remember that the values of both α and β are within the range of [0,1] because of their periodical and symmetric property.

Feature Comparison
To demonstrate the performance of the FRFE, we compared it with the "wavelet entropy" and "wavelet energy".Reference [22] used seven wavelet entropy as features, and used NBC for detection.Reference [26] used seven wavelet energy as features, and used SVM for detection.In that paper, it is proven that seven features can obtain the highest accuracy, and adding more features will not improve the classification performance.
For a fair comparison, we also combined FRFE with NBC and SVM, respectively.Those two methods are termed as "FRFE + WTT + NBC" and "FRFE + WTT + SVM".We run K-fold SCV 10 times on the three datasets.The comparison results are listed in Tables 5 and 6.

SVM versus Non-Parallel SVMs
To compare the performance among standard SVM and two NPSVMs, we used 12 selected FRFE features.Again, K-fold SCV was run 10 times over three datasets.We recorded the accuracy of each K-fold SCV, and averaged the results of 10 runs.The results of FRFE + WTT + SVM, FRFE + WTT + GEPSVM, FRFE + WTT + TSVM, are listed in Table 7.In following experiments, TSVM is the default classifier.

Best Proposed Approach
We analyzed the results obtained.Taking the best of our proposed approaches, FRFE + WTT + TSVM, as instance, its first run of 5-fold SCV is listed in Table 8, and its total 10 runs accuracy results are listed in Table 9.The evaluation (sensitivity, specificity and precision) is listed in Table 10.

Discussion
Figure 4 indicates that if both angles increase to one, the FRFT degrades to a standard FT.Contrarily, if both angles reduce to zero, the FRFT degrades to identity operator, which does not contain any frequency spectral information.
The proposed FRFE measures the information contents in the fractional Fourier domain (FRFD), which is extended from the standard Fourier domain.From another point of view, FRFE is a measure of diversity or unpredictability.A limitation is that FRFE value is not absolute.It depends on the model over FRFD.
Table 5 shows that for NBC, the wavelet entropy obtained accuracies of 92.58%, 91.87%, and 90.51 on D66, D160, and D255, respectively, while the FRFE + WTT + NBC obtained accuracies of 97.12%, 95.94%, and 95.69%, which are higher than the accuracies obtained by wavelet entropy.Therefore, we can conclude the FRFE performed better than wavelet entropy.Table 6 shows the wavelet energy with SVM achieves accuracies of 82.58%, 80.13%, and 77.76% over three datasets, nevertheless, the FRFE + WTT + SVM yields accuracies of 100.00%, 99.69%, and 98.98% over three datasets.It suggests us FRFE is significantly better than wavelet energy.Comparing the "FRFE + WTT + SVM" in Table 6 with "FRFE + WTT + NBC" in Table 5, another finding is SVM is superior to NBC.The reason is SVM works well for large dimensional problems with relative few instances due to its regularization form [58].
Results in Table 7 indicate that GEPSVM is superior to standard SVM.Both obtain perfect detection for D66.For D160, the accuracy of GEPSVM is higher than that of SVM by 0.31%.For D255, the accuracy of GEPSVM is higher than that of SVM by 0.20%.Meanwhile, TSVM is superior to GEPSVM.The accuracy of TSVM is 0.39% higher than that of GEPSVM for D255.
The parallel hyperplane setting restrains standard SVM to generate complicated and flexible hyperplanes.NPSVMs discard this setting, so their performances are much better than SVM.TSVM has a resemblance to GEPSVM in spirit, since both drop parallelism.Their difference is that TSVM uses simpler formulation than GEPSVM, and the former can be solved by merely two QP problems.Our results align with the finding in Kumar and Gopal [59], which says "generalization performance of TSVM is better than GEPSVM and conventional SVM".Nevertheless, Ding et al. [60] claimed that TSVM has a lower generalization ability, so it is too early to make a decision about the classification performance of TSVM before more rigorous tests are implemented.
In total, our proposed FRFE + WTT + TSVM predicts 2539 success cases and 13 fail cases in 10 ˆ5-fold SCV for D255.Remember D255 contains 220 pathological brains and 35 healthy brains, so in total 2200 pathological and 350 healthy instances after 10 repetitions.For 2200 pathological instances, our method predicts 2191 cases successfully, and misclassifies nine pathological instances as healthy.For 350 healthy instances, our method predicts 348 instances successfully, and misclassifies two healthy instances as pathological.Therefore, the sensitivity of our method is 99.59%, specificity is 99.43%, and precision is 99.91% (See Table 10).
Table 11 lists the comparison results.The first column lists the abbreviated name of the different algorithms.The second column lists the feature number used in each method.The third column lists the number of runs.Here all new algorithms were run 10 times, except some old algorithms which ran five times that were reported in literature [19].The last three columns list the classification accuracy over D66, D160, and D255, respectively.Table 11 shows that D66 contains too few instances so that many algorithms achieve accuracy of 100%.For the D160, four algorithms achieve perfect classification.They are RT + PCA + LS-SVM [19], DWPT + TE + GEPSVM [23], SWT + PCA + HPA-FNN [30], WE + HMI + GEPSVM + RBF [33], and the proposed method of "FRFE + WTT + TSVM".D255 is the most difficult one and no one yields a perfect classification.Among all algorithms, the proposed "FRFE + WTT + TSVM" yields the highest accuracy of 99.57%.
Comparing to "SWT + PCA + HPA-FNN [30]" with accuracy of 99.45% over D255, our method increases this about 0.12%.Although the improvement is slight, it is obtained by 10 repetitions of 5-fold stratified cross validation, which means the improvement is robust and reliable.SWT + PCA + HPA-FNN [30] used seven features and proved seven features is the best feature combination, and introducing new features will not improve their accuracy.It does not cost too much time for our method to use 14 features (double of that of SWT + PCA), since 14 features are not a burden to classifiers in current computers.
Our contributions are: (i) we are the first to propose a novel image feature called "Fractional Fourier Entropy (FRFE)"; (ii) WTT is employed to select important FRFEs; (iii) the proposed system "FRFE + WTT + TSVM" is superior to 20 state-of-the-art methods w.r.t.pathological brain detection.

Conclusions and Future Research
In this paper, we proposed a novel image feature-Fractional Fourier Entropy (FRFE)-and then use Welch's t-test (WTT) to select important FRFEs for developing a PBD system.We finally tested four classifiers (NBC, SVM, GEPSVM, and TSVM).The simulation results showed that the proposed "FRFE + WTT + TSVM" yield better results than both other three proposed methods (FRFE + WTT + NBC, FRFE + WTT + SVM, and FRFE + WTT + GEPSVM) and 20 state-of-the-art approaches.Our PBD system may be further applied to brains with more complicated pathological conditions.
In the future research may be performed on the following points: (1) to develop evaluation methods to measure the influence from different values of α and β; (2) trying to consider the use of the least-squares technique to further improve the performance of SVM and NPSVMs; (3) application the FRFE to other pattern recognition problems, such as fruit classification [61] and tea classification [62]; (4) testing kernel methods [63]; (4) mutual entropy [64] will be introduced to test its performance in feature selection; (5) our method may be applied to X-ray [65], AD images [66], and CT images; (6) support vector data description (SVDD) [67] is commonly used for detecting novel data or outliers, so we will get more medical image data, and test SVDD; (7) swarm intelligence approaches [68] may be applied to help train classifiers.

Figure 2 .
Figure 2. Illustration of how FRFT changes with α, whose value varies from zero to one (the real and imaginary parts are shown in black and blue lines, respectively).

Figure 2 .
Figure 2. Illustration of how FRFT changes with α, whose value varies from zero to one (the real and imaginary parts are shown in black and blue lines, respectively).

Figure 3 .
Figure 3. Diagram of our method.

Figure 4
Figure 4 illustrates the 25 WFRFT decomposition results for a healthy brain,.Both α and β fall within the set of [0.6, 0.7, 0.8, 0.9, 1.0].The spectra are log-enhanced and mixed with pseudo-color for a clearer view.

Figure 3 .
Figure 3. Diagram of our method.

Figure 4 Figure 4 .
Figure 4 illustrates the 25 WFRFT decomposition results for a healthy brain,.Both α and β fall within the set of [0.6, 0.7, 0.8, 0.9, 1.0].The spectra are log-enhanced and mixed with pseudo-color for a clearer view.

Table 2 .
Pseudocode of our method.

Table 2 .
Pseudocode of our method.

Table 3 .
Mean and SD of Two Different Brains.

Table 3 .
Mean and SD of Two Different Brains.

Table 9 .
Accuracy Results of each run.

Table 11 .
Comparison with other Methods based on 10 ˆK-fold SCV (# stands for number).