Neighboring Discriminant Component Analysis for Asteroid Spectrum Classiﬁcation

: With the rapid development of aeronautic and deep space exploration technologies, a large number of high-resolution asteroid spectral data have been gathered, which can provide diagnostic information for identifying different categories of asteroids as well as their surface composition and mineralogical properties. However, owing to the noise of observation systems and the ever-changing external observation environments, the observed asteroid spectral data always contain noise and outliers exhibiting indivisible pattern characteristics, which will bring great challenges to the precise classiﬁcation of asteroids. In order to alleviate the problem and to improve the separability and classiﬁcation accuracy for different kinds of asteroids, this paper presents a novel Neighboring Discriminant Component Analysis (NDCA) model for asteroid spectrum feature learning. The key motivation is to transform the asteroid spectral data from the observation space into a feature subspace wherein the negative effects of outliers and noise will be minimized while the key category-related valuable knowledge in asteroid spectral data can be well explored. The effectiveness of the proposed NDCA model is veriﬁed on real-world asteroid reﬂectance spectra measured over the wavelength range from 0.45 to 2.45 µ m, and promising classiﬁcation performance has been achieved by the NDCA model in combination with different classiﬁer models, such as the nearest neighbor (NN), support vector machine (SVM) and extreme learning machine (ELM). noised samples in order to alleviate the overﬁtting problem and to ﬁnd the signiﬁcant category-related features such that the classiﬁcation performance can be improved. The goals are technically achieved by simultaneously maximizing the neighboring between-class scatter, minimizing the within-class scatter and preserving the neighboring principal components. Experimental results on reﬂectance spectrum characteristics measured across the spectral wavelengths ranging from 0.45 to 2.45 µ m show the effectiveness of the proposed model by combining with different baseline classiﬁer models, including NN, SVM and ELM, and the highest classiﬁcation accuracy is achieved using the ELM classiﬁer, which also veriﬁes the superiority of ELM for multiclass classiﬁcation problem.


Introduction
Deep space exploration is the focus of space activities around the world, which aims to explore the mysteries of the universe, search for extraterrestrial life and acquire new knowledge [1][2][3]. Planetary science plays an increasingly important role in the high-quality and sustainable development of deep space exploration [4,5]. Asteroids, as a kind of special celestial body revolving around the sun, are of great scientific significance for human beings in studying the origin and evolution of the solar system, exploring the mineral resources and protecting the safety of the earth due to their large number, different individual characteristics and special orbits [6][7][8]. Studies have shown that the thermal radiation from asteroids mainly depends on its size, shape, albedo, thermal inertia and roughness of the surface [9,10]. The asteroids with different types (such as the S-type, V-type, etc.) in different regions (such as the Jupiter trojans, Hungarian group, etc.) show different spectral characteristics, which establishes the foundations for identifying different kinds of asteroids via remote spectral observation [11,12]. For example, the near-infrared data can reveal the diagnostic compositional information, and the salient features at 1 and 2 µm bands can be used to indicate the existence or absence of olivine and pyroxene [12]. The astronomers have developed many remote observation methods for asteroids, such as spectral and polychromatic photometry, infrared and radio radiation methods [13][14][15][16]. Thus, a large volume of asteroid visible and near-infrared spectral data has been collected with the development of space and ground-based telescope observation technologies, which induced great progress in the field of asteroid taxonomy through their spectral characteristics [17][18][19][20].
The Eight-Color Asteroid Survey (ECAS) is the most remarkable ground-based asteroid observation survey, which gathered the spectrophotometric observations of about 600 large asteroids [14]. However, very few small main-belt asteroids have been observed due to their faintness. With the appearance of charge-coupled device (CCD), it has been possible to study the large-scale spectral data of small main-belt asteroids with a diameter less than 1 km [21]. The first phase of the Small Main belt Asteroid Spectroscopic Survey (SMASSI) was implemented from 1991 to 1993 at the Michigan-Dartmouth-MIT Observatory [15,20]. The main objective of SMASSI was to measure the spectral properties for small and medium-sized asteroids, and it primarily focuses on the objects in the inner main belt aiming to study the correlations between meteorites and asteroids. Based on the survey, abundant spectral measurements for 316 different asteroids have been collected. In view of the successes of SMASSI, the second phase of the Small Main-belt Asteroid Spectroscopic Survey (SMASSII) mainly focused on gathering an even larger and internally consistent asteroid dataset with spectral observations and reductions, which were carried out as consistently as possible [20]. Thus, SMASSII has provided a new basis for studying the composition and structure of the asteroid belt [9].
For asteroid taxonomy, Tholen et al. applied the minimal tree method by a combination with the principal component analysis (PCA) method in order to classify nearly 600 asteroid spectra from the ECAS [14]. For more comprehensive and accurate classification of asteroids, DeMeo et al. developed an extended taxonomy to characterize visible and near -infrared wavelength spectra [20]. The asteroid spectral data used for the taxonomy are based on the reflectance spectral characteristics measured in the wavelength range from 0.45 to 2.45 µm with 379-688 bands. In summary, the dataset was comprised of 371 objects with both visible and near-infrared data. SMASSII dataset provided the most visible wavelength spectra, and the near-infrared spectral measurements from 0.8 to 2.5 µm were obtained by using SpeX, the low-resolution to medium-resolution near-infrared spectrograph and imager at the 3-m NASA IRTF in Mauna Kea, Hawaii [20]. A detailed description for the dataset is illustrated in Table 1. Based on the dataset, DeMeo et al. have presented the taxonomy, as well as the method and rationale, for the class definitions of different kinds of asteroids. Specifically, three main complexes, i.e., S-complex, C-complex and X-complex, were defined based on some empirical spectral characteristics/features, such as the spectral curve slope, absorption bands and so on. Nevertheless, the question of how to automatically discover the key category-related spectral characteristics/features for different kinds of asteroids remains an open problem [9,22,23]. Meanwhile, owing to the noise of observation systems and the ever-changing external conditions, the observed spectral data usually contain noise and distortions, which will cause spectrum mixture due to the random perturbation of electronic observation Remote Sens. 2021, 13, 3306 3 of 20 devices. As a result, the observed asteroid spectra data often show indivisible pattern characteristics [24,25]. Furthermore, the observed spectral data always have wide bands, such as the visible and near-infrared bands. Thus, the reflectance at one wavelength is usually correlated with the reflectance of the adjacent wavelengths [26]. Accordingly, the adjacent spectral bands are usually redundant, and some bands may not contain discriminant information for asteroid classification. Moreover, the abundant spectral information will result in high data dimensionality containing useless or even harmful information and bring about the "curse of dimensionality" problem, i.e., under a fixed and limited number of training samples, the classification accuracy of spectral data might decrease when the dimensionality of spectral feature increases [27]. Therefore, it is necessary to develop effective low-dimensional asteroid spectral feature learning methods and find the latent key discriminative knowledge for different kinds of asteroids, which will be very beneficial for the precise classification of asteroids.
Machine learning techniques have developed rapidly in recent years for spectral data processing and applications, such as the classification and target detection [28][29][30][31][32][33][34][35]. For example, the classic PCA has been applied to extract meaningful features from the observed spectral data without using the prior label information. PCA is also useful for asteroid and meteorite spectra analysis due to the fact that many of the variables, i.e., the reflectance at different wavelengths, are highly correlated [15,20,36]. Linear discriminant analysis (LDA) can make full use of the label priors by concurrently minimizing the within-class scatter and maximizing the between-class scatter in a dimension-reduced subspace [37]. In addition to the above statistics-based methods, some geometry theory-based methods have also been proposed for the problem of data dimensionality reduction. For example, the locality preserving projections (LPP) assume that neighboring samples are likely to share similar labels, and the affinity relationships among samples should be preserved in subspace learning/dimension reduction [38]. Locality preserving discriminant projections (LPDP) have also been developed with locality and Fisher criterions, which can be seen as a combination of LDA and LPP [39,40].
In order to define the class boundaries for asteroid classification, traditional methods always empirically determine the spectral features by relying on the presence or absence of specific features, such as the spectral curve slope, absorption wavelengths and so on, which might be intricate and less reliable. Based on the well labeled asteroid spectral dataset described in Table 1 the main objective of this paper is to study the pattern characteristics of different categories of asteroids from the perspective of data-driven machine learning technique and to develop efficient asteroid spectral feature learning and classification method in a supervised fashion, as shown in Figure 1. In order to be specific, it is assumed that not only the specified absorption bands, such as the 1 µm and 2 µm bands but also all the spectral wavelengths might carry some useful diagnostic information for asteroid category identification and will contribute to the accurate classification of different kinds of asteroids. As a result, the spectral data spanning across the visible to near-infrared wavelengths, i.e., from 0.45 to 2.45 µm, are treated as a whole in order to automatically discover the key category-related discriminative information for efficient asteroid spectral feature learning and classification by using supervised data-driven machine learning methodology. The novelties and contributions of this paper are summarized as below.
(1) Instead of empirically determining the spectral features via the presence or absence of specific spectral features to define asteroid class boundaries for classification, this paper presents a novel supervised Neighboring Discriminant Components Analysis (NDCA) model for discriminative asteroid spectral feature learning by simultaneously maximizing the neighboring between-class scatter and data variances, minimizing the neighboring within-class scatter to alleviate the overfitting problem caused by outliers and enhancing the discrimination and generalization ability of the model. (2) With the neighboring discrimination learning strategy, the proposed NDCA model has stronger robustness to abnormal samples and outliers, and the generalization performance can thus be improved. In addition, the NDCA model transforms the  The reminder of this paper is structured as follows. Section Ⅱ intro works on subspace learning/dimension reduction and machine learning cla The proposed NDCA model is meticulously introduced in Section Ⅲ. Secti the experimental results and discussions. The final conclusion is given in S

Notations Used in This Paper
In this paper, the observed asteroid visible and near-infrared spectrosc denoted as   The reminder of this paper is structured as follows. Section 2 introduces related works on subspace learning/dimension reduction and machine learning classifier models. The proposed NDCA model is meticulously introduced in Section 3. Section 4 contains the experimental results and discussions. The final conclusion is given in Section 5.

Notations Used in This Paper
In this paper, the observed asteroid visible and near-infrared spectroscopy dataset is denoted as X = [x 1 , x 2 , . . . , x N ] ∈ D×N comprising N spectral samples with dimensionality D from C classes. N i is the number of the samples in the i-th class. The label matrix for X is denoted as T = [t 1 , t 2 , . . . , t N ] ∈ C×N with t i as the label vector for x i . The label of each sample in X is coded as a C-dimensional vector, and the j-th entry of t i is +1 with the remaining entities as 0, which indicates that sample x i belongs to the j-th category. The basic idea of linear low-dimensional feature learning, i.e., dimension reduction, is to automatically learn an optimal transformation matrix P = [p 1 , p 2 , . . . , p N ] ∈ D×d with d < D, which can project the observed spectral data from the original high D-dimensional observation space into a lower d-dimensional feature subspace, and obtains the low-dimensional meaningful features Y ∈ d×N of X via Y = P T X = [y 1 , y 2 , . . . , y N ] ∈ d×N . Table 2 summarizes the important notations used in this paper. In the process of low-dimensional feature learning, the key data knowledge and information, such as the discriminative structures, should be preserved and enhanced. Meanwhile, the noise and redundant information should be removed and suppressed. Principal component analysis (PCA) is a widely applied unsupervised statistical dimension reduction and feature learning method, which focuses on maximizing the variance of the data with significant principal components [33]. A formulation for PCA can be derived by solving the following least squares problem: where • 2 F means the Frobenius norm of a matrix, and I d is an identity matrix with the size of d. Formula (1) is equivalent to maximizing the variance of the transformed data as follows [33].
Unlike PCA, LDA is a supervised dimension reduction learning method and aims to maximize the separability between different classes and enhance the compactness within each class with the guidance of label information as described below [34,[41][42][43]: where S W and S B are, respectively, the within-class and between-class scatter matrices of data, which are calculated in the following way [34,[41][42][43]: where x ij is the j-th sample of the i-th class, and µ i and µ are the mean value of the samples in i-th class and all the samples in X, respectively.

Classifier Models for Spectral Data Classification
Classifier models, such as NN, SVM [44] and ELM [45][46][47][48], have been commonly used in the contexts of machine learning and pattern recognition communities in order to recognize and classify spectral data. In particular, the extreme learning machine (ELM) is a newly developed machine learning paradigm for the generalized single hidden layer feed forward neural networks and has been widely studied and applied due to its some unique characteristics, such as the high learning speed, good generalization and universal approximation abilities [47]. The most noteworthy characteristic for ELM is that the weights between the input and the hidden layers are randomly generated without further adjustments. The objective function of ELM is formulated as below: Remote Sens. 2021, 13, 3306 6 of 20 where β ∈ L×C denotes the output weights connecting the hidden layer and the output layer. ξ = [ξ 1 , ξ 2 , . . . , ξ N ] T ∈ N×C is the prediction error matrix with respect to the training data. H ∈ N×L is the hidden layer output matrix and is computed with the following method: where h(•) is the activation function in the hidden layer, for example the sigmoid function.
∈ L refer to the randomly generated input weights and bias, respectively. The output weight matrix β is used to transform the data from the L-dimensional hidden layer space into the C-dimensional high-level label space and is analytically calculated in the following manner.
With the optimal output weight matrix β * obtained, the predicted label for a new test sample z can be computed as follows: where h(z) is the hidden layer output for test sample z.

The Proposed Neighboring Discriminant Component Analysis Model: Formulation and Optimization
The remote observed asteroid spectral data usually contain noise and outliers, which will mix different categories of asteroids and make them inseparable. In addition, learning with outliers will easily cause overfitting problem, which will decrease the generalization ability of machine learning models for testing samples. Thus, the key problem is to distinguish the outliers and to select the most valuable samples for the learning of lowdimensional feature subspace and preserve the key discriminative data knowledge for different classes of asteroids.
To this end, the idea of neighboring learning is introduced to find a neighboring group of valuable samples from all the training samples as well as the samples in each class, and the outliers and noised samples are excluded in dimension reduction learning in order to enhance the generalization ability of the model. As shown in Figure 2, the normalized asteroid spectral data are firstly inputted as in (a). Secondly, (b) finds the neighboring samples in each asteroid class in order to characterize the neighboring within-class and between-class properties of data. Meanwhile, the neighboring samples from all the samples for neighboring principal components were found to preserve the most valuable data information as in (c). With the basic principles of (b) and (c), a clearer class boundary can be found to alleviate the overfitting problem caused by the outliers and noised samples and enhance the neighboring and discriminative information of data for efficient spectral feature learning shown in (d). In order to achieve this goal, the neighboring betweenclass and within-class scatter matrices need to be calculated in order to characterize the neighboring discriminative properties of the observed asteroid spectra. most valuable data information as in (c). With the basic principles of (b) and (c), a cleare class boundary can be found to alleviate the overfitting problem caused by the outlier and noised samples and enhance the neighboring and discriminative information of data for efficient spectral feature learning shown in (d). In order to achieve this goal, th neighboring between-class and within-class scatter matrices need to be calculated in orde to characterize the neighboring discriminative properties of the observed asteroid spectra Neighboring between-class scatter matrix S Nb computation: Firstly, calculate the global centroid x i for all the samples in training dataset X and find Nb = Rb · N neighboring samples to m b by using between-class neighboring ratio Rb (0 < Rb < 1). Thus, N b = [N b1 , N b2 , . . . , N bc , . . . , N bC ] global neighboring samples X b can be obtained with N bc as the number of the neighboring samples in the c-th class for computing the neighboring between-class scatter matrix. Secondly, compute the local centroid x bcj for the c-th class, and x bcj is the j-th sample in the c-th class of the neighboring samples X b . Finally, the neighboring between-class scatter matrix is calculated as follows.
At the same time, the N b global neighboring samples are used to calculate the covariance matrix as below.
Neighboring within-class scatter matrix S Nw computation: Firstly, calculate the basic local centroid x ci for each class of samples, where x ci is the i-th sample in the c-th class of X, and then find the samples group containing N wc = R w · N c neighboring samples to m wc by using within-class neighboring ratio R w (0 < R w < 1) in the i-th class. Secondly, refine the local centroid of each class using the samples in the obtained neighboring group of samples X wc . Finally, compute the neighboring within-class scatter matrix as follows: where x wci is the refined centroid of each class based on the neighboring sample groups X wc , and x wci is the i-th samples of c-th class samples in the neighboring group. By comprehensively consider Equations (10)-(12) in a dimensionreduced subspace, the following optimization problem is formulated. The details for deriving Equation (13) based on Equations (10)- (12) are shown in Appendix A. In Equation (13), γ and µ are the tradeoff parameters for balancing the corresponding components in the objective function, from which one can observe that, in the subspace formed by P, the goals of neighboring between-class scatter maximization, within-class scatter minimization and neighboring principal components preservation can be simultaneously achieved. Accordingly, the side effects of outliers and noised samples will be suppressed to the largest extent. As a result, the global and local neighboring discriminative structures and principal components will be enhanced and preserved by using the neighboring learning mechanism. Furthermore, optimization problem (13) can be transformed into the following one by introducing an equality constraint [49]: where is a constant used to ensure a unique solution for model (13). The objective function for model (14) can be formulated as the following unconstrained one by introducing the Lagrange multiplier λ.
Then, the partial derivative of the objective function (15) with respect to P is calculated and set as zero, resulting in the following equations: where the projection matrix P = [p 1 , p 2 , . . . , p d ] can be acquired, which is composed of the eigenvectors corresponding to the first d largest eigenvalues λ 1 , λ 2 . . . , λ d of the eigenvalue decomposition problem as described below.
Once the above optimal projection matrix P is calculated, the training data are projected into the subspace using P in order to acquire the low-dimensional discriminative feature of the observed spectral data. Afterwards, a classifier model is trained using the dimension-reduced training data. For testing, an asteroid spectral sample with unknown label is firstly transformed into the subspace by using the optimal projection matrix P and then classified by the trained classifier model.

Preprocessing for the Asteroid Spectral Data
As shown in Table 3, a part of the samples described in Table 1 was used in the study. The data preprocessing was performed to preliminarily reduce the influences of noise for ease of classification. Firstly, the original spectral data were filtered and smoothed by using some data filtering method, such as the moving average filter. Secondly, the discrete spectrum measurements were fitted using the high-order polynomial method. Thirdly, the obtained fitted spectral curves within the spectral wavelengths from 0.45 to 2.45 µm were sampled with certain step interval. Several examples the original spectra, smoothed spectra and the fitted spectra for different kinds of asteroids are shown in Figures 3-5, from which one can see that the abnormal noises in some spectral bands were suppressed to a certain extent. using some data filtering method, such as the moving average filter. Secondly, the discrete spectrum measurements were fitted using the high-order polynomial method. Thirdly, the obtained fitted spectral curves within the spectral wavelengths from 0.45 to 2.45 μm were sampled with certain step interval. Several examples the original spectra, smoothed spectra and the fitted spectra for different kinds of asteroids are shown in Figures 3-5, from which one can see that the abnormal noises in some spectral bands were suppressed to a certain extent.    using some data filtering method, such as the moving average filter. Secondly, the discrete spectrum measurements were fitted using the high-order polynomial method. Thirdly, the obtained fitted spectral curves within the spectral wavelengths from 0.45 to 2.45 μm were sampled with certain step interval. Several examples the original spectra, smoothed spectra and the fitted spectra for different kinds of asteroids are shown in Figures 3-5, from which one can see that the abnormal noises in some spectral bands were suppressed to a certain extent.    using some data filtering method, such as the moving average filter. Secondly, the discrete spectrum measurements were fitted using the high-order polynomial method. Thirdly, the obtained fitted spectral curves within the spectral wavelengths from 0.45 to 2.45 μm were sampled with certain step interval. Several examples the original spectra, smoothed spectra and the fitted spectra for different kinds of asteroids are shown in Figures 3-5, from which one can see that the abnormal noises in some spectral bands were suppressed to a certain extent.  Total  # Samples  6  45  16  16  22  8  199  17 32 361

Experimental Setup and Results
As previously mentioned, the smoothed asteroid spectral curves were fitted using a high order polynomial, which was furthered sampled in wavelength region from 0.45 to 2.45 µm with an increment step interval of 0.05 µm, obtaining 41 measurements for each asteroid spectrum. In order to valid the effectiveness of the proposed method, the data from different classes were firstly approximately equally divided into five groups as shown Remote Sens. 2021, 13, 3306 10 of 20 in Figure 6. Afterwards, the five groups of samples from different classes are merged into 5-folds. Each fold contains five groups of samples with each group from distinct classes. The data partition of the 5-folds was illustrated in Table 4. Then, the 5-fold cross validation (CV) strategy was adopted for the performance evaluation of different methods. Specifically, random 4folds of samples were selected and used as the training dataset, and the remaining 1 fold of samples was utilized for testing; thus, five experiments were carried out. A detailed description for the five experiment settings is shown in Table 5, and the individual and average classification accuracy of different methods on the five experiments will be reported. All the experiments were conducted under the same settings and computing platform. Thus, a fair comparison between different methods can be guaranteed. The proposed NDCA model was compared with several representative subspace learning methods, including PCA, LDA, LPP and LPDP. Moreover, the sampled raw asteroid spectral data without feature learning were also included for comparison. In addition, some baseline classifier models, such as the nearest neighbor (NN), SVM and ELM, were adopted in the experiments for the classification of the asteroid features.  Training data Testing data The i fold (i=1, 2, …, 5) The rest 4 folds Figure 6. Five-fold cross verification scheme for asteroid spectral data.
The performance of different dimension reduction methods under gradually increasing reduced dimension d (from 2 to 41 with an interval of 1) by using different baseline classifier models, i.e., NN, SVM and ELM, is illustrated in Figures 7-9. In addition, the highest classification accuracy of different comparative methods under The performance of different dimension reduction methods under gradually increasing reduced dimension d (from 2 to 41 with an interval of 1) by using different baseline classifier models, i.e., NN, SVM and ELM, is illustrated in Figures 7-9. In addition, the highest classification accuracy of different comparative methods under varying dimensions for each experiment is reported in Tables 6-8, respectively. Based on the experimental results, all the comparative methods tend to achieve improved classification performance with the growth of feature dimension. For the proposed NDCA method, the classification accuracy of NDCA method increases first, then decreases and finally tends to be stable. This could be due to the fact that too many dimension features might introduce redundant harmful information and decrease classification performance. It is also notable that the classification performance of LPP stabilizes first and then increases when the feature dimension increases to about 33 in the case of SVM and ELM. Meanwhile, when LPP combines with the NN classifier, the classification performance increases first and, finally, tends to be stable. Even though LPP can only achieve comparative classification performance in a relatively high dimensionality, the best classification accuracies for LPP in combination with NN, SVM and ELM also reached 89.7565%, 92.8158% and 94.4711%, respectively.       Generally speaking, the proposed NDAC method can yield the best classification accuracies of 94.1971%, 93.6377% and 95.1895% with different classifiers. Table 9 further summarizes the performance improvement of the proposed NDCA method in comparison with different comparative methods by using different classifiers. Generally speaking, the proposed NDAC method can yield the best classification accuracies of 94.1971%, 93.6377% and 95.1895% with different classifiers. Table 9 further summarizes the performance improvement of the proposed NDCA method in comparison with different comparative methods by using different classifiers. Specifically, the maximal performance improvement of NDCA method is 4.9886% in comparison with raw feature and the PCA method by using NN as the classifier, and the minimal performance improvement of NDCA method is 0.4521% when comparing with LPDP plus ELM method. In summary, the average performance improvement in all the experimental settings is 2.045%. Therefore, the effectiveness and superiority of the proposed NDCA method can be clearly observed from the perspective of experimental verifications.
In addition, the results show that the raw data without feature learning achieves worse classification performance among all the comparative methods. In contrast, the proposed NDCA model can achieve the highest classification accuracy by combining with different classifier models. Moreover, it should be noted that the highest accuracy can be achieved when the feature dimension is around nine. Thus, the optimal reduced dimension d can be searched around the total number of categories for the samples in asteroid spectral dataset.  Table 9. Performance improvement between different pairs methods by using different classifiers.

Classifiers Comparison Pairs <Ours, Raw> <Ours, PCA> <Ours, LDA> <Ours, LPP> <Ours, LPDP>
Furthermore, the scatter points for the first two dimensions acquired by different methods are visualized in Figure 10 in order to further intuitively observe the low-dimensional feature learning performance. In contrast, the scatter points obtained by the comparative methods have serious data mixture effects between different classes, especially the "K", "L" and "Q" classes, which will result in lower classification performance. From Figure 10e, it can be observed that the scatter points derived by the proposed NDCA model show better within-class compactness and between-class separation characteristics with relatively clearer category boundaries. Accordingly, the spectral characteristics within each class and the discriminant between different classes of asteroids are fully explored and enhanced by using the proposed NDCA model. By combining with the off-the-shelf classifier models, the class boundaries between different kinds of asteroid spectral data can be easily found, which will result in promising generalization and classification performance. Furthermore, the scatter points for the first two dimensions acquired by different methods are visualized in Figure 10 in order to further intuitively observe the lowdimensional feature learning performance. In contrast, the scatter points obtained by the comparative methods have serious data mixture effects between different classes, especially the "K", "L" and "Q" classes, which will result in lower classification performance. From Figure 10e, it can be observed that the scatter points derived by the proposed NDCA model show better within-class compactness and between-class separation characteristics with relatively clearer category boundaries. Accordingly, the spectral characteristics within each class and the discriminant between different classes of asteroids are fully explored and enhanced by using the proposed NDCA model. By combining with the off-the-shelf classifier models, the class boundaries between different kinds of asteroid spectral data can be easily found, which will result in promising generalization and classification performance.

Analysis for NDCA Parameters
Apart from the dimensionality of the derived feature subspace d, the proposed NDCA model has several key other parameters, including the between-class neighboring ratio Rb, the within-class neighboring ratio Rw and the balance parameters γ and μ in the model formulation (13). Obviously, different parameter settings will result in fluctuating Figure 10. Visualization for the scatter points of the first two components acquired by different methods. By comparison, the proposed NDCA model shows better within-class compactness and between-class separation characteristics.

Analysis for NDCA Parameters
Apart from the dimensionality of the derived feature subspace d, the proposed NDCA model has several key other parameters, including the between-class neighboring ratio Rb, the within-class neighboring ratio Rw and the balance parameters γ and µ in the model formulation (13). Obviously, different parameter settings will result in fluctuating performances. Thus, parameter sensitivity analyses were needed to be conducted in order to show the classification performance variation with respect to these parameters. Specifically, the four parameters were divided into two groups, i.e., (γ, µ) and (Rw, Rb). Among them, γ and µ were selected from the candidate parameter set {10 g , g = −4, −3, −2, −1, 0, 1, 2, 3, 4}, while Rw and Rb were selected from the candidate parameter set {0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1}. As shown in Figures 11-13, one can observe that the average classification performance change surfaces in sub-figures (a) of Figures 11-13 are smoother and more stable within a wide parameter setting range, which means that the classification is not very sensitive to the settings of parameter pair (γ, µ). By contrast, the classification performance changes more acutely with the variations of different parameter pairs (R w , R b ).

Analysis for ELM Classifier Parameters
The former experiments show that the proposed NDCA method can generally achieve promising and higher classification accuracy in combination with ELM. As shown in formulation (6), ELM has two key hyper-parameters, i.e., the number of hidden neurons L and the balance parameter α. Figure 14 shows that classification performance changes with different settings of L and α. In general, with the increase in the hidden neurons, the classification accuracy increases first and then tends to be stable. In the experiments, the number of hidden neurons L in ELM is empirically set around 9000. As for the trade-off

Analysis for ELM Classifier Parameters
The former experiments show that the proposed NDCA method can generally achieve promising and higher classification accuracy in combination with ELM. As shown in formulation (6), ELM has two key hyper-parameters, i.e., the number of hidden neurons L and the balance parameter α. Figure 14 shows that classification performance changes with different settings of L and α. In general, with the increase in the hidden neurons, the classification accuracy increases first and then tends to be stable. In the experiments, the number of hidden neurons L in ELM is empirically set around 9000. As for the trade-off parameter α, the classification accuracy first improves when α increases from 10 −5 to 10 and then degrades when α increases from 10 to 10 5 . In the experiments, α can be set around 10 by which promising performance can be expected. parameter α, the classification accuracy first improves when α increases from 10 −5 to 10 and then degrades when α increases from 10 to 10 5 . In the experiments, α can be set around 10 by which promising performance can be expected. From the above experimental results, we observe the following: (1) The benefits of feature learning for asteroid spectrum classification. In the experiments shown in Tables 6-8, the original observed raw spectral data without feature learning were directly fed into the classifier models, i.e., NN, SVM and ELM, for classification. The average classification performances achieved by NN, SVM and ELM were 89.2085%, 91.7047% and 92.6963%, respectively, which were generally the worst performance among all the comparative methods. In contrast, the classification performance achieved by the same classifier models after feature learning obtained some improvement. For example, LPP plus NN, SVM and ELM can, respectively, achieve the improved classification accuracies of 89.7565%, 92.8158% and 94.4711%. From the above experimental results, we observe the following: (1) The benefits of feature learning for asteroid spectrum classification. In the experiments shown in Tables 6-8 The improvements are mainly due to the following two aspects. Firstly, the NDCA model is a supervised dimension reduction method and inherits the merits of the existing methods, which can fully utilize label knowledge in order to find the key category-related information of spectral data for discriminative asteroid spectral feature learning and classification. Secondly, the introduction of neighboring learning methodology can significantly reduce the side effects of outliers and noised samples in order to alleviate the overfitting problem, which will enhance the robustness of the leant low-dimensional features and finally improve the generalization ability and classification performance of the proposed model in testing. (3) The superiority of ELM. Three baseline classifier models, including NN, SVM and ELM, were used in the experiments. In particular, the best results are obtained by NDCA plus ELM with a classification accuracy of about 95.19%, which is generally superior to the comparing classifier models. To the best of our knowledge, this work is the first attempt to apply ELM in asteroid spectrum classification, and very competitive performance has been achieved, which can provide new application scenarios and perspectives for ELM community. (4) Future work discussion. First, future work will consider employing feature selection methods in order to study the asteroid spectral characteristics. Distinct from feature learning/extraction methods, which adopts the idea of data transformation, feature/band selection methods use the idea of selection and aim to automatically select a small subset of representative spectral bands in order to remove spectral redundancy while simultaneously preserving the significant spectral knowledge.
Since the feature selection is performed in the original observation space, the specific selected bands have clearer physical meanings with better interpretability. As a result, feature/band selection is an important technique for spectral dimensionality reduction and has room for further improvement. Second, the visualization in Figure 10 for the scatter points of the first two components acquired by different methods showed that some classes of asteroid spectra with limited training samples are seriously mixed and overlapped. One possible reason is that the numbers of training samples from different classes were unbalanced. For example, the number of samples for 'S' class asteroid is 199, while the 'A' class asteroid only has six samples. When classifying data with complex class distribution, the regular learning algorithm has a natural tendency to favor the majority class by assuming balanced class distribution or equal misclassification cost. As a result, the sample imbalance problem will result in learning bias, and the generalization ability of the obtained model is, thus, restricted. It is significant to deal with the data imbalanced problem and establish balanced data distribution by some sampling or algorithmic methods in future works such that the imbalanced class distribution problem can be well handled and alleviated, which can improve the accuracy of asteroid spectral data analysis.

Conclusions
This paper has introduced a novel supervised NDCA learning model for asteroid spectral feature learning and classification. The key idea is to distinguish the outliers and noised samples in order to alleviate the overfitting problem and to find the significant category-related features such that the classification performance can be improved. The goals are technically achieved by simultaneously maximizing the neighboring betweenclass scatter, minimizing the within-class scatter and preserving the neighboring principal components. Experimental results on reflectance spectrum characteristics measured across the spectral wavelengths ranging from 0.45 to 2.45 µm show the effectiveness of the proposed model by combining with different baseline classifier models, including NN, SVM and ELM, and the highest classification accuracy is achieved using the ELM classifier, which also verifies the superiority of ELM for multiclass classification problem.
Author Contributions: All the authors made significant contributions to the study. T.G. and X.-P.L. conceived and designed the global structure and methodology of the manuscript; T.G. analyzed the data and wrote the manuscript. Y.-X.Z. and K.Y. provided some valuable advice and proofread the manuscript. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available upon request from the author. 1 Deriving Tr P T S Nb P from Equation (10).
Indicate m bc and m b as the local and global centroids in the original space. Similarly, m bc and m b indicate the local and global centroids in the feature space, which can be calculated as m bc = P T m bc and m b = P T m b by using the subspace projection matrix P. The neighboring between-class scatter matrix in feature space is described below.
Since m bc = P T m bc and m b = P T m b , the following two formulations can be obtained via Equation (A1).
It can be easily observed that ∑ C c=1 N c (m bc − m b )(m bc − m b ) T is the neighboring between-class scatter matrix in the original space, which will result in Equation (10) in the paper as follows.
Thus, Equation (A3) can be rewritten as below.
Furthermore, the trace of Equation (A5) is used for the optimization of subspace projection matrix P, resulting in the following formulation.
Tr P T S Nb P (A6) Following the above derivations from Equations (A1)-(A6), the component Tr P T S Nb P in Equation (13) can be obtained based on Equation (10).

2
Deriving P T X b X T b P from Equation (11). Signify x bi as the low dimensional feature of x bi projected by P, i.e., x bi = P T x bi . With the idea of PCA, the variance of the projected data is maximized as follows.
Equation (A8) can be transformed into the following form.
i=1 (x bi )(x bi ) T is the covariance matrix of dataset X b shown in Equation (11) and can be expressed as X b X T b . Therefore, Equation (A9) is formulated in the following form.
Furthermore, the trace of Equation (A10) is used for optimization as described below.
In this way, the component Tr P T X b X T b P in Equation (13) is obtained based on Equation (11) via the derivations from Equations (A7)-(A11).

3
Deriving Tr P T S Nw P from Equation (12).
Denote x wc and m wc as the samples and within-class centroid for the c-th class in original space. x wc and m wc denote the samples and within-class centroid for the c-th class in the feature space, which can be calculated as x wc = P T x wc , m wc = P T m wc using projection matrix P. The neighboring within-class scatter in feature space is described as below.
Substitute x wc = P T x wc and m wc = P T m wc into Equation (A12),the following two formulations can be successively obtained.
It can be observed that ∑ C c=1 ∑ N wc i=1 (x wc − m wc )(x wc − m wc ) T = S Nw is the neighboring within-class scatter matrix in the original space, i.e., Equation (12). Thus, Equation (A14) can be rewritten as described below.
Similarly, the trace of Equation (A15) is used for the optimization of subspace projection matrix P.
Tr P T S Nw P (A16)