Study on Mutual Information and Fractal Dimension-Based Unsupervised Feature Parameters Selection: Application in UAVs

In this study, due to the redundant and irrelevant features contained in the multi-dimensional feature parameter set, the information fusion performance of the subspace learning algorithm was reduced. To solve the above problem, a mutual information (MI) and fractal dimension-based unsupervised feature parameters selection method was proposed. The key to this method was the importance ordering algorithm based on the comprehensive consideration of the relevance and redundancy of features, and then the method of fractal dimension-based feature parameter subset evaluation criterion was adopted to obtain the optimal feature parameter subset. To verify the validity of the proposed method, a brushless direct current (DC) motor performance degradation test was designed. Vibrational sample data during motor performance degradation was used as the data source, and motor health-fault diagnosis capacity and motor state prediction effect ware evaluation indexes to compare the information fusion performance of the subspace learning algorithm before and after the use of the proposed method. According to the comparison result, the proposed method is able to eliminate highly-redundant parameters that are less correlated to feature parameters, thereby enhancing the information fusion performance of the subspace learning algorithm.


Introduction
With the development of scientific and technological research, research objects in various fields such as mechanical engineering, data mining, image processing, information retrieval, and genome engineering are becoming increasingly complex. Therefore, the volume of experimentally acquired data, such as product fault data, genetic data, and high-definition image information, has also increased exponentially, as has the number of feature dimensions [1]. Multidimensional feature parameters usually exhibit sparsity. The information between any feature parameters overlaps and complements each other while there are various problems facing data description, such as poor overall identification, heavy calculation, difficulty in visualization, and incorrect conclusions. To this end, subspace learning algorithms, such as Principal Component Analysis (PCA) [2], Kernel Principal Component Analysis (KPCA) [3], Linear Discriminant Analysis (LDA) [4], Locality Preserving Projections (LPP) [5], and Locally Linear Embedding (LLE) [6], have gradually been applied to information fusion of of the output subspace learning algorithm obtained in Section 4 from the perspectives of the motor health-fault diagnosis effect and motor state prediction. Conclusions of this study and prospects for further studies are presented in Section 6.

Mutual Information (MI)
Mutual information is defined based on information entropy. It measures the interdependence between two features, which means it represents the information shared by both features. Suppose that there is a feature parameter set F comprising n feature parameters 1 2 , , , n f f f  . According to information entropy theory, the mutual information between feature parameters i f and j f can be defined as:

Mutual Information (MI)
Mutual information is defined based on information entropy. It measures the interdependence between two features, which means it represents the information shared by both features. Suppose that there is a feature parameter set F comprising n feature parameters f 1 , f 2 , · · · , f n . According to information entropy theory, the mutual information between feature parameters f i and f j can be defined as: where H( f i ) is the information entropy of feature f i (see Equation (2)) [16,17]; P( f i ) is the probability of feature variable f i taking different probable values, which measures the uncertainty of the value of f i ; H f i f j is the conditional entropy (see Equation (3)), which means the uncertainty of f i when the value of another feature f j is known: In fact, however, the relevance between the feature parameters in the feature parameter set and their redundant features cannot be measured directly by MI, for which the mRMR criterion in the supervised method is required to measure the relevance and redundancy of features.

Fractal Dimension
Fractals are ubiquitous in Nature. Due to the limited data points in the data set, the dataset shows fractal features only within a certain scale range, namely when the local distribution and global distribution of the dataset share similar structure or properties. In this case, it can be analyzed using fractal theory [13,14,[19][20][21]. FD is the quantitative index of fractal theory. There are a variety of methods that can be used to calculate the FD of the dataset, of which the box-counting method is easy to implement and widely used. Therefore, FD was also calculated using box-counting method in this paper. With this method, the dataset is covered using a hypercube with a scale of ε, thereby obtaining the FD of the dataset. In non-scaling interval [ε 1 , ε 2 ], the FD of feature parameter set X with N dimensions can be calculated using the following Equation (4): where ε is the side length of the hypercube; N(ε) is the minimum number of hypercubes with a side length of ε that cover X. The points are plotted in the double logarithm coordinates based on the equation above. The least squares method is used to fit non-scaling interval [ε 1 , ε 2 ], thus obtaining the FD of the dataset.

UFS-MIFD Method
The fundamental theories mentioned in Section 2.1 were extended in this paper. A UFS-MIFD algorithm was developed by drawing from mRMR of the supervised method. To begin with, the relevancy, conditional relevancy and redundancy between feature parameters [7] were defined and calculated. With overall consideration, the mRMR criterion for feature parameter importance ordering was obtained, based on which the importance ordering of feature parameters contained in the feature parameter set was conducted. The less important a feature parameter was, the lower the relevancy between the parameter and the overall feature parameter set and the higher the redundancy was. Next, the feature subsets of the ordered parameter set were selected as per the FD-based feature subset evaluation criterion, thereby eliminating the feature parameters with lower relevancy and high redundancy from the feature parameter set. The algorithmic process is as follows: First, the importance ordering of various feature parameters in the n-dimensional original feature parameter set F = [ f 1 , f 2 , · · · , f n ] was conducted stepwise. The ordered feature set was supposed as G and left empty.
Step 1: The average MI between the whole feature parameter set F and every feature f i (i = 1, 2, · · · , n) was calculated using Equation (5): Thus, the first important feature in G could be g 1 = f l 1 , where l 1 = arg max This feature was able to minimize the uncertainty of the rest of features in F.
Step 2: To obtain the second important feature in G,F = [ f 1 , f 2 , · · · , f n ] was replaced by F = [ f 1 , f 2 , · · · , f j , · · · , f n−1 ]. In this case, feature f j , where j = 1, 2, · · · , n − 1, was selected randomly from F to calculate its relevancy Rel( f j ) with F, the conditional relevancy Rel(g 1 | f i ) between g 1 in G and f j , and the redundancy Red f j ; g 1 of f j with respect to g 1 , of which Rel( f j ) was defined as the average MI between f j and F [7]: where H f j signifies the information f j contains; n ∑ 1 ≤ k ≤ n, j = k I f j ; f k means the information shared by f j and other parameters in F. The larger n ∑ 1 ≤ k ≤ n, j = k I f j ; f k was, the less the new information the other parameters could provide. Therefore, if the feature parameter with the largest Rel( f j ) was selected, there would be the least information loss in the corresponding parameter set. The conditional relevancy Rel(g 1 | f i ) between f j and g 1 could be defined as [7]: Rel The redundancy Red f j ; g 1 of f j with respect to g 1 could be defined as follows [7]: Thus, the importance evaluation criterion E for feature parameter f j could be obtained by taking the relevance between f j and F and the redundancy of f j with respect to G into overall consideration: Suppose that l 2 = arg max 1 ≤ j ≤ n−1 E f j f j ∈ F , the second feature in G was g 2 = f l 2 .
Step 3: Similarly, the original F was replaced by F = [ f 1 , f 2 , · · · , f j , · · · , f n−p+1 ] to obtain the p-th important feature in G. In this case, feature f j , where j = 1, 2, · · · , n − p + 1, was selected randomly from F. The relevance Rel( f j ) between f j and F, the conditional relevance Rel g m f j between g m in G and f j , and the redundancy Red f j ; g m of f j with respect to g m , where, were calculated using Equations (6)- (8). Thus, the importance evaluation criterion E for feature parameter f j could be obtained by taking the relevance between f j and F and the redundancy of f j with respect to G into overall consideration: Suppose that l r = arg max Step 3 was repeated until all the feature parameters in the original feature parameter set F were ordered by their importance, that is, the ordered feature parameter set G was obtained.
Step 4: On that basis, the subsets of the ordered feature parameter set G were selected using the FD-based feature parameter subset evaluation criterion proposed in this study. The main idea was that the feature parameter subsets wherein the difference between the partial fractal dimension and overall fractal dimension satisfied a certain threshold were reserved by eliminating the feature parameter that had the least influence on the feature parameter set once at a time. The steps are given as follows: (1) The FD of N-dimensional ordered feature parameter set G was calculated, denoted as frac(G).
(2) With the N-th feature parameter g N eliminated from G, there were N − 1 feature parameters, which constituted a new feature parameter subset S N−1 . To distinguish between S N−1 and frac(G), the fractal dimension frac(S N−1 ) of S N−1 was named the local fractal dimension. According to calculation, r = frac(G) − frac(S N−1 ). If |r| ≤ η (η was the threshold parameter), S N−1 was considered similar with G. Although the N-th feature parameter had been eliminated, it would not make a difference to G, which suggested that the N-th feature parameter was a highly redundant parameter that was less correlated to G. (3) Let frac(G) = frac(S N−1 ), G = G−{g N }, and N = N − 1. The calculation in step (2) was continued until |r| > η. At this point, the feature parameter subset was the optimal feature parameter subset.
The flow diagram of the proposed method is shown in Figure 2.
as follows: (1) The FD of N-dimensional ordered feature parameter set G was calculated, denoted as frac(G).
(2) With the N-th feature parameter N g eliminated from G, there were N − 1 feature parameters, which constituted a new feature parameter subset SN−1. To distinguish between SN−1 and frac(G), the fractal dimension frac(SN−1) of SN−1 was named the local fractal dimension. According to calculation, r = frac(G) − frac(SN−1). If r η ≤ ( η was the threshold parameter), SN−1 was considered similar with G. Although the N-th feature parameter had been eliminated, it would not make a difference to G, which suggested that the N-th feature parameter was a highly redundant parameter that was less correlated to G.
The calculation in step (2) was continued until r η > . At this point, the feature parameter subset was the optimal feature parameter subset.
The flow diagram of the proposed method is shown in Figure 2.

Calculate the fractal dimension of G，frac(G)
Average MI: between each feature of F and the entire set Relevancy: between each unordered of F and the entire set Conditional Relevancy: between ordered feature of G and unordered feature of F Redundancy: between unordered feature of F and ordered feature of G

Motor Vibration Data Acquisition and Signal Analysis
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation as well as the experimental conclusions that can be drawn.
In this paper, the power motor (the U8 disc type brushless DC motor from T-MOTOR) of an unmanned multi-rotor gyroplane was taken as the research object, based on which a test was designed to monitor the vibrational signals during motor operation. Vibrational signals were used as the sample data for verifying the proposed method and motor performance degradation. The test system is shown in Figure 3. The working process was: the single chip microcomputer that was controlled by the control module of the computer sent pulse-width modulation (PWM) signals to the digital speed regulator that controlled motor operation. Motor vibration signals along X, Y and Zaxes were acquired using the acceleration sensor, which were then stored in the storage module of the computer. The modules of the test system were powered using the system power unit.

Motor Vibration Data Acquisition and Signal Analysis
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation as well as the experimental conclusions that can be drawn.
In this paper, the power motor (the U8 disc type brushless DC motor from T-MOTOR) of an unmanned multi-rotor gyroplane was taken as the research object, based on which a test was designed to monitor the vibrational signals during motor operation. Vibrational signals were used as the sample data for verifying the proposed method and motor performance degradation. The test system is shown in Figure 3. The working process was: the single chip microcomputer that was controlled by the control module of the computer sent pulse-width modulation (PWM) signals to the digital speed regulator that controlled motor operation. Motor vibration signals along X, Y and Z-axes were acquired using the acceleration sensor, which were then stored in the storage module of the computer. The modules of the test system were powered using the system power unit. This motor performance degradation test lasted 1062 h, during which the 1416 sample signals (each signal lasted 0.5 s) were captured and recorded at a time interval of 45 min from X, Y and Zaxes. As shown in Figure 4, the motor sample under test ran basically stably during 0-1016 h, but an abrupt change of its operating state was observed during 1017-1062 h. Such abnormality continued without any sign of weakening or disappearing. As shown in Figure 5, electron microscopy suggested noticeable abrasion on the surfaces of the inner and outer bearing races and bearing balls of the motor sample under test, which indicated that the motor sample under test had failed. Therefore, the motor vibration data acquired during0-1016 h was taken as the initial input data.  This motor performance degradation test was carried out at a 22.2 V rated operating voltage and 100% throttle. The test conditions are shown in Table 1. This motor performance degradation test lasted 1062 h, during which the 1416 sample signals (each signal lasted 0.5 s) were captured and recorded at a time interval of 45 min from X, Y and Z-axes. As shown in Figure 4, the motor sample under test ran basically stably during 0-1016 h, but an abrupt change of its operating state was observed during 1017-1062 h. Such abnormality continued without any sign of weakening or disappearing. As shown in Figure 5, electron microscopy suggested noticeable abrasion on the surfaces of the inner and outer bearing races and bearing balls of the motor sample under test, which indicated that the motor sample under test had failed. Therefore, the motor vibration data acquired during 0-1016 h was taken as the initial input data.

Motor Vibration Feature Extraction and Selection
The features of vibrational data acquired during motor operation were extracted from the perspectives of degradation description and life evaluation. In this study, the feature parameter extraction methods included time domain feature parameter extraction method [22], frequency domain feature parameter extraction method [23], wavelet packet band energy (WPBE) feature parameter extraction method [24], and entropy measure-based feature parameter extraction method [25]. The commonly used time domain feature parameters were mean value, variance (VAR), peak, root mean square (RMS), skewness, kurtosis, pulse, margin, waveform, and peak value; the commonly-used frequency domain feature parameters included gravity frequency (GF), meansquare frequency (MSF), and frequency variance (FV). Entropy-based feature parameters included amplitude spectrum entropy (ASE) and Hilbert marginal spectrum entropy (HMSE).
With the aforementioned feature parameter extraction method, the feature parameters of vibration data along X, Y, and Z-axes were extracted, thus obtaining the triaxial 24-dimensional feature parameters. The triaxial operating state features of the motor under test are shown in Figure  6 (taking RMS, MSF, and Hereditary hemorrhagic telangiectasia (HHT) energy spectrum entropy as an example). It could be seen that the feature parameters along X, Y, and Z axes differ from each other.

Motor Vibration Feature Extraction and Selection
The features of vibrational data acquired during motor operation were extracted from the perspectives of degradation description and life evaluation. In this study, the feature parameter extraction methods included time domain feature parameter extraction method [22], frequency domain feature parameter extraction method [23], wavelet packet band energy (WPBE) feature parameter extraction method [24], and entropy measure-based feature parameter extraction method [25]. The commonly used time domain feature parameters were mean value, variance (VAR), peak, root mean square (RMS), skewness, kurtosis, pulse, margin, waveform, and peak value; the commonly-used frequency domain feature parameters included gravity frequency (GF), mean-square frequency (MSF), and frequency variance (FV). Entropy-based feature parameters included amplitude spectrum entropy (ASE) and Hilbert marginal spectrum entropy (HMSE).
With the aforementioned feature parameter extraction method, the feature parameters of vibration data along X, Y, and Z-axes were extracted, thus obtaining the triaxial 24-dimensional feature parameters. The triaxial operating state features of the motor under test are shown in Figure 6 (taking RMS, MSF, and Hereditary hemorrhagic telangiectasia (HHT) energy spectrum entropy as an example). It could be seen that the feature parameters along X, Y, and Z axes differ from each other. According to the definition of mutual information given in Section 2.1, the information shared by the feature parameters along X, Y, and Z-axes was measured using the mutual information index. The distribution of mutual information between various feature parameters is shown in Figure 7 (taking X-axis as an example), where the horizontal axis means the arbitrary combination of two 24-dimensional feature parameters along the X-axis. Thus, there are 576 combinations. Each point represents the mutual information between any two feature parameters in the 24-dimensional feature parameter set of the motor along the X-axis, with its numerical values shown by gradient colors. According to calculations, the mutual information between various feature parameters along the X-axis was larger than 0 and the numerical value of mutual information between any two feature parameters differed from each other, which indicated that the information between various feature parameters along X-axis overlapped each other with certain relevance. Similarly, calculations also suggested that the mutual information, with different numerical values, between various feature parameters along Y and Z-axes was also larger than 0. This also evidenced that the information between various feature parameters along the Y and Z-axes overlapped each other, with certain relevance between them. According to the definition of mutual information given in Section 2.1, the information shared by the feature parameters along X, Y, and Z-axes was measured using the mutual information index. The distribution of mutual information between various feature parameters is shown in Figure 7 (taking X-axis as an example), where the horizontal axis means the arbitrary combination of two 24dimensional feature parameters along the X-axis. Thus, there are 576 combinations. Each point represents the mutual information between any two feature parameters in the 24-dimensional feature parameter set of the motor along the X-axis, with its numerical values shown by gradient colors. According to calculations, the mutual information between various feature parameters along the Xaxis was larger than 0 and the numerical value of mutual information between any two feature parameters differed from each other, which indicated that the information between various feature parameters along X-axis overlapped each other with certain relevance. Similarly, calculations also suggested that the mutual information, with different numerical values, between various feature parameters along Y and Z-axes was also larger than 0. This also evidenced that the information between various feature parameters along the Y and Z-axes overlapped each other, with certain relevance between them. The UFS-MIFD algorithm proposed in Section 2.2 was used to order the original feature parameter set of the motor under test along X, Y, and Z-axes by importance. The results of the importance ordering of feature parameters along the three axes, namely GX, GY, and GZ, are shown in Figure 8a-c, respectively. The UFS-MIFD algorithm proposed in Section 2.2 was used to order the original feature parameter set of the motor under test along X, Y, and Z-axes by importance. The results of the importance ordering of feature parameters along the three axes, namely G X , G Y , and G Z , are shown in Figure 8a-c, respectively.
It can be seen that the peak was the most important feature parameter in the original feature parameter set along the X and Y-axes while MSF was the most important feature parameter in the original feature parameter set along the Y-axis. Figure 8 also suggests significant differences between various feature parameters in the feature parameter sets along the three axes which reflected the difference between feature parameters along various axes.
The important orders feature parameters of the motor under test along the X, Y and Z-axes, namely G X , G Y , and G Z , were evaluated based on the feature parameter subset evaluation criterion mentioned in the Step 4 of Section 2.2, where the threshold parameter η = 0.05. Eventually, the feature subset S X of the X-axis contained the first 17 feature parameters of G X . Similarly, the feature subset S Y contained the first 16 feature parameters of G Y ; the feature subset S Z contained the first 13 feature parameters of G Z , as shown in Table 2. It can be seen that the peak was the most important feature parameter in the original feature parameter set along the X and Y-axes while MSF was the most important feature parameter in the original feature parameter set along the Y-axis. Figure 8 also suggests significant differences between various feature parameters in the feature parameter sets along the three axes which reflected the difference between feature parameters along various axes.
The important orders feature parameters of the motor under test along the X, Y and Z-axes, namely GX, GY, and GZ, were evaluated based on the feature parameter subset evaluation criterion mentioned in the Step 4 of Section 2.2, where the threshold parameter =0.05 η . Eventually, the feature subset SX of the X-axis contained the first 17 feature parameters of GX. Similarly, the feature subset SY contained the first 16 feature parameters of GY; the feature subset SZ contained the first 13 feature parameters of GZ, as shown in Table 2.    It is generally believed that major feature information can be covered by the first two-dimensional feature parameters fused by the subspace learning method. In this study, the operation state information of the motor under test was fused by the process of feature information fusion based on subspace learning shown in the third part of Figure 9 using subspace learning methods, such as KPCA [3], PCA [2], LPP [5], and LDA [4]. Thus, the two-dimensional integrated feature parameters of the motor operating states were obtained. The final fusion result is shown in Figure 9. It could be seen that the motor operating degradation paths described by KPCA, PCA, and LPP fluctuated less than that by LDA, which evidenced that the KPCA, PCA, and LPP performed better in describing the motor operating state than LDA. It is generally believed that major feature information can be covered by the first twodimensional feature parameters fused by the subspace learning method. In this study, the operation state information of the motor under test was fused by the process of feature information fusion based on subspace learning shown in the third part of Figure 9 using subspace learning methods, such as KPCA [3], PCA [2], LPP [5], and LDA [4]. Thus, the two-dimensional integrated feature parameters of the motor operating states were obtained. The final fusion result is shown in Figure 9. It could be seen that the motor operating degradation paths described by KPCA, PCA, and LPP fluctuated less than that by LDA, which evidenced that the KPCA, PCA, and LPP performed better in describing the motor operating state than LDA.

Health-Fault Diagnosis of Motor
As shown in Figure 10, the "health-fault" states of the motor under test were identified based on the feature fusion result of motor operating state obtained in Section 4. Before the use of UFS-MIFD, information fusion of the original feature parameter set was made using the aforementioned four subspace learning methods. The result of health-fault states obtained based on the information fusion according to the two-dimensional integrated feature parameters F1 and F2 is shown in Figure 10a. Information fusion of the optimal feature parameter subsets SX, SY, and SZ was made using the aforementioned four subspace learning methods after the use of UFS-MIFD. The result of "health-

Health-Fault Diagnosis of Motor
As shown in Figure 10, the "health-fault" states of the motor under test were identified based on the feature fusion result of motor operating state obtained in Section 4. Before the use of UFS-MIFD, information fusion of the original feature parameter set was made using the aforementioned four subspace learning methods. The result of health-fault states obtained based on the information fusion according to the two-dimensional integrated feature parameters F 1 and F 2 is shown in Figure 10a. Information fusion of the optimal feature parameter subsets S X , S Y , and S Z was made using the aforementioned four subspace learning methods after the use of UFS-MIFD. The result of "health-fault" states obtained based on the information fusion according to the two-dimensional integrated feature parameters F 1 * and F 2 * is shown in Figure 10b. It can be seen that an even better health-fault state diagnosis could be observed using two-dimensional integrated motor parameters. In the following sections, quantitative evaluation of the diagnostic result will be made.  Quantitative evaluation of the health-fault state diagnosis shown in Figure 10 was carried out using cluster evaluation index D. The form of evaluation index D is shown as follows [26]: where S w 1 and S w 2 represent the within-class scatter matrices (covariance matrices) of health and fault state samples, which can be used to characterize the distribution of various state sample points around the mean value; tr(S w 1 ) and tr(S w 2 ) are the traces of the within-class scatter matrices of the two state samples, and a smaller value means more concentrated internal distribution of various state samples and better aggregation; S b is the between-class scatter matrix of health and fault state samples, which characterizes the distribution of various state samples in the space. The expression of S b is given as follows: where P(i) is the prior probability of i-th class state samples; is the trace of the between-class scatter matrix of the two classes of state samples. A larger tr(S b ) suggests more scattered distribution of various state samples, which better helped to distinguish motor states. Therefore, the health-fault state diagnosis evaluation index D could be expressed as the ratio between the sum of the traces of within-class scatter matrices of the two classes of state samples and the sum of the traces of between-class scatter matrices of the two classes of state samples. A smaller D suggested better efficacy of the subspace learning algorithm in distinguishing the health-fault states. The evaluation result of the health-fault state diagnosis effect shown in Figure 10 is given in Table 3. It could be seen from Table 3 that the information fusion performance of the four subspace learning methods-KPCA, PCA, LPP, and LDA-was found improved after using UFS-MIFD for feature selection, which enabled them to distinguish the motor health-fault states more correctly and clearly. In addition, the degree of performance enhancement is related to the selection of the subspace learning algorithm.

State Prediction of Motor
Motor state prediction was conducted using the Elman neuron network prediction method based on the discussion above. As shown in Figure 11, Elman is a typical dynamic recurrent neuron network. Unlike common neuron network structures, Elman additionally contains an association layer that is designed to memorize the output value of the hidden layer at the previous moment. It is equivalent to an operator with one-step delay, which provides the whole network with the dynamic memory function. The mathematical model of Elman neuron network is as follows: where u(k − 1) is the input of the input layer node; x(k) is the output of the hidden layer node; y(k) is the output of the output layer node; x c (k) is the feedback state vector; ω x ij , ω y ij , and ω u ij are the connection weight matrices from the input layer to hidden layer, from associative layer to hidden layer, and from hidden layer to output layer, respectively; g(·) is the transfer function of neurons in the output layer; f (·) is the transfer function of neurons in the hidden layer, and Sigmoid function is usually used; α is the self-feedback gain factor, where 0 < α < 1.
Motor state prediction was conducted using the Elman neuron network prediction method based on the discussion above. As shown in Figure 11, Elman is a typical dynamic recurrent neuron network. Unlike common neuron network structures, Elman additionally contains an association layer that is designed to memorize the output value of the hidden layer at the previous moment. It is equivalent to an operator with one-step delay, which provides the whole network with the dynamic memory function. The mathematical model of Elman neuron network is as follows:

Associate Layer
Hidden Layer Figure 11. Elman neuron network structure. Figure 11. Elman neuron network structure.

Output Layer
In this study, the two-dimensional integrated feature information of motor operating states was predicted. The first 1234 points of feature parameters were used to train the Elman neuron network model, thus obtaining an Elman neuron network training model where 50 points were taken as the input and one point as the output. The data collected from 1235-th to 1294-th points served as the verification data to verify model precision and make parameter adjustment. The rest of the 60 points after the 1294-th point were predicted using the aforementioned model. Root mean square error (RMSE) was used to predict the error between the predicted results and observed values based on the following formula [27]: where X pre,i is the predicted value; X obs,i is the observed value; n is the number of points to be predicted. Prediction results are shown in Table 4. Prediction results above suggested enhanced fusion feature prediction precisions of all four subspace learning algorithms after using UFS-MIFD for feature selection. This also indicated that UFS-MIFD contributed to the performance enhancement of subspace learning algorithms.

Conclusions
To overcome the information fusion performance decline of subspace learning algorithms caused by the redundant and irrelevant features in the multidimensional feature parameter set, the mutual information and fractal dimension-based unsupervised feature selection algorithm is studied. A UFS-MIFD method is proposed using various theories and methods, including original feature extraction method, mutual information, and fractal theory, in response to the long computing time, high time complexity, and the possibility of failing to identify the optimal solutions that plague previous unsupervised feature selection algorithms. With this method, a feature importance ordering algorithm that takes the relevance and redundancy of features into overall consideration is developed. The optimal feature subset is identified by eliminating the highly-redundant feature parameters with low relevance to the whole feature parameter set based on the fractal dimension-based feature subset evaluation criterion. In addition, a performance degradation test of brushless DC motor of multi-rotor UAV is designed to verify the proposed method based on the vibration signal data. To verify the proposed UFS-MIFD, the information fusion performance of subspace learning algorithms before and after the use of UFS-MIFD is compared by measuring the motor health-fault diagnosis capacity and motor state prediction effect. Comparison results suggest that UFS-MIFD can play a role in enhancing the information fusion performance of subspace learning methods. Not only is the proposed method able to reduce the negative influence of irrelevant and redundant features and excessive dimension on subsequent algorithms and decisions and enhance the precision and stability of subsequent research results, but it is also of high engineering value since it can be used for the feature selection of large volumes of unlabeled data. With limited data of the motor under test, however, there is still room for the improvement and optimization of the proposed method with the increase of test subjects and sample size. Moreover, because the application of the proposed method in this paper is specific, the proposed method can be applied to the feature selection of vibration signals of similar UAVs' operating systems. In other words, it is not clear if the behavior of the proposed method will be the same for different types of signals of other applications. Therefore, the adaptability and universality of the proposed method will be further discussed and investigated in the following research.