A Comparative Study of Information-Based Source Number Estimation Methods and Experimental Validations on Mechanical Systems

This paper investigates one eigenvalue decomposition-based source number estimation method, and three information-based source number estimation methods, namely the Akaike Information Criterion (AIC), Minimum Description Length (MDL) and Bayesian Information Criterion (BIC), and improves BIC as Improved BIC (IBIC) to make it more efficient and easier for calculation. The performances of the abovementioned source number estimation methods are studied comparatively with numerical case studies, which contain a linear superposition case and a both linear superposition and nonlinear modulation mixing case. A test bed with three sound sources is constructed to test the performances of these methods on mechanical systems, and source separation is carried out to validate the effectiveness of the experimental studies. This work can benefit model order selection, complexity analysis of a system, and applications of source separation to mechanical systems for condition monitoring and fault diagnosis purposes.


Introduction
In many physical systems, the measured signals can be modeled as a superposition of a finite number of the sources with additive environmental noises, and many signal processing methods such as principal component analysis (PCA) [1], blind source separation (BSS) [2,3] and independent component analysis (ICA) [4][5][6] have benefited from this model and achieved wide use in engineering applications. A key and primary issue of these subjects is the estimation of the number of unknown sources from the mixed signals before an effective source separation. Furthermore, it is still challenging to estimate the source number for mechanical systems due to the complicated mixing of the sources and the transmission effects of the mechanical structures.
In the past decades, many researchers have focused their interests on source number estimation methods and their engineering applications, and proposed many approaches to solve this problem. Ye et al. [7] studied the general BSS problem satisfying m greater than or equal to n and gave the validations by computer simulations on artificially synthesized data. Fishler et al. [8] studied a MDL-type estimator that was robust against deviation from the assumption of equal noise level across the array. Bai et al. [9] proposed an information-based method to estimate the number of independent dipole sources from electroencephalograms (EEGs). Jiang et al. [10] proposed a new source number estimation method called beam eigenvalue method (BEM). Huang et al. [11,12] proposed to utilize the minimum mean square error (MMSE) of the multistage Wiener filter to calculate the required description length for encoding the observed data, instead of relying on the eigenvalues of the data covariance matrix. Hu et al. [13] proposed a sound source number and directions estimation method under a multisource reverberant environment and gave experimental validations. Ma et al. [14] proposed a source number estimation method based on modified K-means clustering. Cheng et al. [15] proposed an independent component analysis-based source number estimation methods and applied it to mechanical systems. Han et al. [16] proposed a source number estimation method based on uniform linear arrays (ULAs) and the newly proposed nested arrays. Dosso et al. [17] considered localizing an unknown number of ocean acoustic sources when the properties of the environment are poorly known. Sadhu et al. [18] proposed a decentralized model identification method utilizing the concepts of sparse blind source separation and parallel factor decomposition, which can solve underdetermined blind source separation problems. All the above studies have provided effective ways to estimate the number of sources from different types of the sources. However, the studies on the source number estimation for mechanical systems are very few, and there is still a lot of work to do before an effective application of source number estimation methods to mechanical systems can be put forth.
Unlike the eigenvalue decomposition-based source number estimation method which requires a threshold, information criteria-based methods do not need any parameters for adaptively estimating the number of sources from the mixed signals, and the algorithms are also easier for calculation and perform efficiently in the applications. The key issue on the information-based methods is to find the extremum values of the constructed objective functions based on information criteria, such as Akaike Information Criterion (AIC) [19,20], Minimum Description Length (MDL) [21][22][23], and Bayesian Information Criterion (BIC) [24,25]. However, it has been shown that AIC suffers from the computational problem [26]. MDL also suffers from high computational load and performs well only in the presence of spatially and temporally white noise [27,28]. Furthermore, they have a shortcoming in that the estimation performance is sensitive to the signal to noise ratio (SNR) and the sampling length, and thus the results may be unreliable. Studies on the source number estimation by BIC are rarely found, and the traditional BIC will overflow in the calculation if the sampling length is too large. Furthermore, all the three information-based methods are rarely applied to estimate the source number of mechanical systems whose sources are normally mixed according to linear superposition and nonlinear modulation.
Therefore, this paper studies comparatively the performances of both eigenvalue decomposition-based and information-based source number estimation methods on mechanical sound signals, and improves BIC as IBIC to make it easier for calculation and efficient for the data with a large sampling length. Both linear superposition and nonlinear modulation are considered in the numerical case studies, and a test-bed with three sources is constructed to test the performances of the eigenvalue decomposition-based and information-based methods on source number estimation for mechanical systems. This study can benefit for the model order selection, complexity analysis of a system, and applications of source separation to mechanical systems for condition monitoring and fault diagnosis purposes.
The remainder of this paper is organized as follows: in Section 2, we introduce the theoretical background and investigate the mathematical mechanisms of the eigenvalue decomposition-based source number estimation method, and information-based source number estimation methods entitled as AIC, MDL, and IBIC. In Section 3, we test the performances of these methods on typical mechanical signals with both a linear superposition and a nonlinear modulation. In Section 4, a test bed with three sound sources is constructed to further test the performances of these methods on real mechanical systems, and the effectiveness of the experimental studies is validated by source separation and spectral analysis. Finally, Section 5 summarizes the conclusions.

Source Number Estimation Methods
Consider m observed mixed signals measured in different locations and composed by n source signals . Assume that the observed signals can be described by the following model with white noises of and mixing matrix of : As the source signals and mixing mode are normally unknown for many physical systems, a crucial problem associated with this model is to estimate the number n of source signals from an N finite set of observations before an effective source separation. Constitute from an N finite set of observations. Then the covariance matrix of is given by ( is the expected function). Denoting the eigenvalues of R by , which is a log-likelihood function used to estimate the maximum likelihood of source number n is defined as follows: Where γ is a threshold. The benefit of eigenvalue decomposition is that the source number can be estimated just based on the distributions of eigenvalues, and the crucial step is just a reasonable threshold γ. However, different types of mixed signals have different distributions of eigenvalues, which makes it impossible for a threshold γ for all the applications. Now we comparatively introduce and investigate another three information-based source number estimation methods which can determine the source number adaptively.
(2) Akaike Information Criterion (AIC) The information theoretic criterion for the model order selection or source number estimation, introduced by Akaike [19] is used to determine the number of signals which gives the minimum AIC, defined by: The first term, −2N(m−n)lgL(n), is the well-known log-likelihood of the maximum likelihood estimator of the parameters of the model. The second term, 2n(2m−n), is the bias correction term inserted so as to make AIC an unbiased estimator.
(3) Minimum Description Length (MDL) [19] Inspired by Akaike's work, Rissanen [21] proposed a different approach to select the model or estimate the source number based on the concept of the shortest code length for the data. It can be achieved by the following criterion: Note that apart from a factor of 2, the first term is identical to the corresponding one in AIC, while the second term has an extra factor of ½ lgN.
(4) Bayesian information criterion (BIC) [29] Minka [24] proposed another criterion for estimation of the dimensionality of the data (source number) on basis of Bayesian model selection. It is referred to as the Minka Bayesian model selection (MIBS) which is given by (assume that is the positive scale parameter, λ is the eigenvalue of the covariance matrix of mixed signals X, m is the number of mixed signals X, n is the number of sources S, and N is the data length): where: and is identical with except for where . In order to estimate the latent dimensionality of the data (source number), we choose the value n that maximizes Equation (5). The simplification of MIBS is the BIC approximation, which drops all terms that do not grow with N: In practice, it causes overflow in calculating Equation (7) when the value of N is large (Normally N is a big number). Therefore, we take the logarithm to overcome the overflow problem and entitle it as the improved BIC (IBIC), which significantly decreases the calculating time without a loss of the accuracy. The objective function of the IBIC is as follow equation:

Numerical Case Study
In this section, we numerically generate typical signals of mechanical systems to comparatively study the effectiveness of the different source number estimation methods. These generated source signals consider the modulation effects of mechanical systems, and the mixed signals are composed of the sources through a linear superposition and a weak nonlinear mixing. The generating functions of the source signals are listed below: In the numerical case study, is a sinusoidal signal that simulates the simple harmonic vibration of mechanical systems; is a frequency modulation signal that simulates the frequency modulation effects of mechanical systems; is an amplitude modulation signal that simulates the amplitude modulation effects of mechanical systems; is a white noise signal that simulates the noises produced by the structural transmission and environment. The waveforms of the source signals are shown in Figure 1.

Source Number Estimation for Linearly Mixed Signals
Since the number of the mixed signals should be no less than the number of the source signals for an accurate source separation or system identification, and the source number estimation methods based on the information criteria also require more mixed signals, in the numerical case study we provide six mixed signals composed by the given source signals with a linear superposition matrix A given by:  Figure 2 displays the waveforms and spectra of the mixed signals, which indicates that it is a difficult task to directly estimate the source number for complex waveforms and complex Fourier spectra with many major components. Therefore, source number estimation methods are required to reveal the complexity of the mixed signals.  Table 1 lists the eigenvalues' distribution of the covariance matrix for the mixed signals. Obviously eigenvalues decrease significantly from to , even , which means that there are 4 principal components contained in the mixed signals (from the definition of principal component analysis [1]). Therefore, the threshold γ can be determined as and thus there has . The result also shows that the threshold γ can be easily determined and the source number estimation based on eigenvalue decomposition is effective for the given linear superposition case.  Figure 3 shows the source number estimation by the information-based source number estimation methods: as the source number n increases, the normalized objective functions of AIC and MDL decrease fast from n = 1 to n = 4, and obtain the minimum values −0.0809 and −0.0772 as n = 4, while the normalized objective function of IBIC changes greatly from n = 3 to n = 5, and obtains the maximum value 1.0000 as n = 4. From the definitions of the information-based methods, all these methods accurately evaluate the source number n* = 4 for the given numerical case with the linear superposition. It should be noted that AIC and MDL obtain very similar results from n = 1 to n = 5, and the normalized values for n = 4 are very close to that for n = 5, while the normalized values of IBIC for n = 4 is obviously far from n = 3 and n = 5, which means that the IBIC is more robust and reliable than AIC and MDL for the given case. Therefore, it can be concluded that all the four source number estimation methods are effective for the given numerical case study, and the eigenvalue decomposition-based method and IBIC are more robust and reliable than AIC and MDL as they have very wide boundaries to accurately determine the source number. However, the eigenvalue decomposition-based method requires a reasonable threshold γ normally determined by prior knowledge of the system or experiences, while the other three methods can adaptively estimate the source number.

Source Number Estimation for Weakly Nonlinearly Mixed Signals
In this section, a nonlinearity mixing factor σ on the modulation sources is considered in the mixing process, and thus the performances on the nonlinearly mixed signals of all the source number estimation methods are comparatively studied. The mixed signals are composed by the sources in Figure 1 with both a linear superposition and a nonlinear modulation mixing, and their generating functions are shown in Equation (11) The nonlinearity mixing factor σ reveals the modulation effects of the mechanical systems with many sources, and the nonlinear mixing mode is always considered as a technical bottleneck for source separation or source number estimation. Therefore, the factor σ is considered as a parameter to test the performances of source number estimation algorithms. It should be noted that the nonlinearity mixing factor σ is given an initial value 0.0001 to satisfy the logarithm function in Equations (4), (5) and (9), and the information-based methods are comparatively studied as the nonlinearity mixing factor σ increases.
The accuracy rates of the given three information-based source number estimation methods are displayed in Figure 4, which shows that AIC, MDL and IBIC fail to correctly estimate the source number when the factor σ is up to 0.0013, 0.0015, and 0.0974, respectively. The comparative study results indicate that AIC and MDL give similar performances and they are all sensitive to the nonlinearity mixing factor σ ( ), and MDL performs a little better than AIC toward the nonlinearity mixing factor σ but not significantly, while IBIC performs more robustly toward the nonlinearity mixing factor σ ( ). Therefore, IBIC performances more robustly and reliably toward the nonlinear mixing effects of mechanical systems, and this property guarantees more wide engineering applications of IBIC as most physical systems have nonlinear mixing effects. The waveforms of the mixed signals for are shown in Figure 5, which also shows that it cannot directly estimate the source number just from the complicated waveforms.   Figure 6 displays the performances of AIC as the nonlinearity mixing factor σ equals to 0.0013, 0.0015, and 0.0974: AIC decreases significantly from n = 1 to n = 5 and obtains the minimum value -0.2014 as n = 5 for the case. Furthermore, AIC has similar values for the and cases, and obtains minimum values of -0.0742 and -0.0749 as n = 5. However, for the case, AIC obtains a minimum value -0.0756 as n = 4, while AIC = 0.7322 as n = 3 and AIC = -0.0739 as n = 5. Therefore, AIC fails to correctly estimate the source number as the nonlinearity mixing factor σ increases to 0.0013, and the objective function decreases fast for the case, which means that the nonlinear modulation effects influence the accuracy rates of AIC greatly.   Figure 7, it can be seen that AIC and MDL perform closely for all these three cases, and the only difference is that MDL is a little more robust to the nonlinearity mixing factor σ than AIC.   IBIC values close to each other for case, which means that the maximum values of IBIC are difficult to be determined and IBIC becomes less robustly and reliably. Table 2 displays the eigenvalues of covariance matrix with different factors σ: the related eigenvalues are very close to each other for σ = 0.000, 0.0013, and 0.0015 cases, while is up to 0.9791 for the case. From the definition of eigenvalue decomposition-based source number estimation method, causes AIC failure for the case, and causes MDL failure for the case. However, IBIC begins to fail for the case, which also indicates that IBIC is much more robust to the modulation effects than AIC and MDL. Furthermore, the distributions of all the eigenvalues show that it is not difficult to set a threshold for the eigenvalue decomposition-based source number estimation method. However, normally it is very difficult to set a reasonable γ without any prior knowledge of the sources and their distributions. Therefore, it can be concluded that the eigenvalue decomposition-based source number estimation method is difficult to carry out without any prior knowledge of the sources, while the information-based methods can adaptively and accurately estimate the source number for the linear superposition cases. However, for the cases with nonlinear modulation effects, IBIC performs more robustly and reliably than AIC and MDL, which reveals more wide engineering applications of IBIC.

Experimental Study
In general, it is difficult to directly measure the source signals in most mechanical systems due to the limited accessibility, and thus signal processing is often required to separate and recover the source information from the mixed signals normally measured by remote sensors. Then, these separated source signals can be used for further purposes such as a condition monitoring and a fault diagnosis of mechanical systems. However, a source number estimation from the measured and mixed signals should be carried out for a prior knowledge to source separation or complexity analysis of the systems. In this section, we apply the source number estimation methods mentioned above to a mechanical system shown in Figure 9 to demonstrate and benchmark their performance on mechanical systems.

Introduction of the Test Bed
Aiming at vibration and noise source number estimation for mechanical systems, this study designs a test bed based on a shell structure, which is composed by an end cover, a shell, clapboards, and supports. The whole test bed is supported by four rubber air springs, which can reduce the influences of environmental noises. There are three sound sources: two of them are loudspeakers controlled by the signal generators, and the other one is a motor controlled by the frequency converter. The structure and photo of the test bed are shown in Figure 9. Since vibration and noises of thin shell structures can approximately be governed by linear differential equations [30], we consider the test bed with linear but weakly nonlinear features.
Six sound pressure sensors are used to measure the sound information, and they are installed in different directions of the test bed with a distance of 0.5 m. A HBM Gen2i data acquisition system is applied to collect the sound data from these six sensors. The framework of the measuring system is shown in Figure 10, and the testing parameters are listed in Table 3.

Source Number Estimation
The sound source signals are measured with just one source working at the parameters given in Table 4, and thus three sound sources can be measured as the references to test the source number estimation methods. The waveforms of the source signals are shown in Figure 11, which clearly displays the periodic features of the source signals. As all the three sources are working together, the signals from all the sound pressure sensors around are the mixed signals, and their waveforms are shown in Figure 12. Obviously it is very difficult to correctly estimate the source information from the mixed signals due to complicated waveforms. Both eigenvalue decomposition-based source number estimation method and information-based source number estimation methods are applied to estimate the sound source number of the given test bed. The eigenvalues of the covariance matrix for the mixed signals in different parameters are shown in Table 4. Obviously, it is difficult to determine the source number just based on the distributions of the eigenvalues as , and are very close to each other. The results of source number estimation by information-based methods are shown in Figure 13, which clearly shows that AIC and MDL obtain minimum values −0.9806 and −0.8171 as n = 4, while IBIC also obtains the maximum value 1.0000 as n = 4. Therefore, all the information-based source number estimation methods provide an estimated source number 4 for the given experimental study. However, there are only three sources in the test bed. The inconsistent results confuse us seriously but  The fast ICA algorithm [6,31] is applied to source separation from the given six mixed signals, and the parameter of the source number is given an initial value 4. Thus we obtain four independent components from the mixed signals, and their waveforms are shown in Figure 14, which displays that the waveforms of three separated components are similar to that of the related source signals in Figure 11, and the other separated signal has complex waveforms and is obviously different from the given source signals.  The spectra of the source signals and the separated components are displayed in Figure 15 and Figure 16. Comparing Figure 15 with Figure 16, the spectrum of the separated component 1 is similar to that of the source signal 1 with a same significant frequency of 3,000 Hz, and the spectrum of the separated component 2 is similar to that of the source signal 2 with a same significant frequency of 1,600 Hz. Furthermore, the spectrum of the separated component 3 is close to that of the source signal 3, and the major frequency energy is contained in 500-1,500 Hz. However, the spectrum of the separated component 4 is very complicated from 0 to 5,000 Hz, which is similar to the spectra of white noises. Therefore, the separated component 4 that we consider as an environmental noise has a considerable energy compared with the other three sources, and causes the source number to be 4 rather than 3. After the source separation and spectral analysis, the correct source number of the given test bed should be 4 rather than 3, which provides convincing evidence for the accuracy of all the information-based methods.
In general, like the similar values of the eigenvalues, it is very difficult for the eigenvalue decomposition-based method to accurately and robustly estimate the source number without any prior information about the sources. However, all the information-based methods correctly and adaptively estimate the source number of the test bed, which further reveals that the mixing mode of the sound sources tends to be a linear superposition, and thus guarantees that the information-based methods are effective to the sound signals.

Conclusions
This paper investigates both eigenvalue decomposition-based and information-based source number estimation methods, and improves BIC as IBIC to make it more efficient and easier to calculate. Furthermore, their performances on nonlinear modulation effects of mechanical systems are studied comparatively with numerical case studies and experimental studies.
In the numerical case study with a linear superposition case, the eigenvalue decomposition-based method has a wide band to determine the threshold , and all the three information-based methods accurately estimate the source number. Furthermore, IBIC performs more robustly and effectively as obvious differences of objective functions. For the weakly nonlinearly mixing case, AIC and MDL are very sensitive to the modulation factor σ and become inaccurate with a very small σ, while IBIC performs more robustly and reliably toward σ, which means that IBIC is more effective to mechanical systems. In the experimental studies, a test bed with three sound sources is constructed to test the performances of the above methods. The eigenvalue decomposition-based method is difficult to determine the threshold as the last three eigenvalues are very close to each other. However, all the three information-based methods obtain extremum values as n = 4. Using a source separation, four independent components are extracted from the mixed signals, and the other source which has a considerable energy with the given three sources is considered as an environmental noise. Therefore, all the information-based methods are effectively to sound source number estimation for the given test bed.
Generally, IBIC performs more robustly and reliably toward the nonlinear modulation effects than AIC and MDL, while eigenvalue decomposition-based methods normally require prior information about the sources, and becomes confused when the eigenvalues are very close to each other. Furthermore, the results of information-based methods for the test bed also indicate that the mixing mode of the sound sources tends to be a linear superposition. This study can benefit for model order selection, complexity analysis of a system, and applications of source separation to mechanical systems for condition monitoring and fault diagnosis purposes.