An Acoustic Signal Enhancement Method Based on Independent Vector Analysis for Moving Target Classification in the Wild

In this paper, we study how to improve the performance of moving target classification by using an acoustic signal enhancement method based on independent vector analysis (IVA) in the unattended ground sensor (UGS) system. Inspired by the IVA algorithm, we propose an improved IVA method based on a microphone array for acoustic signal enhancement in the wild, which adopts a particular multivariate generalized Gaussian distribution as the source prior, an adaptive variable step strategy for the learning algorithm and discrete cosine transform (DCT) to convert the time domain observed signals to the frequency domain. We term the proposed method as DCT-G-IVA. Moreover, we design a target classification system using the improved IVA method for signal enhancement in the UGS system. Different experiments are conducted to evaluate the proposed method for acoustic signal enhancement by comparing with the baseline methods in our classification system under different wild environments. The experimental results validate the superiority of the DCT-G-IVA enhancement method in the classification system for moving targets in the presence of dynamic wind noise.


Introduction
In wild environments, the unattended ground sensor (UGS) system is usually employed to acquire military intelligence about intruding targets by detecting and processing their image, acoustic and seismic signals, etc. [1,2]. Compared with other signals, the applications based on acoustic signals provide a simple, portable and easily implementable scheme. Hence, the acoustic signals are widely used in the UGS [3][4][5]. Especially, the classification of moving targets by using of microphone array is of great importance in UGS. However, the acoustic target classification module of the UGS system applied in a wild environment faces a great challenge of complicated and dynamic noise interferences [6].
Mel-frequency cepstral coefficients (MFCCs) are the most widely-used features in speaker and speech recognition [7][8][9]. Meanwhile, we have studied that it can also achieve a satisfactory accuracy in moving acoustic target classification [10,11]. Nevertheless, in the wild environments, the acoustic signals of moving targets are usually contaminated by strong wind noise. Besides, wind noise is ubiquitous since it cannot be totally insulated by windshields. Moreover, the MFCC features are extremely sensitive to the noise interference, which can largely exacerbate the performance of the classification system because of the mismatches between training datasets and test datasets.
To improve the robustness of features and thereupon achieve a satisfactory classification accuracy, the signal enhancement methods are adopted to effectively improve the quality of the acoustic signal by emphasizing the desired component and restraining the interference noise [12][13][14]. The enhancement methods can be categorized into two groups: single-channel techniques and multi-channel techniques. The most-used method with a single microphone is the Wiener filter [15,16], whereas, the amount of noise reduction is in general proportional to the amount of target signal degradation. In addition, the Wiener filter previously required the power spectrum of the signal and noise, which is extremely difficult to achieve in practical application.
Therefore, the enhancement methods based on a microphone array using multi-channel signals are more applicable. The most-used classical methods are the delay-and-sum beamformer (DS) and the minimum variance distortionless response beamformer (MVDR). DS is a widely-used beamforming technology for its simplicity, and several microphone array systems based on DS have been implemented since the 1980s [17][18][19]. Although DS has a simple structure, it requires a large number of microphones to achieve a high performance. However, the microphone arrays in UGS usually consider a relatively small number of microphones. MVDR is very popular for speech enhancement applications [20][21][22], while it is extremely sensitive to the location of source and microphone gains. Thus, it is not suited to moving acoustic targets. Furthermore, to employ DS and MVDR in the microphone array system, the directions of arrival (DOAs) of the acoustic sources are required, then each source signal is separately obtained using the directivity of the array. This means that the beamformers are quite dependent on the estimation of the DOA. Hence, to avoid the limitation of DOA estimation and other disadvantages, blind source separation (BSS) methods have become attractive in the signal enhancement process in recent years [23][24][25][26].
Independent component analysis (ICA) is a classical and the most-used algorithm for the BSS issue [27][28][29], which performs very well in the instantaneous mixing problem, but such a mixing condition is very rare and unrealistic. Actually observed signals are convolutive mixtures in the wild, which means that they are mixed with time delays and convolutions. As an expansion algorithm of the conventional frequency domain ICA [30,31], IVA is designed to retain the dependency within individual source signals, while removing the dependency between different source vectors [32,33]. Instead of a univariate source prior, the dependency in each source vector is retained by adopting a multivariate source prior in IVA. Moreover, IVA separates source signals by estimating an instantaneous unmixing matrix on each frequency bin. It settles the frequency domain BSS problem [34] effectively without suffering from the inherent permutation problem [35] between the frequencies by utilizing the dependencies of frequency bins. From these advantages, IVA is a great method to settle the convolutive blind source separation (CBSS) issues. In the wild environments, the target signal is contaminated by complicated and strong acoustic noise, especially wind noise, which is dynamic and non-additive. When enhancing the moving vehicle signals collected by the microphone array in the wild, it can be considered that observed signals are convolutive mixtures of interference noise and target signal. Hence, we adopt the IVA method to get the enhanced signals for the subsequent moving target classification in UGS.
Furthermore, the nonlinear score function is of vital importance in the IVA algorithm, which can be derived from the source prior [36]. Aside from the multivariate Laplace distribution in the original IVA, several different multivariate distributions have been adopted as the source prior recently [37][38][39]. Especially, the performance of the multivariate generalized Gaussian distribution when it is applied to the IVA application has been studied [40]. In this paper, we adopt a particular multivariate generalized Gaussian distribution as the signal source prior, which can better preserve the dependency across different frequency bins. Moreover, the corresponding nonlinear score function additionally contains an expression about frequency domain energy correlation between the elements of each source vector. Thus, the score function contains more information for the dependency structure, which can thereby better preserve the inter-frequency dependency to improve separation performance. Therefore, our source prior performs better in UGS for moving targets signals, and the experimental results corroborate its superiority.
Moreover, when we employ the improved IVA method for moving target classification in the UGS system, there is also a problem about the contradiction between convergence speed and computational cost in separation process. Therefore, we propose an adaptive variable step strategy in the learning process of the unmixing matrix, which is able to achieve a fast convergence optimization with less computational cost.
In the conventional IVA algorithm, discrete Fourier transform (DFT) is generally used to get the information of the signals about the frequency domain. Here, we convert the time domain signals into the frequency domain using DCT [41] instead of DFT in our work. The reason for choosing DCT is that it is superior to DFT for the transformation of real signals, such as an acoustic signal. For a real signal, DFT gives a complex spectrum and leaves nearly one half of the data unused. In contrast, DCT generates real spectrums for real signals and, hence, avoids the unnecessary computation of redundant data [42], which is more favorable in the UGS system.
Combining a particular multivariate generalized G aussian distribution source prior, an adaptive variable step strategy and DCT with the IVA algorithm, this paper presents a robust signal enhancement method, termed as DCT-G-IVA, which aims to improve the performance of moving target classification in the wild environments. In the meantime, we design a target classification system of moving targets with the improved IVA enhancement method. The experimental results reveal that the classification system with our enhancement method outperforms those with the baseline enhancement methods, indicating the superiority of our proposed method.
In general, this paper has the following contributions: 1. Introducing the IVA algorithm to acoustic signal enhancement based on a microphone array in the wild environments. 2. Presenting an improved IVA method, DCT-G-IVA, which adopts a special multivariate generalized Gaussian distribution as the source prior, an adaptive variable step strategy for learning algorithm. Besides, we employ DCT instead of DFT to convert the time domain observations to the frequency domain. 3. Designing a moving target classification system with the aforementioned DCT-G-IVA enhancement method in UGS and achieving a satisfactory classification accuracy.
Aside from the present section, the remainder of this paper is organized as follows. In Section 2, we formulate the problem that our signal enhancement method solves. Section 3 describes the proposed IVA algorithm, which achieves signal enhancement based on a microphone array in the wild. The system of the target classification is further illustrated in Section 4. In Section 5, the experiments are conducted to evaluate our acoustic signal enhancement method in the target classification system. Concluding remarks are presented in Section 6.

Signal Model and Problem Formulation
In this section, we formulate the observed signals obtained by the microphone array system. Furthermore, we present the IVA problem for the multichannel acoustic signals. To begin with, we introduce some indispensable notations used in this paper.

•
The superscript * denotes the conjugate of the complex number.

•
The superscript H denotes the Hermitian transpose of the matrix, and the superscript T denotes the transpose of the matrix.

•
The italic E[·] denotes the statistical expectation.
• Plain characters denote scalar variables; boldfaced lowercase characters denote vector variables; and boldfaced uppercase characters denote matrix variables.
We consider a signal model in which an M-element microphone array captures convolved observations in the wild. Suppose that N source signals s i (t), i = 1, . . . , N are mixed and collected at that M sensors. The observed signal x j is formed as: where a ij (l) represents the impulse response from source i to sensor j, which has L length. Here, we assume that the number of sources N is already known, and the number of sensors M is no less than N.
Besides, the presence of additive noise n(t) within the above mixing model significantly complicates the BSS problem. It is reduced by applying preprocessing, such as denoising the observed signals through the regularization approach [43]. In the wild environment, because the dominant interference on the moving target signals is wind noise, which is non-additive and convoluted with the source signals, therefore, we do not pay attention to n(t) and drop it in the following work.
The separation system is typically comprised of a set of finite impulse response (FIR) filters w ij (l) of length L to produce N separated signals: at the outputs. The separation filters are estimated, and w ij (l) should be obtained blindly, i.e., without the knowledge of s i (t) and a ij (t).
When an observed signal is converted to the frequency domain, the convolution becomes multiplicative in the frequency domain. In the following, f denotes the frequency bin, and frequency domain by the M × N unknown non-singular mixing matrix A f in the corresponding frequency bin. Then, we have the matrix representation of the problem as: where A f is the frequency domain response function matrix corresponding to a ij (t), which should be invertible.
The purpose of the IVA algorithm is to compute the estimation of the source signals. We have the Here, for convenience, we assume that x i f is already preprocessed to be zero-mean and white with the eigenvalue decomposition (EVD) from here on, then the rest is to rotate the whiten data to find the independent components.

Independent Vector Analysis
Here, we mainly discuss the cost function and the learning algorithm of IVA. Typically, IVA is achieved by minimizing the mutual information among the estimated SCV as: where H(·) is the (Shannon relative) entropy [44], which is defined as Equation (6). I(y n ) is the mutual information within the n-th estimated SCV, and C is a constant term, which does not depend on the estimated signals, but only on the observations. For the ease of notation, we drop the subscript of J IVA in the rest of this article. Equation (5) shows that minimizing the cost function simultaneously minimizes the entropy of all components and maximizes the mutual information within each SCV. It is also evident that the mutual information is responsible for resolving the permutation ambiguity across multiple datasets, since without the mutual information of the SCVs, the cost function would be identical to that of ICA.
where q (y n ) is the probability density function (PDF) of vector y n . Then, we can rewrite the cost function as: When the cost function is minimized, the dependency between different source vectors s i should be removed, but the interrelationship between the components within each vector should be retained. The inter-frequency dependency is modeled by the PDF of the source.
In order to enhance the observed acoustic signals of moving targets, the unmixing matrices are updated in every frequency bin. Let η denote the iteration step in the learning algorithm. The k + 1-th update procedure is formulated as: To compute the unmixing matrix, we apply the natural gradient (NG), which is well known as a fast convergence method [45]. The natural gradient learning rule is given as: where ∆W f ≡ {∆w ij f } means the gradient matrix. R in f is the scored correlation at the current frequency bin; Λ ii is equal to R ii f ; and Λ in is zero when i is not equal to n. Moreover, . . , y i F denotes the nonlinear score function about the i-th source in the f -th frequency bin, which can be obtained as: where ϕ f is the derivative of the logarithm of the source prior.

The Multivariate Generalized Gaussian Source Prior
In order to keep the inter-frequency dependency of each source, the original IVA algorithm exploits a multivariate Laplace distribution as: where µ i denotes the mean vector of the i-th signal, which is assumed as a zero vector, and α is the scale parameter. In this paper, we consider the IVA with the above multivariate Laplace distribution source prior as the L-IVA method. As such, the nonlinear score function to extract the i-th source at the f -th frequency bin can be formulated as: where σ f i denotes the standard deviation of the i-th source at the f -th frequency bin, which is usually set as one. Obviously, Equation (13) is a multivariate function, and the dependency between the frequency bins is thereby accounted for in the learning process.
In our work, we give the formulation of the family of multivariate generalized Gaussian distributions as: where α, β ∈ R + are separately scale and shape parameters and ∑ i is a diagonal matrix, which implies that each frequency bin sample is uncorrelated with the others. If α is set properly, it becomes the multivariate Gaussian distribution when β = 1. Moreover, let α = 1 and β = 1 2 ; we can obtain the multivariate Laplace distribution as Equation (12). The previous work in [40] indicates that if β < 1, the multivariate generalized Gaussian distribution has a heavier tail, which can have an advantage in separating the nonstationary acoustic signals. Consequently, to apply a better nonlinear score function using Equation (14), β is chosen as 1 3 , and α is chosen as one for simplicity in our IVA method. In the meantime, there is a theoretical demonstration of why a shape parameter of 1 3 can perform better, and experimental studies found that the best separation performance can be achieved [46].
Moreover, the acoustic signal is real-valued. Then, we have the source prior as: This particular source prior can better preserve the inter-frequency dependencies compared with the original multivariate Laplace source prior. We term the IVA method using the prior of Equation (15) as G-IVA. Then, the nonlinear score function can be derived from the aforementioned source prior as: All the L-IVA and G-IVA methods in this paper adopt an adaptive variable step strategy presented in the following.

Adaptive Variable
Step for IVA Due to slow and poor convergence through nonlinear optimization, the conventional IVA method inherently has a significant disadvantage, particularly when adopting an improper initialization in the learning process of unmixing matrix. To be specific, when a large step size is set in the learning algorithm, a fast convergence speed may be acquired, while the optimal solution is also easy to miss. However, with a small step size, the global optimal solution can be reached; meanwhile, a slow convergence speed will arise. Even worse, the learning process may be too slow to reach convergence, which leads to the failure of the separation.
In order to make a good tradeoff between the computational cost and fast-convergence optimization, here we propose an adaptive variable step strategy in the updating process of the unmixing matrix W f , instead of the fixed iterative step size in the conventional IVA. The adaptive step size adjustment method is put forward as: where ∆J = J (k + 1) − J (k) is the control variable and c 1 and c 2 are two empirical parameters. With the above adaptive variable step strategy, our IVA algorithm can automatically select the step size to achieve a faster convergence with a lesser iteration number. In addition, this adaptive variable step strategy is easy to implement with a low computational cost, which is suited to the UGS system.

Parameters' Selection
We have proposed an adaptive variable step strategy for the IVA learning process. Here, we discuss how to select the empirical parameters c 1 and c 2 . We choose two speech signals from the TIMIT database [47] as the source signals. After implementing a randomly convolutive mixing, we conduct the adaptive IVA method with the adaptive variable step strategy to separate the mixtures. The separation performance is assessed using the inter-symbol-interference (ISI), which can be formulated as: where R is the correlation matrix of the original speech signals and the separated signals and R ik means the correlation coefficient of the i-th source and the k-th separated signal. Notation max k |R ik | denotes the absolute value of the largest element in each row of R, which corresponds to the source signal. Ideally, the correlation coefficient of a source and the corresponding separated signal equals one, and the ISI (R) equals zero. In our simulation experiments, the parameter c 1 is fixed at 1.15, and c 2 is varied from 0.01-2; the results are shown in Table 1. Moreover, we also make a comparison of iterations for convergence. During the learning process, The IVA algorithm runs until the decrement of the cost function ∆J is less than 10 −9 . When c 2 is greater than three, the adaptive IVA method fails to achieve convergence.
From Table 1, we can notice that the appropriate range of values for c 2 is 1.2-1.4. Moreover, we fix the parameter c 2 to 1.22 and vary c 1 from 0.1-3. The results are shown in Table 2, from which we find that the appropriate range of values for c 1 to be 1.0-1.2. Besides, when c 1 is greater than four, the adaptive IVA method fails to achieve convergence. The adaptive IVA algorithm is not very sensitive to c 1 and c 2 , yet proper parameters are required to be selected. For comparison, the IVA method with a fixed iterative step needs more than 100 iterations to reach convergence.

DCT versus DFT
From a computational viewpoint, differing from the DFT transform using sine and cosine functions, DCT uses only cosine functions to express the signal. Typically, the DCT X(k) of the time domain signal x(n) is formulated as: where N denotes the length of the time domain signal. Furthermore, the inverse discrete cosine transform (IDCT) is easier. By summarizing the results of the base function in the frequency domain multiplying the corresponding amplitude, we can get the corresponding values of the time domain elements. The IDCT x(n) of the frequency domain signal X(k) is computed by: where k is the frequency bin index, and the coefficient c d (k) is given by: The energy of most natural signals, such as acoustic signals and image signals, are concentrated in the low frequency domain. The decorrelation performance of the DCT is close to the Karhunen-Loeve (KL) transform, which has the optimal property of decorrelation [48]. Moreover, DCT coefficients have a better "energy concentration" feature than DFT coefficients, namely we can express the time domain data x(n) with a lesser number of DCT coefficients.
From a practical viewpoint, although the computational complexities of DFT and DCT both are O N 2 , DCT only uses cosine functions to express the signal, namely it consumes about half the computation and memory that DFT consumes. Furthermore, IDCT has the ability to reconstruct the time domain data of acoustic signals better. Considering such advantages of DCT, we adopt the short-time DCT to get the frequency domain information of the acoustic signals for our application.

The System of Moving Target Classification
In this section, we design a classification system for moving targets based on the microphone array and proposed improved IVA method in the wild. The purpose of our study is to provide a reliable and stable classification system of moving targets using the proposed DCT-G-IVA method. This system focuses on the DCT-G-IVA method, which enhances the acoustic signals in the presence of dynamic wind noise in the UGS system. The implemented system of the moving target classification is demonstrated in Figure 1. We evaluate the proposed enhancement algorithm for acoustic signal enhancement by comparing with the baseline enhancement methods in the classification framework of MFCC + GMM, which is composed of the classic feature MFCC and the popular classifier Gaussian mixture model (GMM) [49,50]. While using the IVA algorithm to enhance the observed signal, we consider the actual observed signal as a convolutive mixture of the real target signal and other interference signals. The system presented in Figure 1 starts with the adaptive IVA algorithm for enhancing the observations of the moving targets obtained by the microphone array. Because the moving vehicle signals have been well enhanced, the MFCC features are more robust, and the classification performance can be improved effectively. There is a significant advantage of using IVA that it does not require much a priori knowledge of the target signal and the noise signal. Hence, the IVA method is suited for the acoustic signal enhancement in the wild, where the interference noise is dynamic and complicated.
In the IVA methods, the slow convergence can generate signal distortion, then result in a separation failure. Thus, we propose the adaptive variable step approach during updating the unmixing matrix in our IVA method, which is demonstrated in Figure 2. It is noteworthy that we also use a particular multivariate generalized Gaussian distribution as the source prior as the G-IVA. To validate their performance, we have studied the comparison between G-IVA and the original L-IVA in our classification experiments specified in Section 5. Meanwhile, we adopt DCT instead of DFT in the original IVA algorithm to reduce the computation and memory cost in UGS.

Experimental Description
In this paper, we collected our acoustic samples of observed signals by the microphone array developed in [51][52][53]. Combining the microelectromechanical systems (MEMS) technology with the uniform circular array (UCA), a four-element MEMS microphone (ADMP504) UCA with a radius of R = 0.02 m was deployed. Then, the inputs of these microphones were transferred into separate channels of a four-channel 16-bit simultaneous ADC, which was sampled at a rate of 8192 Hz. The microphone array system we adopted is shown in Figure 3a, and the experimental scenario is also depicted in Figure 3b, where the UCA was placed 10 m away from the road. In the meantime, the different experimental environments are shown in Figure 4.
In our IVA-based separation process for signal enhancement, we use a 1024-point DCT and Hanning window to convert the time domain observed signals to the frequency domain. The length of the window is 1024 with a 75% overlap. Besides, a 1024-point fast Fourier transform (FFT) is used to compute DFT in the STFT domain to do a comparison. When the adaptive variable step strategy is using, the two empirical parameters c 1 and c 2 are separately set as 1.15 and 1.22 in the subsequent experimental confirmation. During the learning process, the IVA algorithms run until the decrement of the cost function ∆J is less than 10 −9 . Besides, we make the comparison among DCT-G-IVA, DFT-G-IVA, DCT-L-IVA and DFT-L-IVA in our classification experiments. In the classification system, the number of filters in MFCC is 24 in the feature extraction. While the model is training, the number of the Gaussian functions in GMM is 10. Such a combination could achieve an optimal performance in the experiments.

Datasets
To show the robustness of our improved IVA algorithms in the signal enhancement application, experiments are conducted using a total of hundreds of sample signals collected under different wild environments. The composition of our labeled sample set is shown in Table 3. In our experiment, we have collected no less than 224 audio signals from three kinds of vehicles and noise. Therein, tracked vehicles (TV) usually have the longest detection distance. The time duration of each collected audio sample is different; thus, we consider a frame-based classification. Thus, in the classification process, each audio sample is divided into frames, and the frame size is 1024 with no overlap between adjacent frames. Figure 5 demonstrates the spectrums of the noise signal and different vehicle signals, which were collected in a suburban district around Nanjing in December 2015, and the wind power level was around five. From Figure 5a, we notice that the spectrum of wild noise (mostly wind noise) covers a wide range. Actually, interference noise is ubiquitous in real-world environments and can seriously impair the quality and intelligibility of target signals. Meanwhile, it is noticeable that the spectrum of the moving vehicle target mainly resides in the low frequency part in the Figure 5b-d. Moreover, The spectral content of a vehicle signal is approximately regular and is mainly dominated by the engine and exhaust system [54]. The spectral contents of interest are composed of a limited number of pitches and harmonics; thus, we can notice that there are some distinct spectrum lines in the spectrums of vehicle signals. An unknown target can hardly be recognized if its harmonics are contaminated by wind noise. In Figure 5, we can almost distinguish the spectrum lines of TV and truck in the presence of wind noise, while the spectrum lines of car are difficult to find, which indicates that the noise contaminates the vehicle signals in the wild environments.

Result Analysis
In our classification experiments, we train the classifiers on the training sets with different sizes to get a comprehensive comparison [55]. Namely, we randomly select the training sets from our labeled datasets at different percentages. Moreover, ten-fold cross-validation is employed to evaluate the classification performance. The classification accuracies of our classification experiments are presented in Table 4. As mentioned previously, the results of target classification are frame-based.
Classification experiment results in Table 4 have validated the superiority of adopting the proposed DCT-G-IVA method for signal enhancement. For comparison, we also adopt two microphone array-based enhancement methods of DS and the average-channel signal as baseline algorithms. DS is a classical approach for signal enhancement with microphone array sensors, and we employ the multiple signal classification (MUSIC) algorithm to estimate the DOAs for DS [56].  Table 4, compared with the baseline enhancement algorithms, our IVA-based methods contribute better to the classification.
Moreover, DCT is preferable to DFT in the transformation of the acoustic signal. Hence, the results of the last four columns in Table 4 present that the IVA methods adopting DCT perform better in the classification experiments. Particularly, the classification accuracy with the DCT-G-IVA method achieves an excellent result, which is up to 96.33%.  Figure 6 is the comparison of the classification probabilities of each target type using two L-IVA signal enhancement methods, which use the DFT and DCT to get the frequency domain information of moving target signals. Likewise, Figure 7 shows the comparison of the classification probabilities of each target type using two G-IVA signal enhancement methods, which use the DFT and DCT to get the frequency domain information of moving target signals. The horizontal axis and vertical axis of Figures 6 and 7 represent the percentages of the training sets in our datasets and the classification probabilities of noise, TV and wheeled vehicles. Here, the wheeled vehicles refer to the car and truck in Table 3. Here, we randomly select the training sets from our labeled datasets at different percentages, while the test sets are the remaining sample data. The classification probability of each type denotes the mean probability of correct classification for all target signals of this type, which can reflect the classification performance from another perspective. Besides, the results of classification probabilities are signal-based.  Because the observed signal is a mixture of many signals and is a more Gaussian signal due to the central limit theorem, the multivariate generalized Gaussian distribution source prior can model the signals more accurately and exploit the frequency domain energy correlation within each source vector. Moreover, the particular multivariate generalized Gaussian distribution we adopted has a heavier tail compared with the original multivariate Laplace distribution, and it can preserve the dependency through different frequency bins to utilize more information describing the dependency structure and provide improved source separation performance. This accounts for the results in Figures 6 and 7 and Table 4 that the G-IVA algorithms perform better than the L-IVA algorithms. Furthermore, DCT-G-IVA presents the best performance in the Figure 7, which achieves a wonderful result such that the mean classification probabilities of noise, tracked vehicle and wheeled vehicle are respectively up to 0.9850, 0.9794, and 0.9845.
All of the experiments were executed in the MATLAB R2015b environment of an industrial computer (8-core, 3.6-GHz frequency and 8-GB memory) to process the datasets detailed in Section 5.2. The execution times of separation using the four IVA methods are shown in Table 5. Furthermore, the convergence of the adaptive variable step IVA was very fast, taking less than 38 iterations. To be specific, the G-IVA only took 34 iterations, while the L-IVA took 37 iterations for convergence. In contrast, the IVA with a fixed step took around 300 iterations to converge. Besides, DCT only uses cosine functions to express the signal, namely it consumes about half the computation and memory of the DFT. Hence, our proposed DCT-G-IVA method can achieve a better enhancement performance, thereupon getting a higher classification accuracy with a lower computational consumption. The results in Tables 4 and 5 validate the above points.  In addition, when using FFT to compute DFT coefficients, the sample size needs to be limited to 2 n , n ∈ Z + , even though it can be solved by using zero padding. However, DCT does not limit the sample size, which is more convenient and applicable for the hardware implementation in the UGS system.

Conclusions
In this paper, a method for acoustic signal enhancement in the wild based on a microphone array and IVA has been proposed. We term our proposed method DCT-G-IVA, which adopts a multivariate generalized Gaussian distribution as the source prior, an adaptive variable step strategy for the learning algorithm and DCT instead of DFT to convert the time domain observed signals to the frequency domain. Moreover, we employ the improved IVA method on the UGS system to enhance the acoustic signals collected by the microphone array, thereupon classify the moving targets using the enhanced signals. Experiments are conducted to evaluate the DCT-G-IVA method for acoustic signal enhancement by comparing with the baseline methods in the classification system of MFCC+GMM under different wild environments. According to the experimental results, the adaptive G-IVA and L-IVA algorithms outperform the classic baseline enhancement methods in the wild environment with wind noise existing. Especially DCT-G-IVA performs best with the highest classification accuracy of 96.33% and relatively less computational cost. Finally, the results suggest that the proposed signal enhancement method is very suited to the microphone array deployed in UGS with a high classification accuracy and a good reliability to resist dynamic wind noise and other interferences. Furthermore, the proposed method could also provide an inspiration to other applications such as speaker recognition in a noisy environment.