Time–Frequency Mask-Aware Bidirectional LSTM: A Deep Learning Approach for Underwater Acoustic Signal Separation

Underwater acoustic signal separation is a key technique for underwater communications. The existing methods are mostly model-based, and cannot accurately characterize the practical underwater acoustic communication environment. They are only suitable for binary signal separation and cannot handle multivariate signal separation. However, recurrent neural networks (RNNs) show a powerful ability to extract the features of temporal sequences. Inspired by this, in this paper, we present a data-driven approach for underwater acoustic signal separation using deep learning technology. We use a bidirectional long short-term memory (Bi-LSTM) approach to explore the features of a time–frequency (T-F) mask, and propose a T-F-mask-aware Bi-LSTM for signal separation. Taking advantage of the sparseness of the T-F image, the designed Bi-LSTM network is able to extract the discriminative features for separation, which further improves the separation performance. In particular, this method breaks through the limitations of the existing methods and not only achieves good results in multivariate separation but also effectively separates signals when they are mixed with 40 dB Gaussian noise signals. The experimental results show that this method can achieve a 97% guarantee ratio (PSR), and the average similarity coefficient of the multivariate signal separation is stable above 0.8 under high noise conditions. It should be noted that our model can only handle known signals such as test signals for calibration.


I. INTRODUCTION
At present, underwater acoustic communication [1] mainly uses sonar technology to detect, locate and identify the underwater targets.However, the sonar technology needs to overcome the noises like ship noise and ocean noise [2]- [4].Source separation technology is a good way to reduce the noises [5]- [8], which attracts tremendous research from both academia and industry.Among these source separation methods, the blind source separation (BSS) is a classical method [9]- [11], which consists of mathematical model, objective function, separation algorithm and evaluation criteria [12], [13].During the research of the BSS algorithm, two approaches are always studied and employed.One is based on independent component analysis (ICA) [14] which works well when the number of sources N is less than or equal to the number of sensors M .The use of ICA is not limited to linear instantaneous mixing, it is also used to solve the separation problem of convolutional mixing and even nonlinear mixing.Another relies on the sparseness of source signals which works well when N is more than M , like binary T-F mask approach [15].The binary T-F mask approach extracts a signal by calculating a binary masking matrix of the signal.It has the advantage of real-time, and in recent years it has also been applied to underwater acoustic separation in combination with underwater sound characteristics.
In view of underdetermination in underwater acoustic communication, this paper studies the method of binary time-frequency mask based on sparsity.The traditional binary T-F mask method chooses features which are performed manually, by using the observation signals.Due to the outliers and distribution of anisotropic variance, the traditional feature extraction method has certain limitations: it can only be used in binary signal separation, but the effect is poor in multiple signal separation, which cannot meet the requirements of separation accuracy.At present, the improvement of binary T-F masking method still stays in feature design [16]- [18].However, it is not easy for human experts to design good features.These artificial features are easily affected by outlier problems and have strict requirements on the selection of source location.
As an alternative, on top of the traditional binary T-F masking, the method of extracting the original features of the underwater acoustic source using the deep neural network has shown good performance.At present, this method has been used to solve image recognition, natural language processing (NLP) and even communication problems [19].Deep learning approach [20]- [23] also makes a breakthrough in the separation of signals.Therefore, we extract the features of the underwater acoustic signals by means of deep learning approach.The main contribution of this work are as follows: (1) We propose a deep learning method based on Bi-LSTM.This method uses the powerful feature extraction capability of RNN, not only improves the performance of separating binary signals, but also achieves good results in ternary or multivariate signal separation experiments.This overcomes the limitations of the previous separation of single targets from deep learning sources.
(2) We improved the training sample with the idea of embedding: embedding each T-F point into a high-dimensional space so that each T-F point can be represented as a vector, and then adding energy-based reference labels to the training sample.This makes the T-F points of different sources more distinct and clustering easier in the process of neural network learning.
(3) We have carried out a lot of experiments on the separation performance of this method by using the unknown noise generated randomly and the marine noise actually collected.The experimental results show that this method can effectively separate the noise as long as the number of clustering K is increased.It is proved that this method still has good robustness and scalability in the actual Marine environment with sufficient complexity.
The rest of the paper is organized as follows.In Section II, we introduce traditional system model of the underwater acoustic source separation.Then, in Section III, we present the proposed approach description, including offline training and online test.Section IV presents the experiments.Finally, conclusion is drawn in Section V.

II. MAINSTREAM METHOD: BINARY TIME -FREQUENCY MASKING METHOD
The binary T-F mask approach separates the underwater acoustic signals according to the auditory masking, using the underwater acoustic source which dominates the energy in a certain T-F domain.Although the target signals received by the system have the varying degrees of frequency band overlap, the main energy of different target signals is usually hidden in different frequency bands.So, the binary mask approach can use this property to realize underwater acoustic signal separation by clustering the T-F bins.To cluster such T-F bins, the traditional method uses the observation signals and calculates manually to gets features.
where X i (t, f ) is the Short Time Fourier Transform (STFT) of signal x i (t).By using STFT, the signals in the time domain can be transformed to the T-F domain which can satisfy the property of sparsity.Geometric features for clustering are calculated based on this constraint.
This condition can also be understood as the fact that the overlap of the T-F domain is a relatively small portion of one of the underwater acoustic signals, so that ignoring the information of this part does not affect the recovery of the entire signal.

B. Signal Separation Steps In Underdetermined Case
This approach can be summarized in Fig. 1 .Based on the sparsity condition of absolute dominance of energy, in the case of underdetermined, the idea of using the binary T-F masking method for water acoustic blind separation is as follows: (1) STFT.Let the sampling frequency of the observation signal be f s , and convert the time domain signal x(t) into the T-F domain representation by using the T-point STFT transform: where t is the time frame index, f is the frequency point, T is the length of the window, and L is the moving length of the window.win(r) represents the window function.Commonly used are rectangular window, Hanning window and Hamming window.In the subsequent Inverse Short Time Fourier Transform (ISTFT), we used the Hanning window to transform to ensure consistent parameters.
(2) Feature extraction.The source signal X(t, f ) satisfying the sparse condition is obtained by STFT transform, and the feature vector Θ(t, f ) is calculated therefrom.On this eigenvector, there are differences between different sources, which can be measured by distance.The Θ(t, f ) is generally composed of the geometric characteristic magnitude α(t, f ) and the phase difference φ(t, f ) between the observed signals.
Taking two observation signals X 1 (t, f ) X 2 (t, f ) as an example, the eigenvector Θ(t, f ), the order of magnitude α(t, f ), and the phase difference φ(t, f ) can be calculated by the following equations: The phase difference is usually normalized to avoid frequency sequencing problems, and the above equation can be written as.
Expanded to the case where there are multiple observation signals, the order of magnitude α(t, f ) and the phase difference φ(t, f ) are expressed as: where, A(t, f ) is the normalization coefficient of order of magnitude; β j = β = 4πd max /c, j = 1, ..., n is the weight coefficient of phase difference, subscript B represents the label of the reference observation signal, c represents the sound propagation speed, and d max represents the maximum distance between the reference observation signal and other observation signals.
Express Θ(t, f ) as a plural form with the following equation: Normalization of the above equation yields a eigenvector representation of the multi-observed signal: From the equations, we can know that Θ(t, f ) which is the features extracted by us are influenced by all kinds of aspects.
(3) Cluster analysis.Clustering the feature vector Θ(t, f ) can obtain m clusters C 1 , ..., C m corresponding to m source signals.Past clustering methods have manual clustering [24], kernel density estimation [25], or maximum likelihood(ML) based gradient search method [26].Because K-means clustering has the characteristics of simple, convenient and fast convergence, it has become the most commonly used method for cluster analysis.K-means can minimize the sum Υ of the Euclidean Distances (ED) of each source signal and the corresponding cluster center c k , and automatically divide the samples into m clusters.The equation is expressed as: First, m cluster centers c 1 , c 2 , ..., c m are randomly initialized, each feature vector is assigned by iterative equation (15).Then the feature vector Θ(t, f ) closest to the mean vector c k is found and assigned as a cluster: Calculate the mean of all feature vectors belonging to c k and correct the cluster center: Substituting the updated mean vector into the equations ( 13) and ( 14) calculates the objective function Υ.If Υ converges, then the set C k , k = 1, 2, ..., m corresponding to each source is obtained after the iteration ends.
(4) Binary T-F masking.Using the results obtained by clustering, a binary T-F masking matrix is constructed.The binary T-F masking matrix is a matrix consisting of 0 and 1 values and whose size is consistent with the T-F matrix.This is similar to the binary test in spectrum sensing [27]- [30].The matrix sets the mask value to 1 or 0 according to whether each T-F point belongs to the target signal, indicating whether the T-F point information belongs to the source signal.
Substituting the following equation gives the spectrum of the estimated signal: (5) Inverse Short-Time Fourier Transform (ISTFT).After obtaining the T-F domain estimation, the final step needs to complete the recovery of the time domain signal y k (t) using ISTFT and overlap retention method [31]: When using ISTFT, the parameters need to be the same as those of STFT using equation (2).Where A is a constant, related to the window function, A = 0.5T /L when using Hanning window, and y d k (t) is expressed as follows: where, r = t − mL.

C. Evaluation Of Separation Performance
In order to verify the separation performance of the algorithm after adding noise, we simulated the binary time-frequency masking method.The T-F masking method requires the signal to meet the conditions of WDO or energy dominance.Therefore, the LFM signal is selected for simulation to facilitate the aliasing operation of the signal at time and frequency.The detailed experimental process is described in Section IV.The experimental results show that when there is no noise, each signal can be well recovered, and the method can correctly divide the T-F region of each signal.Once noise is added, performance deteriorates.The estimated masking matrix not only loses some information of the signal itself, but also receives the T-F information of other signals.

III. PROPOSED METHOD
In recent years, deep learning has been successfully applied in speech separation [32]- [34], and these previous attempts have generally assumed that the numbers and types of sources are fixed.However, in the case of underwater acoustic signal separation, we have to consider two problems: 1) the model can be used to separate arbitrary kinds of underwater acoustic sources, i.e. generalization problem; 2) the model can be used to separate arbitrary numbers of underwater acoustic sources, i.e. scalability problem.Unlike previous attempts and in this article, we use deep learning methods to learn a mapping for the input that is amenable to clustering, and it is helpful to overcome the above two shortcomings.The architecture of the proposed method is illustrated in Fig.In fact, in order to achieve good separation performance after clustering, it is required that the clustering features have good distinguishing characteristics.In recent years, many studies have used deep neural networks [35] to obtain powerful characterizations for clustering [19], [28], [36]- [42], which have achieved good results in image recognition and NLP.They are characterized by embedding the original data features into the new feature space, making the transformed features more suitable for clustering.In addition to the target underwater acoustic signal, there are also ship radiated noise and Marine environment noise in the sonar system.Due to varying degrees of decay in the ocean, the main energy of these noises is concentrated in different frequencies.The main sound source frequencies are shown in table I For a communication sonar, to receive transmitted signals from other sonar platforms, the receiving bandwidth of the receiver is about 100Hz to 3000Hz, and the receiver has prior knowledge of these detection signals [19].According to the characteristics of underwater acoustic signals, if using neural network to "divide" different types of signals in the water audio frequency domain, and then using Fourier transform signal processing method to restore the signal, finally the target signal can be separated.
According to the embedding principle, the role of the deep neural network used in this section is to map the original features (immediate frequency features) of the measured data to the new feature space.Each T-F point is converted into a vector.Each vector has a different position in the new feature space, depending on the amount of energy at the T-F point.These vectors are then "divided" into a number of reasonable ranges based on the distance between the vectors.That is, the T-F vectors belonging to the same underwater sound source have similarities such that the distance is the smallest, and the T-F vectors belonging to different underwater sound sources have a large distance.Finally, they can be easily divided by a simple clustering algorithm.
Suppose a mixed water acoustic signal is transformed by STFT to obtain the original T-F characteristic X t,f ∈ R T * F , where t is the number of time frames and f is the frequency point.
Taking the logarithmic amplitude spectrum 20log10(|X t,f |) as the input of the network, for the convenience of description, the latter is uniformly recorded as |X|.|X| can also be regarded as a sequence [χ 1 , χ 2 , ..., χ T ] composed of spectral information χ i ∈ R F over a plurality of consecutive times.The deep neural network is parameterized by ω, and the features generated based on the network are expressed as: The goal of training is to allow the line vector of the network output feature Θ to be divided into different water sources.That is, θ j satisfies the vector distances belonging to the same water source, and the vectors belonging to different water sources are far away, so as to achieve the purpose of separating the underwater sound.
Assuming that there is a mixed underwater sound source in the water area, it is composed of C kinds of underwater sound sources: Therefore, the loss function of the model can be set as: where • 2 F is the squared Frobenius norm [43].In the process of minimizing the loss function, the two vectors divided in the same water source will be closer and closer, and the distance between the two vectors divided under different water sources will be farther and farther.At the same time, since (Y P )(Y P ) T = Y Y T exists for any permutation matrix P, the method can ensure that the label arrangement and the number of all training samples are independent.

B. Offline Training: Test Network Design Based on RNN, LSTM and Bi-LSTM Respectively
Input and reference label processing: First, randomly take [2 C] underwater acoustic audio files from the file library and mix them according to equation (20).Each audio file needs to be averaged before entering the network training: The mixing coefficient α is randomly taken as an arbitrary number of the16 interval [3/4, 1].
According to equation ( 2), the mixed signal is 32 ms window length, 8 ms time shift STFT, and the log amplitude spectrum X is taken.For a 16 s audio, it can be split into 500 samples of size 706.At the same time, take logarithmic amplitude spectrum of each source signal that makes up the mixed signal, and compare the magnitude of energy at each time and frequency point together to form the reference label Y with the same shape as X.To ensure local accuracy, each iteration consists of a sequence of time steps from multiple input samples of X and Y, and each sequence is 50% overlapped to form a minimum batch-pair network for training.
In the offline training phase, in order to more clearly introduce the proposed Bi-LSTM structure of this paper and highlight its superiority with other neural networks, we tested it based on three structures: RNN, LSTM and Bi-LSTM.In addition, since LSTM is closely related to Bi-LSTM, the following section will first give a brief description of the adopted LSTM structure, followed by a detailed introduction of Bi-LSTM.After the investigation, we know that feature extraction can be performed using RNN.In particular, we use LSTM networks which is an improvement of RNN in this work [44].LSTM can form a deep LSTM network by stacking, repeating the loop body at each moment to enhance the expressive ability of the model.The parameters of the loop body of each layer are the same, and the loop body parameters of different levels can be different.A schematic diagram of the network structure for water acoustic separation using multi-layer LSTM is shown in Fig. 4. By stacking, the neural network can learn deeper expressions and finally embed them into the Kdimensional features.

Structure 2(Bi-LSTM-based):
The transmission of the two network structures, RNN and LSTM, is one-way from front to back, that is, the state at time t can only capture information from the past sequence x 1 , ..., x t−1 and the current input x t .For some problems, however, the prediction of the output may depend on the entire sequence.For example, in speech recognition, some words currently have multiple interpretations, which need to be judged in combination with context and context.Therefore, the processing of the voice needs to refer to the pronunciation information of the past and the future in order to have a more accurate effect.
It is also possible to encounter the same problem in the field of underwater sound.For example, in underwater acoustic communication, underwater waves use sound waves instead of radio waves because of the serious attenuation of underwater waves.Therefore, in underwater communication, The Bi-LSTM consists of two LSTMs of the same size and opposite start of the time series.
Fig. 5 shows the structure of a water-acoustic separation network based on Bi-LSTM.Where h (t)   represents the state of the sub-LSTM that propagates information from t = 1 to T (to the right) by time.h (t) represents the state of the sub-LSTM in which the information moves backward from t = T to 1 (to the left), and can be obtained by substituting the reverse sequence into equation ( 24)-( 28).
The specific operation of the unidirectional sub-LSTM layer is as follows.Given an input sequence X = {X 1 , . . ., X T }, this model can be iteratively computed from t = 1 to T , which is composed of the following: W and b are weights and biases, and i,f ,o and c are the input gate, forget gate, output gate and cell activation vector respectively.σ is the logistic sigmoid function.
Therefore, at each time point t, the output unit can obtain information about the past sequence with respect to the input h (t) and the relevant information about the future sequence of the input h (t) .After two sub-LSTM layers, we use a dense layer to obtain Θ t which is the output of the X i as where h l t is the output of the final LSTM layer and φ is the Relu activation function.By minimizing the loss value, some parameters will adaptively change as the advancement of the learning process.
In the following experiments, the paper extracted the characteristics of the underwater acoustic signal using the above three networks (RNN, LSTM, Bi-LSTM) in the offline training phase.
In the online test phase, combined with STFT and binary time-frequency masking methods, we obtained the corresponding experimental data of the three networks.Experiments have shown that the Bi-LSTM structure has the best performance.

C. Online Test
Different models are trained and applied to the traditional binary T-F masking framework.
The processing flow of the method is basically the same as the processing flow of the binary T-F masking method.The main steps are as follows: (1) Select the underwater acoustic signal in the test set for mixing to obtain a mixed underwater acoustic signal.The signal is de-equalized, normalized, and the signal is subjected to STFT (the parameters of the STFT in the test phase are consistent with the STFT parameters in the training phase), and finally |X| is obtained as an input; (2) Using the trained model, the original feature X of the signal is transformed into a new embedded feature Θ.Since the new feature is just a matrix of dimension T * F K when it is output from the network, in the actual processing, we need to reshape the data, convert its dimensions into T F * K, and facilitate subsequent cluster analysis; (3) Cluster analysis.Clustering analysis of feature Θ using K-means algorithm; (4) T-F masking.According to the set Ω k obtained by clustering, the corresponding binary T-F masking matrix M k (t, f ) is set and substituted into the equation (17).Obtaining a T-F domain estimate of the source signal; (5) Time domain recovery.The source signal Sk (t, f ) estimated in the above step is subjected to ISTFT estimation according to the equation ( 18) to obtain a time domain waveform sk (t) of the source signal.
The clustering algorithm is used to classify this feature of the neural network output so that the vector θ belonging to the same underwater sound source can be divided into a group.Set each vector of "similar" to 1, and set other vectors that are not similar to 0. The new array dimension is reconstructed into a matrix of T * F , which is the binary masking matrix corresponding to the water source.

A. Experimental conditions
The experiment selected a hydroacoustic audio data in SHIPEAR as a data sample [45].Since its establishment, the database has been used for research on ship noise reduction, detection, identification, etc., especially for the application of deep learning technology [46] [47] [48].
The hydroacoustic data on this database was collected by a hydrologist from the Atlantic coast of northwestern Spain, by researcher David and others from the University of Vigo in Spain.
The composition of the database is shown in table II.The sonar audio, ship radiation noise and background noise form A, B, and C signals, respectively, and each audio is selected to be about 6 seconds in length for testing.The sample sampling rate is unified to 44100 Hz.
In addition, we also simulated the binary time-frequency masking method.By comparing the effects of binary separation and multiple separation, the superior performance of the proposed method is proved.For the binary time-frequency masking method, this section selects three LFM signals for simulation, which facilitates the aliasing operation of the signals at time and frequency.Simulate three LFM signals, the sampling frequency is 50 kHz, and the time length is 1 s.The specific parameters are shown in table III.
In the training stage, we try to train the model with the maximum mixture number of 3. So, we randomly select two or three files from the training set to mix in every iteration.Then we use the model to separate every possible underwater acoustic mixing source.We design the  network structure with two LSTM layers with 600 hidden cells and a full connection layers with 100 cells corresponding with the embedding dimension K. Stochastic gradient descents with momentum 0.9 and fixed learning rate 10 −5 was used for training.The Relu function is used as the activation function for the output layer.n order to prevent the network from over-fitting and improve the generalization ability of the model, the input layer and the hidden layer's dropout parameters are set to 0.2 and 0.5 respectively.And add L2 regularization to the network, the parameter is set to 10 −6 .The training iterations of the model is 30.
In the test stage, the input feature X is the log magnitude spectrum of the mixed underwater acoustic signal using STFT with 32ms frame length, 8 ms window shift, and the square root of the hanning window.Moreover, the mixture is separated into 100 frames with half overlap to ensure the local accuracy of output feature Θ.The masks were obtained by clustering the row vectors of the feature Θ.The number of clusters is set to the number of sources in the mixture.

B. Metircs
To evaluate the quality of the source separation, we use three quantitative criteria: 1) the Preserved-Signal Ratio (PSR∈ [0, 1]), representing the quality of the mask preserving the target source.2)the Signal-to-Interference Ratio (SIR∈ [0, ∞)), representing the quality of the mask suppressing the interfering sources.
3)The similarity coefficient ξ estimates the similarity between the signal y i (t) and the source signal x j (t) From the effect diagram of the estimated signal, the source signal can be basically recovered using the binary time-frequency masking method.The No. 2 source signal is aliased with the No. so the information will be affected somewhat, but it can basically be recovered from the mixed signal.
Correlation coefficients ξ, PSR, and SIRM were measured under different signal-to-noise ratios.The results are shown in table IV.It can be seen that when there is no noise, each signal can be well recovered.The two parameters of PSR and SIR indicate that the method can correctly divide the time-frequency region of each signal, that is, the obtained masking matrix accurately covers the time of the signal.Frequency information.Once noise is added, the performance deteriorates, and the PSR reduction is small, but the SIRM reduction is the most obvious.It Fig. 9: Visualization of separating two-unknown-source mixtures means that after adding noise, the estimated masking matrix not only loses some information of the signal itself, but also receives time-frequency information of other signals.
2) Binary and multivariate signal separation using the proposed method: Next, we separate mixing signals of two sources.The visualization result can be seen in Fig. 9.We list all possible combinations and observe the corresponding effect of separation.In the table V, we illustrate the demixing performance about separating two unknown sources using the metrics mentioned in equation ( 30) and (31).It shows that our proposed method has better performance on separating two known sources, which indicates that this approach is different from many separation algorithms based on deep learning.At the same time, it can spread well to unknown sources separation without any specific adaptation methods.SIR is infinity because the interfering sources are suppressed sufficiently and make the denominator close to 0 according to the equation (31).
Furthermore, we separate mixing signals of three sources.Fig. 10 and Fig. 11 show the example of separating the mixtures of three sources.From the comparison of Fig. 9 and Fig. 10, it can be seen that the time and frequency points of each source can be basically found.The overlap between the source signal C and the source signal A is relatively large in the time-frequency domain.However, signal A is dominated by energy at these overlapped time and frequency points, so it will not be disturbed by signals and can be basically recovered, but some information of signal C is lost.In fact, compared with background noise, people are more concerned about the loss of sonar echo signal.Therefore, it is permissible to sacrifice part of signal C in practical application.The overlap between signal B and signal A and C in frequency domain is the least, and the separation performance is the best.However,in order to prove that using deep learning method to separate underwater acoustic sources can achieve a breakthrough, we also show the results with the traditional binary T-F mask approach.In the table VI, the first one is our approach and the second is the traditional approach.It is clear that our proposed method outperforms the traditional method which even can't separate the source C and A very well.
What's more, compared with table V, when we separate more sources, the performance doesn't drop too much.So, the proposed model can scale up to more sources.Thus, it is appropriate for real world applications when the number of sources is not fixed.
In addition, considering that the mixed signal will be interfered by other unknown noises in the actual processing, gaussian noise signals of 0-40dB are added to the mixed signal to analyze the separation performance under different SNR conditions.Meanwhile, compared with the traditional binary T-F masking method, the similarity coefficient is used as the measurement Under the condition of unknown noise, the separated signal will still carry noise and affect the performance.It is found that the noise can be separated as long as the number of clusters is

2 .Fig. 3 :
Fig. 3: Flow chart of proposed method for underwater acoustic signal separation the whole amplitude information |X| of the underwater acoustic signal.The cluster-oriented K-dimensional embedding feature learned by neural network.During the training process, the network sequentially maps the spectrum information χ i on each time step to a new feature space, and finally outputs it as an F * Kdimensional vector.This can be considered as encoding each T-F point in the original T-F feature χ i , and each T-F point after encoding is represented by a row vector θ j of dimension K.Here θ i is the unit vector, i.e. |θ j | 2 = 1.
Before sending mixed signals to network training, compare the energy of each source signal at each time and frequency point.First, set the reference label Y ∈ R T F * C to divide the time and frequency points, and compare the energy of these C kinds of underwater sound sources at various time and frequency points.The energy-dominated underwater sound source will mark the time and frequency points.For example, the energy of the no.c (c ∈ 1, 2, ..., C) underwater sound, dominates at the no.n (n ∈ 1, 2, ..., T F ) time and frequency points, then y n,c = 1.

Fig. 6 :
Fig. 6: Time domain waveform and time-frequency diagrams of source signals

Fig. 9 (
a)(c)(e) are the spectrum of sources A,B,C separately.Fig.9(b)(d)(f) print the separation results of those pairwise mixtures of A, B, C respectively.Compared with the original spectrums, every source can be perfectly separated.

TABLE I :
Major Ocean Sound Source Frequency

TABLE II :
The composition of the database

TABLE III :
LFM signal parameters for binary time-frequency masking method simulation

TABLE IV :
Separation performance at different SNR

TABLE V :
The demixing performance in experiment

TABLE VI :
Comparison of the Demixing Performance in Experiment 3 (top) With That of conventional T-F mask approach (bottom).