Efﬁcient Overdetermined Independent Vector Analysis Based on Iterative Projection with Adjustment

: In this paper, a computationally efﬁcient optimization algorithm for independent vector analysis (IVA) is proposed to accelerate iterative convergence speed and enhance the overdetermined convolutive blind speech separation performance. An iterative projection with adjustment (IPA) is investigated to estimate the unmixing matrix for OverIVA. The IPA algorithm jointly executes the iterative projection (IP) algorithm and the iterative source steering (ISS) algorithm to jointly update one row and one column of the mixing matrix, which can perform computationally-efﬁcient blind source separation. It is achieved by updating one demixing ﬁlter and jointly adjusting all the other sources along its current direction. Motivated by its technology superiorities, this paper proposes a modiﬁed algorithm for the OverIVA, fully exploiting the computational efﬁciency of IPA optimization scheme. Experimental results corroborate the proposed OverIVA-IPA algorithm converges faster and performs better than the existing state-of-the-arts algorithms.


Introduction
Blind source separation (BSS) [1] refers to unmixing or extracting the latent sources from the observed mixed signals with minimal prior information. It has become a versatile technology with diverse applications, such as in speech signals [2,3], biomedical signals [4] and digital communication signals [5,6]. Independent component analysis (ICA) [7,8] is one of the most basic means proposed to deal with BSS. ICA is an unsupervised, data-driven blind separation technique for separating linear mixture signals based on non-Gaussian maximization. The frequency domain independent component analysis (FD-ICA) model [9] is proposed for convolutional mixed signals to overcome the high computational complexity of directly implementing ICA in the time domain processing. In FD-ICA, the observed signal is converted from the time domain to the frequency domain representation through the short-time Fourier transform (STFT), and then ICA is applied to estimate the unmixing matrix at each frequency. However, FD-ICA will suffer from the random permutation ambiguity problem.
To solve the permutation ambiguity of ICA, independent vector analysis (IVA) [10,11] has been proposed and gained remarkable attention from scholars. IVA is an extension of ICA for the separation of multiple parallel mixtures. It resolves the random permutation ambiguity of signal separation outputs by exploiting statistical dependencies across datasets to generalize ICA to multiple datasets. IVA preserves the statistical dependency within a frequency source vector and minimizes statistical dependencies between them.
IVA can naturally solve the random permutation problem without any pre-processing or post-processing during the learning process. The traditional IVA algorithm updates the separation matrix through the method based on gradient [10] and fast fixed-point algorithm [12]. The method based on gradient update needs to adjust the parameters such as step size to make the convergence stable, and it needs to balance the convergence speed and stability. To achieve faster convergence, a hyperparameter-free iterative projection (IP) algorithm based on auxiliary-function-based IVA (AuxIVA) was proposed [13]. Recently, fast-converging optimization algorithms have been proposed for AuxIVA, for example, IP2 [14], iterative source steering (ISS) [15], ISS2 [16], and iterative projection with adjustment (IPA) [17]. IPA is the combination of IP and ISS, which solves the problem that IP and ISS can only correct the update in the next iteration when performing the update. This algorithm is superior to other algorithms in terms of convergence speed and performance.
In the case of overdetermined BSS, where the number of non-stationary signals N is smaller than the number of microphones M, i.e., M > N. In the multi-source case (N ≥ 2), the initial approach is to resolve the oversubscription situation by selecting the N best channels [18,19] or reducing the number of channels to N by principal component analysis (PCA) [20,21]. Unfortunately, these methods risk removing the source signal of interest and reducing separation performance. For the case of a single source, several independent vector extraction (IVE) methods [22,23] have been proposed. And on this basis, IVA is extended to the overdetermined situation (M > N), and the overdetermined independent vector analysis algorithm (OverIVA) [24] is proposed. The traditional OverIVA relies on the orthogonality constraint (OC), which ignores the sample correlation between the target source signal and the noise signal and makes the limited separation. To solve this problem, an OverIVA [25] is proposed which only utilizes the independence between source signals and the stationarity of Gaussian noise for source separation. Recently, algorithms such as IP and ISS have been combined with OverIVA to achieve efficient overdetermined BSS [24][25][26][27].
In this paper, an efficient approach for BSS is proposed, which we call OverIVA-IPA. It combines the technology of OverIVA-IP and OverIVA-ISS with the technology of AuxIVA-IPA. It is an algorithm that can achieve high efficiency and ensure convergence. IP and ISS fix all other sources while doing one of the updates. This means that further correction can only happen at the next iteration. The IPA combines the advantages of the IP and ISS technologies to jointly update the mixing matrix of the source signal. As opposed to IP and ISS, when updating the demixing filter of one source, we simultaneously correct the demixing filters of all other sources accordingly. Therefore, we apply the modified IPA technique to the OverIVA algorithm to update the source part of the demixing matrix as well as the orthogonal noise part. Finally, we validate it in our convolutional speech separation experiments. Experimental results show that the OverIVA-IPA method has faster convergence speed and better performance than the existing OverIVA-IP, OverIVA-IP2, and OverIVA-ISS methods.
The rest of this paper is organized as follows. We describe the background of the overdetermined BSS problem, AuxIVA-IP, OverIVA-IP, AuxIVA-ISS, OverIVA-ISS, and AuxIVA-IPA in Section 2. In Section 3, the proposed algorithm is derived and the time complexity of the algorithm is analyzed. In Section 4, we show the comparative experimental results of different algorithms and conduct related analysis. Section 5 concludes the full text.

Overdetermined Blind Source Separation Model
In general, the model of the BSS algorithm consists of a cost function and an optimization method. The cost function of BSS is constructed according to the characteristics of the limited source and the separation criterion. The purpose of BSS is to find a suitable linear transformation matrix or separation matrix W by optimizing the cost function. The BSS separation process usually estimates the separation matrix and then restores the source signal by estimating the separation matrix. In BSS, usually according to the relationship between the number of sensors sending and receiving signals, BSS system models can be divided into three categories: . In this paper, (·) H , (·) T , det(·), | · |, (·) * and (·) −1 denote the conjugate transpose, transpose, determinant, absolute value, complex conjugate and inverse of (·), respectively.

Cost Function
In previous studies [24,25], it is usually assumed that the source signal follows a non-Gaussian distribution, such as a circularly symmetric Laplace distribution or a time-varying Gaussian distribution. In this paper, the Laplace source prior model is used, with Among them, we set the parameter expectation µ = 0, variance 2b 2 = 2, skewness to 0, and kurtosis to 3. And assuming that the noise signal z where G(·) is a contrast function determined by the distribution of the source signal s f t and is the sample covariance matrix of the observed signal, and Tr(·) is the trace of the matrix. Because the above formula is difficult to directly find the minimum value of W f , on the contrary, the upper bound of the cost function can be minimized, there is where where φ(r nt ) depends on the definition of the contrast function G(·), r nt ∈ R is a variable signal at frequency f and time t, and w f n is the nth unmixing vector at frequency f in the unmixing matrix W f . It should be noted that the random variables in the cost function are multivariate, and each source signal is also multivariate. During the separation process, the optimization algorithm needs to maintain the statistical dependence within each source vector, and at the same time minimize the statistical dependence between different source vectors, so as to avoid problems such as arrangement ambiguity and separate the source signals.

ISS and IP of OverIVA and AuxIVA
In the overdetermined case, Ref. [24] conjectures that the strongest source signal has a very non-Gaussian distribution, while noise mixed with other weaker sources will have a distribution that is closer to Gaussian, thus guaranteeing linear independence between vectors (In this paper, the source signal adopts Laplace distribution). There is no doubt that ISS and IP are hybrid matrix updates that can be directly applied to determined situations. However, using these update methods in the overdetermined situation cannot directly extract the target signal from the noise subspace. The lower part J f of the matrix needs to be modified so that the noise subspaces remain orthogonal.
To effectively derive the parameter estimation algorithm of OverIVA, it is necessary to refer to the proven Propositions derived from [26,28]. Proposition 1. For any local optimum, a new U f can be found without changing the value of the cost function (6). According to the guidance of Proposition 1 above, Formula (3) can be simplified as follows: Our goal is obvious, to estimate W f and J f (M < N) or only W f (M = N) that minimized (6).

Iterative Projection
In OverIVA-IP [25], the rows of the optimized unmixing matrix W f are updated in regular order by IP [14]. Its updated rule is Based on Proposition 2, ×N between the source subspace and the noise subspace. Due to the equivalence relation of Formula (9), where E 1 and E 2 denote

Iterative Source Steering
In OverIVA-ISS [27], the columns of the optimized unmixing matrix W f in regular order are updated by ISS [15]. Its updated rule is where

Iterative Projection with Adjustment
In AuxIVA-IPA [17], the entire mixing matrixW f = W f jointly performs IP-style and ISS-style updates. It completely re-estimates the k-th unmixing filter and adjusts the values of all other filters by taking steps consistent with the current estimate of source k. Its updated rule isW where W f is the estimate of the separation matrix from the previous iteration, while T k (u, q) is the method of each vector update of the mixing matrix, with Update one row and one column of the mixing matrix in each iteration by definition, whereĒ k is the M × (M − 1) matrix containing all regular basis vectors except the kth vector. E k = [e 1 · · · e k−1 e k+1 · · · e M ]. (18) where I denotes the identity matrix and e k denotes the kth unit vector. For the update of the column vector q has min q∈C M−1 with Through the above formula, we can find the optimal solution of the column vector q in the mixing matrix. Among them, V k denotes the kth weighted covariance matrix, with V k ← ((W f V k (W f ) H ) −1 ) * . A and C, respectively, denote the matrix variables of the mth and kth weighted covariance matrices V m and V k after corresponding transformation. b and g denote the vector variables after the corresponding transformation of the mth and kth weighted covariance matrices V m and V k . o denotes the variable value obtained by corresponding transformation of the kth weighted covariance matrix V k . The update of the row vector u is whereq k = e k −Ē k q * , θ ∈ [0, 2π] a is any phase. The optimal solution of q can be calculated through (19)- (24), and then we can solve the optimal solution of the row vector u in the mixing matrix by bringing it into (25). The optimal separation matrix can be obtained by updating the mixing matrix through the optimal solution column vector q and solution row vector u.

OverIVA-IPA
In the process of blind source separation, the key to the fast separation of signals is to use fewer iterations to reduce the cost function (7) more, thereby improving the separation efficiency. The previously proposed block coordinate descent algorithm IP, IP2 and ISS fix part of the separation matrix and then minimize the cost function on the remaining free variables to separate the signal. Wherein IP and ISS update one row or one column of the unmixing matrix each time, and IP2 updates two rows of the unmixing matrix each time. However, when the separation effect of other source vectors is not good, it may cause poor overall separation performance. We propose an IPA-based OverIVA algorithm. In OverIVA-IP and OverIVA-ISS, updating row-by-row or column-by-column naturally allows for the separation of the target sources one by one, requiring only further updating of the background noise. In contrast, the IPA algorithm jointly performs IP-style and ISS-style updates to achieve a more efficient BSS. We are thus inspired that the proposed algorithm combines the convergence advantage of the IPA algorithm with the orthogonality constraint of OverIVA. Through the following update method until the cost function can converge to a stable point where T k (u, q) is given by (16)- (25). Applying the IPA method only to the source part and using the orthogonal constraint to update the remaining noise part to solve the IPA method cannot be directly applied to the entire matrix. In the update process, the IPA method can ensure that each iteration can ensure the proper optimization of the cost function until the final convergence. For the initial value of the matrix W f , we find that it can be set as the identity matrix to be satisfactory. The final algorithm OverIVA-IPA, which alternately applies updates to W f and J f , is detailed in Algorithm 1.

Computational Complexity
When the number T of time frames is greater than the number M of microphones, the running time is determined by the computation of the weighted covariance matrix V f n . In this case, the weighted covariance matrix V f n of N sources is calculated in each iteration, and the computational complexity of OverIVA-IP and OverIVA-IP2 is O (FTN M 2 ). The IPA algorithm does not increase the computational complexity in essence and requires a matrix inversion, two matrix multiplications, and an eigenvalue decomposition. The computational complexity of OverIVA-IPA is O (FTN M 2 ). However, OverIVA with ISS has the particularity that there is an efficient computation of (13), and the complexity is O(FTN M).

Numerical Experiment
We compare the performance of our proposed OverIVA-IPA algorithm with existing OverIVA-IP, OverIVA-ISS, and OverIVA-IP2 when applied to convolutional blind source separation in the frequency-domain STFT. We evaluated the performance of the algorithm in terms of SI-SIR, SI-SDR, and separated spectrograms. And it is an experimental comparison carried out under different numbers of sources and different values of signalto-interference-and-noise ratios (SINR).

Experimental Environment Settings
To synthesize the mixed signal for evaluation, we simulate the impulse reverberation of 1000 random 3D matrix rooms by using the pyroomacoustics Python package [29]. The three-dimensional matrix room has walls of 6 and 10 m in length and a ceiling height of 2.8 to 4.5 m. The simulated reverb time is sampled uniformly between approximately 60 ms and 450 ms. The source and microphone arrays are randomly placed at least 50 cm away from the wall, and the height is between 1 and 2 m. The array is circular and regular. As shown in Figure 1. The three axes of the 3D matrix room in Figure 1 denote the length, width, and height of the room, respectively. Where × denotes the microphone array, the denotes the source signal, and • denotes the interferer signal. The number of sources is set to N = 2, the source signal is selected from the CNU Arctic CORPUS speech database [30], and 5 additional interference sources are selected to generate diffuse noise. The number of microphones is M = 4, 6, 8, and the distance between adjacent microphones is 10 cm. All sound sources are located farther from the array than the critical distance of the room, which is the distance at which direct sound and reverberant energy are equal. This distance can be calculated by where V is the volume of the room. SINR is defined as where σ 2 n , σ 2 i , and σ 2 w are the variances of the target source, interferer, and white noise, respectively, for which the specified SINR can be obtained on any reference microphone. After simulating propagation, the variance of the target source is fixed at σ 2 n = 1 (at an arbitrary reference microphone). In the comparison experiment, the first microphone is selected as a reference, and its SINR value is fixed. The separation effects at 5 dB, 15 dB, and 25 dB SINR values are studied. Simulations were performed at 16 kHz, using a 4096 Hamming window with STFT overlapping 3/4.

Experimental Simulation Results
In the experimental simulation, the performance of various OverIVA algorithms is evaluated using the multivariate Laplacian source prior model. We tested the OverIVA algorithm optimized by IP, IP2, ISS, and IPA methods. We use scale-invariant signal-todistortion ratio (SI-SDR), scale-invariant signal-to-interference ratio (SI-SIR), and signal spectrogram as our separation performance metrics. SI-SDR measures how much the target signal is degraded, while SI-SIR indicates how much of the other sources remain. High SI-SDR indicates both good separation and high quality. High SI-SIR indicates good separation, but not necessarily preservation of the target source. They are defined as follows. Let S ∈ R T×M be the matrix containing the M time-domain groundtruth reference signals in its columns. Letŝ ∈ R T be the estimated signal, and s one of the columns of S. Then, the definition is as follows.
where α =ŝ T s ||s|| 2 , and b = (S T S) −1 S T (αs −ŝ). Figures 2 and 3 show the separation performance of OverIVA-IP, OverIVA-IP2, OverIVA-ISS, and OverIVA-IPA. Where, Figure 2 uses SI-SDR as the performance index, and Figure 3 uses SI-SIR as the performance index. Through the analysis of various performance indicators, it can be seen that the proposed OverIVA-IPA method is superior to other methods in almost all experimental environments. Among them, in the 5 dB and 15 dB environments, the performance of the algorithm is superior. The proposed OverIVA-IPA algorithm is superior to other algorithms in performance, and it is the fastest algorithm to reach higher SI-SDR values and SI-SIR values, where the performance of the six microphones in Figure 2 is comparable to that of OverIVA-IP2 in the 15 dB environment. Table 1 shows that the response speed of OverIVA-IPA is the fastest, and quickly reaches a stable value of SI-SDR with the least number of iterations, and its computational efficiency is several times higher than other algorithms. Table 2 shows the stabilized SI-SDR values of the four methods. The results show that IPA is as effective as other methods in minimizing the cost function, performs better, and converges faster.  OverIVA-IP  35  20  15  27  16  13  26  16  12  OverIVA-IP2  10  8  7  9  8  8  9  9  8  OverIVA-ISS  29  17  13  28  15  13  26  14  12  OverIVA-IPA  7  5  5  6  5  5  6   At the same time, we can obtain the separated spectrum diagram of the signals obtained by each method in the six mics 25 dB environment as follows ( Figure 4): It can be seen from the separated spectrogram that the proposed OverIVA-IPA algorithm can separate the source signal from the mixed signal better than other algorithms, and the separated signal spectrogram waveform is better than other algorithms in detail.

Summary and Prospect
We propose an overdetermined independent vector analysis (OverIVA) algorithm optimized using the iterative projection with adjustment (IPA) algorithm. The algorithm applies efficient updates from auxiliary-function-based IVA (AuxIVA). And the complexity of the algorithm is consistent with that of the iterative projection (IP) algorithm. In numerical experiments, we thoroughly investigated the performance of OverIVA using the different update rules for the separation of realistically simulated speech mixtures. Through the analysis of experimental results, the proposed OverIVA-IPA algorithm is superior to other algorithms in all environments. Future work will focus on applying the algorithm to real systems and evaluating its real-time execution performance.

Conflicts of Interest:
No potential conflict of interest was reported by the authors.