Reformulating the Binary Masking Approach of Adress as Soft Masking

: Binary masking forms the basis for a number of source separation approaches that have been successfully applied to the problem of de-mixing music sources from a stereo recording. A well-known problem with binary masking is that, when music sources overlap in the time-frequency domain, only one of the overlapping sources can be assigned the energy in a particular time-frequency bin. To overcome this problem, we reformulate the classical pan-pot source separation problem for music sources as a non-negative quadratic program. This reformulation gives rise to an algorithm, called Redress, which extends the popular Adress algorithm. It works by deﬁning an azimuth trajectory for each source based on its spatial position within the stereo ﬁeld. Redress allows for the allocation of energy in one time-frequency bin to multiple sources. We present results that show that for music recordings Redress improves the SNR, SAR, and SDR in comparison to the Adress algorithm.


Introduction
Audio source separation is the problem of extracting sources that have been combined to form an observed audio mixture. Applications include speech enhancement and recognition [1] and hearing aid devices [2]. The separation of music sources from audio mixtures, namely Music Source Separation (MSS), has applications, such as (1) creating audio effects for the purpose of remixing and DJing; (2) automatic transcription of pitched instruments; and, (3) key-signature and chord-detection [3]. Time-Frequency (TF) two-channel methods for audio source separation have left an indelible mark on the field [4][5][6]. These methods are based on one or both of the following estimators, weighted, power-weighted or non-weighted relative attenuation, and/or delay estimation-a unifying framework for these estimators is given in [7]. We consider this mixing case, as most commercial music is in stereo format according to [3] (Formats such as 5.1 are not considered). Stereo mixtures allow for the spatial positioning of sources in a sound field; sources are perceived as originating from the left, centre or right, etc. Positioning is achieved via pan-pot mixing, which scales the contribution of each source to each channel. The relative delay between channels is less important for MSS [5] than speech source separation [4], which uses both relative delay and attenuation estimates to de-mix sources.
Music sources are typically sparse. This means that in the majority of TF bins the source has little energy. The source's TF support is the remaining TF bins; they are small in number. Sparsity of the underlying sources coupled with independence of occurrence of sources should mean than in general sources do not overlap in TF-the success of Adress [5] bears testament to the conjecture that this is approximately true. However, two-channel TF methods do not generally work well in the TF bins where the sources overlap. Overlap arises for a number of reasons in MSS: harmonic signals are composed of a fundamental frequency and its harmonics; harmony causes a large degree of overlap in the frequency content of sources locally in time; percussive signals have a flatter spectrum than f 3 = 300 Hz in Figure 1 (rows 1 and 2). Both sources are scaled by 2 to simplify notation, which yields the expressions s 1 (t) = 2 sin(2π f 1 t) + 2 sin(2π s 2 (t) = 2 sin(2π f 2 t) + 2 sin(2sπ f 3 t). (2) A stereo mixture of these sources is produced by pan-mixing, by weighting the contribution of each source on the left channel and the right channel. The left and right channel signals, namely x 1 (t) and x 2 (t), produced using the weights α and γ, are defined as x 2 (t) = γs 1 (t) + s 2 (t), (4) and are illustrated in Figure 1 (rows 3 and 4). The weights lie in the intervals 0 < α < 1 and 0 < γ < 1.
Appealing to the Fourier transform we can express the source signals, s 1 (t) and s 2 (t), more compactly in the frequency domain. The sources are denoted S 1 ( f ) and S 2 ( f ), where f denotes frequency and where, δ(·), is the delta function, As a consequence, the mixtures, X 1 ( f ) and X 2 ( f ), can also be expressed compactly as, illustrate pan-mixed mixtures of the sources, x 1 (t) and x 2 (t).

Adress
In the Adress algorithm, the authors construct a frequency-azimuth plane as a first step towards separating the sources, s 1 (t) and s 2 (t), from the mixtures, x 1 (t) and x 2 (t). The frequency-azimuth plane is constructed by varying an independent variable, g, over the range, 0 ≤ g ≤ 1 and computing the magnitude of the difference between the two frequency domain mixtures. It is necessary to perform this scaling twice, where the roles of X 1 ( f ) and X 2 ( f ) are swapped, to preserve the symmetry of the frequency-azimuth plane. The gwo halves of the frequency-azimuth plane are produced by computing Concatenating the components, A 1 ( f ) and A 2 ( f ), produces the entire frequency-azimuth plane, which is defined as Given this concatenation of components and the role of g, which can assume values in the range 0 ≤ g ≤ 1, in each of these components, when we plot the frequency-azimuth plane we denote the range of g to be −1 ≤ g ≤ 1 to capture this symmetry. In addition, we refer to the frequency-azimuth plane as the azimugram in the following sections. To de-mix the pan-mixed mixtures presented above, we only need to consider three frequency components f 1 , f 2 and f 3 . The azimugram can be defined in closed form for these three components.
In Equations (11) and (12) the gain satisfies the relationship |g| ≤ 1. The locations of nulls in the azimugram are important for source separation. For the three frequencies f 1 , f 2 and f 3 the null locations are given by the gains when f = f 1 , The operation of the Adress algorithm is now summarized for discrete-time source signals which have been sampled at a rate which satisfies the Nyquist-Shannon sampling theorem. Adress computes windowed Discrete Fourier Transforms, X 1 (k) and X 2 (k), of the left and right channel mixture signals. The index k denotes the frequency bin indices, k = 0, 1, . . . , K. Figure 2 illustrates the positive frequencies of the magnitude spectra of the sources and mixtures, illustrating the frequency components that do overlap and those that do not. We use a set of gain values, g = 0 β , 1 β , . . . where β = M 2 , to construct the frequency-azimuth matrix, A ∈ R K×M . The matrix A is generally transformed in order to produce peaks at the locations of nulls in A, yielding the matrixÂ, where we introduce r[k] = max{A[k, :]} − min{A[k, :]} to simplify the presentation. The motivation for this transformation is to simplify null/peak detection for the user. Figure 3 illustrates the resulting matrix,Â, for the two-source mixture introduced above. The peaks located at { f , g} = {100, −0.35} and {200, 0.4} correspond to the frequency components of the sources that do not overlap. Recall that the sign of g is changed in order to distinguish between the right and left components of the azimugram (the value −0.35 is a gain of 0.35 and the negative value indicates which TF mixture is scaled relative to the other one). Reconstruction of the component sources is achieved by assigning TF bins to sources depending on the location of the nulls in the frequency-azimuth plane using an azimuth subspace width parameter H.

Problem
A null is produced at the gain location, g = α, in the frequency-azimuth plane by the mixture signals described above. This null corresponds to the source, s 2 (t), in the mixture, which excites the frequency f 2 = 200 Hz (cf. Figure 3). The source s 2 (t) also excites the frequency f 3 = 300 Hz. The location of the null in the f 3 = 300 Hz frequency band is generally not located at the same gain as the f 2 = 200 Hz component, e.g. α. It is located at either 1+α 1+γ or 1+γ 1+α depending on the scale of α and γ. In Figure 3, this peak is located at 1+0.35 1+0.4 ≈ 0.96. Note that we have used the scale value 0.35 in the numerator and not −0.35 as the negative sign is used to facilitate plotting both components of the azimugram size-by-side in Figure 3. The scaling of the source signals in the mixtures produce another null at g = γ at the frequency f 1 = 100 Hz. This null arises due to the source, s 1 (t), which is scaled by γ in the second mixture x 2 . A null is generally not located at the gain g = γ in the f 3 = 300 Hz band. Co-occupation of the f 3 bin by the two source signals (which are scaled by different gains) causes the Adress algorithm to assign all of the energy of the frequency f 3 to one of the sources. The other source receives none of the energy. The decision on which source obtains the energy is made based on the distance of the location of the null of the f 3 component from the location of the nulls for the other frequency components of the s 1 and s 2 source signals. For example, in Figure 4, the recovered magnitude spectra for the two sources, obtained using the Adress algorithm are illustrated.
The absence of the f 3 frequency component in one of the signals is a deficit. In this example, 50% of the frequency components of one of the sources will be missing and the other source will have a magnitude that is too large for that missing frequency component, because that source has incorrectly

Problem
A null is produced at the gain location, g = α, in the frequency-azimuth plane by the mixture signals described above. This null corresponds to the source, s 2 (t), in the mixture, which excites the frequency f 2 = 200 Hz (cf. Figure 3). The source s 2 (t) also excites the frequency f 3 = 300 Hz. The location of the null in the f 3 = 300 Hz frequency band is generally not located at the same gain as the f 2 = 200 Hz component, e.g., α. It is located at either 1+α 1+γ or 1+γ 1+α depending on the scale of α and γ. In Figure 3, this peak is located at 1+0.35 1+0.4 ≈ 0.96. Note that we have used the scale value 0.35 in the numerator and not −0.35 as the negative sign is used to facilitate plotting both components of the azimugram size-by-side in Figure 3. The scaling of the source signals in the mixtures produce another null at g = γ at the frequency f 1 = 100 Hz. This null arises due to the source, s 1 (t), which is scaled by γ in the second mixture x 2 . A null is generally not located at the gain g = γ in the f 3 = 300 Hz band. Co-occupation of the f 3 bin by the two source signals (which are scaled by different gains) causes the Adress algorithm to assign all of the energy of the frequency f 3 to one of the sources. The other source receives none of the energy. The decision on which source obtains the energy is made based on the distance of the location of the null of the f 3 component from the location of the nulls for the other frequency components of the s 1 and s 2 source signals. For example, in Figure 4, the recovered magnitude spectra for the two sources, obtained using the Adress algorithm are illustrated. The absence of the f 3 frequency component in one of the signals is a deficit. In this example, 50% of the frequency components of one of the sources will be missing and the other source will have a magnitude that is too large for that missing frequency component, because that source has incorrectly been assigned all of the energy in the f 3 frequency bin. This type of problem has been identified and called Frequency-Azimuth Smearing in [5], but it has not been solved.
Methods for trying to ensure that overlap does not happen rely on measuring the level of disjointness of sources in the TF domain; a common measure is Windowed Disjoint Orthogonality (WDO) [4,6]. WDO has been used in order to determine what parametrization of the Short-Time Fourier Transform (STFT) will give the most non-overlapping representation of the source signals in the TF mixtures [4,6]. We now present a solution to the problem of overlapping frequency components presented above that is motivated by re-considering the Adress mixing model.  Secondly, the magnitudes of the components depend on the channel used to de-mix the sources, which is a disadvantage. The intensity of the f 1 component is different relative to the f 3 component depending on which mixture is used to de-mix the sources. Finally, the f 3 component that is assigned to s 1 includes the energy from the s 2 source as well as the s 1 source.

Separation via Azimuth Trajectories
We introduce Redress, which uses azimuth trajectories to recover the source magnitude spectra.

Definition 1.
An azimuth trajectory is the response recorded in the frequency-azimuth matrix, A, as a function of the gain, g, which has been varied over its range 0 ≤ g ≤ 1 in Equations (8) and (9).
We only consider the positive frequencies due to the symmetry of the TF representation. In the two-source case In this discussion, one source dominates on the left-channel and the other dominates on the right-channel in order to address both scenarios. The first component of the azimugram is defined as: The pan-mixing model does not delay sources and, thus, there is no relative delay between the two microphones. Each term can be expressed as a real-valued scalar, c, times a complex value, z = (a + bj). The absolute value of the product |cz| can be re-expressed as the product of the absolute value of the scalar times the absolute value of the complex value, |cz| = |c||z|. Two cases warrant examination. • When f = f 1 the azimugram can be simplified as the product of s 1 's azimuth trajectory h 1 (g) = |(1 − g m γ)| and s 1 's spectral content |δ( f 1 − f 1 )| at that frequency.
• When f = f 3 the azimugram can be expressed as however, from the triangle inequality, it holds that Expressing this portion of the azimugram as the product of source trajectories, h 1 (g) = |(1 − g m γ)| and h 2 (g) = |(α − g m )|, and their corresponding source spectral content is an approximation.
For mixtures consisting of an arbitrary number of sources, who have, in turn, different magnitudes in each of the TF bins, the error introduced by assuming the inequality is in fact equality in Equation (20) depends on the location of the sources that are occupying the bins, the values of α and γ in Equation (20) and also the magnitude spectra in those frequency bins.
Continuing with discrete TF representations of the signals, and cognisant of the approximation introduced by the triangle inequality, we approximate the azimugram component A 1 (k, m) with the following factorization in the two source case. The two matrices, W and H, are defined as where the bins k 1 , k 2 , and k 3 correspond to the frequencies f 1 , f 2 , and f 3 , and δ kk i is the Kronecker delta.
A similar factorization can be constructed for the other component of the azimugram A 2 (k, m).

Reconstruction
In the case of an arbitrary number of sources, R, the azimugram A ∈ R K×M + is approximated by the product of the source magnitude TF Spectra in the τ-th time window times the azimuth trajectory of that source. The azimuth trajectories matrix is pre-computed. It is formed by adding a row to the matrix H ∈ R R×M + for each source, given estimates of the azimuths of each source, a = {a 1 , a 2 , . . . , a R }. These azimuths are either selected by the user (cf. [5]) or estimated while using a relative attenuation estimator (cf. [7]). If the r-th source dominates on one channel the trajectory is If the r-th source dominates on the other channel the trajectory is The azimuth trajectories matrix, given the azimuth estimates a, is The recovery of the magnitude spectra of the source signals is achieved by solving the non-negative quadratic programme where W ∈ R M×R + for the azimugram computed for the τ-th window of the analyzed mixtures X 1 [k, τ] and X 2 [k, τ]. We use the update proposed by Lee and Seung in [17], e.g., W ← W AH T WHH T . The resulting W factor is approximately equal to the source magnitude TF spectra: The reconstruction of the discrete-time r-th source is achieved by forming an estimate of its entire magnitude TF Spectrum, by taking the r-th column from each estimate of W and forminĝ We use the mixture phase to reconstruct the discrete-time source. Algorithm 1 summarizes the workflow of the Redress algorithm. Figure 5 illustrates the time-frequency magnitude spectra of both sources learned using Redress. Both sources are now assigned a frequency component in the f 3 = 300 Hz bin. In comparison Adress is unable to assign energy in the f 3 = 300 Hz bin to both source signals. Figure 6 illustrates the separated azimugrams which result from the Redress approach. In summary, the factorization of the azimugram learned by Redress allows both of the component azimugrams to have peaks at the correct gain for the 300 Hz component.

Algorithm 1: Summary of the Redress Algorithm
Result: Separated sources s 1 , s 2 , . . . Compute the STFT of both channels, x 1 and x 2 ; Compute the frequency azimuth plane by constructing A 1 and A 2 (Equations (8) and (9)); Form the azimuth trajectory matrix, H, using Equations (22)-(24); Recover the sources by solving the NQP in Equation (25); Extract the sources from the columns of the source magnitude TF Spectra matrix, W, matrix in Equation (26); Reconstruct the source magnitude spectra using Equation (27); Use the mixture phase to reconstruct the discrete-time sources;

Evaluation
We generated stereo pan-mixed mixtures using up to four of the original stems, (bass, drums, other instruments, and vocals) provided in the Demixing Secret Dataset (DSD) in order to evaluate the performance of Redress [20]. According to [3], the DSD dataset has gained traction as an evaluation dataset for source separation problems. Evaluation was carried out using an FFT size of 4096 samples, with a hopsize of 2048 samples, and a sampling frequency of 44.1 kHz in order to be consistent with [5]. The analysis window used was a Hamming window. We ran the Redress algorithm for 100 iterations. All of the mixtures were created by down-mixing any stereo stems to mono and then remixing them with two, three of four different azimuth positions depending on the number of sources in the mixture. The source-count parameter, R, of Redress was set to the number of sources in the mixture. For both Adress and Redress, we assumed the azimuths a were known. Both Adress and Redress were implemented in Matlab 2018b (9.5.0.944444 (R2018b), Mathworks, Natick, MA, USA).
The purpose of this evaluation was to test the hypothesis that Redress could improve the source estimates achieved by a binary masking approach when the sources overlapped in the TF domain. We increased the number of source signals present in the mixtures from two to four in order to vary the level of overlap in the TF domain. Four was the maximum number of sources available for each track in the DSD dataset. The baseline method used in our comparison was the Adress algorithm, which uses a binary masking approach for separation.

Separation Example: Redress
We present a separation example using a four-source mixture and the Redress algorithm. Figure 7 illustrates a 10 s excerpt of the original four source components of Patrick Talbot's "Set Me Free" from the DSD dataset. Two components are panned left and the other two components are panned right. Figure 8 illustrates the estimates of these sources achieved by Redress. The separations are of high quality; there are very few visual differences between the original sources and the estimates of them. The bass, other instruments, and voice waveforms contain little or no evidence of the percussion/drums events. Similarly, the drums waveform contains little or none of the components of the harmonic instruments.  We illustrate the magnitude TF representations of the original sources and estimates of these sources signals in Figures 9 and 10. The drums have significant spectral energy from 0-15 kHz. The voice and instruments are compactly supported in the range 0-5 kHz. Both the other instruments and the voice have energy in the frequency range 5-15 kHz. This energy overlaps with much of the drums' energy. Figure 9 illustrates that estimates of the sources: drums, voice, and other instruments, have energy in this range, which illustrates the benefit of Redress over a binary masking approach. The bass and other instruments have been assigned less of this energy. In a binary mask-based source separation approach, only one source can be assigned energy in a TF bin, which can cause the resulting magnitude TF spectra to have gaps if that energy is assigned to another source.
On listening to the recovered waveforms, we conclude that the quality is high. There are some audible artifacts where ideal separation is not achieved, which we now describe. Some traces of the higher frequencies of the drums can be heard on one other estimated source, the other instruments waveform. Some components of the voice waveform are present in the bass waveform. However, the drums waveform is very good; both the low and high frequency components are present. Similarly, the voice waveform is excellent in spite of the fact that some of the voice higher frequency energy is assigned to the bass. This experiment suggests that TF bins can be assigned to multiple source estimates, which is not possible using binary masking approaches. The Redress algorithm seeks to minimize the reconstruction error and, to do this, it may assign the energy in TF to multiple sources. From the triangle inequality (Equation (20)), the factorization that we have proposed is not exact. It is exact when sources are disjoint in TF; it is approximately correct when sources are not disjoint, and the quality of the approximation depends on the magnitude of the sources sharing the TF and their panning weights. We now investigate the approximation achieved by Redress and the binary masking approach used by Adress.  Tables 1-3 illustrate the Signal-to-Noise Ratios (SNRs) of the source estimates achieved over a range of mixing scenarios. In the first case, two of the four sources are mixed and the other two sources are not included in the mixtures (cf. Table 1). In Table 1, the SNRs for estimates of the two sources in the mixtures are indicated and a dash indicates that the corresponding source was not present in the mixture. Similarly, in the second case, three of the four sources are included in the mixture (cf. Table 2) and in the final case all four sources are included Table 3. In general, the SNRs of the estimated sources decreases when the number of sources increases, which can be attributed to the increased TF overlap of the sources when more sources are present. This first result demonstrates that Redress is affected by increased TF overlap. On average, Redress improves the SNR of the reconstructed sources by 3, 1 and 0.1 dB in the two, three, and four sources mixtures in comparison with the sources recovered by Adress. Note that the results for the Adress algorithm are improved by artificial means in these experiments. When the sources are reconstructed by Adress, we have the choice of reconstructing from the left channel or the right channel. The choice of the left channel over the right channel has a significant bearing on the SNR of the result. To compare Redress with the best possible Adress result, we use knowledge of the original sources and estimate the SNR for sources, which have been estimated from both channels and choose the maximum value. In the Redress case, we do not leverage knowledge about the true sources in order to improve the estimate achieved. Even with this advantage, Redress outperforms Adress in terms of the SNR estimates of the recovered sources.

Overlapping Sources in TF
In these experiments, the Adress algorithm was parameterized with 201 equally spaced azimuth positions and a target azimuth range of 20 azimuth positions. These parameters were used in the evaluation of Adress in [8].  Figure 11 displays boxplots of the measurements obtained for each mixture class, e.g., 2, 3, or 4 sources while using the MATLAB toolbox BSS EVAL [21]. The measurements considered included (1) the Source-to-Distortion Ratio (SDR) which we interpret as a global quality assessment; (2) the Source-to-Artifacts Ratio (SAR) which in this case is related to the level of musical noise introduced into the source estimates; and finally, (3) the Source-to-Interference Ratio (SIR), which measures the interference from other sources in the estimated sources. In general, all of the measurements decreased as the number of sources increased due the increasing likelihood of sources overlapping in TF. Adress yielded higher Source-to-Interference Ratios (SIR) than Redress. This is unsurprising, as Redress was developed to be able to assign contributions in TF bins to multiple sources. This has the consequence of decreasing SIRs achieved by it when compared to the binary masking approach, Adress. A disadvantage of binary masking approaches is that the aggressive way that TF bins are assigned to one source only typically causes the level of distortion and artefacts that are introduced into the source estimates to increase. Figure 11 illustrates that Redress generally yields higher SDRs and SARs than Adress. The SDR and SAR capture the overall sound quality of the separated source signals. Artifacts-mainly musical noise-are reduced by Redress's ability to assign TF bins to multiple sources. As a consequence of this, the TF spectra of the sources tend to have fewer isolated bins with energy. This is evident from the TF representations of the separated sources in Figure 9.  Figure 11. SDR, SIR, SAR of Redress, and Adress grouped by number of sources in the mixture.

Discussion and Future Work
One of the main contributions of this paper was to reformulate the binary masking approach of Adress as a soft masking problem and then to solve this problem using a nonnegative quadratic programme. The SAR, SNR, and SDR scores achieved by Redress were better than the Adress algorithm under the condition that the number of sources was known. In the experiments, not all sources were playing at the same time and with the same intensity. The challenge of time varying numbers of sources was also faced by the Adress algorithm. It would be interesting to consider how both algorithms could adaptively set the number of sources in order to improve their performance when the assumed number of sources was incorrect. In many cases, the Adress algorithm implementation allows for the user to choose target azimuths for sources as a way to solve this problem. The Redress algorithm could also benefit from similar user supervision.
Regarding the SIR performance of both algorithms, the SIR performance was better for the original Adress algorithm than the proposed algorithm, Redress. This would imply that Adress provides better separation of the sources. Thus, Redress seems to make separation worse, but the overall sound quality better. It is important to consider the target application of Redress. The separated sources learned by Adress are typically re-mixed in order to reduce the effects of the musical noise introduced by the algorithm as a consequence of the binary masking approach it takes to source separation. In effect, Adress separates the sources and then remixes the sources in order to reduce the artifacts and distortions (measured using SARs and SDRs) that arise as a consequence of the uncompromising nature of binary masking.
One of the motivations for the interest in Adress was its low computation complexity and, thus, the possibility of implementing it in real-time. In many cases, Adress is not used in real-time, as the user selects an appropriate azimuth based on an inspection of a simplified form of the azimugram. In comparison, our implementation of Redress in the current paper uses 100 iterations of an NMF-style update, which was proposed by Lee and Seung, e.g., W ← W AH T WHH T , in order to reconstruct the source signals. A first analysis of the computational complexity of the Redress approach can be summarized as follows. We only consider computations that are different to those computed by Adress. The matrix H can be pre-computed by the system for all possible azimuths. Therefore, it can be stored as a look-up table and so it does not pose a computational burden for a real-time implementation of the approach. Similarly, the matrix product, HH T , can be computed off-line and stored in a look-up table for run-time. The computational cost of Redress involves two matrix products, AH T and WHH T , and one matrix element-wise product and division. The complexity of these terms is set by the size of the FFT, the number of sources to detect and the number of azimuth positions. We posit that this computational load should not be a barrier to implementing Redress in a manner that has the same responsive performance as Adress, even when this update is run in an iteration for 100 steps.

Conclusions
In this paper, we investigated whether it was possible to use pre-computed source azimuth trajectories as activation functions in a pan-pot de-mixing problem. We showed that by doing so, the de-mixing problem could be reformulated as a Non-negative Quadratic Program which allowed the Redress source separation algorithm to assign energy to multiple sources who shared a TF bin. The results suggested that Redress decreased the artefacts that were introduced into source estimates. This came at the cost of increasing the level of interference from other sources in those estimates. Redress has a number of advantages that we summarize now. The problem that binary masking had with regard to the f 3 frequency, the shared frequency bin in the motivating example, in Section 2, has been addressed. Redress allocates some of the energy of the mixture at f 3 Hz to both sources. The allocation is generally not the correct solution, but it is preferable to allocating all of the energy to one source, and none to the other sources. With regard to the parameter selection that is required for Adress, we selected the value H, which is an azimuth range used to partition the frequency-azimiuth plane, by testing a number of different settings for the best result. Redress only required that the number of sources R be set for it to out-perform Adress. No other testing to optimize parameters was required. In addition, using Redress, there was no ambiguity about which channel should be used to de-mix the sources. When sources are reconstructed, the Adress algorithm can use either mixture to recover the source signals. In our experiments, we reconstructed the sources using both mixture magnitude TF spectra and then picked the one with the best SNR, SIR, SDR, or SAR, depending on the context. Redress estimated the source magnitude spectra using both mixtures and so no choice is available, or required. Finally, Redress outperformed Adress in terms of the achieved SNR, SDR, and SAR for mixtures consisting of two to four sources.