Big Data Blind Separation

Data or signal separation is one of the critical areas of data analysis. In this work, the problem of non-negative data separation is considered. The problem can be briefly described as follows: given X∈Rm×N, find A∈Rm×n and S∈R+n×N such that X=AS. Specifically, the problem with sparse locally dominant sources is addressed in this work. Although the problem is well studied in the literature, a test to validate the locally dominant assumption is not yet available. In addition to that, the typical approaches available in the literature sequentially extract the elements of the mixing matrix. In this work, a mathematical modeling-based approach is presented that can simultaneously validate the assumption, and separate the given mixture data. In addition to that, a correntropy-based measure is proposed to reduce the model size. The approach presented in this paper is suitable for big data separation. Numerical experiments are conducted to illustrate the performance and validity of the proposed approach.


Introduction
Transforming data into information is a key research direction of the current scientific age. During the data collection phase, it is often the case that the data cannot be collected from the actual data generating locations (sources). Typically, a nearby physically connected location (station) that is accessible can be used for the data collection. If the station is influenced by more than one source, then the data collected at the station provides mixed information from the multiple sources (mixture data). This creates a challenging problem of identifying the source data from the given mixture data. Such a problem is typically known as a data (or signal) separation problem.
In this paper, a linear mixing type data separation problem is considered. The generative model of the problem in its standard form can be written as: where X ∈ R m×N denotes the given mixture matrix, A ∈ R m×n is the unknown mixing matrix, and S ∈ R n×N denotes the unknown source matrix. The problem can be further classified into overdetermined (m > n), undetermined (m < n), and square or determined (m = n) cases. The overdetermined case can be transformed into the square case by using the Principal Component Analysis (PCA) method [1,2]. The undetermined cases often result in loss of information or redundancy in the representation. Usually, approximate recovery is done based on the prior probability assumptions. Typically, Gaussian or Laplacian priors are used to estimate the mixing matrix. The idea is to identify the directions of maximum density via clustering. These directions correspond to the identification of the mixing matrix [3,4]. Once the mixing matrix is estimated, the source matrix is obtained by solving series of least square problems. In this paper, the data separation problem for m ≥ n cases will be considered.

1.
Every column of the source matrix is non-negative.

2.
Source matrix has a full row rank. 3.
Mixing matrix has a full column rank, and m ≥ n.

4.
The rows of the source matrix, and columns of the mixing matrix have unit norm. 5.
Source matrix is sparse.
The first assumption provides a mathematical advantage in designing the solution algorithms. Basically, the non-negativity assumption transforms the BSS problem into a convex programming problem [24][25][26]. In addition to that, non-negative source signals are very common in sound and image analysis. The next two assumptions ensure that the problem is recoverable (solvable). The fourth assumption is perhaps the limit of all BSS approaches, which is related to scalability and uniqueness (see [27]). The fifth assumption is the key sparsity assumption of the SCA-related approaches. Different scenarios of the SCA problem arise with different structures of the sparsity. The typical structures of sparsity discussed in the SCA literature can be classified as: • Locally Dominant Case: In addition to the basic assumptions, for a given row r of S, there exists at least one unique column c such that: • Locally Latent Case: In addition to the basic assumptions, for a given row r of S, there exists at least (n − 1) linearly independent and unique columns C r such that: • General Sparse Case: This is the default case.
The first case is one of the widely known cases in the SCA literature (see [28,29] for recent literature reviews). The second case is new to the of SCA literature, and few recent papers address this case [30]. The first two cases have identifiability conditions, which assure identification and recovery of the source signals. If the conditions of the first two cases do not apply to the given data, then the SCA problem belongs to the last general case. Typically, the general case may not have perfect identification (apart from the scalability and uniqueness issues). For the general case, minimum volume-based approaches [26,31,32], and extreme direction-based approaches [33] have been proposed to approximately recover the source matrix. Some special cases of sparse structure, apart from the above cases, that can recover original source data have also been studied in the literature, for example see [34].
In addition to the above, the conditions on X that improve the separability of sources are studied in the literature [35,36]. Methods that exploit spectral variability can be seen in [37]. Time series (frequency and transformation analysis) based methods to identify sparse sources have been presented in [4,38]. Further methods developed on the assumption of SCA can be found in [39,40]. The prominent application areas of SCA include, but are not limited to, the following: Blind Hyperspectral Unmixing (BHU) [41], chemical analysis [42], Nuclear Magnetic Resonance (NMR) spectroscopy [43], etc. Figure 1 portrays various linear BSS methods available in the literature. The grayed areas in the figure represent the research areas that will not be considered further in this paper. Thus, the branches unrelated to the paper that emerge from the grayed areas are ignored in the figure.  One of the critical gaps found in the SCA literature is the unavailability of a method to test the existence of a locally dominant assumption from the mixture data (X). In this paper, a novel mathematical programming and correntropy-based approach is presented for testing the locally dominant source assumption. The proposed approach also provides a solution for the locally dominant SCA problem.

Blind Signal Separation
Throughout this paper, the following notation styles are used. A capital letter bold face character, like B, indicates a matrix. A small letter bold face character with or without a subscript, like b r , indicates the rth column vector of matrix B. A small letter bold face character with a special subscript, like b p• , indicates the transpose of the pth row vector of matrix B. A non-bold small letter character, like b p,r , represents the pth row rth column element of matrix B.
The rest of the paper is organized as follows: Section 2 introduces the locally dominant case. Specifically, it displays the existing formulations from the literature, and presents the proposed novel formulation. A correntropy-based ranking method to eliminate the non-extreme data points is developed in Section 3. By incorporating the proposed model and the proposed ranking method, a tailored solution approach for the big data separation problem is developed in Section 4. A numerical study to assert the performance of the proposed approach is illustrated in Section 5. Finally, the paper is concluded with discussions in Section 6.

Locally Dominant Case
Consider the following determined or square version of the SCA model: Each column x i for i = 1, . . . , N of the mixture matrix (X) can be represented as follows: where a j is the jth column of the mixing matrix (A), and s j,i is the jth row ith column element of the source matrix (S). Equation (5) highlights that every column vector of X is a linear combination of the column vectors of A. Since the source matrix (S) is non-negative (i.e., s j,i ≥ 0 for all i & j), the combination is a conic combination. Thus, the columns of X are spanned by the columns of A. In other words, the extreme column vectors of X are the columns of A. Therefore, the locally dominant case boils down to the identification of the extreme vectors of X [24]. If all the columns of X are non-negative, then normalizing every column of X with respect to Norm-1 makes the columns of X coplanar. That is, all the columns are contained in the following lower dimensional plane: Now, the extreme points of X on the lower dimensional plane correspond to the columns of A. In addition to that, if some of the elements of X are negative, then the columns of X are projected onto a suitable lower dimensional plane. There are many approaches in the literature that are designed to work on this lower dimensional plane (affine hull) [25,44]. The advantage of working on this plane is that the extreme vector columns of X will form the vertices of a lower dimensional simplex. Thus, identifying the extreme points will result in the identification of the mixing matrix. Next, a few well known mathematical formulations and solution approaches for SCA from the literature are presented.

Conventional Formulations
One of the earliest mathematical formulations that identifies the extreme vectors of X is proposed in [24]. The idea is to pick one column of X, say x c , and check the possibility of it being an extreme vector. The formulation corresponding to x c is given as follows: s.t. : where α i ≥ 0, ∈ R is the variable that corresponds to the weight of x i ∈ X for i = 1, . . . , N.
The key idea that is exploited in the formulation is that the extreme vectors cannot be represented by a non-negative weighted combination of the other data vectors. The above formulation is a least square minimization problem. In the worse case, the formulation has to be executed N times for each x c , i.e., for c = 1, . . . , N. In the best case, the formulation has to be executed n times. A nonzero value of the objective function indicates that the vector x c is an extreme vector of X.
Another approach, called Convex Analysis of Mixtures of Non-negative Sources (CAMNS), that works on the affine hull is presented in [25]. The solution approach of CAMNS involves two major steps. In the first step, the parameters C ∈ R N×(n−1) and d ∈ R N of the affine hull are estimated as follows: where x j• is the jth row of X. In Equation (10), vector d is subtracted from all the rows of X, and in Equation (11), matrix C contains the columns corresponding to the eigenvectors associated with the largest (n − 1) eigenvalues of UU T . Basically, this first step is similar to the dimensionality reduction process executed in the PCA. In the second step, the following mathematical model is repeatedly solved: max/min : s.t. : where r ∈ R N is a generated vector, c i• is an ith row of C, and α ∈ R (n−1) is the unknown variable. The above formulation exploits the notion that the optimal solution of a linear program exists at the extreme points. For a given r, the formulation is solved twice; one time as the maximization problem and the other time as the minimization problem. This is done in order to get one (or maybe two) new extreme point(s) in every iteration. The (n − 1) extreme points are identified by resolving the above formulation repeatedly with respect to different r vectors. The crux of this method is hidden in the generation of r and convergence of the approach, which are the main concepts presented in [25]. Notice that the above two ideologies extract the columns of A sequentially. In addition to the above approaches, typical approaches in the literature identify the columns of A sequentially [28,29,45]. Therefore, there is no mechanism to validate the locally dominant assumption from X. One of the main objectives of this paper is to build such a mechanism using a mathematical formulation-based approach. In the following subsection, a novel formulation is presented that can provide the above mechanism.

Envelope Formulation
In this section, a mathematical model that can simultaneously identify all the extreme vectors of X under the locally dominant assumption is proposed. To the best of our knowledge, an exact method that identifies all the extreme vectors in a single shot is unavailable.
Let X 2 be the data obtained after normalizing each column of X with respect to Norm-2. Let y i ∈ X 2 be the ith column of X 2 . Let a T y = b be a plane that is inclined in such a way that all the columns of X 2 are contained in one halfspace of the hyperplane, and the origin is in the other halfspace. Such a hyperplane is referred to as linear envelope in this work. The envelope can be written as: where a ∈ R m corresponds to the normal vector of the envelope, and b ≥ 0 is a constant. Out of infinite possible representations of the above envelope, an envelope with b = 1 is selected for further analysis. Now, the distance between any vector y i and the envelope can be written as: Equation (16) follows from Equation (14). Ignoring the denominator (||a|| 2 ) in Equation (16) results in a proportional or scaled distance (p(y i )). Among infinite possible linear envelopes, the aim is to find the tightest or supporting envelope. A formulation that can identify the tightest envelope is given as follows: where a ∈ R m is the unknown variable. The above formulation can be equivalently written as: Formulation: (20) min. : s.t. : where µ ∈ R m is defined as µ = ∑ N i=1 y i . It is assumed that the duplicate and/or all zero columns of X 2 are removed before the execution of the above model. The aim of the above model is to find the envelope that has the minimum distance with respect to all the columns of X 2 . The above formulation is linear, and needs to be executed only once to identify all the extreme vectors. That is, at the optimal solution, the data points (y i 's) corresponding to the active constraints in Equation (19) will correspond to the extreme vectors of X 2 . It can be seen that the above formulation is always feasible. Furthermore, due to the design of the constraint given in Equation (19), the problem is linear. That is, the design allowed the usage of proportional distance (Equation (17)) instead of the nonlinear actual distance (Equation (15)).
In addition to that, from the LP theory, only m constraints will be active at the optimal solution. Let Θ be the matrix containing the columns of X corresponding to the m active constraints. Identifying the columns of A requires the following additional steps: If q i ≥ 0 for i = 1, . . . , N, then Θ corresponds to A. However, if any element of q i is strictly less than 0 for any i = 1, . . . , N, then it indicates that the locally dominant assumption is invalid for the given X. Thus, this serves as a test for the existence of the locally dominant assumption.
The above test works for typical data mixing without noise. However, the image data is usually integer data. There is always rounding, taking place at the source or mixture level. Therefore, the rounding effect needs to be incorporated into the above condition. It is proposed in this paper to follow the heuristic method to incorporate the rounding effect. Let ν = 0.5|Θ −1 e|, where e ∈ R n is a vector of all ones. The notion is that the rounding will create a maximum error of ±0.5 in each pixel element. Thus, the maximum error in any pixel element will be strictly less than 0.5. Therefore, the check for image data will be as follows: If q i + ν ≥ 0 for i = 1, . . . , N, then Θ corresponds to A.
Furthermore, the above idea can be extended for mixing scenarios containing noise. For instance, a level of tolerance can be used to analyze the noisy mixture data. Let ψ ≥ 0, ∈ R m be a tolerance parameter selected by the user. Then, based on the earlier discussion, the check for noisy data will be as follows: If q i + ψ ≥ 0 for i = 1, . . . , N, then Θ corresponds to A. The precise value of ψ may not be available for a given scenario. Hence, the parameter will be empirically selected based on trial experiments. In Section 5, an experiment that highlights the usage of parameter ψ for noisy data is illustrated.

Point Correntropy
Correntropy is a generalized correlation measure based on the concept of entropy. It is typically used in detecting local similarity between two random variables. Roughly speaking, it is a kernelized version of the conventional correlation measure. The measure first appeared in [46,47], and its usage as a cost function was illustrated in [48][49][50][51][52]. The optimization properties of the cost function are presented in [53]. The correntropy cost function (or the correntropic loss) for N errors is defined as: is a scaling parameter, ε ∈ R N is an array of errors, and k() is the transformation kernel function with parameter σ. In this work, a Gaussian kernel is selected, i.e., (23) is readily separable with respect to sample errors, and can be rewritten as: , and it will be referred to as point estimate of the correntropic loss. Let ε i be an error corresponding to the ith column vector of X. For a given kernel parameter σ, the point estimate provides similarity information of the ith vector with respect to the other data vectors of X. Based on the geometry of the vectors, the extreme vectors' similarity with respect to the central vectors should be typically less than the other non-extreme vectors of X. Thus, the point estimate of the correntropic loss function can be used as a measure to differentiate extreme and non-extreme vectors of X.

Solution Methodology
Our goal is to develop a geometric separation method for the non-negative data mixing problem, that can be applied to the "Big Data" scenarios. The concepts developed in Sections 2 and 3 are tailored with respect to big data, and the following solution approach is proposed. The summary of the proposed approach is illustrated in Figure 2 and Algorithm 1. Data: Given X ∈ R m×N Result: Find A ∈ R m×n and S ∈ R n×N + such that X = AS X 2 = normalize(X) ; Remove all zero columns and duplicate columns from X 2 , and say X 2 ∈ R m× N ; Estimate σ from S; Obtain X R by removing all columns with the 50 percentile point correntropy criterion from X 2 ; Let X E = X 2 \ X R ; Let y i be the ith column of X 2 ; a = Solution of LP Formulation (20) with respect to data X R ; while a T y i < 1 for i = 1, . . . , N do where w ∈ X E is the column with maximum infeasiblity to the LP constraints; Get a = Solution LP Formulation (20) with respect to data X R ; end Let Θ be the matrix containing the columns of X corresponding to the active constraints at the optimal solution of Formulation (20).; Calculate q i = Θ −1 x i for i = 1, . . . , N; Set ψ equal to 0 for non-noisy non-image data mixing, equal to ν for non-noisy image data mixing, or equal to the user-specified value for noisy mixing.; Locally dominant assumption invalid; Stop; end Data Ranking: As seen earlier, the extreme vectors of X 2 contain all the relevant information that is needed for separation (i.e., identifying A and S). Other data vectors are redundant in identifying the mixing matrix. Thus, the point estimate of the correntropic loss can be evaluated at all the data points with respect to the central columns of X 2 . Those data points that have low value of the correntropic loss can be removed from the data set X 2 without losing any information. The major issues in implementing the above idea are as follows: how to select the right value for σ, and how to define ε i corresponding to y i for i = 1, . . . , N.
The value of σ represents the kernel width, and should be large enough to contain the central vectors. However, it should be small enough to exclude the extreme vectors. In the following, we propose a practical method to estimate the value of σ. Let S be a sample of columns randomly selected from {y j,i1 − y j,i2 }, where y j,i1 is the jth element of y i1 . Based on the trial experiments, we found that σ = ∑ n j=1 δ j √ 2n is a good choice for the kernel width. Furthermore, the value of c should correspond to the center of the columns of X 2 . An approximate estimate for c can be c = g+h 2 , where g j = max j {y j,i }, h j = min j {y j,i } for i ∈ S. Thus, simple (and practical for big data) estimate of error for y i will be ε i = y i − c for i = 1, . . . , N. Based on the trial experiments, it can be concluded that the larger the size of S, the better the estimation. Furthermore, the strategy to eliminate the columns from the trial experiments is as follows. All the columns of X 2 that have the point estimate value lower than 50 percentile are removed from further consideration. (20), it can be seen that the big data corresponds to a large number of constraints. However, only m constraints are active, and the rest of the constraints are redundant (i.e., the rest of the constraints will never be active). The proposed data ranking method eliminates a good amount of the redundant constraints, depending upon the distribution of columns in the data set.

Handling a Large Number of Constraints: From Formulation
Let X R ⊆ X 2 be the data matrix obtained after eliminating possible central vectors, and let X E ⊂ X 2 be the eliminated central vectors. If the columns of X 2 are reorganized, then there exists a partition such that X 2 = [X R |X E ]. Once the tightest envelope is obtained for X R by solving Formulation (20), the envelope is validated with respect to X E . If all the columns of X E fall in the same half space (i.e., Equation (14) is feasible with respect to X 2 ), then it can be guaranteed that none of the extreme vectors of X 2 were eliminated. However, if there is any infeasibility detected, then the column of X E with maximum infeasibility is added to the LP, and the LP is resolved.
Performance Index: In order to evaluate the performance of the proposed approach, distance-based metrics will be used. Specifically, two metrics (one for the mixing matrix, and the other for the source matrix) will be used in this work. The following error measure is for the mixing matrix: where a j is the jth column of the original mixing matrix A, and a [j] is the corresponding column to a j , obtained from the recovered mixing matrix A. The corresponding columns are identified by the Hungarian algorithm [54]. The source matrix is obtained as follows: The above approach can be replaced by S = A −1 X, whenever A −1 exists. Similar to e A , the following error measure is for the source matrix: (27) where s j• is the jth row of the original source matrix S, and s [j•] is the corresponding row to s j• , obtained from the recovered source matrix S.

Numerical Experiments
In order to illustrate the performance of the proposed approach, numerical experiments are presented. The experiments are divided into four groups. In the first group of experiments, simulated non-noisy data is used to test the performance and sensitivity of the proposed approach. In the second group of experiments, image data mixtures are used to test the applicability of the proposed approach on real image data. The third and fourth group of experiments compare the proposed approach with the well known SCA methods in the literature. In all the instances of this section, the following specifications were used: S = {1, . . . , N}. All the random mixing matrices contain columns with unit norm. The LP resolving step is skipped in order to identify the number of instances in which the extreme vectors were eliminated. The LP was solved via the dual simplex method, using the state of the art Cplex 12.0 solver [55]. All the instances were solved on an Intel Xeon 2.4 GHz workstation, with 16 logical processors and 32 GB of RAM.

Simulated Data Separation
Setup: Given n and N, a random source matrix S that satisfies the locally dominant assumption is generated. For the generated S matrix, 100 random mixing matrices A are generated. Then, using the X = AS equation, 100 random mixture matrices are generated. On each mixture matrix, the proposed approach is implemented. This study is executed for all the following combinations of n = 5, 7, 9 and 11, and N = 1000, 5000, 10,000, 50,000, 100,000, 500,000 and 1,000,000. In addition to that, this study is also executed for n = 10, 20, 40, 60, 80 and 100 when N = 1,000,000. The value of ψ is set to zero in this experiment.
Results: Using one mixture matrix as input, and using the proposed approach, matrices A and S are recovered. This experiment is repeated 100 times for a given combination of n and N. The performance of the proposed approach is displayed in Table 1. The column corresponding to mErrA (vErrA) indicates the mean (variance) of error e A over the 100 instances. Similarly, columns mErrS and vErrS correspond to the mean and variance of error e S respectively. The column corresponding to mTime (vTime) indicates the mean (variance) of the solution time per instance in seconds (milliseconds) over the 100 instances. In addition to that, the column corresponding to mRed (vRed) indicates the mean (variance) of the percentage of columns eliminated over the 100 instances. Finally, the column corresponding to nMiss indicates the number of times the extreme vectors were eliminated based on the 50 percentile criterion. Since the mixtures are clean (i.e., no noise is added to the mixture data), the recovery is perfect. This can be seen from the very low average error (mErrA and mErrS) over the 100 iterations. Furthermore, the method is consistent in the recovery of the matrices, and it can be justified from the low variance in the error (vErrA and vErrS). The suitability and applicability of the proposed approach to big data can be seen from the solution time. For instance, Figures 3 and 4 illustrate the average time in seconds required to solve one instance of the proposed approach for N data points and n data sources. The behavior of the solution time with respect to log 10 (N) is exponential. In other words, the solution time increases linearly with respect to N. Furthermore, from Figure 4, it can be observed that the solution time is linear with respect to n. Thus, the algorithm is suitable for big data scenarios. Moreover, the 50 percentile criterion removes exactly 50% of the data points in all the cases, with zero variance. This is due to the fact that the source matrices are uniformly randomly generated. Due to the uniform generation of the source matrices, none of the extreme vectors were eliminated.

Image Mixture Separation
Setup: In the following experiments, image data available from the literature and online repositories are considered (see Table 2). Each source image of an image set is reshaped into one row vector. Then, the reshaped images are row-wise stacked together to generate the S matrix. The source matrices are pre-processed in order to satisfy the locally dominant assumption. Next, for each source matrix S, 100 random A matrices are generated, and correspondingly 100 random X matrices are analyzed using the proposed approach. Table 2 summarizes the details of the image sets that are considered in this subsection. The first column conveys the name of the image set that is being considered. The column corresponding to n indicates the total number of sources, and the column corresponding to N indicates the total data points (or column vectors) in X. The value of ψ is set to ν in this experiment. Results: Table 3 displays the results after executing the proposed approach on the 100 mixture instances of each image set. The columns have the notation similar to the earlier experiment, except that the vTime column units are in seconds. Moreover, Figures 5-9 depict the results. Low mean errors (e A and e S ) over the 100 runs are obtained for all the image sets. This shows that the method precisely recovers A and S matrices. A low value in the corresponding variance column indicates the high level of consistency of the proposed approach. The solution time, specifically for the finger print data set, indicates the applicability of the proposed approach for big data with complex image mixing scenarios. Based on the results, it can be seen that the 50 percentile criterion eliminates a good amount (more than 50%) of the redundant columns. However, in some instances (at most 7 percentage in one instance), the criterion eliminated some of the extreme vectors.

Setup:
Given n and N, a random source matrix S that does not satisfy the locally dominant assumption is generated in this experiment. For the generated S matrix, 100 random mixing matrices A are generated. Then, using the X = AS equation, 100 random mixture matrices are generated. This study is executed for n = 5, 7, 9, 11, 13 and 15, and for N = 10,000. The value of ψ is set to zero in this experiment. Well known methods from the SCA literature are compared with the proposed approach. The methods that are used for the comparison are N-FINDR ( [44]), VCA ( [45]), MVSA ( [56]). The objective of this experiment is to highlight that the above three (like the other typical algorithms in the literature) do not have the capability to test the locally dominant assumption from the knowledge of X. To the best of my knowledge, only an exhaustive search similar to the one presented in [24] can do such a test. However, the proposed approach does not require such an exhaustive search.
Results: Table 4 displays the results after executing the proposed and selected approaches on the 100 randomly generated mixture instances. The column corresponding to mErrA (vErrA) indicates the mean (variance) of error e A over the 100 instances for the three methods used from the literature. The lines in the column corresponding to ErrA indicate that the proposed approach was unable to identify any mixing matrix. The reason for the lines is the non-existence of the locally dominant assumption in the source data. This information is captured in the column corresponding to TnMiss. The numbers in the TnMiss column indicate the total number of times the proposed approach exited with the "no locally dominant sources" token. Based on the results, it can be seen that the other algorithms try to find the best match for the columns of A. However, they are unable to validate the locally dominant assumption. This is due to the fact that no such test is available in the literature. However, in all the scenarios, the proposed approach was able to conclude that the input data is not a mixture of sources that contain the locally dominant signals.

Setup:
In this experiment, a random source matrix S that satisfies the locally dominant assumption, is generated. For the generated S matrix, 100 random mixing matrices A are generated. Then, using the X = AS equation, 100 random mixture matrices are generated. In each mixture matrix, 5% of the columns are randomly selected, and a uniform noise between 0 and 0.01 is added to all the elements of the selected columns. This study is executed for n = 5, 7, 9, 11, 13 and 15, and for N = 10,000. Well known methods from the SCA literature are compared with the proposed approach. The methods that are used for the comparison are N-FINDR ( [44]), VCA ( [45]), MVSA ( [56]). The objective of this experiment is to comparatively assess the performance of the proposed and selected methods in noisy data. In this experiment, the value of ψ is defined as follows: ψ = ρ|Θ −1 e|, where e ∈ R n is a vector of all ones, and ρ takes the following values: 0, 0.2, 0.4, . . . , 1.
Results: Table 5 displays the results after executing the proposed and selected approaches on the 100 randomly generated noisy mixture instances. The columns corresponding to VCA, MVSA and N-FINDR present the average error e A over the 100 instances. The proposed approach is executed 100 times for each value of ρ, and the average error e A for each value of ρ is archived. The column corresponding to the proposed approach presents the best of the average errors e A over the values of ρ. Table 6 ( Figure 10) indicates the total number of times the method fails (succeeds) to identify the mixing matrix, for various values of ρ. Based on the low value of error in the proposed column, and the trends depicted in Figure 10, it can be seen that the proposed approach recovers A and S in the majority of the noisy instances for higher values of ρ. Moreover, as n increases, the complexity of mixing increases, and thus the proposed approach requires a higher value of ρ for the recovery of A and S.

Discussion and Conclusions
The SCA approaches are relatively new to the BSS problem when compared to the ICA approaches. The main critique that often appears with respect to locally dominant SCA approaches is the validity of the locally dominant criterion. In this paper, a mathematical modeling-based approach is proposed that can validate the existence of the locally dominant criterion from the given mixture matrix. That is, the formulation can be used not only to identify the mixing matrix, but also to validate the assumption presented in Equation (2). Although the approach is proposed for the determined case, it can also be applied to the overdetermined cases. The columns of the matrix X are proportional to the number of constraints in the proposed LP. Thus, big data often leads to LP with many redundant constraints. We propose the usage of interior point methods, when the total number of constraints is very high [57]. Moreover, LP decomposition-based approaches for SCA can also be developed to improve the solution time [58]. In addition to that, the LP presolve theory [59] can be designed to eliminate the redundant constraints in the proposed SCA approach. Roughly speaking, the point correntropic ranking method may be seen as a novel probabilistic approach for removing LP redundant constraints. The proposed method of estimating the point correntropy is computationally cheap, and can be applied to the big data scenarios. From the simulated data study, it can be seen that if the input data is uniformly distributed, then the 50 percentile criterion can be raised to a higher value. However, from the image data set, it is clear that the real world data is rarely uniformly distributed. Thus, the 50 percentile criterion is a good estimate to avoid the LP resolving. From the comparative experiments, it can be concluded that the proposed approach validates the locally dominant assumption in non-noisy and noisy mixing scenarios. To summarize, the proposed approach provides new insights into the BSS problem.