Next Article in Journal
Is the Voronoi Entropy a True Entropy? Comments on “Entropy, Shannon’s Measure of Information and Boltzmann’s H-Theorem”, Entropy 2017, 19, 48
Next Article in Special Issue
Efficient Low-PAR Waveform Design Method for Extended Target Estimation Based on Information Theory in Cognitive Radar
Previous Article in Journal
Extinction Analysis of Stochastic Predator–Prey System with Stage Structure and Crowley–Martin Functional Response
Previous Article in Special Issue
Centroid-Based Clustering with αβ-Divergences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Informed Weighted Non-Negative Matrix Factorization Using αβ-Divergence Applied to Source Apportionment

1
Laboratoire LISIC–EA 4491, Université du Littoral Côte d’Opale, F-62228 Calais, France
2
Laboratoire UCEIV–EA 4492, Université du Littoral Côte d’Opale, SFR CONDORCET FR CNRS 3417, F-59140 Dunkerque, France
*
Author to whom correspondence should be addressed.
Current address: Savencia Group, F-78220 Viroflay, France.
Entropy 2019, 21(3), 253; https://doi.org/10.3390/e21030253
Submission received: 19 January 2019 / Revised: 21 February 2019 / Accepted: 26 February 2019 / Published: 6 March 2019
(This article belongs to the Special Issue Information Theory Applications in Signal Processing)

Abstract

:
In this paper, we propose informed weighted non-negative matrix factorization (NMF) methods using an α β -divergence cost function. The available information comes from the exact knowledge/boundedness of some components of the factorization—which are used to structure the NMF parameterization—together with the row sum-to-one property of one matrix factor. In this contribution, we extend our previous work which partly involved some of these aspects to α β -divergence cost functions. We derive new update rules which are extendthe previous ones and take into account the available information. Experiments conducted for several operating conditions on realistic simulated mixtures of particulate matter sources show the relevance of these approaches. Results from a real dataset campaign are also presented and validated with expert knowledge.

1. Introduction

Source apportionment consists of estimating the particulate matter (PM) sources present in the ambient air together with their relative concentrations. A source is fully characterized by a profile which gathers the m chemical species’ proportions (expressed in ng/μg) that constitute it. Usually, several, say n, PM samples are collected using an automated sampler, then characterized to asses the chemical composition. Each of them can be written as a mixture of p profiles, with different concentrations (expressed in ng/m 3 ). Mathematically, if we respectively denote by X, G, and F as the non-negative n × m data matrix, n × p contribution matrix, and p × m profile matrix, the collected data reads
X G · F .
While being known under the name of (blind) source separation in the signal/image processing community, Equation (1) is called the receptor model in the chemistry community. In practice, the latter should satisfy the following properties [1]:
  • The entries of G and F are non-negative (one cannot assume a negative mass in G nor a negative proportion of chemical species in F).
  • The product G · F must fit the data matrix X.
  • When one entry of the product ( G · F ) i j does not fit the entry x i j , we should then check
    x i j ( G · F ) i j ,
    i.e., the estimated mass of a chemical species in a sample should not be above the corresponding measured one.
As a consequence, estimating the unknown matrices G and F is mainly performed using positive matrix factorization (PMF) [2] and, in particular, using its popular version from the US Environmental Protection Agency.
Independently from the PMF investigations done by the chemistry community, Equation (1) has been massively considered by the signal/image processing and the machine learning communities which processed it with non-negative matrix factorization (NMF) techniques [3].
The general idea behind NMF is to minimize a discrepancy measure between X and the estimated product G · F . Such a problem has been extensively studied in the past years. Historically—apart from pioneering work [4]—most methods are based on an alternating optimization of the factor matrices. NMF has been massively investigated because of the more interpretable results it provides when compared with methods without sign constraints. NMF was successfully applied to many fields, e.g., hyperspectral unmixing [5,6], astrophysics [7,8], fluorescence spectroscopy for agro-food analysis [9], audio signals [10], or environmental data processing [11].
It should be noticed that NMF is flexible and can take into consideration additional assumptions to provide a better estimation of the NMF factors. In the literature, assumptions such as sparseness [12,13], fixed row and/or column sums [13,14], structure in the matrix factors [15,16], or orthogonality constraints [17] were investigated.
Solving Equation (1) can be performed by appropriately choosing a discrepancy measure between X and G F . When this measure is the Frobenius norm of their difference, the possible presence of a few outliers may corrupt the NMF enhancement. As a consequence, robust NMF methods were proposed to deal with a predefined number of outliers. While some of them decompose the data matrix into the sum of a low-rank and a sparse matrix—where the latter contains the outlying component [18]—most ones consider some modified cost functions as dissimilarity measures which gave rise to flexible and robust algorithms, e.g., Bregman-NMF [19], α -NMF [20], β -NMF [21,22], α β -NMF [23], Correntropy-NMF [24], Huber-NMF [25] (it should be noticed that the Huber cost function has also been considered for robust PMF [26]).
Lastly, it should be noticed that in receptor models, each data point x i j is provided with an uncertainty measure σ i j and PMF actually solves a weighted optimization problem [4,26]. Weighted extensions of NMF have been also considered, e.g., to enhance the factorization [27] or to deal with missing entries [28,29]. However, it is known than both the PMF [30] and the standard NMF techniques face some convergence issues (however, the convergence of NMF is guaranteed under some separability assumptions [3] which are not satisfied in practice in the considered application and which are thus out of the scope of this paper) [3].
As a consequence, we investigated the enhancement provided by informed NMF. In Ref. [31], the use of a Gaussian plume model enables us to assess the presence or absence of some punctual sources, depending on wind measures, and source and sensor locations which allowed us to fix some entries of G to zero. In the absence of a punctual source, such an information should be dropped. In Ref. [32], an informed NMF-based weighted criterion takes into consideration the known values of some terms of F (Informed NMF has also been proposed in [33] where the known entries are seen as a penalization term in the NMF optimization problem) in order to improve the separation. For that purpose, we introduced a specific parameterization for NMF methods using a Frobenius norm. This approach should be considered as a flexible NMF counterpart of [34] in between blind source separation—where no information on F is provided—and regression, where F is fully known. While it was shown in practice to be less sensitive than blind NMF to convergence issues, this method can still be affected by outliers which are present in many receptor modeling problems.
In this paper, we thus extend our previous work [32] by (i) investigating and discussing several α β -divergence expressions, (ii) exploring different data normalization procedures combined with set values (as profiles are chemical species proportions, the rows of F are normalized), and (iii) adding minimum and maximum bounds to some of the unknown values of F. The methods we propose in this paper have been partially introduced in [35,36], in the framework of the β -divergence only. We generalize here [35,36] to the α β -divergence and we provide a detailed study of their performance, shown on both realistic simulations and real data campaign.
The remainder of the paper is structured as follows. We recall some properties of the α β -divergence in Section 2. Section 3 introduces our proposed NMF parameterization—which puts on light the special structure of the profile matrix in the NMF algorithm—while Section 4 is dedicated to the problem formulation. We introduce our proposed methods in Section 5 that we test in Section 6. Lastly, we conclude about the proposed work in Section 7. Appendix A introduces update rules for an alternative informed α β -NMF method.

2. Robust Cost Functions

2.1. Introduction to Modified Cost Functions

Chemical data often face some particular measures whose characteristics substantially differ from those which are commonly observed. From a signal processing point of view, such data may be considered as outliers which may degrade the performance of classical algorithms using the Frobenius norm in their cost function. Such an issue is often addressed in the field of robustness where the challenge is to design new algorithms which take into account the above corrupted data.
Apart from the low-rank plus sparse decomposition [18], robust NMF algorithms using modified cost functions were investigated. Indeed, these robust functions provide less penalization to large entries of the residual matrix, which is defined as
R X G · F .
Among them, the Huber cost function accounts for the differentiable connection between the 2 and 1 norms, according to the residual value with respect to an adaptive cutoff parameter. Another popular modified cost function stands for the correntropy measure [24] which accounts for a bounded and non-convex discrepancy measure.
In contrast with the above measures, the α β -divergence is not a norm as it is not symmetrical. Figure 1 shows an example of the behavior of such functions which are penalizing the values of the residual in different ways. As mentioned earlier, the α β -divergence is the only cost function to present a possible asymmetric behavior around the null residual value. Hopke [1] highlighted the need for methods dedicated to chemical source apportionment which enforce a positive residual value. This situation fits well with the configuration described in Figure 1.

2.2. α β -Divergence

The α β -divergence (For special values of α , β , the reader is invited to consult [23]) is a parametric discrepancy measure which may be used to evaluate the gap between two scalar quantities p and q, i.e., ( α , β , α + β ) 0 ,
D α , β ( p | | q ) = 1 α β p α q β α α + β p α + β β α + β q α + β .
Special values of the parameters lead to very famous divergence measures [23], such as α -divergences or β -divergences [21]. These divergences are different from classical norms in the sense that they check some common properties—e.g., non-negativity—while others such as symmetry, scalability and triangular inequality are not satisfied.
Cichocki et al. [23] study the influence of the parameters α and β on the robustness of the estimated data (they also establish general connections between the general α β -divergence and the scaled α α + β -order α -divergence with an α + β zoom of its arguments). To this aim, they express the sensitivity to outliers by computing the differentiation with respect to an unknown parameter here replaced for simplicity with an entry of F, namely F r j , i.e.,
D α , β ( X X ^ ) F r j = i X ^ i , j F r j ( X ^ i j ) α + β 1 weight ln 1 α ( X i j / X ^ i j ) α - zoom ,
where
X ^ i j r G i r F r j ( G · F ) i j ,
and
ln 1 α ( z ) = z α 1 α , if α 0 , ln ( z ) , if α = 0 .
Considering Equation (6), the expression X ^ i , j F r j = G i r may be replaced in Equation (5), leading to
D α , β ( X X ^ ) F r j = i G i r ( X ^ i j ) α + β 1 α β weight ln 1 α ( X i j / X ^ i j ) α - zoom
For the sake of comparison, sensitivity equations of M-estimators [37] are usually designed for the weighted Frobenius cost function (corresponding to α = 1 and β = 1 in a α β -divergence), i.e.,
D 1 , 1 ( X X ^ ) F r j = i G i r W i j weight ( X G · F ) i j Residual entry
where W i j accounts for the general entry of the weight matrix. This weight is usually viewed as a confidence index into the corresponding data. As a consequence, a large residual together with a large weight leads to large modifications in the estimates. In the frame of Equation (8), the weight entry reads
W i j = ( X ^ i j ) α + β 1 l n 1 α ( X i j / X ^ i j ) R i j = ( X ^ i j ) α + β 1 α β weight 1 α ( X i j ) α ( X ^ i j ) α ( X i j X ^ i j ) α - zoom weight .
Figure 2 describes the α -zoom weight as a function of the ratio X i j X ^ i j for different values of α . It turns out that α < 1 provides small weight to large values of the ratio X i j X ^ i j . In other words, this situation does not induce big changes in the estimates. Outliers such as X i j X ^ i j will be allowed in this context.
Equation (8) combines two effects, namely an α -zoom and an α β weight effect. When α > 1 , the emphasis of the α -zoom is put on larger values of the ratio X i j X ^ i j while the emphasis is put on smaller values of this ratio when α < 1 . These properties are recalled in Table 1. The α β weight effect in ( X ^ i j ) α + β 1 is expressed as a function of α + β in Table 2.
To summarize, α can be used to control the influence of large or small ratios in the estimator through the α -zoom, while β provides some control on the weighting of the ratios depending on the demand to better fit to larger or smaller values of the model [23]. Gathering these properties, the space of values ( α , β ) may be partitioned in several areas as described in Figure 3.
Each zone allows a certain kind of outliers. Areas 1 and 2 allow outliers of the form X i j > X ^ i j for large and small amplitudes of X ^ i j , respectively. Areas 3 and 4 accept outliers such as X i j < X ^ i j for large and small amplitudes of X ^ i j , respectively. Areas 1 and 3 favor a better fit to small values of X while areas 2 and 4 favor a better fit to large ones. As a consequence, for our considered application, we propose to favor a best fit for major species with respect to minor species. This leads to considering the case α + β > 1 . Secondly, if the estimation does not fit the data, we prefer keeping situations where X i j > X ^ i j holds, as explained in Section 1 and in [1]. This fact results in the choice α < 1 . These two conditions give rise to an area of interest which is area 2 and which is kept along the article (for convexity reasons in NMF [23], area 2 should be delimited to β < 1 ).

2.3. Existing NMF Methods with Parametric Divergences

NMF methods are formulated as the global minimization of a cost function under the non-negativity of both factors G and F. Aside from pioneering work [4], NMF is classically performed through an iterative procedure which alternatively minimizes—for a fixed F (respectively G)—a discrepancy between X and G · F . Multiplicative update rules were firstly proposed in [38] for the Frobenius norm and the Kullback–Leibler divergence. While being easy to implement, multiplicative algorithms only ensure that the cost function does not increase within iterations, which is not sufficient for getting a limit point. The study of NMF convergence through the Karush–Kuhn Tucker (KKT) conditions was explored by Lin [39]; stationarity is only a necessary condition of a local minimum. Moreover, some limit points which are not stationary may exist, especially if some components of F and G are initialized to zero.
Moreover, most algorithms are sensitive to the initialization and to the presence of outliers. Parametric divergences may reduce the influence of this last drawback by an appropriate choice of the hyperparameters.
Cichocki et al. [20] proposed multiplicative update rules with α -divergence. The developed rules were based on the majorization-minimization (MM) strategy [40] but they may also be obtained in a heuristic way by using the KKT conditions or partial derivatives of the cost function as well.
Févotte and Idier [21] proposed to use the β -divergence as a cost function and derived different kinds of rules according to three different strategies involving the heuristic approach, the majorization-minimization strategy [40] and a new one called majorization-equalization. This last strategy provides a larger step size and a faster convergence. Hennequin et al. stated that the β -divergence could be viewed as a special case of Bregman divergence [41], thus leading us to apply Bregman divergence theorems to β -divergence. Cichocki et al. [23] proposed NMF based on generalized α β -divergences in the framework of majorization-minimization (MM).
Extending the work in [27,42] from the one hand and in [23] from the other hand, we introduced in [35] a weighted β -NMF ( β -WNMF) defined for β [ 0 ; 1 ] . It is straightforward to extend it to a Weighted α β -NMF which amounts to minimizing a weighted α β -divergence,
min G 0 , F 0 D W α , β X G · F min G 0 , F 0 i , j W i j D α , β x i j ( G · F ) i j ,
and yields
F k + 1 = F k N F α , β ( G k , F k ) , G k + 1 = G k N G α , β ( G k , F k ) ,
where
N F α , β ( G , F ) G T · W X α G · F β 1 G T · ( W ( G · F ) α + β 1 ) 1 α ,
N G α , β ( G , F ) W X α G · F β 1 · F T ( W ( G · F ) α + β 1 ) · F T 1 α ,
and X Y and X Y respectively denote the componentwise product and division between two matrices. W is a weight matrix used to model the uncertainties σ i j associated to the data samples x i j , and whose general element w i j is set to w i j σ i j ( α + β ) . This approach encompasses several other methods, especially α β -NMF [23] if W = 1 n m , i.e., for any i and j, w i j = 1 , and β -NMF [21] if additionally α = 1 .
Apart from multiplicative updates, NMF based on alternating direction method of multipliers (ADMM) were recently proposed [43] for their ability to perform distributed computations for large scale data and in particular, Sun and Févotte introduced an approach based on the β -divergence [22] while Zhu and Honeine [24] proposed a correntropy-based approach for large deviations. Such fast approaches are not required for the considered chemical application where the global computation time is not an issue.

3. Constraint Parameterization

In this paper, we assume the values of some components of the profile matrix F to be provided or bounded by experts. We thus propose a formalism which takes into account this knowledge. It extends our previous parameterization [32] which only considered equality constraints.
Let Ω E and Ω I be two p × m binary matrices which inform the presence/absence of equality and inequality constraints on each element f i j of the matrix F, respectively, i.e.,
ω i j E = 1 if f i j is known , 0 otherwise , ω i j I = 1 if f i j is bounded , 0 otherwise .
We then define the p × m binary matrices Ω ¯ E and Ω ¯ I as Ω ¯ E 1 p m Ω E and Ω ¯ I 1 p m Ω I , where 1 p m is the p × m matrix of ones. By construction, we obtain
Ω E Ω I = 0 , Ω I Ω ¯ E .
We denote by Φ E the p × m sparse matrix of set values, i.e.,
Φ E F Ω E .
Please note that φ i j E —the ( i , j ) -th element of Φ E —is equal to zero when ω i j E = 0 . We can easily prove that
Φ E Ω E = Φ E , Φ E Ω ¯ E = 0 .
Similarly, we define Φ I + and Φ I the p × m sparse matrices of upper and lower bounds (equality constraints could be considered as inequalities, with the same upper and lower bounds. However, in some preliminary tests, we found our proposed approaches to outperform those using bound constraints only), respectively, i.e.,
Φ I F Ω I Φ I + .
Let f i and φ i E be the i-th column of F and Φ E , respectively. A column f i may be expressed as
f i = φ i E + Γ i θ i ,
where θ i and Γ i are respectively the ( p l i ) × 1 vector of free parameters and the p × ( p l i ) orthonormal basis of free parameters [32]. From Equation (20), we define Δ f i as
Δ f i f i φ i E = Γ i θ i ,
and Δ F as the matrix gathering each column Δ f i , i.e.,
Δ F F Φ E .
Following the stages in [32]—which combine Equations (17), (18) and (22)—we obtain the matrix form of Equation (20):
F = Ω E Φ E + Ω ¯ E Δ F .
This expression of F puts on light its specific structure, as F is expressed as the sum of its set and free parts. Moreover, combining Equations (16) and (23) leads to
F = Ω E Φ E + Ω ¯ E Ω ¯ I Δ F + Ω ¯ E Ω I Δ F ,
which shows that the free part of F may be decomposed as a bounded part and an unconstrained one.

4. General Problem Formulation

The proposed informed NMF methods consist of estimating the matrices G and F in order to get an approximate factorization (1) under the above constraints, i.e.,
min G 0 , F 0 D W α , β X G · F s . t . F Ω E = Φ E , Φ I F Ω I Φ I + , F · 1 m m = 1 p m ,
where the weighted divergence D W α , β · · is defined in Equation (11). The first constraint ensures that some predefined components of F are set while the second one forces the selected components to be bound-constrained. The last condition enforces each row of F to be normalized, i.e., j = 1 m f i j = 1 , i = 1 p (Please note that the normalization met in remote sensing [44]—where the sum of each row of F is equal to one—is not similar, except in a noiseless case in the framework of exact factorization. Moreover, the normalization also differs from the one met in mobile sensor calibration [13] as the normalization is approximately satisfied in the latter).
The main challenge in the Equation (25) consists of finding solutions which are satisfying all the above constraints. The first constraint leads to consider the parametrization (23) that we used in [32]. By substituting the parametrization (23), Equation (25) becomes a constrained NMF with respect to G and Δ F , i.e.,
min G 0 , Δ F 0 D W α , β X | | G · ( Ω E Φ E ) + G · ( Ω ¯ E Δ F ) s . t . Φ I Δ F Ω I Φ I + , Δ F · 1 m m = 1 p m Φ E · 1 m m .
The last condition is derived from the last one in Equation (25) combined with Equation (22).
In the case of bound constraints only, no dedicated parameterization exists, but projective methods have been developed [45]. The row sum-to-one constraint has been taken under account by using a special parameterization in [14]. However, dealing with all the constraints together at the same time is a difficult task. We thus propose a less elegant, yet efficient strategy which consists of considering them sequentially. By dropping the bound constraint, we obtain the following reduced problem:
min G 0 , Δ F 0 D W α , β X | | G · ( Ω E Φ E ) + G · ( Ω ¯ E Δ F ) s . t . Δ F · 1 m m = 1 p m Φ E · 1 m m .
As an alternative to the above problem, please note that by combining Equations (1) and (23), we obtain
X G · ( Ω E Φ E ) G · ( Ω ¯ E Δ F ) .
We can thus derive a slightly different problem, i.e.,
min G 0 , Δ F 0 D W α , β X G · ( Ω E Φ E ) | | G · ( Ω ¯ E Δ F ) s . t . Δ F · 1 m m = 1 p m Φ E · 1 m m ,
which yieldsslightly different update rules. We proposed in [35] some multiplicative update rules to solve Equation (29) in the case of β -divergence only. The extension to the α β -divergence is derived in Appendix A.
As explained above, instead of looking for the solution of Equation (26) directly, we sequentially consider each additional set of information, i.e., we first estimate Δ F and F that we then normalize and project onto the set of admissible solutions (or that we project and then normalize, respectively) within iterations.

5. Proposed Informed α β -NMF Methods

5.1. Weighted α β -NMF with Set Constraints

In this section, we firstly aim to solve Equation (27) without the sum-to-one constraint. The whole strategy follows the majoration-minimization technique [40] and consists of (i) finding a majoring function which is convex with respect to the unknown parameters, and (ii) minimizing this auxiliary function instead of the original one.
Proposition 1.
Update rules for the free part of the profile matrix are
F k + 1 F k Ω ¯ E N F α , β ( G k , F k ) ,
where (denoting λ α + β 1 ), we define
N F α , β ( G , F ) G T ( W X λ X G Φ E 1 β G ( F ) β 1 ) G T ( W X λ X G Φ E λ ( G ( F ) ) λ ) 1 α .
Proof. 
We consider a column of the data since the divergence may be split into independent partial divergences. Using the notations defined in Section 3, we hereafter drop the index i for the vectors f ̲ i Γ i θ ̲ i , φ ̲ i E , θ ̲ i , and for the matrix Γ i . Let k be the current iteration index and let us define
U G Γ .
Expression (32) together with Equation (21) provide
D w ̲ α , β ( x ̲ G φ ̲ E + G Δ f ̲ ) = D w ̲ α , β ( x ̲ G φ ̲ E + U θ ̲ ) .
The weighted α β -divergence between two corresponding column vectors reads
D w ̲ α , β ( x ̲ G f ̲ ) = i w i x i α + β h α , β ( G φ ̲ E ) i + j u i j θ j x i ,
where ( α , β , α + β ) 0 ,
h α , β ( z ) 1 α β α α + β + β α + β z α + β z β .
Provided that h α , β ( 1 ) = 0 and noticing that h α , β ( z ) is convex for z 0 and β [ min ( 1 , 1 α ) ; max ( 1 , 1 α ) ] [23], Jensen’s inequality may be applied twice, i.e.,
h α , β ( G φ ̲ E ) i + j u i j θ j x i ( x G φ ̲ E ) i x i h α , β j u i j θ j ( x G φ ̲ E ) i
and
h α , β j u i j θ j ( x ̲ G φ ̲ E ) i j u i j θ j k l u i l θ l k h α , β θ j l u i l θ l k ( x G φ ̲ E ) i θ j k ,
where the superscript k is the current iteration number and θ j is the j-th element of the free parameter vector θ ̲ introduced in Equation (20). Equation (34) together with expressions (36) and (37) yield the following auxiliary function:
H 2 , w α , β ( θ j , θ j k ) = i w i x i α + β 1 ( x ̲ G φ ̲ E ) i j u i j θ j k l u i l θ l k · h α , β θ j l u i l θ l k ( x ̲ G φ ̲ E ) i θ j k .
Canceling its gradient H 2 , w α , β ( θ j , θ j k ) θ j leads to the optimum, i.e.,
θ j θ j k α = i w i u i j ( x ̲ G φ ̲ E ) i 1 β x i λ ( l u i l θ l k ) β 1 i w i u i j x i λ ( x ̲ G φ ̲ E ) i λ ( l u i l θ l k ) λ ,
which reads in its vector form
θ ̲ θ ̲ k α = U T [ w ̲ x ̲ λ ( x ̲ G φ ̲ E ) 1 β ( U θ ̲ k ) β 1 ] U T [ w ̲ x ̲ λ ( x ̲ G φ ̲ E ) λ ( U θ ̲ k ) λ ] .
By combining Equation (21) with the above relationship, we derive the expression of one column of the matrix F :
f ̲ k + 1 f ̲ k = Γ U T [ w ̲ x ̲ λ ( x ̲ G φ ̲ E ) 1 β ( U θ ̲ k ) β 1 ] Γ U T [ w ̲ x ̲ λ ( x ̲ G φ ̲ E ) λ ( U θ ̲ k ) λ ] 1 α .
By replacing U according to Equation (32), and by noticing that Γ Γ T = diag ( ω ̲ ¯ E ) , it results in the new update rule:
f ̲ k + 1 f ̲ k ω ̲ ¯ E N f ̲ k ,
where
N f ̲ k G T w ̲ x ̲ λ ( x ̲ G φ ̲ E ) 1 β ( G f ̲ k ) β 1 G T w ̲ x ̲ λ ( x ̲ G φ ̲ E ) λ ( G f ̲ k ) λ 1 α .
Similarly to [35], we derive the update rules by writing the matrix form of Equation (43), which completes the proof. □
Appendix A proposes the update rules for the problem (29). These rules are almost similar to those introduced above as they present some differences in the multiplicative mask. We show in Appendix A that the update rules proposed in the main part of this paper extend the ones proposed in Appendix A by iteratively updating the weights.
Update rules for G correspondto an unconstrained α β -WNMF driven by Equation (12) since no information is available on G. Their validity is only guaranteed within the convex domain, i.e., for β [ min ( 1 , 1 α ) ; max ( 1 , 1 α ) ] . Outside this domain, some additional assumptions on the reconstructed data are needed to ensure the local convexity property [23]. As we chose to set α and β so that they belong to area 2 in Figure 3, the convexity domain reduces the possible area to the intersection between area 2 and the half-plane β 1 .

5.2. Normalization Procedures

In the considered application, the rows of the profile matrix are summed to one. This case is different from the one encountered in hyperspectral unmixing [44]—since our constraint cannot be split into independent vectorial subproblems—and in mobile sensor calibration [13] as the sum-constraint is only approximately satisfied in the latter. As a consequence, in our previous work [32,35], we used to normalize the matrices G and F in each iteration, after estimating them. We reformulate these steps below (see Section 5.2.1) while we investigate an alternative normalization procedure in Section 5.2.2. They are introduced in the framework of the above approach but the rules may be applied to our previous work [32,35] as well.

5.2.1. Classical Normalization

Let us define F ˜ as the normalized profile matrix and G ˜ the corresponding scaled contribution matrix. In order to hold the sum-to-one property, Lantéri et al. [14] proposed a change of variables under the form (please note that the normalization constraint can also be solved as a penalization term in the NMF problem formulation [13]. This setting is interesting when the sum constraint is approximately satisfied, which is not the case for the considered application).
F ˜ i j = F i j j = 1 m F i j ,
which may be written under matrix form as
F ˜ = F F · 1 m m .
This equation enables to normalize the rows of F whereas the symmetric version enables to scale the columns of G correspondingly, i.e.,
G ˜ = G [ 1 n m · F T ] .
The product G ˜ · F ˜ then reads
G ˜ · F ˜ = F F · 1 m m · G [ 1 n m · F T ] ,
which results in the expression of its general entry:
( G ˜ · F ˜ ) i j = k G i k l F k l · F k i · l F k l = k G i k F k i = ( G · F ) i j .
This means that the matrix product is maintained throughout the normalization process. Since the cost function to minimize only depends on this product, this property ensures the same decrease as in the unconstrained case within iterations.
The normalized expression of F—denoted F ˜ —at iteration k + 1 then reads
F ˜ k + 1 Φ E Ω E + Δ F ˜ k Ω ¯ E R F α , β [ Φ E Ω E + Δ F ˜ k Ω ¯ E R F α , β ] · 1 m m ,
where
R F α , β M F α , β ( G , F ) for Problem ( 29 ) , N F α , β ( G , F ) for Problem ( 27 ) ,
and where F ˜ k stands for the free part of the normalized matrix F ˜ defined by F ˜ k F ˜ k Ω ¯ E . Noticing that
F ˜ k Ω ¯ E = F ˜ k Ω ¯ E
we express Equation (49) with respect to F ˜ k :
F ˜ k + 1 Ω E Φ E + F ˜ k Ω ¯ E R F α , β Ω E Φ E + F ˜ k Ω ¯ E R F α , β · 1 m m .
Similarly, we derive the update rules for G ˜ k + 1 , i.e.,
G ˜ k + 1 G ˜ k N G α , β ( G ˜ k , F k + 1 ) 1 n m · ( Ω E Φ E + F ˜ k Ω ¯ E R F α , β ) T ,
where N G α , β ( G ˜ k , F k + 1 ) —defined in Equation (14)—is computed with the unnormalized matrix F k + 1 which reads
F k + 1 = Ω E Φ E + F ˜ k Ω ¯ E R F α , β .
Equations (52) and (53) thus provide the update rules for our first normalized and constrained WNMF method denoted α β -N 1 -constrained and weighted NMF (CWNMF) below. Although the set profiles are lost within iterations due to the normalization process, we noticed in preliminary tests that they were recovered asymptotically.

5.2.2. Alternative Normalization

As an alternative, we now propose a second normalization which keeps the set constraints on F within iterations. Starting with Equation (30) that we normalize, it turns out that
F ˜ k + 1 Δ F ˜ k Ω ¯ E R F α , β ( Δ F ˜ k Ω ¯ E R F α , β ) · 1 m m ( 1 p m Φ E · 1 m m ) ,
where ( 1 p × m Φ E · 1 m m ) accounts for the matrix involving the sum of the free components for each source, and the other part of the expression represents the different proportions within the free profiles. Using the property (51), alternative update rules may be derived
F ˜ k + 1 Ω E Φ E + Ω ¯ E F ˜ k R F α , β Ω ¯ E F ˜ k R F α , β 1 m m ( 1 p m Φ E · 1 m m ) .
This normalization keeps the constraints verified within iterations but may move along directions different from the steepest descent direction. During this process, the contribution matrix does not require a scale factor as in the first method since the scale factor is only applied to the free parameters of F. We then estimate G k + 1 using the unconstrained rules defined in Equations (12) and (14). The update rules (12) and (56) are associated with our second normalized and constrained WNMF method denoted α β -N 2 -CWNMF below.

5.2.3. Description of Algorithm Acronyms

We proposed above some update rules for two methods for normalized and constrained WNMF. However, we also proposed different update rules in Appendix A for which the above normalizations can be applied. As these methods minimize the divergence between G Δ F and the Residual X G Φ E , we add a “-R” to their acronym. Table 3 outlines the necessary information for each method. The pseudo code for α β -N x -CWNMF(-R) methods is shown in Algorithm 1.
Algorithm 1 α β -N x -constrained weighted non-negative matrix factorization (CWNMF) residual (-R) method.
i 0
while i N do
  Update F at fixed G according to Equation (52) or (56)
  Update G at fixed F according to Equation (12) or (53)
   i i + 1
end while

5.3. Bound-Constrained Normalized and Weighted α β -NMF

We now focus on problem (25) which involves several kinds of constraints which should coexist simultaneously. To our knowledge, only Lin [45] deals with bound constraints and proposes to adapt the stepsize of projected gradient techniques in order to both decrease the cost function while holding the constraints. However, the work was devoted to bound constraints only, and his solution does not suit our problem with normalization.
As explained above, we propose to tackle them by applying a projection onto the admissible domain. Bound constraints act as safety barriers which prevent unrealistic solutions. However, the combination of normalization and projection should be applied in a predefined order. We thus propose below two structures:
  • the bound constraint projection followed by a normalization stage,
  • or the normalization followed by the projection.

5.3.1. Informed NMF with Bound Constraints and Normalization

In this subsection, we consider update rules for N 2 -CWNMF methods. The same kind of procedure should be done for N 1 -CWNMF approaches proposed above. We assume that we get at iteration k a normalized matrix F ˜ k and an unscaled (indeed, no scaling is applied on G in N 2 -CWNMF, as explained in Section 5.2.2) contribution matrix G k . Combining Equations (24) and (54) provide
F k + 1 = Φ E Ω E + Δ F ˜ k Ω ¯ E Ω ¯ I R F α , β + Δ F ˜ k Ω ¯ E Ω I R F α , β ,
which may be simplified by using Equation (51), i.e.,
F k + 1 = Φ E Ω E + F ˜ k Ω ¯ E Ω ¯ I R F α , β + F ˜ k Ω ¯ E Ω I R F α , β .
Applying the bound constraint then consists of
F k + 1 = Φ E Ω E + F ˜ k Ω ¯ E Ω ¯ I R F α , β + P Ω I ( F ˜ k Ω ¯ E R F α , β ) ,
where P Ω I . is the projection operator defined by
P Ω I U Ω I Φ I if Ω I U Φ I , Ω I Φ I + if Ω I U Φ I + , Ω I U otherwise .
The second normalization proposed in Section 5.2.2 consists of scaling the free part without changing the set components, which reads
F ˜ k + 1 = Φ E Ω E + F ˜ k Ω ¯ E Ω ¯ I R F α , β + P Ω I ( F ˜ k Ω ¯ E R F α , β ) ( F ˜ k Ω ¯ E Ω ¯ I R F α , β + P Ω I ( F ˜ k Ω ¯ E R F α , β ) ) · 1 m m ( 1 p m Φ E 1 m m ) .
This rule keeps the sum-to-one constraint and the set values. The bound constraints may be lost within because of the normalization but were found to be asymptotically recovered in our tests. The associated updates for G follows the unconstrained ones and it has not to be corrected by a scale factor, i.e.,
G k + 1 = G k N G α , β ( G k , F ˜ k ) ,
where N G α , β ( G k , F ˜ k ) has been introduced in Equation (14). The rules (61) and (62)—associated to our informed NMF approach named α β -BN 2 -CWNMF—do not ensure the cancellation of the gradient of Equation (38) along iterations but they preserve two set of constraints among the three ones. Let us recall that the approach using the first proposed normalization—denoted α β -BN 1 -CWNMF—can be derived with the same strategy. The pseudo code for α β -BN 1 -CWNMF method is shown in Algorithm 2.
Algorithm 2 α β -BN 1 -CWNMF method
i 0
while i N do
  Update F at fixed G according to Equation (61)
  Update G at fixed F according to Equation (62)
   i i + 1
end while

5.3.2. Informed NMF with Normalization and Bound Constraints

The same procedure as above should be applied in the reverse order so that bound projection is applied as the last step of an iteration. When applied to Equation (58), the second normalization provides
F ˜ k + 1 = Ω E Φ E + ( 1 p m Φ E · 1 m m ) Ω ¯ E Ω ¯ I F k R F α , β Ω ¯ E F k R F α , β 1 m m + Ω ¯ E Ω I F k R F α , β Ω ¯ E F k R F α , β 1 m m .
The projection stage then leads to the unnormalized profile
F k + 1 = Ω E Φ E + F k Ω ¯ E Ω ¯ I R F α , β ( F k Ω ¯ E R F α , β ) · 1 m m ( 1 p m Φ E · 1 m m ) + P Ω I F k Ω ¯ E R F α , β ( F k Ω ¯ E R F α , β ) · 1 m m ( 1 p m Φ E · 1 m m ) .
Equations (62) and (64) account for the update rules in this last method, denoted as α β -N 2 B-CWNMF. The pseudo code for α β N 2 B-CWNMF method is shown in Algorithm 3.
Algorithm 3 α β -N 2 B-CWNMF method
i 0
while i N do
  Update F at fixed G according to Equation (64)
  Update G at fixed F according to Equation (62)
   i i + 1
end while
Please notice that only set and bound constraints are checked within iterations. Convergence towards a limit point ensures that limit matrices keep all the desired properties. As explained above, the same procedure with our first considered normalization may be applied, thus yielding an approach named α β -N 1 B-CWNMF.

6. Experimental Results

In this section, the enhancement provided by our methods are investigated in both simulations and a real data campaign. In these tests, we aim to identify the sources (by their chemical profile) contributing to the total atmospheric suspended PM as well as to quantify their contribution. In both the simulations and the real dataset, we consider atmospheric particles with diameter equal to or below 10 μm (PM 10 ). In practice, these particles are trapped in a filter which is changed every 24 h. Each filter is then analyzed by chemists who derive the masses of several chemical species of interest for the considered application, i.e., for evaluating the impact of marine traffic on air pollution in a port city. Species under study are divided into 16 metal tracers—i.e., Al , Cr , Fe , Mn , P, Sr , Ti , Zn , V, Ni , Co , Cu , Cd , Sb , La , and Pb —8 water soluble ionic species—i.e., Na + , NH 4 + , K + , Mg 2 + , Ca 2 + , Cl , NO 3 , and SO 4 2 —carbon compounds—either organic (OC) or elementary (EC)—levoglucosan, and polyols.
In all these experiments, except when we tested the influence of these parameters, we set the values of α and β to 0.6 and 0.9, respectively. Indeed, such a couple of value lies in the recommended area 2 defined in Figure 3. Moreover, we found in preliminary tests that these values of α and β provided a better performance. As a consequence, we do not make them vary in the remainder of this section.
Moreover, the signal-to-noise ratio (SNR) enabled us to evaluate the data set and is defined as:
SNR ( X ) = 1 m j = 1 m S N R j ( x ̲ j ) = 1 m j = 1 m 10 log 10 i = 1 n x i j 2 i = 1 n e i j 2 ,
where x i j and e i j stand for the ( i , j ) -th non-noisy data and the individual noise. This index is widely used in the literature [46].

6.1. Realistic Simulations

From the validated profile and contribution matrices obtained during the real campaign [47], simulation data were built by taking into account the individual uncertainty provided by the real campaign. In these simulations, the data matrix X thus consisted of a 278 × 28 matrix—which correspond to the chemical composition (28 species) of 278 PM samples—associated with individual uncertainties, which are those provided by the chemical analysis.
In addition, we also considered several cases with outliers. It is assumed that outliers come from an additional positive individual contamination.
The mathematical model of the outliers was driven by a random vector i d x _ o u t l i e r s including the locations of the outliers in the data matrix. For these locations, a multiplicative model was used depending of the trial number i (between 1 and 400).
X 1 ( i d x _ o u t l i e r s ) = ( 1 + i r a t i o ) X ( i d x _ o u t l i e r s ) ,
where X 1 (resp. X) accounts for the with outliers noiseless data (resp. the without outliers noiseless data). The variable r a t i o is a parameter which may be tuned in order to get a SNR after outliers ranging from 15 dB to 70 dB. In our tests, the outlier deviation increased with the trial number i. In other words, for low trial number, the multiplicative factor remained close to 1 in order to keep large. The effect of such outliers essentially depends on the location of the outliers. Indeed, if an outlier acts on a large entry of the data matrix, its impact on the SNR will be greater.
Then, a noise has to mimic the chemical measurement process. The chemical measurement process only gives a concentration value together with an uncertainty. So, every value within this interval is equally possible. A uniform noise which is designed on a limited support was proposed. This support may be truncated on the left side if the uncertainty is greater than the corresponding data.
Among the 278 samples, 10 and 20 outliers were considered. Practically, we noticed that the signal-to-noise ratio (SNR) index then dropped in the worst case by 4 dB if the set of 20 outliers is taken into consideration with respect to the no outlier case.

6.1.1. Source Profiles

In this study, 10 sources are highlighted. Among them, some of them are purely natural or purely anthropogenic but some of them became anthropised. Table 4 describes major species present in each source profile. Other species than those listed in the corresponding source profile may be considered as negligible. Please note that—as we here consider simulations—the real profile matrix is perfectly known and is provided in Table A1. Also, one should notice that each source profile is presented under a per mil notation, i.e., it sums to a thousand instead of 1 and the only difference is a scale factor equal to 1000.

6.1.2. Equality Constraints

Equality constraints or set values enable to inform the algorithm about some entries of the profile matrix. This knowledge is taken into account by specifying matrices Ω E and Φ E . These matrices are available in Appendix B. It is to be stressed that the only used knowledge here is the absence of some compounds in some source profiles. As a result, matrix Φ E reduced to 0 10 × 28 . Then, it follows that our informed methods with residuals were identical to those without residual. As a consequence, we do not test the latter in the simulations below.

6.1.3. Initialization

An approximate prior knowledge of F was used as a starting point for each informed NMF algorithm. Table A3 gathers the different entries used. Then, a weighted quadratic estimation of the initial contribution matrix G [31] was performed so that each method has the same initial factors.

6.1.4. Performance Evaluation

Several performance indexes are available in the literature. However in this work, only the mixing-error ratio ( M E R ) index [52] is considered (please note that while specifically designed for measuring the estimation accuracy of a mixing matrix, the MER may also be used as a signal-to-inteference ratio (SIR) when applied to the profile matrix, and more specifically to F T ). It was computed over each column of G. For each source, a scalar quantity M E R j for source j expressed in d B may be obtained.
For one exact vector g ̲ j and its estimate g ̲ ^ j , it is possible to write g ̲ ^ j under the form
g ̲ ^ j = g ̲ ^ j c o l l + g ̲ ^ j o r t h ,
where g ̲ ^ c o l l and g ̲ ^ o r t h are respectively colinear and orthogonal to the exact vector g ̲ . This decomposition allows to express the M E R of source j, denoted as M E R j , defined as,
M E R j = 10 log 10 g ̲ ^ j c o l l 2 g ̲ ^ j o r t h 2 .
Infinite values mean exact separation while 0 d B correspond to an angle equal to 45 ° . These values may be summed up into a vector which gathers the performance of each source. Generally, a global indicator is obtained by averaging each index over all sources, i.e.,
M E R = 1 p j = 1 p M E R j .
In all the cases under study, the MER (the results and the Matlab interpretation codes are already available at http://www-lisic.univ-littoral.fr/~delmaire/recherche.html) index [52] was represented as a function of the input SNR. In this study, intensive computations were performed with ten thousand iterations for each method over 400 tests. In our comparison, we dropped the PMF method as it is only available as a user interface (see https://www.epa.gov/air-research/positive-matrix-factorization-model-environmental-data-analyses) which prevents to compute several tests in a single command. Moreover, even for a single test, our expertise shows that PMF requires the uncertainties to be increased in order to perform a computation, but it did not make sense in this case. As a consequence, nine methods were selected and tested: among them, three are uninformed, two account for our informed methods with set values while the four remaining ones are our informed methods with bounds.
In order to get an idea, we chose to display the road traffic profile estimation in the case when input S N R is equal to 24 dB (Figure 4). Species were represented in descending order of the real profile. We could notice that for this source, α β N 1 CWNMF appears better than other methods.
In our tests, the input SNR ranges from 15 to 70 dB. We decided to display only the performance of the methods for 20 outliers as shown in Figure 5 since the other tests provide similar results. The statistical performance is provided in Figure 5 by specifying the standard deviation in each slice of SNR and for each method.
Let us first analyze the enhancement provided by the non-informed NMF methods. One notice that the robust α β -WNMF [23] performs very poorly in all cases. Its standard deviation appears very large for a wide range of SNRs. Besides, RNMF—which stands for a robust NMF method [18]—behaves correctly for low SNRs while its performance decreases surprisingly for large SNRs. Moreover, we experimented a sparse NMF (SNMF) method [53] including a β -divergence cost function together with L 1 sparsity of one factor. We select one trial and test the performance for the parameter β ranging from 0 to 2. The optimal value β = 0.5 has been selected over 400 trials for the case of 20 outliers. SNMF provides inconsistent solutions in every slice of SNR.
We analyzed the performance of our proposed informed methods. Let us firstly focus on both informed methods with set values which were experimented, i.e., the α β -N1CWNMF and α β -N2CWNMF methods. Their performance appeared to be very similar in all the simulations. In practice, their MER was approximately equal to the SNR in every input SNR slice, which was expected according to our experience in preliminary tests.
The four informed NMF methods with bound constraints behaved similarly, except in a few slices where the SNR is large. Indeed, in low SNR, they are slightly better than α β -N1CWNMF (the gap is not visible due to the scale), but they outperform all the other tested methods as soon as the SNR becomes greater than 40 dB. The low gap in low SNR is essentially due to the fact that we inform F while the performance index is measured on G. In noisy tests—i.e., for a low SNR—the estimated matrix G does not benefit from the additional information on F, because of the important noise in X. However, we noticed an improvement on F for these tests, even if we cannot safely measure it, as the profiles might be correlated.
On the contrary, for medium and large SNR, the MER enhancement was significant for every bound-constrained informed NMF method. More precisely, α β -N1BCWNMF and α β -N2BCWNMF outperformed all the other methods with a significant gap as soon as the SNR increased.
We also explored in the synthetic example the use of a large range of α and β parameters within area 2 such that 0.5 α 1 and 0.5 β 1 . We noticed that the MER index for α β WNMF method was very sensitive to the choice of the α β parameters and also to the trial number. A successful tuning of these parameters was somehow difficult.
On the other hand, we experiment the same operating conditions for N 1 CWNMF. We observe in Figure 6 that results are more stable than for the uninformed one. In this case, the choice of α β appears quite insensitive but the method remains satisfactory.
In addition, we could wonder how constraints affect the results. First, we potentially may use 117 set values and roughly 60 bound constraints. We decided to inspect the influence of dropping set values only. For that purpose, we progressively turned on one set value at a time for each column and according to the increasing order of the row index, until the 117 constraints were reached. We plot the MER performance according to the number of constraints in Figure 7.
Contrary to what should be expected, adding constraints may sometimes degrade the performance suddenly or conversely. There seems to be set of constraints which fit better to the situation. This conclusion is quite surprising and the design of appropriate constraints seems an open question.
To conclude, these methods provide a good performance in every situation and are thus better-suited for the considered application.

6.2. Real Data Case

The real data campaign was conducted by Dr C. Roche during her Ph.D. thesis [54], within the UCEIV laboratory (Université du Littoral Côte d’Opale). The first goal of this thesis was to study how much the shipping traffic in the English Channel, one of the most important in the world, can contribute to the atmospheric PM 10 concentration in coastal area, such as the Hauts–de–France region. In her work, some characteristic species of maritime traffic emissions have been evidenced. Then, some flexible bound profiles and set profile entries were proposed. Using this knowledge, the challenge was to implement an informed NMF method—as those developed in [32]—in order to reconstruct the PM origin.
Contrary to [54], we here would like to drop some of the bound information and to test whether or not the new methods that we propose in this paper are still competitive.

6.2.1. Context

A sampling campaign has been conducted using a Digitel DA80 sampler over a long period—i.e., 16 months—in Cape Gris–Nez and over a shorter period—i.e., three months—in the port of Calais, which enabled us to get 278 sample measurements. Cape Gris–Nez and Calais are two coastal sites in the eastern part of the English channel. The first one is a rural site whereas the port of Calais is the second busiest in passenger traffic in Europe with 10.8 million of passengers and over 80 arrivals and departures of ferries per day in average in 2014 [55].
The DA80 device (see Figure 8) is an equipment which is able to trap PM on filters, which are stored and a posteriori analyzed for chemical composition. A special sieve enabled us to select only PM 10 , i.e., PM whose diameter was lower than 10 μ m. The machine is also able to save wind conditions and time. The sampling period was chosen equal to 24 h. Along this period, meteorological conditions concentration levels were highly varying. Thus, after analyzing the filters, several data files were available to address the pollution source apportionment problem.

6.2.2. Input Data

Appendix C provides operating conditions for the run which are performed. Based on the expert knowledge provided by chemists and on the information described in Table 4, a matrix Ω E —which defines 55 set value locations (among 278 profile entries)—is provided in Table A5. In the same way as in Section 6.1, the matrix Φ E is equal to 0 p × m . In addition, the initial profile matrix is chosen by an expert and is provided in Appendix C.

6.2.3. Results Evaluation

The results were obtained in the case of 10 identified sources and 10 4 iterations for each method. The profiles under study are specified in Table 4. However, their estimation remains a difficult task for several reasons which listed hereafter:
  • Data are corrupted by an unknown number of outliers. Their origin may be of various kinds, e.g., the presence of a new source which affects the data at some sparse moments.
  • Data are very noisy. In particular, an additional overall pollution—whose level highly varies over time—can not be assigned to a particular source and can significantly decrease the overall SNR.
  • Some source profiles may be geometrically close, only a few tracer species are able to distinguish them.
Even if a database with source profiles is available at http://source-apportionment.jrc.ec.europa.eu/Specieurope/sources.aspx, a universal profile for a given source does not exist. When comparing our result, they all appear highly consistent with the published one in the Specieurope database. We display the source profiles in a descending order of expected species (MPMPthis task was designed by the chemist co-authors of the paper). A correct source profile was then displayed as decreasing proportions from the left to the right of each figure. On the contrary, a large proportion on the right part of a profile plot implies that the estimation has partly failed.
Among the 10 source profiles, some of them are well recovered. We only show in Figure 9 the estimated sea traffic source profile as it is difficult to recover. As mentioned above, it is expected that proportions are decreasing from the left to the right side of the figure. The order has been built based on ship profiles from the European database and from the literature [50,51,54]. To process these data, we compare the enhancement provided by two non-informed methods, i.e., the α β -WNMF and the β -RNMF [18] and three informed methods, i.e., the method used in [54] and our methods α β -N1CWNMF, and its bound-constrained extension α β -N1BCWNMF. Other bound methods were dropped since they turn out in Section 6.1 to behave roughly similar to α β -N1BCWNMF. Note that Roche [54] used 67 constraints while we only use 63 and 65 bound constraints in the tested bound-constrained informed α β -NMF method, respectively.
It may be noticed that blind NMF methods, i.e., the α β -WNMF and the β -RNMF, and our α β -N1CWNMF method are overestimating SO 4 2 and NO 3 species while underestimating OC and EC compounds. The estimated sea traffic profile thus appears not to be very realistic with these methods. Besides, bound-constrained WNMF methods behave similarly and report good estimations for major species. However, these estimations reach the proposed bounds for Fe , NO 3 , and SO 4 2 species among the 28 species under study. For example, SO 4 2 is limited by the maximum value provided in Table A4. Finally, these bound-constrained NMF methods outperform all the other methods for the sea traffic re-construction.
Using the estimation provided by each method, it is possible to reconstruct each species’ concentration, and especially the V and Ni compounds since they are tracers of the sea traffic activity [54]. In other words, the V and Ni species can only be found in the sea traffic source. Moreover, the ratio V over Ni is often assimilated to a value between 2 and 3 [51], and is found to be between 1.2 to 1.5 for the three bound-constrained WNMF methods, which is close to the expected ratio.
To confirm this fact, we plot the reconstruction of the V species in Figure 10. This shows that this compounds is mainly due to Sea Traffic. More than 98 per cent of the V species originates from the sea traffic source which is consistent with the chemist’s expectations.

7. Conclusions

In this paper, we tackled an informed non-negative matrix factorization problem where the profile matrix lives in a specific subspace. We proposed several informed NMF methods combining α β -divergence and a specific structure of one matrix factor provided by the considered problem. This work extends our previous informed NMF [32]—assuming some values of one of the factor matrices to be known—by considering generalized divergences, and by leading to alternative update rules and normalization. The update rules may be viewed as projective multiplicative updates applied to a special structure of the profile matrix. The relevance of these extensions were shown on realistic simulations of natural and industrial PM source apportionment—with various input SNR conditions and various numbers of outliers—and on a real data case. In practice, these informed methods are more robust than blind NMF, and provide results which are consistent with the chemical expert, even in the presence several outliers. In future work, we will investigate new soft constraints to inform NMF and alternatives to multiplicative updates.

Author Contributions

Conceptualization, G.D., M.P., A.L., and G.R.; methodology, G.D., M.O., and A.L.; software, G.D., M.O., and A.L.; validation, G.D., M.O., M.P., G.R., F.L., and D.C.; formal analysis, G.D. and M.O.; investigation, G.D. and M.O.; resources, F.L. and D.C.; data curation, G.D., M.O., F.L., and D.C.; writing—original draft preparation, G.D. and M.P.; visualization, G.D. and M.O., supervision, G.R. and D.C.; project administration, G.R. and D.C.; funding acquisition, G.D., G.R., F.L., and D.C.

Funding

This research was funded partly by the integrated steel and mining company ArcelorMittal, and partly by the DREAL Nord-Pas-de-Calais agency within the ECUME project.

Acknowledgments

The experiments presented in this paper were carried out using the CALCULCO computing platform, supported by SCoSI/ULCO.

Conflicts of Interest

The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
CWNMFConstrained and weighted NMF
KKTKarush–Kuhn Tucker
ECElementary carbon
MERMixing-error ratio
MMMajorization-minimization
NMFNon-negative matrix factorization
OCOrganic carbon
PMParticulate matter
RNMFRobust NMF
SIRSignal-to-interference ratio
SNRSignal-to-noise ratio
WNMFWeighted NMF

Appendix A. Update Rules for Problem (29)

In this appendix, we aim to develop update rules for problem (29), for which we drop the sum-to-one constraint. The structure of the proof follows the same steps as in Section 5.1 within a MM strategy.
Proposition A1.
Update rules for the free part of the profile matrix are
F k + 1 F k Ω ¯ E M F α , β ( G k , F k ) ,
where
M F α , β ( G , F ) G T W X G Φ E α G ( F k Ω ¯ E ) β 1 G T W G ( F k Ω ¯ E ) λ ( 1 α ) .
Proof. 
We focus on a column of the data and drop hereafter the index i for the vectors f ̲ i , φ ̲ i E , θ ̲ i , and for the matrix Γ i . Let us define the residual vector r ̲ as
r ̲ x ̲ G φ ̲ E .
Combining Equations (21) and (32) leads to
D w ̲ α , β ( r ̲ G Δ f ̲ ) = D w ̲ α , β ( r ̲ U θ ̲ ) = i w i r i α + β · h α , β j u i j θ j r i ,
where U is defined in Equation (32), and where h α , β ( z ) has been defined in Equation (35).
Using the convexity of h α , β ( z ) for z 0 and β [ min ( 1 , 1 α ) ; max ( 1 , 1 α ) ] [23], Jensen inequality may be applied once, resulting in:
h α , β j u i j θ j r i j u i j θ j k l u i l θ l k h α , β θ j l u i l θ l k r i θ j k ,
where the superscript k is the current iteration number and θ j is the j-th element of the free parameters vector θ ̲ introduced in (20). Equation (A4) together with Equation (A5) yield the majoring function
H 1 , w α , β ( θ j , θ j k ) = i w i r i α + β j u i j θ j k l u i l θ l k · h α , β θ j l u i l θ l k r i θ j k .
Cancelling its gradient H 1 , w α , β ( θ j , θ j k ) θ j leads to
θ j θ j k α = i w i u i j r i α ( l u i l θ l k ) β 1 i w i u i j · ( l u i l θ l k ) λ ,
which reads in vector form:
θ ̲ θ ̲ k α = U T w ̲ r ̲ α ( U θ ̲ k ) β 1 U T w ̲ ( U θ ̲ k ) λ .
By combining the definition (21) with the above relationship, we derive the expression of one column of the matrix F .
f ̲ k + 1 f ̲ k = Γ U T [ w ̲ r ̲ α ( U θ ̲ k ) β 1 ] Γ U T [ w ̲ ( U θ ̲ k ) λ ] 1 α .
By replacing U according to Equation (32), and by noticing that Γ Γ T = d i a g ( ω ̲ ¯ E ) , it results in the new update rule:
f ̲ k + 1 f ̲ k ω ̲ ¯ E M f ̲ k α , β ,
where M f ̲ k α , β accounts for
M f ̲ k α , β G T w ̲ r ̲ α ( G f ̲ k ) β 1 G T w ̲ ( G f ̲ k ) λ 1 α .
Combining Equations (18) and (22) enables to express
Δ F k = Δ F k Ω ¯ E = F k Ω ¯ E .
It yields the update rule by combining the matrix form of Equations (A11) and (A12), i.e.,
F k + 1 Φ E + F k Ω ¯ E M F α , β ( G k , F k ) ,
where M F α , β ( G , F ) is defined in Equation (A2). This relation completes the proof. □
As explained in Section 5.1, it should be noticed the update rules for both variants of the weighted α β -divergence cost function provide almost similar update rules. In fact, N F α , β ( G , F ) in Equation (31) and M F α , β ( G , F ) in Equation (A2) are the same if we replace W in the latter expression by
W W X λ ( X G Φ E ) λ .
As G is updated at each NMF iteration, the value of W in Equation (A14) varies at each iteration, which means that the update rules proposed in the main part of this paper extend the one proposed in Appendix A by iteratively updating the weights.

Appendix B. Operating Conditions for the Simulations

Table A1 specifies the different entries of the true profile matrix for simulations.
Prior information is provided through the specification of both Ω E and Φ E . We choose to select only set values which specify the absence of some species in the profile matrix. As a consequence, Φ E is equal to Φ E = 0 p × m . The position of these known zero entries are provided in Table A2.
The chemists are able to provide an initial profile matrix which is given in Table A3. It is to be noted that the same initial matrix is applied for both the informed and non informed methods. It should be noticed that the known zeros in F are initialized as 1.00 × 10 11 .
Table A1. Theoretical source profile used in the simulations.
Table A1. Theoretical source profile used in the simulations.
ProfilesAlCrFeMnPSrTiZnVNiCoCuCdSb
Sea0.00190002.5 × 10 4 0.203400000000
Aged sea07.2351 × 10 5 000.50.41.877 × 10 4 0001.785 × 10 4 1.7941 × 10 5 00
Crustal119.138.589 × 10 5 77.351.7823.06800.78468.91211.8680.350300.02760.008100
nitrates4.00 × 10 3 2 × 10 5 3.50.110.0749000.7742007.0408 × 10 4 0.16.486 × 10 3 0.01975
sulfate05 × 10 5 00.028250.05313000.1334000.0032878.00 × 10 6 00
Biomass0.00102.5540.0552701.016 × 10 5 00.1415000000.0385
Road traffic0039.04140.14042.6590010.908001.00 × 10 8 2.771200.8964
Sea traffic0.0011471.2012 × 10 4 0.1002000.02179.42 × 10 5 07.49205.53480.18291.752 × 10 4 1.315 × 10 6 0
Biogenic000014.5280.043088.941 × 10 4 0000005.2 × 10 4
Metal64.43033.332780.16330.72000100.151.51.550
Bis La Pb Na + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea00297.03010.7132.759.183581.02069.080000
Aged sea00.12800430101.00 × 10 2 39515030000
Crustal0.059401.8333 × 10 4 4.36 × 10 5 55301.81049.9539.96384.92000
nitrates7.178 × 10 4 0.20750216.263.2001.21 × 10 5 730.73045009.027 × 10 11
sulfate00.07290260.834.43008.66 × 10 8 0680.5953.84000
Biomass00.10072.6502.85 × 10 12 12.260.00111.6725.4835.1656.84692.1091.1469.781.477 × 10 7
Road traffic0.01213.35305.14 × 10 10 39.8403.00 × 10 8 3.40 × 10 8 50.1960.22301.13488.8100
Sea traffic0.0941000.0626000075.17300.69500.76109.8700
Biogenic005.0230.096829.056000.2975020.094854.020076.83
Metal0.221522.95000000050.000000
Table A2. Matrix Ω E used in the simulations.
Table A2. Matrix Ω E used in the simulations.
Ω E AlCrFeMnPSrTiZnVNiCoCuCdSb
Sea01110011111111
Aged sea10110001110011
Crustal00000000010011
nitrates00000110110000
sulfate10100110110011
Biomass01001010111110
Road traffic11000110110010
Sea traffic00011001000001
Biogenic11110001111110
Metal00000011100001
LaPbNa + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea11010000101011
Aged sea10010000000111
Crustal00000001000111
nitrates00100010010010
sulfate10100010100011
Biomass10000000000000
Road traffic00100100000011
Sea traffic00101010000011
Biogenic11000010000110
Metal00101011101111
Table A3. Matrix F i n i t used in the simulations.
Table A3. Matrix F i n i t used in the simulations.
F init AlCrFeMnPSrTiZnVNiCoCuCdSb
Sea0.21.00 × 10 11 1.00 × 10 11 1.00 × 10 11 0.010.81.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11
Aged sea1.00 × 10 11 0.0011.00 × 10 11 1.00 × 10 11 110.011.00 × 10 11 1.00 × 10 11 1.00 × 10 11 0.010.011.00 × 10 11 1.00 × 10 11
Crustal2000.00115022220221.00 × 10 11 0.0010.00011.00 × 10 11 1.00 × 10 11
nitrates1.00 × 10 5 2.00 × 10 6 810.41.00 × 10 11 1.00 × 10 11 41.00 × 10 11 1.00 × 10 11 0.0010.50.010.2
sulfate1.00 × 10 11 1.00 × 10 4 1.00 × 10 11 1.00 × 10 4 0.51.00 × 10 11 1.00 × 10 11 0.41.00 × 10 11 1.00 × 10 11 0.011.00 × 10 4 1.00 × 10 11 1.00 × 10 11
Biomass51.00 × 10 11 1029.43 × 10 11 0.0011.00 × 10 11 11.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.006 × 10 10
Road traffic1.00 × 10 11 1.00 × 10 11 5011.00 × 10 + 0 1.00 × 10 11 1.00 × 10 11 241.00 × 10 11 1.00 × 10 11 1.00 × 10 11 41.00 × 10 11 2
Sea traffic0.011.00 × 10 4 0.41.00 × 10 11 1.00 × 10 11 0.11.00 × 10 4 1.00 × 10 11 181011.00 × 10 3 1.00 × 10 4 1.00 × 10 11
Biogenic1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 57.96 × 10 10 7.96 × 10 10 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 1.00 × 10 11 7.96 × 10 10
Metal737065050351.00 × 10 11 1.00 × 10 11 1.00 × 10 11 301341.00 × 10 11
LaPbNa + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea1.00 × 10 11 1 × 10 9 3205 × 10 5 1038115501 × 10 5 701 × 10 5 1 × 10 5 9.98 × 10 11 9.98 × 10 11
Aged sea1.00 × 10 11 0.012501 × 10 8 14015150320DA80.eps210121.00 × 10 11 9.99 × 10 11 9.99 × 10 11
Crustal0.00011 × 10 7 0.00010.000110102501.00 × 10 11 30302901.00 × 10 11 1.00 × 10 10 1.00 × 10 10
nitrates0.20.51 × 10 10 30051.00 × 10 11 1.00 × 10 11 0.26001.00 × 10 11 801.00 × 10 11 1.00 × 10 10 1.00 × 10 10
sulfate1.00 × 10 11 0.11.00 × 10 8 305101.00 × 10 11 1.00 × 10 11 1.00 × 10 3 1.00 × 10 11 5841001.00 × 10 11 1.00 × 10 10 1 × 10 11
Biomass1.00 × 10 11 13287253866666651070579.43 × 10 11
Road traffic19.991.00 × 10 10 1.00 × 10 8 570.000491.00 × 10 6 1.00 × 10 11 79.99802604309.99 × 10 11 9.99 × 10 11
Sea traffic0.51.00 × 10 8 1 × 10 11 1.00 × 10 2 1 × 10 11 1 × 10 11 1.00 × 10 11 1 × 10 11 1102504501608.37 × 10 11 8.37 × 10 11
Biogenic1.00 × 10 11 7.96 × 10 10 11941.00 × 10 11 7.96 × 10 10 558001 × 10 11 7.96 × 10 10 170
Metal1401.00 × 10 11 1.00 × 10 2 1.00 × 10 11 1.00 × 10 2 1.00 × 10 11 0.0050.001700.001641.64 × 10 10 1.64 × 10 10 1.64 × 10 10

Appendix C. Real Data Operating Conditions

When considering the real dataset, let us emphasize that we do not know in advance the profile matrix. The chemists are able to provide an initial profile matrix which is given in Table A4. As in the previous appendix, the same initial matrix is applied to both the informed and non-informed methods. ϵ is a very small quantity to make the initialization very close to the case of informed methods where set values are zeros.
Table A4. Matrix F i n i t used in the real data case.
Table A4. Matrix F i n i t used in the real data case.
F init AlCrFeMnPSrTiZnVNiCoCuCdSb
Sea0.19 ϵ ϵ ϵ 1.008.00 ϵ ϵ ϵ ϵ ϵ ϵ ϵ ϵ
Aged sea0.100.010.500.011.008.000.020.02 ϵ ϵ 1.000.010.010.01
Crustal266.670.141502.00 ϵ 2.00200.500.500.070.070.07 ϵ 0.01
nitrates0.980.98300.98 ϵ 0.980.9820 ϵ 0.980.98100.980.98
sulfate1.001.00301.00 ϵ 1.0015.0020 ϵ 1.001.001.001.001.00
Biomass4.00 ϵ 9.001.00 ϵ 1.001.0010 ϵ ϵ ϵ 1.00 ϵ ϵ
Road traffic201.00505.00 ϵ 1.00 ϵ 505.00105.00505.0050
Sea traffic10 ϵ 10 ϵ ϵ ϵ ϵ 5.0055.0055.0030 ϵ ϵ ϵ
Biogenic0.01 ϵ 0.01 ϵ 20 ϵ ϵ ϵ ϵ ϵ ϵ 1.00 ϵ ϵ
Metal808035840818.00404030301.00405030
LaPbNa + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea ϵ ϵ 320.00 ϵ 10.0040.0010.00540.08 ϵ 70.000.640.09 ϵ ϵ
Aged Sea ϵ 0.01250.00 ϵ 10.0025.0010.00200.00275.30210.008.001.00 ϵ ϵ
Crustal ϵ 0.1410.003.00100.0070.14210.007.0020.0035.0790.0012.62 ϵ ϵ
nitrates ϵ 0.98 ϵ 200.000.980.9840.000.98547.30 ϵ 100.0040.00 ϵ ϵ
sulfate ϵ 20.00 ϵ 200.0034.001.0040.001.00 ϵ 554.0060.0016.00 ϵ ϵ
Biomass0.000.942.8328.3170.004.7237.7466.0570.0066.05500.6169.2956.46 ϵ
Road traffic ϵ 10.00 ϵ 10.0010.00 ϵ 21.002.0080.0040.00271.73303.27 ϵ ϵ
Sea traffic15.0010.00 ϵ 10.00 ϵ ϵ 20.00 ϵ 10.0030.00580.00160.00 ϵ ϵ
Biogenic ϵ ϵ 1.001.005.004.001.00 ϵ 5.005.00760.0050.00 ϵ 146.98
Metal1.0080.00 ϵ 1.0048.0010.005.00 ϵ ϵ 10.00 ϵ ϵ ϵ ϵ
Prior set value information is provided through the specification of both Ω E and Φ E . The set value configuration is the same as those presented in [54]. As a consequence, Φ E is equal to Φ E = 0 p × m . The position of the set entries is provided in Table A5.
Table A5. Matrix Ω E used in the real data case.
Table A5. Matrix Ω E used in the real data case.
Ω E AlCrFeMnPSrTiZnVNiCoCuCdSb
Sea01010011111111
Aged sea00000000110000
Crustal00000000000000
nitrates00000000100000
sulfate00000000100000
Biomass01000000110000
Road traffic00000000000000
Sea traffic00000000000000
Biogenic00010000110010
Metal00000000000000
LaPbNa + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea11010000100011
Aged sea00010000000011
Crustal00000000000011
nitrates00100000010010
sulfate00100000100010
Biomass00000000000000
Road traffic00100100000011
Sea Traffic00100000000011
Biogenic11000000000010
Metal00100001101111
Prior bound information is provided through the specification of both Ω I and Φ I + , Φ I . For sake of concision, Table A6 only gathers the bound information.
Table A6. Matrices Φ I + / Φ I used in the real data case.
Table A6. Matrices Φ I + / Φ I used in the real data case.
Φ I + / Φ I AlCrFeMnPSrTiZnVNiCoCuCdSb
Sea0000020/000000000
Aged sea0000020/000000000
Crustal400/500200/100040/0.0010000000
nitrates00000000000000
sulfates00000000000000
Biomass100/0.0010100/0.00100000000000
Road traffic0075/1000050/0.100015/0.000001015/0.000001
Sea traffic0070/0.10000070/570/550/0.00001000
Biogenic00000000000000
Metal00000000000000
LaPbNa + NH 4 + K + Mg 2 + Ca 2 + Cl NO 3 SO 4 2 OCECLevo.Polyols
Sea00400/200050/550/1550/5720/3600100/300000
Aged sea0000000250/0500/50500/500000
Crustal0000150/5150/5500/50050/040/00000
nitrates000800/500000950/20000000
sulfates000800/5000000950/2000000
Biomass0010/040/0100/15/0100/0.001100/0.001150/1150/0750/100200/500
Road traffic00020/000010/060/1080/20300/150800/25000
Sea Traffic30/00020/000020/075/0300/10700/100200/5000
Biogenic005/05/00005/05/020/0850/500000
Metal00000000060/100000

References

  1. Hopke, P. Review of receptor modeling methods for source apportionment. J. Air Waste Manag. Assoc. 2016, 66, 237–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Paatero, P. The Multilinear Engine—A Table-Driven, Least Squares Program for Solving Multilinear Problems, Including the n-Way Parallel Factor Analysis Model. J. Comput. Graph. Stat. 1999, 8, 854–888. [Google Scholar] [CrossRef]
  3. Gillis, N. The why and how of nonnegative matrix factorization. In Regularization, Optimization, Kernels, and Support Vector Machines; Chapman and Hall/CRC: Palo Alto, CA, USA, 2014; pp. 257–291. [Google Scholar]
  4. Paatero, P.; Tapper, U. Positive matrix factorization: A non negative factor model with optimal utilization of error estimates of data values. Environmetrics 1994, 5, 111–126. [Google Scholar] [CrossRef]
  5. Parra, L.C.; Spence, C.; Sajda, P.; Ziehe, A.; Müller, K.R. Unmixing hyperspectral data. Adv. Neural Inf. Process. Syst. 2000, 12, 942–948. [Google Scholar]
  6. Zdunek, R. Regularized nonnegative matrix factorization: Geometrical interpretation and application to spectral unmixing. Int. J. Appl. Math. Comput. Sci. 2014, 24, 233–247. [Google Scholar] [CrossRef] [Green Version]
  7. Igual, J.; Llinares, R. Nonnegative matrix factorization of laboratory astrophysical ice mixtures. IEEE J. Sel. Top. Signal Process. 2008, 2, 697–706. [Google Scholar] [CrossRef]
  8. Berné, O.; Joblin, C.; Deville, Y.; Smith, J.; Rapacioli, M.; Bernard, J.; Thomas, J.; Reach, W.; Abergel, A. Analysis of the emission of very small dust particles from Spitzer spectro-imagery data using blind signal separation methods. Astron. Astrophys. 2007, 469, 575–586. [Google Scholar] [CrossRef] [Green Version]
  9. Gobinet, C.; Perrin, E.; Huez, R. Application of non-negative matrix factorization to fluorescence spectroscopy. In Proceedings of the 12th European Signal Processing Conference (EUSIPCO’04), Vienna, Austria, 6–10 September 2004; pp. 1095–1098. [Google Scholar]
  10. Févotte, C.; Vincent, E.; Ozerov, A. Single-channel audio source separation with NMF: Divergences, constraints and algorithms. In Audio Source Separation; Springer: Cham, Switzerland, 2018; pp. 1–24. [Google Scholar]
  11. Puigt, M.; Delmaire, G.; Roussel, G. Environmental signal processing: New trends and applications. In Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’17), Bruges, Belgium, 26–28 April 2017; pp. 205–214. [Google Scholar]
  12. Hoyer, P. Non-negative matrix factorization with sparseness constraint. J. Mach. Learn. Res. 2004, 5, 1457–1469. [Google Scholar]
  13. Dorffer, C.; Puigt, M.; Delmaire, G.; Roussel, G. Informed Nonnegative Matrix Factorization Methods for Mobile Sensor Network Calibration. IEEE Trans. Signal Inf. Process. Netw. 2018, 4, 667–682. [Google Scholar] [CrossRef] [Green Version]
  14. Lantéri, H.; Theys, C.; Richard, C.; Févotte, C. Split Gradient Method for Nonnegative Matrix Factorization. In Proceedings of the 18th European Signal Processing Conference, Aalborg, Denmark, 23–27 August 2010. [Google Scholar]
  15. Meganem, I.; Deville, Y.; Hosseini, S.; Déliot, P.; Briottet, X. Linear-Quadratic Blind Source Separation Using NMF to Unmix Urban Hyperspectral Images. IEEE Trans. Signal Process. 2014, 62, 1822–1833. [Google Scholar] [CrossRef]
  16. Dorffer, C.; Puigt, M.; Delmaire, G.; Roussel, G. Nonlinear Mobile Sensor Calibration Using Informed Semi-Nonnegative Matrix Factorization with a Vandermonde Factor. In Proceedings of the 2016 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janerio, Brazil, 10–13 July 2016. [Google Scholar]
  17. Yoo, J.; Choi, S. Nonnegative Matrix Factorization with Orthogonality Constraints. J. Comput. Sci. Eng. 2010, 4, 97–109. [Google Scholar] [CrossRef]
  18. Févotte, C.; Dobigeon, N. Nonlinear hyperspectral unmixing with robust nonnegative matrix factorization. IEEE Trans. Image Process. 2015, 24, 4810–4819. [Google Scholar] [CrossRef] [PubMed]
  19. Dhillon, S.; Sra, S. Generalized nonnegative matrix approximations with Bregman divergences. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 283–290. [Google Scholar]
  20. Cichocki, A.; Lee, H.; Kim, Y.; Choi, S. Nonnegative matrix factorization with alpha-divergence. Pattern Recognit. Lett. 2008, 29, 1433–1440. [Google Scholar] [CrossRef]
  21. Févotte, C.; Idier, J. Algorithms for nonnegative matrix factoriaztion with the beta-divergence. Neural Comput. 2011, 23, 2421–2456. [Google Scholar] [CrossRef]
  22. Sun, D.; Fevotte, C. Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 6201–6205. [Google Scholar]
  23. Cichocki, A.; Cruces, S.; Amari, S. Generalized Alpha-Beta divergences and their application to robust nonnegative matrix factorization. Entropy 2011, 13, 134–170. [Google Scholar] [CrossRef]
  24. Zhu, F.; Halimi, A.; Honeine, P.; Chen, B.; Zheng, N. Correntropy Maximization via ADMM—Application to Robust Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1–12. [Google Scholar] [CrossRef]
  25. Chreiky, R.; Delmaire, G.; Puigt, M.; Roussel, G.; Abche, A. Informed split gradient Non-negative Matrix Factorization using Huber cost function for source apportionment. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology, Limassol, Cyprus, 12–14 December 2016. [Google Scholar]
  26. Paatero, P. Least squares formulation of robust non-negative factor analysis. Chemom. Intell. Lab. Syst. 1997, 37, 23–35. [Google Scholar] [CrossRef]
  27. Ho, N.D. Non Negative Matrix Factorization Algorithms and Applications. Ph.D. Thesis, Université Catholique de Louvain, Louvain-la-Neuve, Belgium, June 2008. [Google Scholar]
  28. Zhang, S.; Wang, W.; Ford, J.; Makedon, F. Learning from incomplete ratings using non-negative matrix factorization. In Proceedings of the 2006 SIAM International Conference on Data Mining, Bethesda, MD, USA, 20–22 April 2006; pp. 549–553. [Google Scholar]
  29. Dorffer, C.; Puigt, M.; Delmaire, G.; Roussel, G. Fast nonnegative matrix factorization and completion using Nesterov iterations. In Proceedings of the 13th International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, 21–23 February 2017; pp. 26–35. [Google Scholar]
  30. Viana, M.; Pandolfi, A.; Minguillo, M.C.; Querol, X.; Alastuey, A.; Monfort, E.; Celades, I. Inter-comparison of receptor models for PM source apportionment: Case study in an industrial area. Atmos. Environ. 2008, 42, 3820–3832. [Google Scholar] [CrossRef]
  31. Plouvin, M.; Limem, A.; Puigt, M.; Delmaire, G.; Roussel, G.; Courcot, D. Enhanced NMF initialization using a physical model for pollution source apportionment. In Proceedings of the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014), Brugge, Belgium, 23–25 April 2014; pp. 261–266. [Google Scholar]
  32. Limem, A.; Delmaire, G.; Puigt, M.; Roussel, G.; Courcot, D. Non-negative matrix factorization under equality constraints—a study of industrial source identification. Appl. Numer. Math. 2014, 85, 1–15. [Google Scholar] [CrossRef]
  33. Choo, J.; Lee, C.; Reddy, C.; Park, H. Weakly supervised nonnegative matrix factorization for user-driven clustering. Data Min. Knowl. Discov. 2015, 29, 1598–1621. [Google Scholar] [CrossRef]
  34. De Vos, M.; Van Huffel, S.; De Lathauwer, L. Spatially Constrained ICA Algorithm with an Application in EEG Processing. Signal Process. 2011, 91, 1963–1972. [Google Scholar] [CrossRef]
  35. Limem, A.; Delmaire, G.; Puigt, M.; Roussel, G.; Courcot, D. Non-negative matrix factorization using weighted beta divergence and equality constraints for industrial source apportionment. In Proceedings of the 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Southampton, UK, 22–25 September 2013. [Google Scholar]
  36. Limem, A.; Puigt, M.; Delmaire, G.; Roussel, G.; Courcot, D. Bound constrained weighted NMF for industrial source apportionment. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Reims, France, 21–24 September 2014. [Google Scholar]
  37. Zhang, Z. Parameter estimation techniques: A tutorial with application to conic fitting. Image Vis. Comput. 1997, 15, 59–76. [Google Scholar] [CrossRef]
  38. Lee, D.; Seung, H. Learning the parts of objects by non negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
  39. Lin, C.J. On the Convergence of Multiplicative Update Algorithms for Non-negative Matrix Factorization. IEEE Trans. Neural Netw. 2007, 18, 1589–1596. [Google Scholar]
  40. Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
  41. Hennequin, R.; David, B.; Badeau, R. Beta-Divergence as a Subclass of Bregman Divergence. IEEE Signal Process. Lett. 2011, 18, 83–86. [Google Scholar] [CrossRef] [Green Version]
  42. Guillamet, D.; Vitria, J.; Schiele, B. Introducing a weighted non-negative matrix factorization for image classification. Pattern Recognit. Lett. 2003, 24, 2447–2454. [Google Scholar] [CrossRef]
  43. Xu, Y.; Yin, W.; Wen, Z.; Zhang, Y. An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 2012, 7, 365–384. [Google Scholar] [CrossRef] [Green Version]
  44. Heinz, D.C.; Chang, C. Fully constrained least squares linear mixture analysis for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39, 529–545. [Google Scholar] [CrossRef]
  45. Lin, C.J. Projected Gradients Methods for Non-Negative Matrix Factorization. Neural Comput. 2007, 19, 2756–2779. [Google Scholar] [CrossRef] [PubMed]
  46. Moussaoui, S. Séparation de Sources Non-NéGatives. Application au Traitement des Signaux de Spectroscopie. Ph.D. Thesis, Université Henri Poincaré, Nancy, France, 2005. (In French). [Google Scholar]
  47. Roche, C.; Ledoux, F.; Borgie, M.; Delmaire, G.; Roussel, G.; Puigt, M.; Courcot, D. Origins of PM10 in northern coast of France: A one year study to estimate maritime contributions in the Strait of Dover. In Proceedings of the 22nd European Aerosol Conference, Tours, France, 4–9 September 2016. [Google Scholar]
  48. Kfoury, A. Origin and Physicochemical Behaviour of Atmospheric PM2.5 in Cities Located in the Littoral Area of the Nord-Pas-de-Calais Region, France. Ph.D. Thesis, Université du Littoral Côte d’Opale, Dunkerque, France, May 2013. [Google Scholar]
  49. Kfoury, A.; Ledoux, F.; Roche, C.; Delmaire, G.; Roussel, G.; Courcot, D. PM2.5 source apportionment in a French urban coastal site under steelworks emission influences using constrained non-negative matrix factorization receptor model. J. Environ. Sci. 2016, 40, 114–128. [Google Scholar] [CrossRef] [PubMed]
  50. Waked, A.; Favez, O.; Alleman, L.Y.; Piot, C.; Petit, J.E.; Delaunay, T.; Verlinden, E.; Golly, B.; Besombes, J.L.; Jaffrezo, J.L.; et al. Source apportionment of PM10 in a north-western Europe regional urban background site (Lens, France) using positive matrix factorization and including primary biogenic emissions. Atmos. Chem. Phys. 2014, 14, 3325–3346. [Google Scholar] [CrossRef]
  51. Becagli, S.; Sferlazzo, D.M.; Pace, G.; di Sarra, A.; Bommarito, C.; Calzolai, G.; Ghedini, C.; Lucarelli, F.; Meloni, D.; Monteleone, F.; et al. Evidence for heavy fuel oil combustion aerosols from chemical analyses at the island of Lampedusa: A possible large role of ships emissions in the Mediterranean. Atmos. Chem. Phys. 2012, 12, 3479–3492. [Google Scholar] [CrossRef]
  52. Vincent, E.; Araki, S.; Bofill, P. The 2008 Signal Separation Evaluation Campaign: A community-based approach to large-scale evaluation. In Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation (ICA 2009), Paraty, Brazil, 15–18 March 2009; pp. 734–741. [Google Scholar]
  53. Le Roux, J.; Hershey, J.R.; Weninger, F. Sparse NMF–Half-baked or Well Done? Technical Report TR2015-023; Mitsubishi Electric Research Labs (MERL): Cambridge, MA, USA, 2015. [Google Scholar]
  54. Roche, C. Etude des Concentrations et de la Composition des PM10 sur le Littoral du Nord de la France—Evaluation des Contributions Maritimes de L’espace Manche-Mer du Nord. Ph.D. Thesis, Université du Littoral Côte d’Opale, Dunkerque, France, 2016. [Google Scholar]
  55. Ledoux, F.; Roche, C.; Cazier, F.; Beaugard, C.; Courcot, D. Influence of ship emissions on NOx, SO2, O3 and PM concentrations in a North-Sea harbor in France. J. Environ. Sci. 2018, 71, 56–66. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Behavior of several dissimilarity measures with respect to the residual value.
Figure 1. Behavior of several dissimilarity measures with respect to the residual value.
Entropy 21 00253 g001
Figure 2. The α -zoom weight.
Figure 2. The α -zoom weight.
Entropy 21 00253 g002
Figure 3. Different areas as a function of α and β .
Figure 3. Different areas as a function of α and β .
Entropy 21 00253 g003
Figure 4. Estimation of the road traffic profile.
Figure 4. Estimation of the road traffic profile.
Entropy 21 00253 g004
Figure 5. MER vs. input signal-to-noise ratio (SNR). The case with 20 outliers.
Figure 5. MER vs. input signal-to-noise ratio (SNR). The case with 20 outliers.
Entropy 21 00253 g005
Figure 6. Mixing-error ratio (MER) index for N 1 constrained and weighted NMF mixing-error ratio (CWNMF) vs. α and β .
Figure 6. Mixing-error ratio (MER) index for N 1 constrained and weighted NMF mixing-error ratio (CWNMF) vs. α and β .
Entropy 21 00253 g006
Figure 7. MER versus constraint number. The case with 20 outliers.
Figure 7. MER versus constraint number. The case with 20 outliers.
Entropy 21 00253 g007
Figure 8. Digitel DA80 high volume sampler used for data acquisition (source of the right plot: Digitel).
Figure 8. Digitel DA80 high volume sampler used for data acquisition (source of the right plot: Digitel).
Entropy 21 00253 g008
Figure 9. Estimation of the sea traffic source profile.
Figure 9. Estimation of the sea traffic source profile.
Entropy 21 00253 g009
Figure 10. V species reconstruction over Cape Gris–Nez.
Figure 10. V species reconstruction over Cape Gris–Nez.
Entropy 21 00253 g010
Table 1. Properties of α -zoom.
Table 1. Properties of α -zoom.
α 0 < X ij X ^ ij < 1 X ij X ^ ij > 1
α > 1 small zoomlarge zoom
α < 1 large zoomsmall zoom
Table 2. Weighting effect on the α β -divergence.
Table 2. Weighting effect on the α β -divergence.
α + β 0 < X ^ ij < 1 X ^ ij > 1
α + β < 1 large weightingsmall weighting
α + β > 1 small weightinglarge weighting
Table 3. Our different non-negative matrix factorization (NMF) methods with normalization.
Table 3. Our different non-negative matrix factorization (NMF) methods with normalization.
AcronymFGMask on FMask on G
α β -N 1 -CWNMF-REquation (52)Equation (53) M F α , β ( G ˜ k , F ˜ k ) N G α , β ( G ˜ k , F k + 1 )
α β -N 1 -CWNMFEquation (52)Equation (53) N F α , β ( G ˜ k , F ˜ k ) N G α , β ( G ˜ k , F k + 1 )
α β -N 2 -CWNMF-REquation (56)Equation (12) M F α , β ( G k , F ˜ k ) N G α , β ( G k , F ˜ k + 1 )
α β -N 2 -CWNMFEquation (56)Equation (12) N F α , β ( G k , F ˜ k ) N G α , β ( G k , F ˜ k + 1 )
Table 4. Features of the different source profiles.
Table 4. Features of the different source profiles.
ProfilesTypeMajor SpeciesReferences
Sea saltsNatural Cl , Na + , SO 4 2 , Mg 2 + , K + , Ca 2 + , Sr [48]
Crustal dustNatural Al , Ca 2 + , Fe , K + , OC, Ti , NO 3 , Na + [49]
Primary biogenic emissionNaturalOC, EC, Polyols, P[50]
Aged sea saltsAnthropised NO 3 , Na + , SO 4 2 , Mg 2 + , K + , OC, Ca 2 + , Sr , Cl [50]
Secondary nitratesAnthropised NO 3 , OC, NH 4 + , EC, Ca 2 + , Fe , Zn , Cu [50]
Secondary sulfatesAnthropised SO 4 2 , NH 4 + , OC, Ca 2 + , K + , Fe , Pb , Zn [49]
Biomass combustionAnthropogenicOC, EC, Levoglucosan, NO 3 , K + , Zn [50]
Road trafficAnthropogenicEC, OC, NO 3 , Cu , Sb , Zn , Fe [50]
Sea trafficAnthropogenicOC, EC, V, Ni , Co , SO 4 2 , NH 4 + , NO 3 [50,51]
Rich metal sourceAnthropogenic Fe , Al , Cr , Pb , Zn , Mn [50]

Share and Cite

MDPI and ACS Style

Delmaire, G.; Omidvar, M.; Puigt, M.; Ledoux, F.; Limem, A.; Roussel, G.; Courcot, D. Informed Weighted Non-Negative Matrix Factorization Using αβ-Divergence Applied to Source Apportionment. Entropy 2019, 21, 253. https://doi.org/10.3390/e21030253

AMA Style

Delmaire G, Omidvar M, Puigt M, Ledoux F, Limem A, Roussel G, Courcot D. Informed Weighted Non-Negative Matrix Factorization Using αβ-Divergence Applied to Source Apportionment. Entropy. 2019; 21(3):253. https://doi.org/10.3390/e21030253

Chicago/Turabian Style

Delmaire, Gilles, Mahmoud Omidvar, Matthieu Puigt, Frédéric Ledoux, Abdelhakim Limem, Gilles Roussel, and Dominique Courcot. 2019. "Informed Weighted Non-Negative Matrix Factorization Using αβ-Divergence Applied to Source Apportionment" Entropy 21, no. 3: 253. https://doi.org/10.3390/e21030253

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop