# Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data Sets

- (i)
- The number of emergency hospital admissions for cardiovascular disease (CVD), myocardial infarction (MI), congestive heart failure (CHF), respiratory disease, and diabetes were collected in 26 US communities, for the years 2000–2003 [3].
- (ii)
- The Compressed Mortality File (CMF)—a county-level national mortality and population database spanning the years 1968–2010. The table contains death counts for 13 age categories.

#### 2.2. Methods

#### 2.2.1. Non-Negative Factorization of a General Matrix

**V**that contains only non-negative entries.

**V**can be approximated by a sum of k rank-1 bilinear forms

**V**=

**∑**

**w**·

_{q}**h**

_{q}^{T}, more conveniently written as

**V**=

**WH**

^{T}, with

**W**= [

**w**]

_{q}_{1 ≤ q ≤ k}and

**H**= [

**h**]

_{q}_{1 ≤ q ≤ k}. The vectors

**w**and

_{q}**h**are referred to as the components of the factorization model. The NMF of

_{q}**V**can be used to obtain

**W**and

**H**, and thereby guarantees that all elements of

**W**and

**H**are non-negative—by contrast with the ordinary SVD of

**V**, which returns

**W**and

**H**with mixed signs. Consider now a general matrix

**V**, which we would like to analyze with a NMF approach. If

**V**is of mixed signs, a preliminary transformation is required to satisfy the non-negativity requirement. Two approaches can be applied. The first approach is a novel approach that is referred to as the PosNegNMF approach:

- (i)
- Split
**V**into the positive and negative parts :**V**=**V**_{+}−**V**_{−}where**V**_{+}contains the positive entries, other entries being replaced by 0, and**V**_{−}contains the absolute values of the negative entries, other entries being replaced by 0. - (ii)
- When the rows are the observations and the columns are the variables, use the horizontally concatenated matrix
**V**in order to give equal weight to low and high features while characterizing the clusters of observations._{PN} - (iii)
- Apply the NMF clustering on the concatenated matrix:
**V**=_{PN}**WH**^{T}(note that writing**V**=**W(H**_{+}−**H**_{−}**)**^{T}where**H**= [**H**_{+}**H**_{−}] corresponds to the semi-NMF model).

- (i)
- (ii)
- Apply the NMF clustering to
**V**._{0}

**W**and

**H**. The simplest method uses multiplying updating rules [4] to minimize the sum of squares of the elements of

**V**−

**WH**, which we will refer to as the residual sum of squares (Appendix A). Note that in contrast to the analysis of

^{T}**V**

_{PN}, which includes all information contained in

**V**, the analysis of

**V**disregards the baseline, and so it includes only a part of the information contained in

_{0}**V**. We will show in the results section that this loss of information may have dramatic consequences on the quality and interpretability of the clustering results.

#### 2.2.2. NMF Clustering and Reordering

**W**to cluster observations, as traditionally done with SVD-based methods, such as the latent semantic indexing method used in document clustering [5]. The simple clustering scheme described in the introduction is based on the direct link of each homogeneous block in

**V**with a particular column in

**W**, which has the largest elements point to this block. However, neither approach is independent of the chosen scaling system, which is arbitrary. To address this problem, component leverages can be calculated. The component leverage represents the ability of a row or column to exert specifically an influence on a particular component—without a corresponding increase in the influence on other components. Leverage column vectors have all elements in the interval (0, 1). They are strongly correlated with the NMF column vectors

**w**and

_{q}**h**, however much less affected by the choice of a scaling system. Technical details for the calculation of leverages are given in Appendix B.

_{q}#### 2.2.3. Stability and Specific Clustering Contribution of NMF Clusters

#### 2.2.4. Rank of the NMF Factorization

**V**−

**WH**, where k takes values from 1 up to a large value (e.g., half the number of rows or columns), in combination with the analysis of cluster stability. The plot of each criterion as a function of k is called a scree plot. An optimal k should correspond to both a low residual sum of squares and a high cluster stability. Additionally, the scree plot of the cluster SCC can be used to confirm the choice based on the first two criteria. Ultimately, selecting k is a difficult decision to make, and it is an important step on the path to a better understanding of the nature of

^{T}**V**.

#### 2.2.5. Normalization of Contingency Tables

- (i)
- For each cell, the contingency ratio is calculated by forming the ratio of the true count over the expected count—assuming the independence of rows and columns
- (ii)
- Further normalization steps include the subtraction of the expected ratio under the assumption of independence (=1), yielding a mixed signs matrix, and a subsequent scaling of rows and columns to ensure homogeneous cell variances.
- (iii)
- The SVD is applied to the normalized matrix.
- (iv)
- A biplot based on the first two SVD components is then performed, allowing for a simultaneous clustering of the rows and columns of the table, which we will refer to as the SVD clustering. Note that PCA clustering refers to the same approach, since PCA’s eigen vectors are the column singular vectors [7].

#### 2.3. Software

## 3. Results

#### 3.1. Hospital Admissions Data

#### 3.1.1. PosNegNMF Clustering

#### 3.1.2. Affine NMF Clustering

#### 3.1.3. Correspondence Analysis

#### 3.1.4. Additional Remarks

#### 3.2. Compressed Mortality File

## 4. Discussion

#### 4.1. Performance

#### 4.2. Alternative Approaches

**V**, we could also use standard clustering methods to first cluster rows, and then columns of

**V**. A rank-1 bilinear form approximation provides a natural way of performing these operations. Approximating the matrix

**V**by

**V**=

**wh**

^{T}, where

**w**and

**h**are the row and column marker vectors respectively, then reordering the rows of

**V**by the decreasing values of the elements of

**w**and the columns by the decreasing values of the elements of

**h**leads to a matrix with elements that tend to go from the largest in the top left to the smallest in the bottom right. The bilinear approximation also provides a way to cluster the matrix into internally homogeneous rectangles. Sorting the elements of

**w**into descending order and separating them into k maximally homogeneous clusters leads to the reordered

**V**having rows that are maximally homogeneous to the extent the bilinear approximation matches them well—the elements of the error matrix

**V**−

**wh**

^{T}are small enough. Applying the same operation to separate the elements of

**h**into m maximally homogeneous clusters does the same for the columns, while separating the elements of both the row and column markers will cluster the matrix

**V**into km internally homogeneous blocks. This approach was used in Liu et al. [11] where a robust incomplete-data implementation of SVD has been used to obtain the approximants

**w**and

**h**. The segmentation of the row and column markers was carried out using the dynamic programming optimization method from Hawkins [12].

#### 4.3. Limitations

**H**) of a mixture of true real-world sources, and their contributions (the columns of

**W**) to observed samples (the rows of

**V**), finding “the right solution”, and determining the uniqueness of the solution, are critical goals. These two problems are essentially different, along the lines of Shmueli [18], to explain—through cluster analysis—or to predict—estimates of mixture contributions. Whether PosNegNMF improves on existing mixture deconvolution methods will require further investigation. Positive Matrix Factorization (PMF) [19] has been mostly applied to mixture problems, such as determining the sources of atmospheric pollution. Both NMF and PMF approaches actually solve the same mathematical problem with very close algorithms. However, PMF is more oriented towards mixture deconvolution problems—yielding accurate mixture contributions, where as NMF is more oriented towards pattern discovery—yielding interpretable clusters. It should be noted that the PosNeg transformation generates a sparse matrix with at least half of the cells having zero counts. Obviously, matrices which are naturally sparse are not amenable to the PosNeg transformation.

## 5. Conclusions

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

CA | Correspondence Analysis |

NMF | Non-Negative Matrix Factorization |

PMF | Positive Matrix Factorization |

SVD | Singular Value Decomposition |

CVD | cardiovascular disease |

MI | myocardial infarction |

CHF | congestive heart failure |

SCC | Specific clustering contribution |

## Appendix A: Estimation of the NMF Model Components W and H

**W**and

**H**:

**W**iq > 0,

**H**qj > 0, ∀ i, j, q, where i, j, and q are the row, column, and component indexes.

**W**iq ←

**W**iq (

**VH**)iq/(

**WH**

^{T}

**H**) iq, ∀ i, q

**H**jq ←

**H**jq (

**W**

^{T}

**V**)qj/(

**W**

^{T}

**WH**

^{T}) qj, ∀ j, q.

**W**and

**H**using specific multiplicative factors that relate to the current quality of the intended approximation, see Lin, 2005 [20] for further details regarding the properties of this algorithm and extensions using projected gradient methods. An affine NMF variant of these updating rules can be found in Laurberg, 2007 [21], however this variant applies only to non-negative matrices, whereas the affine approach that uses the minimum of each column applies to mixed signs matrices. Note that the NMF model components are not ordered: the order in which they appear in the factorization can change, depending on the initialization and algorithm used.

## Appendix B: Calculation of Leverages

**W**or

**H**are typically constrained to sum to unity as a convenient way of eliminating the degeneracy associated with the invariance of

**WH**under the transformation

**W**→

**WΛ**,

**H**→

**Λ**

^{−1}

**H**, where

**Λ**is a diagonal matrix defining a particular scaling system. It should be noted that the sum to unity constraint is arbitrary. We will show that the choice of a particular system can affect the clustering process.

**H**

_{1}and

**H**

_{2}. In Figure B1a, the sample coordinates (x, y) correspond to the mixture coefficients in a two-component model: Sample = x

**H**

_{1}+ y

**H**

_{2}. The red circles correspond to samples with y > x, thus more similar to

**H**

_{2}; the blue crosses correspond to samples with y < x, thus more similar to

**H**

_{1}. Now, the x and y vectors were scaled by the square root of their respective sum of squares, yielding a new coordinates system. When this system is used, three samples originally closer to

**H**

_{1}appear wrongly clustered with samples originally closer to

**H**

_{2}(Figure B1b, red crosses). Indeed, for these particular samples, in this coordinates system, y > x, where as in the original system, y < x. If the distance to

**H**

_{1}or

**H**

_{2}is now defined by the Euclidean distance to the extremity of each axis (represented by an arrow), the clustering remains unchanged in either coordinates system.

**Figure B1.**(

**a**) Clustering in the original coordinate system; (

**b**) Clustering using scaled coordinates.

**H**q is defined by:

**W**or

**H**, and are in the interval (0, 1). They are little affected by the chosen scaling system, thus allowing for a more reliable clustering.

**H**q is a function of $\underset{j}{\mathrm{max}}\left(\mathbf{W}\left(j,q\right)\right)$, thus it can be severely affected by outliers. The following iterative algorithm allows for estimating a robust maximum:

- Initialize the robust estimate by the maximum of each component.
- For each vector component q:
- For each row i of
**W**, calculate the probability p (i, q) and the row score (Equations (C2) and (C1) respectively, Appendix C). - Force the row score to 0 if p (i, q) < 1/k.
- Update Robust Max(q) by the weighted mean of
**W**(i, q), where the mean is taken over all samples satisfying $\mathbf{W}\left(\mathit{i},\mathit{q}\right)\mathbf{>}{\underset{\mathit{j}}{\mathbf{95}}}^{\mathit{t}\mathit{h}}quantile\left(\mathbf{W}\left(\mathit{j},\mathit{q}\right)\right)$, and the weights are the row scores. The idea is that rows with higher row scores should weigh more on the max estimation. - Replace all
**W**(i, q) > Robust Max(q) by Robust Max(q).

- Repeat 2. until convergence.

**Figure B2.**Estimation of a robust max on each component vector (blue line on top of each histogram).

## Appendix C: Stability and Specific Clustering Contribution of NMF Clusters

**V**are resampled with replacement and the rows of

**W**are resampled in exactly the same way as in

**V**.

**H**is then re-estimated, while the resampled

**W**remains fixed. Column leverages are re-estimated after each run of this resampling scheme, giving rise to different clusters of columns, and the frequency at which a column falls into a particular cluster can be calculated. The higher the frequency, the more stable will be the column with respect to this cluster. Conversely, the updated

**H**can be used to re-estimate

**W**, which gives rise to different clusters of rows. Similarly, the frequency at which a row falls into a particular cluster can be calculated.

**W**, we define the row score for the ith row as:

**W**provides an overall indicator of the Specific Clustering Contribution (SCC).

## References

- Fogel, P.; Young, S.S.; Hawkins, D.M.; Ledirac, N. Inferential, robust non-negative matrix factorization analysis of microarray data. Bioinformatics
**2007**, 23, 44–49. [Google Scholar] [CrossRef] [PubMed] - Ding, C.H.Q.; Tao, L.; Jordan, M.I. Convex and Semi-Nonnegative Matrix Factorizations. IEEE Trans. Pattern Anal. Mach. Intelli. Arch.
**2010**, 32, 44–55. [Google Scholar] [CrossRef] [PubMed] - Zanobetti, A.; Franklin, M.; Koutrakis, P.; Schwartz, J. Fine particulate air pollution and its components in association with cause-specific emergency admissions. Environ. Health
**2009**, 8. [Google Scholar] [CrossRef] [PubMed] - Lee, D.D.; Seung, H.S. Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature
**1999**, 401, 788–791. [Google Scholar] [PubMed] - Xu, W.; Liu, X.; Gong, Y. Document Clustering Based On Non-negative Matrix Factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 28 July–1 August 2003.
- Kim, H.; Park, H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics
**2007**, 23, 1495–1502. [Google Scholar] [CrossRef] [PubMed] - Fogel, P.; Hawkins, D.M.; Beecher, C.; Luta, G.; Young, S.S. A Tale of Two Matrix Factorizations. Am. Stat.
**2013**, 67, 207–218. [Google Scholar] [CrossRef] - Greenacre, M.J. Tying Up the Loose Ends in Simple Correspondence Analysis. 2001. Available online: http://dx.doi.org/10.2139/ssrn.1001889 (accessed on 20 July 2007).
- SAS Institute Inc. SAS
^{®}Technical Report A-108, Cubic Clustering Criterion; SAS Institute Inc.: Cary, NC, USA, 1983; p. 56. [Google Scholar] - Zhang, S.; Wang, W.; Ford, J.; Makedon, F. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. SIAM Conf. Data Min.
**2006**, 6, 548–552. [Google Scholar] - Liu, L.; Hawkins, D.M.; Ghosh, S.; Young, S.S. Robust singular value decomposition analysis of microarray data. PNAS
**2003**, 100, 13167–13172. [Google Scholar] [CrossRef] [PubMed] - Hawkins, D.M. Fitting Multiple Change-Points to Data. Comput. Stat. Data Anal.
**2001**, 37, 323–341. [Google Scholar] [CrossRef] - Hawkins, D.M. Topics in Applied Multivariate Analysis; Cambridge University Press: New York, NY, USA, 1982. [Google Scholar]
- Ng, A.Y.; Jordan, M.I.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inform. Proc. Syst.
**2002**, 2, 849–856. [Google Scholar] - Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput.
**2007**, 17, 395–416. [Google Scholar] [CrossRef] - Atkins, J.E.; Boman, E.G.; Hendrickson, B. A spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput.
**1998**, 28, 297–310. [Google Scholar] [CrossRef] - Liiv, I. Seriation and matrix reordering methods: An historical overview. Stat. Anal. Data Min.
**2010**, 3, 70–91. [Google Scholar] [CrossRef] - Shmueli, G. To Explain or to Predict? Stat. Sci.
**2010**, 25, 289–310. [Google Scholar] [CrossRef] - Paatero, P. Least squares formulation of robust non-negative factor analysis. Chemom. Intell. Lab. Sys.
**1997**, 37, 23–35. [Google Scholar] [CrossRef] - Lin, C.J. Projected Gradient Methods for NonNegative Matrix Factorization. Tech. Rep. Inform. Support Serv.
**2007**, 19, 2756–2779. [Google Scholar] [CrossRef] [PubMed] - Laurberg, H. On Affine Non-Negative Matrix Factorization. Acoust. Speech Signal Proc.
**2007**, 2, 653–656. [Google Scholar] - Devarajan, K. Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology. PLoS Comput. Biol.
**2008**, 4, e1000029. [Google Scholar] [CrossRef] [PubMed] - Boutsidis, C.; Gallopoulos, E. SVD Based Initialization: A HeadStart for Nonnegative Matrix Factorization. J. Pattern Recogn.
**2008**, 41, 1350–1362. [Google Scholar] [CrossRef]

**Figure 2.**NMF clustering and re-ordering of hospital admissions by city and cause. Red: High count; Blue: Low count.

**Figure 6.**Correspondence analysis biplot of hospital admissions by city and cause (PosNegNMF clusters are represented by the city label colors).

Cluster | High Counts | Low Counts |
---|---|---|

1 | Respiratory | CVD, CHF, MI |

2 | MI | CHF, diabetes |

3 | MI, CVD | Respiratory |

4 | CHF | Diabetes, MI |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fogel, P.; Gaston-Mathé, Y.; Hawkins, D.; Fogel, F.; Luta, G.; Young, S.S.
Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health. *Int. J. Environ. Res. Public Health* **2016**, *13*, 509.
https://doi.org/10.3390/ijerph13050509

**AMA Style**

Fogel P, Gaston-Mathé Y, Hawkins D, Fogel F, Luta G, Young SS.
Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health. *International Journal of Environmental Research and Public Health*. 2016; 13(5):509.
https://doi.org/10.3390/ijerph13050509

**Chicago/Turabian Style**

Fogel, Paul, Yann Gaston-Mathé, Douglas Hawkins, Fajwel Fogel, George Luta, and S. Stanley Young.
2016. "Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health" *International Journal of Environmental Research and Public Health* 13, no. 5: 509.
https://doi.org/10.3390/ijerph13050509