# Guided Semi-Supervised Non-Negative Matrix Factorization

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Classical Non-Negative Matrix Factorization

#### 2.2. Semi-Supervised NMF

#### 2.3. Guided NMF

#### 2.4. Topic Supervised NMF

## 3. Proposed Method

Algorithm 1: GSSNMF with multiplicative updates. |

## 4. Experiments

#### 4.1. Pre-Processing of the CIP Dataset

`max_df = 0.8`,

`min_df=0.04`, and

`max_features = 700`in the function

`TfidfVectorizer`.

#### 4.2. Classification on the CIP Dataset

#### 4.3. Topic Modeling on the CIP Dataset

#### 4.4. Pre-Processing of the 20 Newsgroups Dataset

`max_df = $0.8$`, and

`max_features = 2000`in the function

`TfidfVectorizer`.

#### 4.5. Classification on the 20 Newsgroups Dataset

#### 4.6. Topic Modeling on the 20 Newsgroups Dataset. Note That TS-NMF’s Rank Has to Equal the Number of Classes; as a Result, It Only Has Rank 9 Topic Modeling Result

## 5. Conclusions and Future Works

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. GSSNMF Algorithm: Multiplicative Updates Proof

## Appendix B. GSSNMF Topic Modeling: Details on μ Values

**Table A1.**Mean of averaged coherence scores from 10 independent trials (mean ${\mathcal{C}}_{\mathrm{avg}}$) of Guided NMF and GSSNMF given $\lambda $ and the best-performing $\mu $ for each $\lambda $, by rank.

Rank 6 | |||
---|---|---|---|

Guided NMF | GSSNMF | ||

$\lambda $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ | $\mu $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ |

0.05 | 1235.799 | 0.017 | 1238.159 |

0.1 | 1236.479 | 0.006 | 1233.213 |

0.15 | 1223.286 | 0.011 | 1236.314 |

0.2 | 1227.161 | 0.01 | 1238.868 |

0.25 | 1238.161 | 0.006 | 1234.622 |

0.3 | 1231.811 | 0.014 | 1232.488 |

0.35 | 1234.488 | 0.019 | 1234.992 |

0.4 | 1224.315 | 0.015 | 1233.806 |

0.45 | 1215.942 | 0.002 | 1237.543 |

0.5 | 1219.138 | 0.009 | 1236.933 |

0.55 | 1217.428 | 0.019 | 1237.721 |

0.6 | 1225.177 | 0.009 | 1235.567 |

0.65 | 1221.387 | 0.011 | 1234.732 |

0.7 | 1233.618 | 0.014 | 1231.956 |

0.75 | 1225.619 | 0.019 | 1234.159 |

0.8 | 1232.224 | 0.014 | 1232.239 |

0.85 | 1219.339 | 0.02 | 1233.272 |

0.9 | 1220.92 | 0.002 | 1232.433 |

0.95 | 1220.591 | 0.0 | 1233.243 |

1.0 | 1199.539 | 0.01 | 1233.594 |

Rank 7 | |||

Guided NMF | GSSNMF | ||

$\lambda $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ | $\mu $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ |

0.05 | 1160.539 | 0.004 | 1175.973 |

0.1 | 1161.856 | 0.008 | 1169.722 |

0.15 | 1163.338 | 0.004 | 1172.093 |

0.2 | 1156.654 | 0.006 | 1172.746 |

0.25 | 1161.297 | 0.013 | 1175.38 |

0.3 | 1167.535 | 0.006 | 1181.253 |

0.35 | 1161.732 | 0.018 | 1170.002 |

0.4 | 1168.508 | 0.004 | 1173.216 |

0.45 | 1160.197 | 0.01 | 1173.244 |

0.5 | 1160.156 | 0.002 | 1174.586 |

0.55 | 1165.1 | 0.017 | 1170.422 |

0.6 | 1166.476 | 0.019 | 1170.292 |

0.65 | 1150.758 | 0.017 | 1171.758 |

0.7 | 1157.302 | 0.008 | 1169.872 |

0.75 | 1163.242 | 0.008 | 1167.135 |

0.8 | 1157.301 | 0.017 | 1166.257 |

0.85 | 1159.55 | 0.005 | 1163.406 |

0.9 | 1144.468 | 0.017 | 1165.967 |

0.95 | 1154.814 | 0.018 | 1166.376 |

1.0 | 1167.511 | 0.011 | 1169.715 |

Rank 8 | |||

Guided NMF | GSSNMF | ||

$\lambda $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ | $\mu $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ |

0.05 | 1101.472 | 0.013 | 1113.19 |

0.1 | 1099.322 | 0.007 | 1111.402 |

0.15 | 1093.238 | 0.014 | 1118.944 |

0.2 | 1104.311 | 0.017 | 1108.933 |

0.25 | 1112.251 | 0.018 | 1112.88 |

0.3 | 1095.652 | 0.01 | 1108.676 |

0.35 | 1112.29 | 0.003 | 1113.42 |

0.4 | 1097.472 | 0.018 | 1113.481 |

0.45 | 1108.421 | 0.001 | 1110.225 |

0.5 | 1104.963 | 0.015 | 1107.234 |

0.55 | 1103.96 | 0.02 | 1108.842 |

0.6 | 1099.296 | 0.012 | 1112.723 |

0.65 | 1103.613 | 0.004 | 1111.863 |

0.7 | 1100.494 | 0.018 | 1107.633 |

0.75 | 1101.705 | 0.003 | 1109.97 |

0.8 | 1096.231 | 0.006 | 1105.261 |

0.85 | 1099.701 | 0.017 | 1110.706 |

0.9 | 1087.417 | 0.001 | 1108.98 |

0.95 | 1093.507 | 0.013 | 1104.385 |

1.0 | 1083.938 | 0.019 | 1102.194 |

Rank 9 | |||

Guided NMF | GSSNMF | ||

$\lambda $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ | $\mu $ | Mean ${\mathcal{C}}_{\mathrm{avg}}$ |

0.05 | 1055.147 | 0.02 | 1058.959 |

0.1 | 1053.454 | 0.009 | 1064.597 |

0.15 | 1052.940 | 0.017 | 1064.211 |

0.2 | 1057.455 | 0.016 | 1065.838 |

0.25 | 1058.728 | 0.02 | 1060.095 |

0.3 | 1044.547 | 0.015 | 1062.604 |

0.35 | 1061.362 | 0.001 | 1061.362 |

0.4 | 1054.063 | 0.002 | 1063.328 |

0.45 | 1042.789 | 0.003 | 1057.803 |

0.5 | 1048.559 | 0.003 | 1060.999 |

0.55 | 1046.531 | 0.014 | 1060.221 |

0.6 | 1048.027 | 0.014 | 1056.21 |

0.65 | 1042.196 | 0.013 | 1057.367 |

0.7 | 1050.172 | 0.013 | 1056.088 |

0.75 | 1045.215 | 0.012 | 1060.503 |

0.8 | 1048.137 | 0.002 | 1057.536 |

0.85 | 1040.848 | 0.007 | 1061.34 |

0.9 | 1051.050 | 0.004 | 1061.306 |

0.95 | 1055.246 | 0.001 | 1055.246 |

1.0 | 1044.768 | 0.02 | 1052.106 |

## References

- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature
**1999**, 401, 788–791. [Google Scholar] [CrossRef] [PubMed] - Seung, D.; Lee, L. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst.
**2001**, 13, 556–562. [Google Scholar] - Arora, S.; Ge, R.; Moitra, A. Learning topic models–going beyond SVD. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA, 20–23 October 2012; pp. 1–10. [Google Scholar]
- Kuang, D.; Choo, J.; Park, H. Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering. In Partitional Clustering Algorithms; Springer: Berlin/Heidelberg, Germany, 2015; pp. 215–243. [Google Scholar]
- Wu, W.; Kwong, S.; Hou, J.; Jia, Y.; Ip, H.H.S. Simultaneous dimensionality reduction and classification via dual embedding regularized nonnegative matrix factorization. IEEE Trans. Image Process.
**2019**, 28, 3836–3847. [Google Scholar] [CrossRef] [PubMed] - Wu, W.; Jia, Y.; Wang, S.; Wang, R.; Fan, H.; Kwong, S. Positive and negative label-driven nonnegative matrix factorization. IEEE Trans. Circuits Syst. Video Technol.
**2020**, 31, 2698–2710. [Google Scholar] [CrossRef] - Xu, W.; Liu, X.; Gong, Y. Document Clustering Based on Non-Negative Matrix Factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 267–273. [Google Scholar]
- Chang, J.; Gerrish, S.; Wang, C.; Boyd-graber, J.; Blei, D. Reading Tea Leaves: How Humans Interpret Topic Models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
- Jagarlamudi, J.; Daumé III, H.; Udupa, R. Incorporating Lexical Priors into Topic Models. In Proceedings of the Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23–27 April 2012; Association for Computational Linguistics: Avignon, France, 2012; pp. 204–213. [Google Scholar]
- Chen, Y.; Rege, M.; Dong, M.; Hua, J. Non-negative matrix factorization for semi-supervised data clustering. Knowl. Inf. Syst.
**2008**, 17, 355–379. [Google Scholar] [CrossRef] - Lee, H.; Yoo, J.; Choi, S. Semi-Supervised Nonnegative Matrix Factorization. IEEE Signal Process. Lett.
**2010**, 17, 4–7. [Google Scholar] - Jia, Y.; Kwong, S.; Hou, J.; Wu, W. Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization. IEEE Trans. Neural Netw. Learn. Syst.
**2019**, 31, 2510–2521. [Google Scholar] [CrossRef] [PubMed] - Jia, Y.; Liu, H.; Hou, J.; Kwong, S. Semisupervised adaptive symmetric non-negative matrix factorization. IEEE Trans. Cybern.
**2020**, 51, 2550–2562. [Google Scholar] [CrossRef] [PubMed] - Haddock, J.; Kassab, L.; Li, S.; Kryshchenko, A.; Grotheer, R.; Sizikova, E.; Wang, C.; Merkh, T.; Madushani, R.W.M.A.; Ahn, M. Semi-Supervised NMF Models for Topic Modeling in Learning Tasks. Available online: https://arxiv.org/pdf/2010.07956 (accessed on 29 January 2022).
- Vendrow, J.; Haddock, J.; Rebrova, E.; Needell, D. On a Guided Nonnegative Matrix Factorization. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3265–32369. [Google Scholar]
- MacMillan, K.; Wilson, J.D. Topic supervised non-negative matrix factorization. arXiv
**2017**, arXiv:1706.05084. [Google Scholar] - Lin, C.J. On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization. IEEE Trans. Neural Netw.
**2007**, 18, 1589–1596. [Google Scholar] - Budahazy, R.; Cheng, L.; Huang, Y.; Johnson, A.; Li, P.; Vendrow, J.; Wu, Z.; Molitor, D.; Rebrova, E.; Needell, D. Analysis of Legal Documents via Non-negative Matrix Factorization Methods. Available online: https://arxiv.org/pdf/2104.14028 (accessed on 29 January 2022).
- Lang, K. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; pp. 331–339. [Google Scholar]
- Opitz, J.; Burst, S. Macro F1 and Macro F1. Available online: https://arxiv.org/pdf/1911.03347 (accessed on 29 January 2022).
- Mimno, D.; Wallach, H.; Talley, E.; Leenders, M.; McCallum, A. Optimizing Semantic Coherence in Topic Models. In Proceedings of the Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; Association for Computational Linguistics: Edinburgh, UK, 2011; pp. 262–272. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
- Ramos, J. Using tf-idf to Determine Word Relevance in Document Queries. Available online: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.1424&rep=rep1&type=pdf (accessed on 29 January 2022).
- Li, J.; Zhang, K.; Qi’na, F. Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ. J. Nat. Sci.
**2007**, 12, 917–921. [Google Scholar] [CrossRef] - Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag.
**1988**, 24, 513–523. [Google Scholar] [CrossRef][Green Version] - Naseem, U.; Razzak, I.; Khan, S.K.; Prasad, M. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Trans. Asian -Low-Resour. Lang. Inf. Process.
**2021**, 20, 1–35. [Google Scholar] [CrossRef] - Robertson, S. Understanding inverse document frequency: On theoretical arguments for IDF. J. Doc.
**2004**, 60, 503–520. [Google Scholar] [CrossRef][Green Version] - Kwok, I.; Wang, Y. Locate the hate: Detecting tweets against blacks. In Proceedings of the Twenty-seventh AAAI Conference on Artificial Intelligence, Bellevue, DC, USA, 14–18 July 2013. [Google Scholar]
- Burnap, P.; Williams, M.L. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision-making. Policy Internet
**2015**, 7, 223–242. [Google Scholar] [CrossRef][Green Version] - Grotheer, R.; Huang, L.; Huang, Y.; Kryshchenko, A.; Kryshchenko, O.; Li, P.; Li, X.; Rebrova, E.; Ha, K.; Needell, D. COVID-19 Literature Topic-Based Search via Hierarchical NMF. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Virtual Event, 13–17 September 2020. [Google Scholar]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol.
**1974**, 36, 111–133. [Google Scholar] [CrossRef]

**Figure 2.**The heatmap representation of Macro F1-score averaged over 10 independent trials with $\mu ,\lambda \in \{0.005,0.006,\cdots ,0.0012\}$: (

**a**) displays the results for SSNMF with $\mu \in \{0.005,0.006,\cdots ,0.0012\}$; (

**b**) highlights the best scores for GSSNMF by ranging $\lambda $ from $0.0005$ to $0.0012$; (

**c**) shows the performance of GSSNMF with different values of $\mu $ and $\lambda $ in the predefined range. From the heatmap, one can see, for different $\mu $, that one may choose $\lambda $ such that GSSNMF outperforms SSNMF.

**Figure 3.**Difference between Reconstructed Labels using SSNMF and GSSNMF and Actual Crime Label matrices ($\mu =0.0011,\lambda =0.0007$), a light pixel indicates correct label assignment, while a dark pixel indicates otherwise.

**Figure 4.**Comparison of Guided NMF mean ${\mathcal{C}}_{\mathrm{avg}}$ score (over 10 independent trials) and highest GSSNMF mean ${\mathcal{C}}_{\mathrm{avg}}$ score (over 10 independent trials) for each $\lambda $ tested.

**Figure 6.**Macro F1-scores for SSNMF and GSSNMF applied to the 20 Newsgroups data, averaged over 10 independent trials with $\mu \in [0.05,0.12]$, $\lambda \in [0.695,0.702]$: (

**a**) displays the results for SSNMF with $\mu \in [0.05,0.12]$; (

**b**) shows the best performance for GSSNMF by ranging $\lambda $ from $0.695$ to $0.702$; (

**c**) shows the performance of GSSNMF with different combinations of $\mu $ and $\lambda $ in the predefined range. From the maximal values and heatmap, one can see, for different $\mu $, that there are choices of $\lambda $ such that GSSNMF outperforms SSNMF.

Guided NMF Results ($\mathit{\lambda}=0.4$) | ||||||
---|---|---|---|---|---|---|

Topic 1 | Topic 2 | Topic 3 | Topic 4 | Topic 5 | Topic 6 | Topic 7 |

gang | burglari | murder | accomplic | gang | instruct | identif |

member | sexual | shot | corrobor | member | murder | eyewit |

crip | strike | hous | robberi | expert | manslaught | photo |

activ | admiss | detect | instruct | beer | lesser | lineup |

phone | instruct | vehicl | murder | estrada | theori | suggest |

murder | threat | phone | codefend | hispan | degre | suspect |

photo | object | apart | abet | tattoo | passion | photograph |

territori | discret | robberi | commiss | intent | abet | pack |

associ | impos | want | special | men | voluntari | procedur |

shot | sex | firearm | conspiraci | robberi | premedit | expert |

Coherence Score $\mathcal{C}$ per Topic: | ||||||

1112.94 | 1388.307 | 1290.817 | 921.023 | 1109.453 | 1123.185 | 1090.895 |

Averaged Coherence Score ${\mathcal{C}}_{\mathrm{avg}}$: 1148.089 | ||||||

GSSNMF Results ($\lambda =0.3,\mu =0.006$) | ||||||

Topic 1 | Topic 2 | Topic 3 | Topic 4 | Topic 5 | Topic 6 | Topic 7 |

murder | instruct | detect | identif | gang | gang | burglari |

accomplic | manslaught | phone | eyewit | member | member | strike |

corrobor | lesser | probat | photo | crip | expert | sexual |

vehicl | murder | waiver | lineup | associ | beer | robberi |

robberi | self | plea | suggest | expert | hispan | impos |

shot | passion | interrog | suspect | activ | shot | discret |

abet | theori | confess | photograph | intent | estrada | punish |

intent | voluntari | interview | pack | premedit | men | sex |

degre | heat | admiss | procedur | prove | tattoo | feloni |

hous | spont | transcript | reliabl | firearm | male | threat |

Coherence Score $\mathcal{C}$ per Topic: | ||||||

1215.826 | 1024.617 | 1188.975 | 1184.333 | 1180.321 | 1084.33 | 1281.146 |

Averaged Coherence Score ${\mathcal{C}}_{\mathrm{avg}}$: 1165.65 |

Rank | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|

${\mathcal{C}}_{\mathrm{avg}}$ | Classical NMF | 1247.219 | 1181.154 | 1133.603 | 1070.618 |

Guided NMF | 1227.161 | 1167.535 | 1112.29 | 1058.728 | |

GSSNMF | 1238.868 | 1181.253 | 1118.944 | 1065.837 | |

Best GSSNMF Parameters | $\mu =0.016$ | $\mu =0.006$ | $\mu =0.014$ | $\mu =0.01$ | |

$\lambda =0.2$ | $\lambda =0.3$ | $\lambda =0.15$ | $\lambda =0.2$ |

Top 10 Words from Selected TS-NMF Topics | |||
---|---|---|---|

Topic 1 | Topic 2 | Topic 3 | Topic 4 |

gang | gang | gang | accomplic |

member | estrada | member | gang |

instruct | expert | circumstanti | robberi |

expert | member | premedit | instruct |

assault | tattoo | murder | corrobor |

beer | identif | instruct | identif |

object | opin | deliber | intent |

intent | territori | accomplic | special |

injuri | primari | intent | feloni |

men | activ | shot | abet |

Class | Topics | Seed Words |
---|---|---|

Computers | comp.graphics, comp.sys.mac.hardware | graphics, hardware |

Science | sci.crypt, sci.med, sci.space | cryptography, medical, space |

Politics | talk.politics.guns | guns |

Religion | talk.religion.misc | god |

Recreation | rec.motorcycles, rec.sport.baseball | motorcycle, baseball |

**Table 5.**Coherence of topics generated by Classical NMF, Guided NMF, TS-NMF, and GSSNMF for 20 Newsgroups data.

Rank | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|

${\mathcal{C}}_{\mathrm{avg}}$ | Classical NMF | 980.860 | 940.967 | 874.404 | 832.409 |

Guided NMF | 858.361 | 798.737 | 741.057 | 714.796 | |

TS-NMF | - | - | - | 856.786 | |

GSSNMF | 984.443 | 942.678 | 881.626 | 843.399 | |

Best GSSNMF Parameters | $\mu =0.0085$ $\lambda =0.1$ | $\mu =0.012$ $\lambda =0.3$ | $\mu =0.012$ $\lambda =0.3$ | $\mu =0.0001$ $\lambda =0.5$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, P.; Tseng, C.; Zheng, Y.; Chew, J.A.; Huang, L.; Jarman, B.; Needell, D. Guided Semi-Supervised Non-Negative Matrix Factorization. *Algorithms* **2022**, *15*, 136.
https://doi.org/10.3390/a15050136

**AMA Style**

Li P, Tseng C, Zheng Y, Chew JA, Huang L, Jarman B, Needell D. Guided Semi-Supervised Non-Negative Matrix Factorization. *Algorithms*. 2022; 15(5):136.
https://doi.org/10.3390/a15050136

**Chicago/Turabian Style**

Li, Pengyu, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, and Deanna Needell. 2022. "Guided Semi-Supervised Non-Negative Matrix Factorization" *Algorithms* 15, no. 5: 136.
https://doi.org/10.3390/a15050136