# Continuous Semi-Supervised Nonnegative Matrix Factorization

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Relation to Current Work and Contributions

## 3. Model

#### 3.1. Formulation

**Remark**

**1**

**Remark**

**2**

#### 3.2. Theoretical Results

**Proposition**

**1**

**Theorem**

**1**

**Remark**

**3.**

**Proof of Proposition**

**1.**

**Proof of Theorem**

**1.**

#### 3.3. Algorithm

Algorithm 1: Overall CSSNMF algorithm.
| |||||

Input :A matrix $X\in {\mathbb{R}}_{\ge 0}^{n\times m}$, | |||||

a vector $Y\in {\mathbb{R}}^{n\times 1}$, | |||||

a positive integer $r\in \mathbb{N}$, | |||||

a scalar $\lambda \ge 0$, | |||||

a relative error tolerance $\tau >0$, and | |||||

a maximum number of iterations $maxIter$. | |||||

Output:Minimizers of Equations (6)–(9): nonnegative matrix $W\in {\mathbb{R}}_{\ge 0}^{n\times r}$, | |||||

nonnegative matrix $H\in {\mathbb{R}}_{\ge 0}^{r\times m}$, and | |||||

vector $\theta \in {\mathbb{R}}^{(r+1)\times 1}.$ | |||||

1 | $relErr=\infty ,err=\infty $ | ||||

2 | Elementwise, $W\sim {Unif([0,\left|\right|X\left|\right|}_{\infty}))$, $H\sim {Unif([0,\left|\right|X\left|\right|}_{\infty}))$, | ||||

$\theta \sim {Unif([0,\left|\right|X\left|\right|}_{\infty})).$ | |||||

3 | $iter=0$ | ||||

4 | while $relErr>\tau $ and $iter<maxIter$ do | ||||

5 | $W\leftarrow newW$ as per Algorithm 2 | ||||

6 | $H\leftarrow newH$ as per Algorithm 3 | ||||

7 | $\theta \leftarrow new\theta $ as per Algorithm 4 | ||||

8 | Normalize W, H, and $\theta $ as per Algorithm 5 | ||||

9 | $errTemp={F}^{(\lambda )}(W,H,\theta ;X,Y)$ | ||||

10 | if $err<\infty $ then | ||||

11 | $relErr\leftarrow |err-errTemp|/err$ | ||||

12 | end if | ||||

13 | $err\leftarrow errTemp$ | ||||

14 | $iter\leftarrow iter+1$ | ||||

15 | end while$\phantom{(}$ | ||||

16 | return $W,H,\theta $ |

Algorithm 2: Updating W. |

Algorithm 3: Updating H. |

Algorithm 4: Updating $\theta $. | ||

Input | :A vector $Y\in {\mathbb{R}}^{n\times 1}$, and | |

a matrix $W\in {\mathbb{R}}_{\ge 0}^{n\times r}$ | ||

Output | :A new value for $\theta $. | |

1 | $e={(1,1,\dots .,1)}^{\mathrm{T}}\in {\mathbb{R}}^{n\times 1}$ | |

2 | $\overline{W}=\left[\begin{array}{ccc}e& |& W\end{array}\right]$ | |

3 | return ${\overline{W}}^{+}Y$ |

Algorithm 5: Normalization process. | ||

Input | :A matrix $W\in {\mathbb{R}}_{\ge 0}^{n\times r}$, | |

a matrix $H\in {\mathbb{R}}_{\ge 0}^{r\times m}$, and | ||

a vector $\theta \in {\mathbb{R}}^{(r+1)\times 1}.$ | ||

Output | :New values for W, H, and $\theta $. | |

1 | $S\in {\mathbb{R}}_{\ge 0}^{r\times 1}$ a vector of row sums of H. | |

2 | $S\leftarrow \mathrm{diag}(S)$ | |

3 | $W\leftarrow WS$. | |

4 | $H\leftarrow {S}^{-1}H$. | |

5 | ${\theta}_{2:(r+1)}\leftarrow {S}^{-1}{\theta}_{2:(r+1)}.$ | |

6 | return W, H, and $\theta $. |

**W and H fixed.**

**$\mathbf{W}$ and $\theta $ fixed.**

**$\mathbf{H}$ and $\theta $ fixed.**

Algorithm 6: Prediction process. | ||

Input | :A matrix $H\in {\mathbb{R}}_{\ge 0}^{r\times m}$, | |

a vector $\theta \in {\mathbb{R}}^{(r+1)\times 1}$, | ||

and a vector $x\in {\mathbb{R}}^{1\times m}$. | ||

Output | :Model prediction for response variable, $\widehat{y}$. | |

1 | Compute $w=\mathrm{arg}\phantom{\rule{0.166667em}{0ex}}{\mathrm{min}}_{w\in {\mathbb{R}}_{\ge 0}^{1\times r}}\left|\right|wH-{x\left|\right|}^{2}$. | |

2 | Compute $\widehat{y}={\theta}_{1}+w{\theta}_{2:(r+1)}$. | |

3 | return $\widehat{y}$ |

## 4. Synthetic Datasets

#### 4.1. Generating Synthetic Data

- We fix values of $n=100,m=40$, $M=20$, and $r=4.$
- We then define ${\eta}_{x}={\eta}_{y}=4.$
- We pick $X\in {\mathbb{R}}^{n\times r}$ such that each entry is $\sim Unif([0,M))$. We likewise choose $H\in {\mathbb{R}}^{r\times m}.$
- We set $X=WH$.
- We pick $\theta \in {\mathbb{R}}^{(r+1)\times 1}$ such that each element is $\sim Unif([-M/2,M/2)).$
- We set $Y=\overline{W}\theta .$
- We perturb X with noise $\sim {\mathcal{D}}_{X}$ and Y with noise $\sim {\mathcal{D}}_{Y}$.
- Any negative X-entries are set to 0.

- Being elementwise $\sim \mathcal{N}(0,{\eta}_{x}^{2})$ and $\sim \mathcal{N}(0,{\eta}_{y}^{2})$ or
- Being elementwise $\sim Unif([0,{\eta}_{x}))$ and $\sim Unif([0,{\eta}_{y}))$.

#### 4.2. Investigation

## 5. Rate My Professors Dataset

#### 5.1. Pre-Processing

**min_df=0.01**,

**max_df=0.15**,

**stop_words=’english’**,

**norm=‘l1’**,

**lowercase=True**. We found the ratings were not balanced: there were 57 on the interval [1, 2), 235 on the interval $[2,\phantom{\rule{0.166667em}{0ex}}3)$, 494 on the interval [3, 4), and 629 on the interval [4, 5]. To balance the dataset, we extracted only a random subset of 57 reviews in each interval (all ratings on $[1,\phantom{\rule{0.166667em}{0ex}}2)$ were used). Overall, we obtained a corpus matrix X that was $228\times 1635$. The open right-end of the intervals ensures data are not duplicated.

#### 5.2. Choice of Topic Number and Regression Weight

#### 5.3. Prediction

#### 5.4. Topics Identified

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Chen, Y.; Zhang, H.; Liu, R.; Ye, Z.; Lin, J. Experimental explorations on short text topic mining between LDA and NMF based Schemes. Knowl.-Based Syst.
**2019**, 163, 1–13. [Google Scholar] [CrossRef] - Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature
**1999**, 401, 788–791. [Google Scholar] [CrossRef] [PubMed] - Lao, H.; Zhang, X. Regression and Classification of Alzheimer’s Disease Diagnosis Using NMF-TDNet Features From 3D Brain MR Image. IEEE J. Biomed. Health Inform.
**2021**, 26, 1103–1115. [Google Scholar] [CrossRef] [PubMed] - Lai, Y.; Hayashida, M.; Akutsu, T. Survival analysis by penalized regression and matrix factorization. Sci. World J.
**2013**, 2013. [Google Scholar] [CrossRef] [Green Version] - Stewart, G.W. On the early history of the singular value decomposition. SIAM Rev.
**1993**, 35, 551–566. [Google Scholar] [CrossRef] [Green Version] - Shahnaz, F.; Berry, M.W.; Pauca, V.P.; Plemmons, R.J. Document clustering using nonnegative matrix factorization. Inf. Process. Manag.
**2006**, 42, 373–386. [Google Scholar] [CrossRef] - Joyce, J.M. Kullback-leibler divergence. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 720–722. [Google Scholar]
- Marler, R.T.; Arora, J.S. The weighted sum method for multi-objective optimization: New insights. Struct. Multidiscip. Optim.
**2010**, 41, 853–862. [Google Scholar] [CrossRef] - Freijeiro-González, L.; Febrero-Bande, M.; González-Manteiga, W. A critical review of LASSO and its derivatives for variable selection under dependence among covariates. Int. Stat. Rev.
**2022**, 90, 118–145. [Google Scholar] [CrossRef] - Austin, W.; Anderson, D.; Ghosh, J. Fully supervised non-negative matrix factorization for feature extraction. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 5772–5775. [Google Scholar]
- Zhu, W.; Yan, Y. Joint linear regression and nonnegative matrix factorization based on self-organized graph for image clustering and classification. IEEE Access
**2018**, 6, 38820–38834. [Google Scholar] [CrossRef] - Haddock, J.; Kassab, L.; Li, S.; Kryshchenko, A.; Grotheer, R.; Sizikova, E.; Wang, C.; Merkh, T.; Madushani, R.; Ahn, M.; et al. Semi-supervised Nonnegative Matrix Factorization for Document Classification. In Proceedings of the 2021 55th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 31 October–3 November 2021; pp. 1355–1360. [Google Scholar]
- Li, P.; Tseng, C.; Zheng, Y.; Chew, J.A.; Huang, L.; Jarman, B.; Needell, D. Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents. Algorithms
**2022**, 15, 136. [Google Scholar] [CrossRef] - Rate My Professors. Available online: https://www.ratemyprofessors.com/ (accessed on 17 February 2023).
- He, J. Big Data Set from RateMyProfessor.com for Professors’ Teaching Evaluation. 2020. Available online: https://doi.org/10.17632/fvtfjyvw7d.2 (accessed on 21 February 2023).
- Kim, H.; Park, H. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl.
**2008**, 30, 713–730. [Google Scholar] [CrossRef] [Green Version] - Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst.
**2000**, 13. [Google Scholar] - Klein, C.A.; Huang, C.H. Review of pseudoinverse control for use with kinematically redundant manipulators. IEEE Trans. Syst. Man Cybern.
**1983**, 2, 245–250. [Google Scholar] [CrossRef] - Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature
**2020**, 585, 357–362. [Google Scholar] [CrossRef] - Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods
**2020**, 17, 261–272. [Google Scholar] [CrossRef] [Green Version] - scipy.optimize.nnls. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html (accessed on 17 February 2023).
- Bro, R.; De Jong, S. A fast non-negativity-constrained least squares algorithm. J. Chemom. J. Chemom. Soc.
**1997**, 11, 393–401. [Google Scholar] [CrossRef] - Luo, Y.; Duraiswami, R. Efficient parallel nonnegative least squares on multicore architectures. SIAM J. Sci. Comput.
**2011**, 33, 2848–2863. [Google Scholar] [CrossRef] - Berry, M.W.; Browne, M.; Langville, A.N.; Pauca, V.P.; Plemmons, R.J. Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal.
**2007**, 52, 155–173. [Google Scholar] [CrossRef] [Green Version] - Joachims, T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report, Carnegie-Mellon Univ Pittsburgh pa Dept of Computer Science. 1996. Available online: https://apps.dtic.mil/sti/citations/ADA307731 (accessed on 21 February 2023).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Bleske-Rechek, A.; Fritsch, A. Student Consensus on RateMyProfessors Com. Pract. Assess. Res. Eval.
**2011**, 16, 18. [Google Scholar] - Hartman, K.B.; Hunt, J.B. What ratemyprofessors. com reveals about how and why students evaluate their professors: A glimpse into the student mind-set. Mark. Educ. Rev.
**2013**, 23, 151–162. [Google Scholar] - Moon, G.E.; Ellis, J.A.; Sukumaran-Rajam, A.; Parthasarathy, S.; Sadayappan, P. ALO-NMF: Accelerated locality-optimized non-negative matrix factorization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1758–1767. [Google Scholar]

**Figure 1.**Illustration of decreasing objective function at $r=3$ topics for $\lambda \in \{0\}\cup \{{10}^{i/2}|i\in \mathbb{Z}\cap [-2,2]$}.

**Figure 2.**Regression errors with varying regression weight $\lambda $ with different numbers of topics r for Gaussian noise (

**a**–

**g**) and uniform noise (

**h**–

**n**). The $\lambda $ values used are the set $\{0\}\cup \{{10}^{i/2}|i\in \mathbb{Z}\cap [-8,8]\}.$ For each $\lambda $ and r, fifty trials were run and the regression errors corresponding to the best overall objective function ${F}^{(\lambda )}$ were recorded. Points with $\lambda >0$ for which the regression error exceeds $1.5$ times the regression error at $\lambda =0$ are not displayed. The dashed horizontal line is the estimated minimal mean regression error. The dashed vertical line is the transition point between a linear and logarithmic x-scale.

**Figure 3.**Regression errors at various noise levels $\eta $ with varying regression weight $\lambda $. The true number of topics is $r=4$. The top row, (

**a**–

**c**), depicts the training errors, and the bottom row, (

**d**–

**f**), depicts the testing errors. The first column, (

**a**,

**d**), is for a low rank approximation $r=3$, the second column, (

**b**,

**e**), is for an approximation of correct rank $r=4$, and the third column, (

**c**,

**f**), is for when the number of topics used $r=5$ is larger than the true number of topics. For each $\lambda $, $\eta $, and r, fifty trials were run, and the regression errors corresponding to the best overall objective function ${F}^{(\lambda )}$ were recorded. Points with $\lambda >0$ for which the regression error exceeds $1.5$ times the regression error at $\lambda =0$ are not displayed. The dashed vertical line is the transition point between a linear and logarithmic x-scale.

**Figure 4.**Errors in training and validation on Rate My Professor dataset with $r=11$ topics. Points with $\lambda >0$ for which the regression error exceeds $1.5$ times the regression error at $\lambda =0$ are not displayed. The dashed vertical line is the transition point between a linear and logarithmic x-scale.

**Figure 5.**Histograms of the predicted mean rating for various ranges of true ratings ([1, 2] in (

**a**), [2, 3] in (

**b**), [3, 4] in (

**c**), and [4, 5] in (

**d**)). The vertical dashed lines represent the mean values. The predicted and true means are as follows: $2.206$ and $1.543$ for ratings in [1, 2], $3.233$ and $2.529$ for ratings in [2, 3], $3.594$ and $3.593$ for ratings in [3, 4] (the lines are indistinguishable), and $4.576$ and $4.494$ for ratings in [4, 5].

**Figure 7.**Topics with positive $\theta $-weights. The $\theta $-weight is given as the topic weight. The strength of each word is given numerically beside each of the top 10 words.

**Figure 8.**Topics with negative $\theta -$weights. The $\theta -$weight is given as the topic weight. The strength of each word is given numerically beside each of the top 10 words.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lindstrom, M.R.; Ding, X.; Liu, F.; Somayajula, A.; Needell, D.
Continuous Semi-Supervised Nonnegative Matrix Factorization. *Algorithms* **2023**, *16*, 187.
https://doi.org/10.3390/a16040187

**AMA Style**

Lindstrom MR, Ding X, Liu F, Somayajula A, Needell D.
Continuous Semi-Supervised Nonnegative Matrix Factorization. *Algorithms*. 2023; 16(4):187.
https://doi.org/10.3390/a16040187

**Chicago/Turabian Style**

Lindstrom, Michael R., Xiaofu Ding, Feng Liu, Anand Somayajula, and Deanna Needell.
2023. "Continuous Semi-Supervised Nonnegative Matrix Factorization" *Algorithms* 16, no. 4: 187.
https://doi.org/10.3390/a16040187