Next Article in Journal
Acknowledgment to Reviewers of MAKE in 2020
Next Article in Special Issue
Property Checking with Interpretable Error Characterization for Recurrent Neural Networks
Previous Article in Journal / Special Issue
Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

1
Fraunhofer IAIS, 53757 Sankt Augustin, Germany
2
Department of Computer Science, University of Bonn, 53113 Bonn, Germany
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Mach. Learn. Knowl. Extr. 2021, 3(1), 123-167; https://doi.org/10.3390/make3010007
Submission received: 30 November 2020 / Revised: 8 January 2021 / Accepted: 13 January 2021 / Published: 19 January 2021
(This article belongs to the Special Issue Selected Papers from CD-MAKE 2020 and ARES 2020)

Abstract

:
Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.

1. Introduction

Matrix factorization methods have always been a staple in many natural language processing (NLP) tasks. Factorizing a matrix of word co-occurrences can create both low-dimensional representations of the vocabulary, so-called word embeddings [1,2], that carry semantic and topical meaning within them, as well as representations of meaning that go beyond single words to latent topics.
Decomposition into Directional Components (DEDICOM) is a matrix factorization technique that factorizes a square, possibly asymmetric, matrix of relationships between items into a loading matrix of low-dimensional representations of each item and an affinity matrix describing the relationships between the dimensions of the latent representation (see Figure 1 for an illustration).
We introduce a modified row-stochastic variation of DEDICOM, which allows for interpretable loading vectors and apply it to different matrices of word co-occurrence statistics created from Wikipedia based semi-artificial text documents. Our algorithm produces low-dimensional word embeddings, where one can interpret each latent factor as a topic that clusters words into meaningful categories. Hence, we show that row-stochastic DEDICOM successfully combines the task of learning interpretable word embeddings and extracting representative topics.
We further derive a similar model for factorization of three-dimensional data tensors, which represent word co-occurrence statistics for text corpora with intrinsic structure that allows for some separation of the corpus into subsets (e.g., a news corpus structered by time).
An interesting aspect of this type of factorization is the interpretability of the affinity matrix. An entry in the matrix directly describes the relationship between the topics of the respective row and column and one can therefore use this tool to extract topics that a certain text corpus deals with and analyze how these topics are connected in the given text.
In this work we first describe the aforementioned DEDICOM algorithm and provide details on the modified row-stochasticity constraint and on optimization. We further expand our model to factorize three-dimensional tensors and introduce a multiplicative update rule that facilitates the training procedure. We then present results of various experiments on both semi-artificial text documents (combinations of Wikipedia articles) and real text documents (movie reviews and news articles) that show how our approach is able to capture hidden latent topics within text corpora, cluster words in a meaningful way and find relationships between these topics within the documents.
This paper is an extension of previous work [3]. In addition to the algorithms and experiments described there, we here add the extension of the DEDICOM algorithm to three-dimensional tensors, introduce a multiplicative update rule to increase training stability and present new experiments on two additional text corpora (Amazon reviews and New York Times news articles).

2. Related Work

Matrix factorization describes the task of compressing the most relevant information from a high-dimensional input matrix into multiple low-dimensional factor matrices, with either approximate or exact input reconstruction (see for example [4] for a theoretical overview of common methods and their applications). In this work we consider the DEDICOM algorithm, which has a long history of providing an interpretable matrix or tensor factorization, mostly for rather low-dimensional tasks.
First described in [5], it since has been applied to analysis of social networks [6], email correspondence [7] and video game player behavior [8,9]. DEDICOM also has successfully been employed to NLP tasks such as part of speech tagging [10], however to the best of our knowledge we provide the first implementation of DEDICOM for simultaneous word embedding learning and topic modeling.
Many works deal with the task of putting constraints on the factor matrices of the DEDICOM algorithm. In [7,8], the authors constrain the affinity matrix R to be non-negative, which aids interpretability and improves convergence behavior if the matrix to be factorized is non-negative. However, their approach relies on the Kronecker product between matrices in the update step, solving a linear system of n 2 × k 2 , where n denotes the number of items in the input matrix and k the number of latent factors. These dimensions make the application on text data, where n describes the number of words in the vocabulary, a computationally futile task. Constraints on the loading matrix, A, include non-negativity as well (see [7]) or column-orthogonality as in [8].
In contrast, we propose a new modified row-stochasticity constraint on A, which is tailored to generate interpretable word embeddings that carry semantic meaning and represent a probability distribution over latent topics.
The DEDICOM algorithm has previously been applied to tensor data as well, for example in [11], in which the authors apply the algorithm on general multirelational data by computing an exact solution for the affinity matrix. Both [6,7] explore a slight variation of our tensor DEDICOM approach to analyze relations in email data and [12] apply a similar model on non-square input tensors.
Previous matrix factorization based methods in the NLP context mostly dealt with either word embedding learning or topic modeling, but not with both tasks combined.
For word embeddings, the GloVe [2] model factorizes an adjusted co-occurrence matrix into two matrices of the same dimension. The work is based on a large text corpus with a vocabulary of n 400,000 and produces word embeddings of dimension k = 300 . In order to maximize performance on the word analogy task, the authors adjusted the co-occurrence matrix to the logarithmized co-occurrence matrix and added bias terms to the optimization objective.
A model conceived around the same time, word2vec [13], calculates word embeddings not from a co-occurrence matrix but directly from the text corpus using the skip-gram or continuous-bag-of-words approach. More recent work [1] has shown that this construction is equivalent to matrix factorization on the pointwise mutual information (PMI) matrix of the text corpus, which makes it very similar to the glove model described above.
Both models achieve impressive results on word embedding related tasks like word analogy, however the large dimensionality of the word embeddings makes interpreting the latent factors of the embeddings impossible.
On the topic modeling side, matrix factorization methods are routinely applied as well. Popular algorithms like non-negative matrix factorization (NMF) [14], singular value decomposition (SVD) [15,16] and principal component analysis (PCA) [17] compete against the probabilistic latent dirichlet allocation (LDA) [18] to cluster the vocabulary of a word co-occurrence or document-term matrix into latent topics. (More recent expansions of these methods can be found in [19,20].) Yet, we empirically show that the implicitly learned word embeddings of these methods lack semantic meaning in terms of the cosine similarity measure.
We benchmark our approach qualitatively against these methods in Section 4.3 and the Appendix A and Appendix B.

3. Constrained DEDICOM Models

In this section we provide a detailed theoretical view at different constrained DEDICOM algorithms utilized for factorizing word co-occurrence based positive pointwise mutual information matrices and tensors.
We first consider the case of a two-dimensional input matrix S (see Figure 1a) in Section 3.1. We then present an extension of the algorithm for three-dimension input tensors S (see Figure 1b) in Section 3.2. Finally we derive a multiplicative update rule for non-negative tensor DEDICOM.

3.1. The Row-Stochastic DEDICOM Model for Matrices

For a given language corpus consisting of n unique words X = x 1 , , x n we calculate a co-occurrence matrix W n × n by iterating over the corpus on a word token level with a sliding context window of specified size. Then
W i j = # word i appears in context of word j .
Note that the word context window can be applied symmetrically or asymmetrically around each word. We choose a symmetric context window, which implies a symmetric co-occurrence matrix, W i j = W j i .
We then transform the co-occurrence matrix into the pointwise mutual information matrix (PMI), which normalizes the counts in order to extract meaningful co-occurrences from the matrix. Co-occurrences of words that occur regularly in the corpus are decreased since their appearance together might be nothing more than a statistical phenomenon, the co-occurrence of words that appear less often in the corpus give us meaningful information about the relations between words and topics. We define the PMI matrix as
PMI i j   : = log W i j + log N log N i log N j
where N   : = i j = 1 n W i j is the sum of all co-occurrence counts of W, N i   : = j = 1 n W i j the row sum and N j   : = i = 1 n W i j the column sum.
Since the co-occurrence matrix W is symmetrical, the transformed PMI matrix is symmetrical as well. Nevertheless, DEDICOM is able to factorize both symmetrical and non-symmetrical matrices. We expand details on symmetrical and non-symmetrical relationships in Section 3.4.
Additionally, we want all entries of the matrix to be non-negative, our final matrix to be factorized is therefore the positive PMI (PPMI)
S i j = PPMI i j = max { 0 , PMI i j } .
Our aim is to decompose this matrix using row-stochastic DEDICOM as
S A R A T , with S i j b = 1 k c = 1 k A i b R b c A j c ,
where A     n × k , R     k × k , A T denotes the transpose of A and k n . Literature often refers to A as the loading matrix and R as the affinity matrix. A gives us for each word i in the vocabulary a vector of size k, the number of latent topics we wish to extract. The square matrix R then provides possibility for interpretation of the relationships between these topics.
Empirical evidence has shown that the algorithm tends to favor columns unevenly, such that a single column receives a lot more weight in its entries than the other columns. We try to balance this behavior by applying a column-wise z-normalization on A, such that all columns have zero mean and unit variance.
In order to aid interpretability we wish each word embedding to be a distribution over all latent topics, i.e., entry A i b in the word-embedding matrix provides information on how much topic b describes word i.
To implement these constraints we therefore apply a row-wise softmax operation over the column-wise z-normalized A matrix by defining A     n × k as
A i b   : = exp ( A ¯ i b ) b = 1 k exp ( A ¯ i b ) , A ¯ i b   : = A i b μ b σ b , μ b   : = 1 n i = 1 n A i b , σ b   : = 1 n i = 1 n A i b μ b 2
and optimizing A for the objective
S A R ( A ) T .
Note that after applying the row-wise softmax operation all entries of A are non-negative.
To judge the quality of the approximation (6) we apply the Frobenius norm, which measures the difference between S and A R A T . The final loss function we optimize our model for is therefore given by
L ( S , A , R ) = S A R ( A ) T F 2
= i = 1 n j = 1 n S i j A R ( A ) T i j 2
with
A R ( A ) T i j = b = 1 k c = 1 k A i b R b c A j c
and A defined in (5).
To optimize the loss function we train both matrices using alternating gradient descent similar to [8]. Within each optimization step we apply
A A f θ ( A , η A ) , where A = L ( S , A , R ) A
R R f θ ( R , η R ) , where R = L ( S , A , R ) R
with η A , η R > 0 being individual learning rates for both matrices and f θ ( · ) representing an arbitrary gradient based update rule with additional hyperparameters θ . For our experiments we employ automatic differentiation methods. For details on the implementation of the algorithm above refer to Section 4.2.

3.2. The Constrained DEDICOM Model for Tensors

In this section we extend the model described above to three-dimensional tensors as input data. As above, the input describes the co-occurrences of vocabulary items in a text corpus. However, we consider additionally structured text: Instead of one matrix describing the entire corpus we unite multiple n × n matrices of co-occurrences into one tensor S ¯ t × n × n . Each of the t slices then consists of an adjusted PPMI matrix for a subset of the text corpus. This structure could originate for instance from different data (e.g., different Wikipedia articles), different topical subsets of the data source (e.g., reviews for different articles) or describe time-slices (e.g., news articles for certain time periods).
To construct the PPMI tensor we again take a vocabulary X = x 1 , , x n over the entire corpus. For each subset l we then calculate a co-occurrence matrix W ¯ l n × n as described above. Stacking these matrices yields the co-occurrence tensor W ¯ t × n × n .
When transforming slice W ¯ l into a PMI matrix we want to apply information from the entire corpus. We therefore do not only calculate the column, the row and the total sums on the corresponding subset but on the entire text corpus. Therefore
PMI ¯ lij : = log W ¯ lij + log N log N i log N j ,
where N   : = l = 1 k i j = 1 n W ¯ lij is the sum of all co-occurrence counts of W ¯ , N i   : = l = 1 k j = 1 n W ¯ lij the row sum and N j   : = l = 1 k i = 1 n W ¯ l i j the column sum.
Finally we define the positive pointwise mutual information tensor as
S ¯ lij = PPMI ¯ lij = max { 0 , PMI ¯ lij } .
We decompose this input tensor into a matrix A n × k and a tensor R ¯ t × k × k , such that
S ¯ A R ¯ A T
where we multiply each slice of R ¯ with A and A T to reconstruct the corresponding slice of S ¯ :
S ¯ l i j b = 1 k c = 1 k A i b R ¯ l b c A j c .
We keep our naming convention for A as the loading matrix and R ¯ as the affinity tensor, since again A gives us for each word i in the vocabulary a vector of size k and for each slice l the square matrix R ¯ l   : = ( R ¯ l i j ) i , j = 1 k provides information on the relationships between the topics in the l-th input slice.
Analogous to (7) we construct a loss function
L ( S ¯ , A , R ¯ ) = S ¯ A R ¯ ( A ) T F 2
= l = 1 t i = 1 n j = 1 n S ¯ l i j A R ¯ l ( A ) T i j 2
= l = 1 t L ( S ¯ l , A , R ¯ l )
with
A R ¯ l ( A ) T i j = b = 1 k c = 1 k A i b R ¯ l b c A j c .
Note that in this framework, the DEDICOM algorithm described in the previous section is equivalent to tensor DEDICOM with t = 1 .
Update steps can then be taken via alternating gradient descent on A and R ¯ . As in the previous section, one can now add additional constraints to A and R ¯ and calculate the gradients as in (10), using automatic differentiation methods. Taking update steps of size η A and η R ¯ respectively leads to an eventual convergence to some local or global minimum of the loss (16) with respect to the original or constrained A and R ¯ .
Alternatively, constraints can be added to A and R ¯ by methods like projected gradient descent and the Frank-Wolfe algorithm [21] which either adjust the respective matrix or tensor to be constrained after the gradient step or apply the general gradient step such that the matrix or tensor never leaves the constrained area.
However, empirical results show that automatic differentiation methods lead to slow and unstable training convergence and worse qualitative results when applying the mentioned constraints on the factor matrices and tensors. We therefore derive an alternative method of applying alternating gradient descent to A and R ¯ based on multiplicative update rules. This does not only improve training stability and convergence behavior but also lead to better qualitative results (see Section 4.3 and Figure 2).
We derive the gradients for A and R analytically and set the learning rates η A and η R ¯ individually for each element ( i , j ) as η i j A for matrix A and for each element ( l , i , j ) as η l i j R ¯ for tensor R ¯ , such that the resulting update step is an element-wise multiplication of the respective matrix or tensor.
We derive the updates for the matrix algorithm first and later extend them for the tensor case. For detailed derivations refer to Appendix B. For A we derive the gradient analytically as
L ( S , A , R ) A = 2 S A R T + S T A R A R A T A R T + R T A T A R .
Therefore the update step is
A i j A i j + η i j A 2 S A R T + S T A R i j A R A T A R T + R T A T A R i j .
If we now chose η A as
η i j A   : = A i j 2 [ A ( R T A T A R + R A T A R T ) ] i j .
the update (21) becomes
A i j A i j [ S T A R + S A R T ] i j [ A ( R T A T A R + R A T A R T ) ] i j .
For R we derive the gradient analytically as
L ( S , A , R ) R = 2 ( A T S A A T A R A T A ) .
Therefore the update step is
R i j = R i j + η i j R 2 ( [ A T S A ] i j [ A T A R A T A ] i j ) .
Choose
η i j R   : = R i j 2 [ A T A R A T A ] i j ,
and the update (25) becomes
R i j R i j [ A T S A ] i j [ A T A R A T A ] i j .
Since S i j 0 for all i , j , in both (23) and (27) each element of the multiplier matrix is positive if both A 0 and R 0 in all entries. Therefore, initializing both matrices with positive values results in an update step that keeps the elements of A and R positive.
To extend this rule to tensor DEDICOM, consider that the analytical derivatives hold for R ¯ and A by considering each slice S ¯ l and R ¯ l individually:
L ( S ¯ l , A , R ¯ l ) R ¯ l = 2 ( A T S ¯ l A A T A R ¯ l A T A ) ,
L ( S ¯ l , A , R ¯ l ) A = 2 ( S ¯ l T A R l + S ¯ l A R ¯ l T A ( R ¯ l T A T A R ¯ l + R ¯ l A T A R ¯ l T ) ) .
Since by (18) we have L ( S ¯ , A , R ¯ ) = l = 1 t L ( S ¯ l , A , R ¯ l ) we can derive the full gradients as
L ( S ¯ , A , R ¯ ) R ¯ = ( A T S ¯ A A T A R ¯ A T A ) ,
L ( S ¯ , A , R ¯ ) A = l = 1 t 2 ( S ¯ l T A R ¯ l + S ¯ l A R ¯ l T A ( R ¯ l T A T A R ¯ l + R ¯ l A T A R ¯ l T ) ) .
For A we set η A as
η i j A   : = A i j 2 l = 1 t [ A ( R ¯ l T A T A R ¯ l + R ¯ l A T A R ¯ l T ) ] i j .
Then the update step is
A i j A i j η i j A L ( S ¯ , A , R ¯ ) A
= A i j l = 1 t [ S ¯ l T A R ¯ l + S ¯ l A R ¯ l T ] l = 1 t [ A ( R ¯ l T A T A R ¯ l + R ¯ l A T A R ¯ l T ) ] i j .
For R ¯ we again set
η l i j R ¯   : = R ¯ l i j 2 [ A T A R ¯ l A T A ] i j ,
and the update (25) becomes
R ¯ l i j R ¯ l i j [ A T S ¯ l A ] i j [ A T A R ¯ l A T A ] i j .
Equations (23) and (27) provide multiplicative update rules that ensure the non-negativity of A and R without any additional constraints. Equations (33) and (36) provide the corresponding rules for matrix A and tensor R ¯ in tensor DEDICOM.

3.3. On Symmetry

The DEDICOM algorithm is able to factorize both symmetrical and asymmetrical matrices S. For a given matrix A, the symmetry of R dictates the symmetry of the product A R A T , since
( A R A T ) i j = b = 1 k c = 1 k A i b R b c A j c = b = 1 k c = 1 k A i b R c b A j c
= c = 1 k b = 1 k A j c R c b A i b = ( A R A T ) j i
if R c b = R b c for all b , c . We therefore expect a symmetric matrix S to be decomposed into A R A T with a symmetric R, which is confirmed by our experiments. Factorizing a non-symmetric matrix leads to a non-symmetric R, the asymmetric relation between items leads to asymmetric relations between the latent factors. The same relations hold for each slice S ¯ l and R ¯ l in tensor DEDICOM.

3.4. On Interpretability

We have
S i j b = 1 k c = 1 k A i b R b c A j c ,
i.e., we can estimate the probability of co-occurrence of two words w i and w j from the word embeddings A i and A j and the matrix R, where A i denotes the i-th row of A.
If we want to predict the co-occurrence between words w i and w j we consider the latent topics that make up the word embeddings A i and A j , and sum up each component from A i with each component A j with respect to the relationship weights given in R.
Two words are likely to have a high co-occurrence if their word embeddings have larger weights in topics that are positively connected by the R matrix. Likewise a negative entry R b , c makes it less likely for words with high weight in the topics b and c to occur in the same context. See Figure 3 for an illustrated example.
Having an interpretable embedding model provides value beyond analysis of the affinity matrix of a single document. The worth of word embeddings is generally measured in their usefulness for downstream tasks. Given a prediction model based on word embeddings as one of the inputs, further analysis of the model behavior is facilitated when latent input dimensions easily translate to semantic meaning.
In most word embedding models, the embedding vector of a single word is not particularly useful in itself. The information only lies in its relationship (i.e., closeness or cosine similarity) to other embedding vectors. For example, an analysis of the change of word embeddings and therefore the change of word meaning within a document corpus (for example a news article corpus) can only show how various words form different clusters or drift apart over time. Interpretabilty of latent dimensions would provide tools to also consider the development of single words within the given topics.
All considerations above hold for the three-dimensional tensor case, in which we analyze a slice R ¯ l together with the common word embedding matrix A to gain insight into the input data slice S ¯ l .

4. Experiments and Results

In the following section we describe our experimental setup in full detail (Our Python implementation to reproduce the results is available on https://github.com/LarsHill/text-dedicom-paper. Additionally, we provide a snapshot of our versions of the applied public datasets (Wikipedia articles and Amazon reviews). ) and present our results on the simultaneous topic (relation) extraction and word embedding learning task. We compare these results against competing matrix and tensor factorization methods for topic modeling, namely NMF (including a Tucker-2 variation compatible with tensors), LDA and SVD.

4.1. Data

We conducted our experiments on three orthogonal text datasets which cover different text domains and allow for a thorough empirical analysis of our proposed methods.
The first corpus leveraged triplets of individual Wikipedia articles. The articles were retrieved as raw text via the official Wikipedia API using the wikipedia-api library. We differentiated between thematically similar (e.g., “Dolphin” and “Whale”) and thematically different articles (e.g., “Soccer” and “Donald Trump”). Each article triplet was categorized into one of three classes: All underlying Wikipedia articles were thematically different, two articles were thematically similar and one was different, and all articles were thematically similar. The previous paper [3] contained an extensive evaluation over 12 triples of articles in the supplementary material. In this work we focused on the three triples described in the previous main paper, namely
  • “Soccer”, “Bee”, “Johnny Depp”,
  • “Dolphin”, “Shark”, “Whale”, and
  • “Soccer”, “Tennis”, “Rugby”.
Depending on whether the article triplets were represented as input matrix or tensor they were processed differently. In the case of a matrix input all three articles got concatenated to form a new artificially generated document. In the case of a tensor input the articles remained individual documents later which later represented slices in the tensor representation.
To analyze the topic extraction capability of constrained DEDICOM also on text which was rather prone to grammatical and syntactical errors, we utilized a subset of the Amazon review dataset [22]. In particular, we restricted ourselves to the “movie” product category and created a corpus consisting of six text documents holding the concatenated reviews from the following films respectively, “Toy Story 1”, “Toy Story 3”, “Frozen”, “Monsters, Inc.”, “Kung Fu Panda” and “Kung Fu Panda 2”. Grouping the reviews by movie affiliation enabled us to generate a tensor representation of the corpus which we factorized via non-negative tensor DEDICOM to analyze topic relations across movies. Table 1 lists the number of reviews per movie and shows that based on review count “Kung Fu Panda 1” was the most popular among the six films.
The third corpus represented a complete collection of New York Times news articles ranging from 1st September 2019 to 31st August 2020. The articles were taken from the New York Times website and covered a wide range of sections (see Table 2).
Instead of grouping the articles by section we binned and concatenated them by month yielding 12 news documents containing monthly information (see Table 3 for details on the article count per month). Thereby, the factorization of tensor DEDICOM allowed for an analysis of topic relations and their changes over time.
Before transforming the text documents to matrix or tensor representations we applied the following textual preprocessing steps. First, the whole text was made lower-cased. Second, we tokenized the text making use of the word-tokenizer from the nltk library and removed common English stop words, including contractions such as “you’re” and “we’ll”. Lastly we cleared the text from all remaining punctuation and deleted digits, single characters and multi-spaces (see Table 4 for an overview of corpora statistics after preprocessing).
Next, we utilized all preprocessed documents in a corpus to extract a fixed size vocabulary of n = 10,000 most frequent tokens. Since our dense input tensor was of dimensionality t × n × n , a larger vocabulary size led to a significant increase in memory consumption. Based on the total number of unique corpora words reported in Table 4, a maximum vocabulary size of n = 10,000 was reasonable for the three Wikipedia corpora and the Amazon reviews corpus. Only the New York Times dataset could potentially have benefited from a larger vocabulary size.
Based on this vocabulary a symmetric word co-occurrence matrix was calculated for each of the corpus documents. When generating the matrix we only considered context words within a symmetrical window around the base word. Analysis in [2,3] showed that the window size in the range of 6 to 10 had little impact on performance. Thus, following our implementation in [3], we chose a window size of 7, the default in the original glove implementation. Like in [2], each context word only contributed 1 / d to the total word pair count, given it was d words apart from the base word. To avoid any bias or prior information from the structure and order of the concatenated Wikipedia articles, reviews or news articles, we randomly shuffled the vocabulary before creating the co-occurrence matrix. As described in Section 3 we then transformed the co-occurrence matrix to a positive PMI matrix. If the corpus consisted of just one document the generated PPMI matrix functions as input for the row-stochastic DEDICOM algorithm. If the corpus consisted of several documents (e.g., one news document per month) the individual PPMI matrices were stacked to a tensor which in turn represented the input for the non-negative tensor DEDICOM algorithm.
The next section sheds light upon the training process of row-stochastic DEDICOM, non-negative tensor DEDICOM and the above mentioned competing matrix and tensor factorization methods, which will be benchmarked against our results in Section 4.3 and in the Appendix A and Appendix B.

4.2. Training

As thoroughly outlined in Section 3 we trained both the row-stochastic DEDICOM and non-negative tensor DEDICOM with the alternating gradient descent paradigm.
In the case of a matrix input and a row-stochasticity constraint on A we utilize automatic differentiation from the PyTorch library to perform update steps on A and R. First, we initialized the factor matrices A n × k and R k × k , by randomly sampling all elements from a uniform distribution centered around 1, U ( 0 , 2 ) . Note that after applying the softmax operation on A all rows of A were stochastic. Therefore, scaling R by
s ¯   : = 1 n 2 i j n S i j ,
would result in the initial decomposition A R A T yielding reconstructed elements in the range of s ¯ , the element mean of the PPMI matrix S, and thus, speeding up convergence. Second, A and R were iteratively updated employing the Adam optimizer [23] with constant individual learning rates of η A = 0.001 and η R = 0.01 and hyperparameters β 1 = 0.9 , β 2 = 0.999 and ϵ = 1 × 10 8 . Both learning rates were identified through an exhaustive grid search. We trained for num _ epochs = 15,000 until convergence, where each epoch consisted of an alternating gradient update with respect to A and R. Algorithm 1 illustrates the just described training procedure.
Algorithm 1 The row-stochastic DEDICOM algorithm
1: initialize A , R U ( 0 , 2 ) · s ¯ ⊳See Equation (40) for the definition of s ¯
2: initialize β 1 , β 2 , ϵ ⊳Adam algorithm hyperparameters
3: initialize η A , η R ⊳Individual learning rates  
4: for i in 1 , , num_epochs do
5:    Calculate loss L = L ( S , A , R ) ⊳See Equation (7)
6:     A A Adam β 1 , β 2 , ϵ ( A , η A ) , where A = L A
7:     R R Adam β 1 , β 2 , ϵ ( R , η R ) , where R = L R
8: return A and R, where A = row _ softmax ( clo_norm ( A ) ) ⊳See Equation (5)
In the case of a tensor input and an additional non-negativity constraint on R ¯ we noticed an inferior training performance with automatic differentiation methods. Hence, due to faster and more stable training convergence and improved qualitative results, we updated A and R ¯ iteratively via derived multiplicative update rules enforcing non-negativity. Again, we initialized A n × k and R ¯ t × k × k , by randomly sampling all elements from a uniform distribution centered around 1, U ( 0 , 2 ) . In order to ensure that the initialized components yielded a reconstructed tensor whose elements were in the same range of the input, we calculated an appropriate scaling factor for each tensor slice S ¯ l as
α l   : = s ¯ l k 2 1 3 , where s ¯ l   : = 1 n 2 i j n S l i j .
Next, we scaled A by α ¯ = 1 t l = 1 t α l and each slice R ¯ l by α l before starting the alternating multiplicative update steps for num _ epochs = 300 . The detailed derivation of the update rules is found in Section 3.2 and their iterative application in the training process is described in Algorithm 2.
Algorithm 2 The non-negative tensor DEDICOM algorithm
1: initialize A , R ¯ U ( 0 , 2 )
2: scale A by α ¯ and R ¯ l by α l ⊳See Equation (41) for the definitions of α ¯ and α l
3: for i in 1 , , num_epochs do
4:     Calculate loss L = L ( S ¯ , A , R ¯ ) ⊳See Equation (17)
5:      A i j A i j l = 1 t S ¯ l A R ¯ l T + S ¯ l T A R ¯ l i j A l = 1 t R ¯ l A T A R ¯ l T + R ¯ l T A T A R ¯ l i j
6:       R ¯ l i j R ¯ l i j A T S ¯ l A i j A T A R ¯ l A T A i j
7: return A and R ¯
We implemented NMF, LDA and SVD using the sklearn library. In all cases the learnable factor matrices were initialized randomly and default hyperparameters were applied during training. For NMF the multiplicative update rule from [14] was utilized.
Figure 4 shows the convergence behavior of the row-stochastic matrix DEDICOM training process and the final loss of NMF and SVD. Note that LDA optimized a different loss function, which is why the calculated loss was not comparable and therefore excluded. We see that the final loss of DEDICOM was located just above the other losses, which is reasonable when considering the row stochasticity contraint on A and the reduced parameter amount of n k + k 2 compared to NMF ( 2 n k ) and SVD ( 2 n k + k 2 ).
To also have a benchmark model for our constrained tensor DEDICOM methods to compare against, we implemented a Tucker-2 variation of NMF, named tensor NMF (TNMF), which factorized the input tensor S ¯ as
S ¯ l W ¯ H l .
Its training procedure closely followed the above described alternating gradient descent approach for non-negative tensor DEDICOM. However, due to the two-way factorization (three-way for DEDICOM) the scaling factor α l to properly initialize W ¯ and H had to be adapted to
α l   : = s ¯ l k 1 2 , where s ¯ l   : = 1 n 2 i j n S ¯ l i j .
Analogous to Figure 4, we compared the training stability and convergence speed of our implemented tensor factorization methods. In particular, Figure 2 visualizes the reconstruction loss development for non-negative tensor DEDICOM trained via multiplicative update rules, row-stochastic tensor DEDICOM trained with automatic differentiation and the Adam optimizer and tensor NMF. It could be clearly observed that row-stochastic tensor DEDICOM converged much slower than the other two models trained with multiplicative update rules (learning rates were implicit here and did not have to be tuned).

4.3. Results

In the following, we present our results of training the above mentioned constrained DEDICOM factorizations on different text corpora to simultaneously learn interpretable word embeddings and meaningful topic clusters and their relations.
First, we focused our analysis on row-stochastic matrix DEDICOM applied to the synthetic Wikipedia text documents described in Section 4.1. For compactness reasons we primarily considered the document “Soccer, Bee and Johnny Depp”, set the number of topics to k = 6 and refer to Appendix A.1 for the other article combinations and competing matrix factorization results. Second, we extended our evaluation to the tensor representation of the Wikipedia documents ( t = 3 , one article per tensor slice) and compared the performance of non-negative (multiplicative updates) and row-stochastic (Adam updates) tensor DEDICOM. Lastly, we applied non-negative tensor DEDICOM to the binned Amazon movie and New York Times news corpora to investigate topic relations across movies and over time. We again point the interested reader to Appendix A for additional results and the comparison to tensor NMF.
In the first step, we evaluated the quality of the learned latent topics by assigning each word embedding A i 1 × k to the latent topic dimension that represents the maximum value in A i , e.g.,
A i = 0.05 0.03 0.02 0.14 0.70 0.06 , argmax A i = 5 ,
and thus, A i was matched to Topic 5. Next, we decreasingly sorted the words within each topic based on their matched topic probability. Table 5 shows the overall number of allocated words and the resulting top 10 words per topic together with each matched probability.
Indicated by the high assignment probabilities, one can see that columns 1, 2, 4, 5 and 6 represent distinct topics, which can easily be interpreted. Topic 1 and 4 were related to soccer, where 1 focused on the game mechanics and 4 on the organizational and professional aspect of the game. Topic 2 and 6 clearly referred to Johnny Depp, where 2 focused on his acting career and 6 on his difficult relationship to Amber Heard. The fifth topic obviously related to the insect “bee”. In contrast, Topic 3 did not allow for any interpretation and all assignment probabilities were significantly lower than for the other topics.
Further, we analyzed the relations between the topics by visualizing the trained R matrix as a heatmap (see Figure 5c).
One thing to note was the symmetry of R which was a first indicator of a successful reconstruction, A R A T , (see Section 3.3). In addition, the main diagonal elements were consistently blue (positive), which suggested a high distinction between the topics. Although not very strong one could still see a connection between Topic 2 and 6 indicated by the light blue entry R 26 = R 62 . While the suggested relation between Topic 1 and 4 was not clearly visible, element R 14 = R 41 was the least negative one for Topic 1. In order to visualize the topic cluster quality we utilized Uniform Manifold Approximation ad Projection (UMAP) [24] to map the k-dimensional word embeddings to a 2-dimensional space. Figure 5a illustrates this low-dimensional representation of A , where each word is colored based on the above described word to topic assignment. In conjunction with Table 5 one could nicely see that Topic 2 and 6 (Johnny Depp) and Topic 1 and 4 (Soccer) were close to each other. Hence, Figure 5a implicitly shows the learned topic relations as well.
As an additional benchmark, Figure 5b plots the same 2-dimensional representation, but now each word is colored based on the original Wikipedia article it belonged to. Words that occurred in more than one article were not considered in this plot.
Directly comparing Figure 5a,b shows that row-stochastic DEDICOM did not only recover the original articles but also found entirely new topics, which in this case represented subtopics of the articles. Let us emphasize that for all thematically similar article combinations, the found topics were usually not subtopics of a single article, but rather novel topics that might span across multiple Wikipedia articles (see for example Table A2 in the Appendix A). As mentioned at the top of this section, we are not only interested in learning meaningful topic clusters, but also in training interpretable word embeddings that capture semantic meaning.
Hence, we selected within each topic the two most representative words and calculated the cosine similarity between their word embeddings and all other word embeddings stored in A . Table 6 shows the four nearest neighbors based on cosine similarity for the top two words in each topic. We observed a high thematical similarity between words with large cosine similarity, indicating the usefulness of the rows of A as word embeddings.
In comparison to DEDICOM, other matrix factorization methods also provided a useful clustering of words into topics, with varying degree of granularity and clarity. However, the application of these methods as word embedding algorithms mostly failed on the word similarity task, with words close in cosine similarity seldom sharing the same thematical similarity we have seen in DEDICOM. This can be seen in Table A1, which shows for each method, NMF, LDA and SVD, the resulting word to topic clustering and the cosine nearest neighbors of the top two word embeddings per topic. While the individual topics extracted by NMF looked very reasonable, its word embeddings did not seem to carry any semantic meaning based on cosine similarity; e.g., the four nearest neighbors of “ball” were “invoke”, “replaced”, “scores” and “subdivided”. A similar nonsensical picture can be observed for the other main topic words. LDA and SVD performed slightly better on the similar word task, although not all similar words appeared to be sensible, e.g., “children”, “detective”, “crime”, “magazine” and “barber”. In addition, some topics could not be clearly defined due to mixed word assignments, e.g., Topic 4 for LDA and Topic 1 for SVD.
Before shifting our analysis to the Amazon movie review and the New York Times news corpus we investigated factorizing the tensor representation of the “Soccer, Bee and Johnny Depp” Wikipedia documents. In particular, we compared the qualitative factorization results of row-stochastic and non-negative tensor DEDICOM trained with automatic differentiation and multiplicative update rules, respectively. Table 7 and Table 8 in conjunction with Figure 6 and Figure 7 show the extracted topics and their relations for both methods.
It could be seen that non-negative tensor DEDICOM yielded a more interpretable affinity tensor R ¯ (Figure 7) due to its enforced non-negativity. For example, it clearly highlighted the bee related Topics 1, 3 and 5 in the affinity tensor slice corresponding to the article “Bee”. Moreover, all extracted topics in Table 8 were distinct and their relations were well represented in the individual slices of R ¯ . In contrast, Topic 6 in Table 7 did not represent a meaningful topic, which was also indicated by the low probability scores of the ranked topic words. Although the results of the similar word evaluation were arguably better for row-stochastic tensor DEDICOM (see Table 9 and Table 10) we prioritized topic extraction and relation quality. That is why in the further analysis of the Amazon review and New York Times news corpus we restricedt our evaluation to non-negative tensor DEDICOM.
As described in Section 4.1 our Amazon movie review corpus comprised human written reviews for six famous animation films. Factorizing its PPMI tensor representation with non-negative tensor DEDICOM and the number of topics set to k = 10 revealed not only movie-specific subtopics but also general topics that spanned over several movies. For example, Topics 1, 9 and 10 in Table 11 could uniquely be related to the films “Frozen”, “Toy Story 1” and “Kung Fu Panda 1”, respectively, whereas Topic 5 constituted bonus material on a DVD which held true for all films. The latter could also be seen in Figure 8 where Topic 5 was highlighted in each movie slice (strongly in the top and lightly in the bottom row). In the same sense one could observe that Topic 3 was present in both “Kung Fu Panda 1” and “Kung Fu Panda 2”, which is reasonable considering the topic depicted the general notion of a fearsome warrior.
Figure 9 and Table 12 refer to our experimental results on the dataset of New York Times news articles. We saw a diverse array of topics extracted from the text corpus, ranging from US-politics (Topics 4, 6, 7) to natural disasters (Topic 8), Hollywood sexual assault allegations (Topic 10) and the COVID epidemic both from a medical view (Topic 3) and a view on resulting restrictions to businesses (Topic 9).
The corresponding heatmap allowed us to infer when certain topics were most relevant in the last year. While the entries relating to the COVID pandemic remain light blue for the first half of the heatmap we sawa the articles picking up on the topic around March 2020, when the effects of the Coronavirus started hitting the US. Even comparatively smaller events like the conviction of Harvey Weinstein and the death of George Floyd triggering the racism debate in the US could be recognized in the heatmap, with a large deviation of Topic 10 around February 2020 and Topic 4 around June 2020.
Further empirical results on the Amazon review and New York Times news corpora, such as two-dimensional UMAP representations of the embedding matrix A and extracted topics from tensor NMF, can be found in Appendix A.3 and Appendix A.4, respectively. For example, Table A21 shows that the tensor NMF factorization also extracted high quality topics but lacked the interpretable affinity tensor R which was crucial in order to properly comprehend a topic development over time.

5. Conclusions and Outlook

We propose a constrained version of the DEDICOM algorithm that is able to factorize the pointwise mutual information matrices of text documents into meaningful topic clusters all the while providing interpretable word embeddings for each vocabulary item. Our study on semi-artificial data from Wikipedia articles has shown that this method recovers the underlying structure of the text corpus and provides topics with thematic granularity, meaning the extracted latent topics are more specific than a simple clustering of articles. A comparison to related matrix factorization methods has shown that the combination of relation aware topic modeling and interpretable word embedding learning given by our algorithm is unique in its class.
Extending this algorithm to factorize three-dimensional input tensors allows for the study of changes in the relations between topics across subsets of a structured text corpus, e.g., news articles grouped by time period. Algorithmically, this can be solved via alternating gradient descent by either automatic gradient methods or by applying multiplicative update rules which decrease training time drastically and enhance training stability.
Due to memory constraints from matrix multiplications of high dimensional dense tensors our proposed approach is currently limited in vocabulary size or time dimension.
In further work we aim for developing algorithms capable of leveraging sparse matrix multiplications to avoid the above mentioned memory constraints. In addition, we plan to expand on the possibilities of constraining the factor matrices and tensors when applying a multiplicative update rule and further analyze the behavior of the factor tensors, for example by utilizing time series analysis to discover temporal relations between extracted topics and to potentially identify trends. Finally, further analysis may include additional quantitative evaluations of our proposed methods’ topic modeling performance with competing approaches.

Author Contributions

Conceptualization, D.B.; Methodology, L.H.; Project administration, L.H. and D.B.; Supervision, C.B. and R.S.; Writing—original draft, L.H. and D.B. All authors have read and agreed to the published version of the manuscript.

Funding

The authors of this work were supported by the Competence Center for Machine Learning Rhine Ruhr (ML2R) which is funded by the Federal Ministry of Education and Research of Germany (grant no. 01|S18038C). We gratefully acknowledge this support.

Data Availability Statement

Publicly available datasets (Amazon reviews, Wikipedia articles) were analyzed in this study. This data can be found here: https://github.com/LarsHill/text-dedicom-paper. The NYT news article data presented in this study are available on request from the corresponding author. The data are not publicly available due to potential copyright concerns.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Additional Results

Appendix A.1. Additional Results on Wikipedia Data as Matrix Input

Articles: “Soccer”, “Bee”, “Johnny Depp”.
Table A1. For each evaluated matrix factorization method we display the top 10 words for each topic and the five most similar words based on cosine similarity for the two top words from each topic.
Table A1. For each evaluated matrix factorization method we display the top 10 words for each topic and the five most similar words based on cosine similarity for the two top words from each topic.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
NMF #619#1238#628#595#612#389
1ballbeesfilmfootballheardalbum
2mayspeciesstarredcupdeppband
3penaltybeeroleworldcourtguitar
4refereepollenseriesfifaalcoholvampires
5playershoneyburtonnationalrelationshiprock
6teaminsectscharacterassociationstatedhollywood
7goalfoodfilmsinternationaldivorcesong
8gamenestsboxwomenabusereleased
9playersolitaryofficeteamsparadisperry
10playeusocialjackuefastatingdebut
0ballbeesfilmfootballheardalbum
1invokeodorsburtondirectedathenaeuscrewjones
2replacedtuftstoneparalympicallegingmarilyn
3scorescolourlandaugoverningopedroots
4subdividedaffectedbrothervarietiesasserteddrums
0mayspeciesstarredcupdeppband
1ydnikosharedinauguratedreferheroes
2ineffectivenesscommerciallywhitakerconfederationyorkbowie
3tacticalmicrobiotaeccentricgoldleadersdebut
4slowerstrategiesbefriendsheadquartersnonindiansolo
LDA #577#728#692#607#663#814
1filmfootballdepppenaltybeesspecies
2serieswomenchildrenheardflowersworkers
3manassociationlifeballbeesolitary
4playedfifaroledirecthoneyplayers
5piratesteamsstarredrefereepollencolonies
6charactergamesalongsideredfoodeusocial
7alongworldactortimeincreasednest
8castcupstatedgoalpollinationmay
9alsogameburtonscoredtimessize
10hollowinternationalplayingplayerlarvaeegg
0filmfootballdepppenaltybeesspecies
1charliecupcriticalextrabeesocial
2nearcanadafebruarykicksinsectschosen
3thinkingzealandscriptinnerauthorsfemales
4shadowsactivitiessongmovinghivessubspecies
0serieswomenchildrenheardflowersworkers
1crybabyfifadetectiveallisonalwayscarcases
2watersoperacrimeseriouseusocialitylived
3sangexceedingmagazineallergicvarroaprovisioned
4castcujubarbercostwingcuckoo
SVD #1228#797#628#369#622#437
1beesdeppgamecupheardbeekeeping
2alsofilmballfootballcourtincreased
3beestarredteamfifadivorcehoney
4speciesroleplayersworldstatingdescribed
5playedseriespenaltyeuropeanalcoholuse
6timeburtonplayuefaparadiswild
7onecharactermaynationaldocumentsvarroa
8firstactorrefereeeuropeabusemites
9tworeleasedcompetitionscontinentalsettlementcolony
10pollenreleaselawsconfederationsuedflowers
0beesdeppgamecupheardbeekeeping
1beeiiicorrectcontinentalallegingvarroa
2developsracismabandonedcontestedattemptinganimals
3studiedappropriationmaximumconfederationsfinalizedmites
4cropsmarchclearconmebolsubmittedplato
0alsofilmballfootballcourtincreased
1althoughwatersfinelyerdeclarationusage
2toldrobinsonpoisedsuffixissuedfarmers
3chosenscottwornwordrestrainingmentioned
4starscostarsmannerappendedverballyaeneid
Articles: “Dolphin”, “Shark”, “Whale”.
Table A2. Top half lists the top 10 representative words per dimension of the basis matrix A, bottom half lists the five most similar words based on cosine similarity for the two top words from each topic.
Table A2. Top half lists the top 10 representative words per dimension of the basis matrix A, bottom half lists the five most similar words based on cosine similarity for the two top words from each topic.
Topic 1
#460
Topic 2
#665
Topic 3
#801
Topic 4
#753
Topic 5
#854
Topic 6
#721
1sharkcalfshipconservationwaterdolphin
(0.665)(0.428)(0.459)(0.334)(0.416)(0.691)
2sharksmonthsbecamecountriessimilardolphins
(0.645)(0.407)(0.448)(0.312)(0.374)(0.655)
3finscalvesposeidongovernmenttissuecaptivity
(0.487)(0.407)(0.44)(0.309)(0.373)(0.549)
4killedfemalesridingwalesbodywild
(0.454)(0.399)(0.426)(0.304)(0.365)(0.467)
5millionblubberdionysusbycatchswimmingbehavior
(0.451)(0.374)(0.422)(0.29)(0.357)(0.461)
6fishyoungancientcancelledbloodbottlenose
(0.448)(0.37)(0.42)(0.288)(0.346)(0.453)
7internationalspermdeityeasternsurfacesometimes
(0.442)(0.356)(0.412)(0.287)(0.344)(0.449)
8finbornagopolicyoxygenhuman
(0.421)(0.355)(0.398)(0.286)(0.34)(0.421)
9fishingfeedmelicertescontrolsystemless
(0.405)(0.349)(0.395)(0.285)(0.336)(0.42)
10teethmysticetesgreeksimminentswimvarious
(0.398)(0.341)(0.394)(0.282)(0.336)(0.418)
0sharkcalfshipconservationwaterdolphin
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2sharkscalvesdionysussouthpreydolphins
(0.981)(0.978)(0.995)(0.981)(0.964)(0.925)
3finsfemalesridingstatesswimmingsometimes
(0.958)(0.976)(0.992)(0.981)(0.959)(0.909)
4killedmonthsdeityunitedallowsanother
(0.929)(0.955)(0.992)(0.978)(0.957)(0.904)
5fishingyoungposeidonendangeredswimbottlenose
(0.916)(0.948)(0.987)(0.976)(0.947)(0.903)
0sharksmonthsbecamecountriessimilardolphins
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2sharkbornoldeasternsurfacebehavior
(0.981)(0.992)(0.953)(0.991)(0.992)(0.956)
3finsyounglaterunitedbrainsometimes
(0.936)(0.992)(0.946)(0.989)(0.97)(0.945)
4tigerspermagocaughtsoundvarious
(0.894)(0.985)(0.939)(0.987)(0.968)(0.943)
5killedcalvesmodernsouthobjectless
(0.887)(0.984)(0.937)(0.979)(0.965)(0.937)
Articles: “Dolphin”, “Shark”, “Whale’.
Table A3. For each evaluated matrix factorization method we display the top 10 words for each topic and the five most similar words based on cosine similarity for the two top words from each topic.
Table A3. For each evaluated matrix factorization method we display the top 10 words for each topic and the five most similar words based on cosine similarity for the two top words from each topic.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
NMF #492#907#452#854#911#638
1bloodinternationalevidencesonaragocalf
2bodykilledselfawarenessmayteethyoung
3heartstatesshipsurfacemillionfemales
4gillsconservationdionysusclicksmysticetescaptivity
5bonynewcamepreywhalescalves
6oxygenunitedanotheruseyearsmonths
7organssharkimportantunderwaterbaleenborn
8tissueworldposeidonsoundscetaceansspecies
9waterendangeredmarkknownmodernmale
10viaislandsridingsimilarextinctfemale
0bloodinternationalevidencesonaragocalf
1travelsproposalflawspoisonedconsistuninformed
2enterslipotidaemethodologicalsignalsspecializeprimary
3vibrationbannednictating≈–legsborn
4tolerantiniidaewakeemittedclosestleaner
0bodykilledselfawarenessmayteethyoung
1crystallawlegendaryindividualsfuelbrood
2blocksconsumershumankindhelpinglamplacking
3modifiedpontoporiidaehelperswastefilterfeedingaccurate
4slitsorgperformingdepressionkrillconsistency
LDA #650#785#695#815#635#674
1killedteethheadspeciesmeatair
2systembaleenfishmalewhaleusing
3endangeredmysticetesdolphinfemalesftcausing
4oftenagofinwhalesfisheriescurrents
5closejaweyessometimesalsosounds
6sharksfamilyfatcaptivityoceangroups
7countrieswaternavyyoungthreatssound
8sinceincludespopularsharkchildrenresearch
9calledallowtissuefemalepopulationclicks
10vesselsgreatertailwildbottomburst
0killedteethheadspeciesmeatair
1posturesdenseundersidealongporbeagleaustralis
2dolphinariumscetaceagroovesanothersourcesubmerged
3towntourismeyesightlongactivitiesmelbourne
4onethirdplanktonfeedersosmoregulationsleepcomparablespear
0systembaleenfishmalewhaleusing
1dominatemysticetesmostlyfemaleslivecommunication
2closedistinguishingswimaortahumanbecome
3controversyuniqueduefemalecoldassociated
4agreeremovewholepositionpartsmirror
SVD #1486#544#605#469#539#611
1dolphinswatersharkmillionposeidondolphin
2speciesbodysharksyearsbecamemeat
3whalestailfinsagoshipfamily
4fishteethinternationalwhaleridingriver
5alsoflipperskilledtwoevidencesimilar
6largetissuefishingcalfmelicertesextinct
7mayallowsfinmyadeitycalled
8oneairlawlaterinoused
9animalsfeednewmonthscameislands
10usebonyconservationmysticetesmadegenus
0dolphinswatersharkmillionposeidondolphin
1variousverticalcorpseapproximatelygamesdepicted
2findingunlikestocksassignedphalanthusmakara
3militarychewgaleahybodontsstatuecapensis
4selfmadelackgaleomorphiiappearedisthmiangoddess
0speciesbodysharksyearsbecamemeat
1herdheartmostlyacanthodianspiratescontaminated
2reproductionresistingfdaspentelderharpoon
3affordfitlistsstretchingmistookpractitioner
4maturityposteriorcarchariasinformalwealthypcbs
Articles: “Soccer”, “Tennis”, “Rugby”.
Table A4. Top half lists the top 10 representative words per dimension of the basis matrix A, bottom half lists the five most similar words based on cosine similarity for the two top words from each topic.
Table A4. Top half lists the top 10 representative words per dimension of the basis matrix A, bottom half lists the five most similar words based on cosine similarity for the two top words from each topic.
Topic 1
#539
Topic 2
#302
Topic 3
#563
Topic 4
#635
Topic 5
#650
Topic 6
#530
1mayleadstournamentsgreatestfootballnet
(0.599)(0.212)(0.588)(0.572)(0.553)(0.644)
2penaltysoletournamenttennisrugbyshot
(0.576)(0.205)(0.517)(0.497)(0.542)(0.629)
3refereecompeteseventsfemalesouthstance
(0.564)(0.205)(0.509)(0.44)(0.484)(0.553)
4teamextendingprizeeverunionstroke
(0.517)(0.204)(0.501)(0.433)(0.47)(0.543)
5goalfixingtournavratilovawalesserve
(0.502)(0.203)(0.497)(0.405)(0.459)(0.537)
6kicktriggeredmoneymodernnationalrotation
(0.459)(0.203)(0.488)(0.401)(0.446)(0.513)
7playbleedingcupbestenglandbackhand
(0.455)(0.202)(0.486)(0.4)(0.438)(0.508)
8ballfraudworldwingfieldnewhit
(0.452)(0.202)(0.467)(0.394)(0.416)(0.507)
9offenceinflammationatpsportseuropeforehand
(0.444)(0.202)(0.464)(0.39)(0.406)(0.499)
10foulconditionsmenwilliamsstatestorso
(0.443)(0.201)(0.463)(0.389)(0.404)(0.487)
0mayleadstournamentsgreatestfootballnet
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2goaltirednesseventsfemaleunionshot
(0.98)(1.0)(0.992)(0.98)(0.98)(0.994)
3playineffectivenesstoureverrugbyserve
(0.959)(1.0)(0.989)(0.971)(0.979)(0.987)
4penaltyrecommencesmoneynavratilovaassociationhit
(0.954)(1.0)(0.986)(0.967)(0.96)(0.984)
5teammandatedprizetennisenglandstance
(0.953)(1.0)(0.985)(0.962)(0.958)(0.955)
0penaltysoletournamenttennisrugbyshot
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2refereediscretioneventsgreatestfootballnet
(0.985)(1.0)(0.98)(0.962)(0.979)(0.994)
3kicksynonymeventfemaleunionserve
(0.985)(1.0)(0.978)(0.953)(0.975)(0.987)
4offenceviolatedatpyearenglandhit
(0.982)(1.0)(0.974)(0.951)(0.961)(0.983)
5foullayoutmoneynavratilovawalesstance
(0.982)(1.0)(0.966)(0.949)(0.949)(0.98)
Figure A1. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Figure A1. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Make 03 00007 g0a1
Figure A2. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Figure A2. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Make 03 00007 g0a2
Articles: “Soccer”, “Tennis”, “Rugby”.
Table A5. For each evaluated matrix factorization method we display the top 10 words for each topic and the 5 most similar words based on cosine similarity for the 2 top words from each topic.
Table A5. For each evaluated matrix factorization method we display the top 10 words for each topic and the 5 most similar words based on cosine similarity for the 2 top words from each topic.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
NMF #511#453#575#657#402#621
1netrefereenationaltournamentsracketsrules
2shotpenaltysouthdoublesballswingfield
3servemayfootballsinglesmadedecember
4hitkickcupeventssizegame
5stancecardeuropetourmustsports
6strokelistedfifaprizestringslawn
7backhandfoulunionmoneystandardmodern
8ballmisconductwalesatpsyntheticgreek
9serverredafricamenleatherfa
10serviceoffencenewgrandwidthfirst
0netrefereenationaltournamentsracketsrules
1defensiveretakenserbiabrunopressurisationcollection
2closerinterferencegoldwoodiesbecomehourglass
3somewheredismissednortheliminatedequivalentsunhappy
4centerfullyheadquarterssoaressizeoriginated
0shotpenaltysouthdoublesballswingfield
1rotatedpriorasiancombiningexpressexperimenting
2executeyellowargentinabeckerozllanelidan
3strivedurationlaexclusivelybladderattended
4curveprimarykongwoodbridgelengthantiphanes
LDA #413#518#395#776#616#501
1usednetwimbledonworldpenaltyclubs
2forehandballepiskyroscupscorerugby
3useserveoccurstournamentsgoalschools
4largeshotgrassfootballteamnavratilova
5notableopponentromanfifaendforms
6alsohitbcnationalplayersplaying
7westernlinesoccurinternationalmatchsport
8twohandedserveradeuropegoalsgreatest
9doublesserviceislandtournamenttimeunion
10injurymaybelievedstatesscoredwar
0usednetwimbledonworldpenaltyclubs
1secondsmistakenresultbritishmeasuresees
2restrictionsdiagonaldeterminedcancelledcrossedpapua
3althoughhollowexistscombinedrequiringadmittance
4useperpendicularwinwiiteammateforces
0forehandballepiskyroscupscorerugby
1twohandedlongromanmultiplepenaltyunion
2gripsdeucebcinlinebarpublic
3facetiouslypositionislandfifafouledtook
4woodbridgeallowsbelievedmanufacturedhourpublished
SVD #1310#371#423#293#451#371
1playersnettournamentsstrokegreatestballs
2playerballsinglesforehandeverrackets
3tennisshotdoublesstancefemalesize
4alsoservetourpowerwingfieldsquare
5playopponentslambackhandwilliamsmade
6footballmayprizetorsonavratilovaleather
7teamhitmoneygripgameweight
8firstservicegrandrotationsaidstandard
9onehittingeventstwohandedserenawidth
10rugbylinerankingusedsportspast
0playersnettournamentsstrokegreatestballs
1breakingpacemastersrotateslivedpanels
2onereachlowestachievefemalesewn
3runningunderhandeventsfacebiggestentire
4oftenairtouraddspotentialleather
0playerballsinglesforehandeverrackets
1utilizekeepindiantwohandedautobiographymeanwhile
2givehandsdoublesbeginsjacklaminated
3convertedpassprobackhandconsistentwood
4toucheitherrankingsachievegonzalesstrings

Appendix A.2. Additional Results on Wikipedia Data as Tensor Input

Wikipedia Articles “Soccer”, “Bee”, “Johnny Depp”–DEDICOM Automatic gradient method.
Figure A3. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A3. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a3
Wikipedia Articles “Soccer”, “Bee”, “Johnny Depp”—DEDICOM Multiplicative Update Rules.
Figure A4. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A4. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a4
Wikipedia Articles “Dolphin”, “Shark”, “Whale”—DEDICOM Multiplicative Update Rules.
Table A6. Each column lists the top 10 representative words per dimension of the basis matrix A.
Table A6. Each column lists the top 10 representative words per dimension of the basis matrix A.
Topic 1
#226
Topic 2
#628
Topic 3
#1048
Topic 4
#571
Topic 5
#1267
Topic 6
#554
1cellsmysticetessharkbonydolphinwhaling
(1.785)(1.808)(3.019)(1.621)(3.114)(3.801)
2brainwhalessharksblooddolphinsiwc
(1.624)(1.791)(2.737)(1.452)(2.908)(2.159)
3lightfeedfinsfishbottlenoseaboriginal
(1.561)(1.427)(1.442)(1.438)(1.629)(2.098)
4conebaleenkilledgillsmeatcanada
(1.448)(1.33)(1.407)(1.206)(1.403)(1.912)
5allowodontocetesendangeredteethbehaviormoratorium
(1.32)(1.278)(1.377)(1.088)(1.399)(1.867)
6greaterconsisthammerheadbodycaptivityindustry
(1.292)(1.162)(1.269)(1.043)(1.298)(1.855)
7slightlywaterconservationsystemriverus
(1.269)(1.096)(1.227)(1.027)(1.281)(1.838)
8earkrilltradeskeletoncommonbelugas
(1.219)(1.05)(1.226)(1.008)(1.275)(1.585)
9corneatoothedwhitetipcalledselfawarenesswhale
(1.158)(1.003)(1.203)(0.99)(1.248)(1.542)
10rodspermfinningtissueoftengb£
(1.128)(0.991)(1.184)(0.875)(1.218)(1.528)
Table A7. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A7. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0cellsmysticetessharkbonydolphinwhaling
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1sensitiveunbornnativeedgeshybridmāori
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2conegrindtlmirabilehybridizationtrips
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3rodcounterpartspredators—organismsmatchesyangtzepredominantly
(0.998)(1.0)(1.0)(1.0)(1.0)(1.0)
4corneasthreechamberedcretaceousturbulencegrampusrevenue
(0.998)(1.0)(1.0)(1.0)(1.0)(1.0)
0brainwhalessharksblooddolphinsiwc
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1receiveextendedreminiscenthydrodynamicsuperpoddistinction
(0.998)(0.996)(1.0)(0.998)(1.0)(1.0)
2equalizerbrydeelectricalscatteringmasturbationbillion
(0.998)(0.996)(1.0)(0.998)(1.0)(1.0)
3lobesclosesinducedreminderinteractionspain
(0.997)(0.996)(1.0)(0.998)(1.0)(1.0)
4cleareffectscoarselyflowsstressfulcompetition
(0.997)(0.996)(1.0)(0.998)(1.0)(1.0)
Figure A5. Colored heatmap of affinity tensor R .
Figure A5. Colored heatmap of affinity tensor R .
Make 03 00007 g0a5
Figure A6. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A6. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a6
Wikipedia Articles “Soccer”, “Tennis”, “Rugby”—DEDICOM Multiplicative Update Rules.
Figure A7. Colored heatmap of affinity tensor R .
Figure A7. Colored heatmap of affinity tensor R .
Make 03 00007 g0a7
Table A8. Each column lists the top 10 representative words per dimension of the basis matrix A.
Table A8. Each column lists the top 10 representative words per dimension of the basis matrix A.
Topic 1
#441
Topic 2
#861
Topic 3
#412
Topic 4
#482
Topic 5
#968
Topic 6
#57
1rugbytitlesracketsnetpenaltydoubles
(2.55)(1.236)(2.176)(2.767)(1.721)(2.335)
2unionwtawingfieldshotfootballsingles
(2.227)(1.196)(1.536)(2.586)(1.701)(2.321)
3walescircuitmodernserveteamtournaments
(1.822)(1.123)(1.513)(2.393)(1.507)(2.245)
4georgiafuturesrackethitlawstennis
(1.682)(1.122)(1.43)(1.978)(1.462)(1.752)
5fijiearnthstancerefereegrand
(1.557)(1.104)(1.355)(1.945)(1.449)(1.662)
6samoaofferlawnservicefifaevents
(1.474)(1.096)(1.316)(1.83)(1.439)(1.648)
7zealandmixedcenturystrokemayslam
(1.458)(1.089)(1.236)(1.797)(1.435)(1.623)
8newdrawsstringsservergoalplayer
(1.414)(1.085)(1.179)(1.761)(1.353)(1.344)
9tongaatpyieldedbackhandcompetitionsprofessional
(1.374)(1.072)(1.121)(1.692)(1.345)(1.328)
10southchallengerballsforehandassociationsplayers
(1.369)(1.07)(1.101)(1.554)(1.288)(1.316)
Table A9. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A9. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0rugbytitlesracketsnetpenaltydoubles
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1irelandhopmanproximalhitorganiserssingles
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2uniondressinterlacedformallyelapsedtournaments
(1.0)(1.0)(1.0)(1.0)(1.0)(0.985)
3backfiredtennischannelharryoffensivepolitegrand
(1.0)(0.998)(1.0)(1.0)(1.0)(0.975)
4kilopascalsseouldeservesdeeplymodestslam
(1.0)(0.998)(1.0)(1.0)(1.0)(0.971)
0unionwtawingfieldshotfootballsingles
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1rugbyhelpsproximalrequirescircumferencedoubles
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2irelandhamiltoninterlacedbackwardstouchlinetournaments
(1.0)(1.0)(1.0)(1.0)(1.0)(0.985)
3backfiredweeksharryentailsanctionsgrand
(1.0)(1.0)(1.0)(1.0)(1.0)(0.975)
4zealandcoupledeservestorsohomeslam
(1.0)(1.0)(1.0)(1.0)(0.999)(0.971)
Figure A8. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A8. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a8
Wikipedia Articles “Soccer”, “Tennis”, “Rugby”—TNMF.
Table A10. Each column lists the top 10 representative words per dimension of the basis matrix A .
Table A10. Each column lists the top 10 representative words per dimension of the basis matrix A .
Topic 1
#275
Topic 2
#505
Topic 3
#607
Topic 4
#459
Topic 5
#816
Topic 6
#559
1greatestracketstournamentsnetfootballpenalty
(39.36)(33.707)(29.126)(29.789)(27.534)(27.793)
2evermoderneventsshotrugbyreferee
(26.587)(24.281)(25.327)(27.947)(24.037)(23.632)
3femaleballstourserveuniongoal
(25.52)(22.016)(23.488)(25.722)(21.397)(23.072)
4navratilovawingfieldprizehitsouthmay
(24.348)(20.923)(21.823)(21.344)(20.761)(22.978)
5besttennisatpstancenationalteam
(24.114)(19.863)(21.124)(20.75)(19.586)(21.258)
6williamsstringsmoneyservicefifakick
(22.207)(18.602)(20.667)(19.7)(19.331)(21.052)
7serenaracketdoublesserverwalesfoul
(21.256)(18.369)(19.919)(19.051)(18.627)(19.018)
8saidmaderankingstrokeleaguelisted
(20.666)(17.622)(19.736)(18.781)(18.31)(17.736)
9martinayieldedusbackhandcupfree
(20.153)(17.284)(19.431)(17.809)(17.015)(17.702)
10budgethmastersballassociationgoals
(20.111)(16.992)(18.596)(17.2)(16.721)(17.209)
Table A11. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A11. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0greatestracketstournamentsnetfootballpenalty
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1illustratedgardenuslobmidlothianwhole
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2johanssonconstructionearnedreceivingalcockcorner
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3wiltonyieldedparticipatingrotatescapitaloffender
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
4jonathanenergyreceivesaddsrepresentativesstoke
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
0evermoderneventsshotrugbyreferee
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1deserveddesignjuniorslobberslangdismissed
(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)
2statedversionbowlunablecolonistsshowing
(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)
3femaleshapecomprisedaltersevenasidestoppage
(1.0)(0.998)(1.0)(1.0)(1.0)(1.0)
4contemporariesstitchedcarloapplyingseldomlayout
(1.0)(0.998)(1.0)(1.0)(1.0)(1.0)
Figure A9. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A9. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a9
Wikipedia Articles “Dolphin”, “Shark”, “Whale”—TNMF.
Table A12. Each column lists the top 10 representative words per dimension of the basis matrix H.
Table A12. Each column lists the top 10 representative words per dimension of the basis matrix H.
Topic 1
#675
Topic 2
#996
Topic 3
#279
Topic 4
#491
Topic 5
#1190
Topic 6
#663
1whalingsharksyoungkilleddolphinmysticetes
(34.584)(30.418)(33.823)(23.214)(35.52)(24.404)
2whalefishbornsharkdolphinsflippers
(25.891)(23.648)(27.62)(22.6)(31.881)(22.059)
3whalesbonyoviductstatesbottlenoseodontocetes
(21.653)(19.689)(23.706)(21.24)(18.198)(21.621)
4belugaspreyviviparityendangeredbehaviorwater
(20.933)(18.785)(23.694)(20.976)(18.003)(21.087)
5aboriginalteethembryosconservationselfawarenesstail
(19.44)(18.242)(22.966)(20.398)(16.48)(18.268)
6iwcbloodcontinuefinsmeatmya
(19.226)(16.521)(21.752)(18.641)(16.02)(17.79)
7canadagillscalvesnewoftenbaleen
(18.691)(13.34)(21.25)(18.445)(15.687)(17.189)
8arctictissueblubberinternationalcaptivitylimbs
(17.406)(12.927)(21.094)(18.4)(15.452)(16.56)
9industrybodyeggdrumriverallow
(16.837)(12.691)(20.735)(17.587)(14.68)(16.552)
10rightskeletonfluidsfinningcommontoothed
(16.766)(12.52)(20.662)(17.321)(14.389)(16.489)
Table A13. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A13. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0whalingsharksyoungkilleddolphinmysticetes
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1antarcticaloangettingalzheimerbehaviorsdigits
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2spainleopardinsulationqueenslandfamiliarstreamlined
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3carodogfishharshalspantropicalarchaeocete
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
4excludedlifespansprimarycontroltestdefines
(1.0)(0.999)(1.0)(1.0)(1.0)(0.999)
0whalefishbornsharkdolphinsflippers
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1reasonlikegettingfigurelevelsexpel
(1.0)(0.997)(1.0)(1.0)(1.0)(1.0)
2respectedlifetimeyoungsourcesmoderatecompress
(1.0)(0.992)(1.0)(1.0)(0.999)(1.0)
3divinitycontentleanervideoinjuriesprotocetus
(0.999)(0.992)(1.0)(0.998)(0.999)(1.0)
4takenhazardousinsulationdogfishesseemsnostrils
(0.998)(0.992)(1.0)(0.997)(0.998)(1.0)
Figure A10. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A10. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a10
Wikipedia Articles “Soccer”, “Bee”, “Johnny Depp”—TNMF.
Table A14. Each column lists the top 10 representative words per dimension of the basis matrix H.
Table A14. Each column lists the top 10 representative words per dimension of the basis matrix H.
Topic 1
#793
Topic 2
#554
Topic 3
#736
Topic 4
#601
Topic 5
#740
Topic 6
#616
1filmballhoneyfootballheardspecies
(37.29)(27.588)(29.778)(32.591)(37.167)(32.973)
2starredmayinsectsfifadeppeusocial
(23.821)(25.768)(27.784)(25.493)(30.275)(25.001)
3rolepenaltybeesworldcourtfemales
(23.006)(25.04)(27.679)(25.414)(20.771)(24.24)
4seriesplayersbeecupdivorcesolitary
(19.563)(24.063)(26.936)(24.925)(17.289)(21.173)
5burtonrefereefoodassociationsuednest
(18.694)(23.649)(23.44)(22.331)(16.105)(20.198)
6playedteamflowersnationalstatedmales
(17.583)(22.9)(22.374)(20.958)(15.984)(18.3)
7charactergoalpollinationwomenalcoholworkers
(16.646)(22.859)(18.09)(20.668)(15.238)(17.16)
8successplayerlarvaeinternationalstatingtypically
(16.41)(22.054)(17.73)(20.16)(15.199)(16.886)
9filmsplaypollentournamentparadiscolonies
(15.74)(21.774)(17.666)(18.26)(14.98)(16.528)
10boxgamepredatorsuefaallegedqueens
(15.024)(20.471)(17.634)(18.029)(14.971)(16.427)
Table A15. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A15. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0filmballhoneyfootballheardspecies
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1avrilofficialstriangulumenteredobtainedprogressive
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2officeinvokeconsumptionmostcountersuedhalictidae
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3landauheadingcopperexcessdepthstemperate
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
4chamberlaintwohalvesmightukmismanagementspring
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
0starredmayinsectsfifadeppeusocial
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1raiminoninternationalbloomsoceaniacityunfertilized
(1.0)(0.992)(1.0)(1.0)(1.0)(1.0)
2candidateredeatssudamericanatributefemales
(1.0)(0.991)(1.0)(1.0)(1.0)(1.0)
3hardwickerequiredcatchingwidenedmickpaper
(1.0)(0.991)(1.0)(1.0)(1.0)(1.0)
4peteryddiseaseoverseeelvishibernate
(1.0)(0.989)(1.0)(1.0)(1.0)(1.0)

Appendix A.3. Additional Results on Amazon Review Data as Tensor Input

Amazon Reviews—DEDICOM Mulitplicative Update Rules.
Figure A11. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Figure A11. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original Wikipedia article assignment (words that occur in more than one article are excluded).
Make 03 00007 g0a11
Figure A12. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original review article.
Figure A12. (a) 2-dimensional representation of word embeddings A colored by topic assignment. (b) 2-dimensional representation of word embeddings A colored by original review article.
Make 03 00007 g0a12
Figure A13. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original review article.
Figure A13. (a) 2-dimensional representation of word embeddings H colored by topic assignment. (b) 2-dimensional representation of word embeddings H colored by original review article.
Make 03 00007 g0a13
Table A16. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A16. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6Topic 7Topic 8Topic 9Topic 10
0annashenlegendarylasseterdiscscreamscodemikewoodypo
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1christophcanonssacredandrewthxcertifiedharvestedconfirmboggripspanda
(1.0)(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.985)
2readinessyeohfulfillstantonpresentationspeciallytraineddiscountchasedspyfu
(1.0)(1.0)(0.999)(1.0)(0.999)(1.0)(1.0)(0.996)(1.0)(0.983)
3carrotswolfrosteregglestonupgradescreamprocessingbrowserflairjosieblack
(1.0)(1.0)(0.999)(1.0)(0.999)(1.0)(1.0)(0.996)(1.0)(0.983)
4povertyweaponmegafanuncreditedfeaturettescorporationpopupslotsupurbkung
(1.0)(1.0)(0.999)(1.0)(0.998)(1.0)(1.0)(0.995)(1.0)(0.981)
0elsapeacockvalleydirectorbirdsenergyemailcrystalbuzzmaster
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1shipwreckshenprayingproducerpressedpoweredconfirmozhistshifu
(1.0)(1.0)(0.999)(0.998)(1.0)(0.998)(1.0)(1.0)(1.0)(0.999)
2marriagemcbridekimteasergadgetfrightenedcodemaewaynewarrior
(1.0)(1.0)(0.997)(0.997)(1.0)(0.997)(1.0)(1.0)(1.0)(0.999)
3idenayeohchorgumglobesclassicallyscreamsfwiwceliareuniteddragon
(1.0)(1.0)(0.997)(0.997)(1.0)(0.996)(1.0)(1.0)(1.0)(0.998)
4proddingmichellepreyingrousingstarzscarryandroidcristalhockeymartial
(1.0)(1.0)(0.997)(0.997)(1.0)(0.996)(1.0)(1.0)(1.0)(0.994)
Table A17. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A17. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6Topic 7Topic 8Topic 9Topic 10
0annashenlegendarylasseterdiscscreamscodemikewoodypo
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1christophcanonssacredandrewthxcertifiedharvestedconfirmboggripspanda
(1.0)(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.985)
2readinessyeohfulfillstantonpresentationspeciallytraineddiscountchasedspyfu
(1.0)(1.0)(0.999)(1.0)(0.999)(1.0)(1.0)(0.996)(1.0)(0.983)
3carrotswolfrosteregglestonupgradescreamprocessingbrowserflairjosieblack
(1.0)(1.0)(0.999)(1.0)(0.999)(1.0)(1.0)(0.996)(1.0)(0.983)
4povertyweaponmegafanuncreditedfeaturettescorporationpopupslotsupurbkung
(1.0)(1.0)(0.999)(1.0)(0.998)(1.0)(1.0)(0.995)(1.0)(0.981)
0elsapeacockvalleydirectorbirdsenergyemailcrystalbuzzmaster
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1shipwreckshenprayingproducerpressedpoweredconfirmozhistshifu
(1.0)(1.0)(0.999)(0.998)(1.0)(0.998)(1.0)(1.0)(1.0)(0.999)
2marriagemcbridekimteasergadgetfrightenedcodemaewaynewarrior
(1.0)(1.0)(0.997)(0.997)(1.0)(0.997)(1.0)(1.0)(1.0)(0.999)
3idenayeohchorgumglobesclassicallyscreamsfwiwceliareuniteddragon
(1.0)(1.0)(0.997)(0.997)(1.0)(0.996)(1.0)(1.0)(1.0)(0.998)
4proddingmichellepreyingrousingstarzscarryandroidcristalhockeymartial
(1.0)(1.0)(0.997)(0.997)(1.0)(0.996)(1.0)(1.0)(1.0)(0.994)
Amazon Reviews—TNMF.
Table A18. Each column lists the top 10 representative words per dimension of the basis matrix H.
Table A18. Each column lists the top 10 representative words per dimension of the basis matrix H.
Topic 1
#590
Topic 2
#1052
Topic 3
#456
Topic 4
#350
Topic 5
#4069
Topic 6
#733
Topic 7
#1140
Topic 8
#582
Topic 9
#423
Topic 10
#605
1annawoodydirectorallenwidescreencodemastermikefilmscreams
(109.81)(134.366)(100.622)(83.875)(34.484)(88.94)(89.686)(88.628)(58.514)(87.782)
2elsabuzzlasseterhanksouttakesemailpocrystalanimationenergy
(106.148)(120.93)(93.134)(77.313)(30.724)(73.645)(85.79)(82.472)(53.628)(78.113)
3olafandyandrewtimdiscpromoshifubillycharactersworld
(59.353)(105.728)(81.452)(75.511)(30.688)(67.483)(82.465)(79.244)(46.861)(73.888)
4trollstoysstantonricklesextrasamazonwarriorgoodmanfilmsmonstropolis
(58.811)(98.523)(80.132)(74.217)(30.09)(64.978)(75.609)(76.831)(44.937)(73.484)
5kristofflightyearjohntomversionspromotiondragonsullypixarmonsters
(56.309)(68.336)(73.004)(72.401)(27.894)(58.631)(74.235)(75.612)(44.873)(71.721)
6hanssidpetejimincludedfreetaiwazowskievencity
(55.628)(52.34)(70.556)(69.776)(27.455)(58.207)(71.721)(71.588)(44.176)(71.352)
7frozencowboydoctervarneymaterialpromotionallungrandallanimatedpower
(54.257)(48.588)(64.734)(66.053)(26.887)(57.738)(70.786)(69.695)(43.492)(70.642)
8queenspaceralphslinkyeditionclickfurioussulleyalsomonster
(53.956)(47.88)(54.884)(62.326)(26.546)(55.373)(63.232)(69.604)(43.484)(70.197)
9sisterroomjoepotatocontainsdownloadoogwayjamesdvdcloset
(52.749)(42.655)(53.7)(62.237)(25.386)(50.788)(60.879)(68.574)(42.736)(61.451)
10icetoyranftmrextrapurchasefivebuscemiwellscare
(49.71)(42.042)(53.41)(61.801)(25.144)(50.327)(59.259)(66.028)(40.124)(61.243)
Table A19. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A19. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6Topic 7Topic 8Topic 9Topic 10
0annawoodydirectorallenwidescreencodemastermikefilmscreams
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1marriageaccientlyproducertrustworthybenefactorscardfuriouslongtimefilmselectrical
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.995)(1.0)(1.0)
2trollslimpjacksonargumentspioneersconfirmationdragoncyclopsfirstscreamprocessing
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.995)(0.994)(1.0)
3fleesjealouseyrabsonhankskeepcaseassumingshifuslotanimatedchlid
(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)(0.995)(0.993)(1.0)
4christianswellscomposerknowitallredoneandroidwarriorhumanlikeanimationshortage
(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)(0.994)(0.989)(1.0)
0elsabuzzlasseterhanksouttakesemailpocrystalanimationenergy
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1marriagerecivenathantrustworthystoryboardingpromomartialbillyfilmsupply
(1.0)(0.999)(1.0)(1.0)(0.993)(1.0)(0.998)(1.0)(0.989)(1.0)
2healszorgofficeralleninformativeavailfighttalkativestorypowered
(1.0)(0.999)(1.0)(1.0)(0.991)(1.0)(0.997)(0.999)(0.987)(1.0)
3marryinglimpcunninghamtomcontentsflixsterartscompetitorscenescollect
(1.0)(0.997)(1.0)(1.0)(0.99)(1.0)(0.996)(0.999)(0.987)(1.0)
4feministaccientlyderryberryargumentslogoconfirmingadopteddevilishlyalsoscreams
(1.0)(0.997)(1.0)(1.0)(0.99)(1.0)(0.995)(0.999)(0.984)(0.999)

Appendix A.4. Additional Results on the New York Times News Article Data as Tensor Input

New York Times News Articles—DEDICOM Mulitplicative Update Rules.
Table A20. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A20. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6Topic 7Topic 8Topic 9Topic 10
0suleimaniloansmasksfloydcontributedconfederateukrainestormrestaurantsweinstein
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1qassimspendsanitizerbrutalityalanstatuelutsenkostormssalonsraped
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2iransmallbusinesswipespoliceedmondsonmonumentsukrainiansisaiascafespredatory
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3iranianrentclothsystemicmervoshstatuesyovanovitchlandfallpubsmann
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
4militiasincentiveshomemadekneeemilyhonoringburismaforecastersnightclubssciorra
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
0iranuniversityprotectiveminneapolisreportingstatuesondlandhurricanebarssexual
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1qassimjeromegownsbreonnarabinmonumentszelenskybahamasdiningrape
(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2suleimanioxfordventilatorskuengcontributedstatuesvolkerhurricanestheatersmetoo
(1.0)(0.998)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3iraniancolumbiarespiratorsfloydkeithconfederategiulianiforecastersvenuessexually
(1.0)(0.998)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
4militiaseconomicssuppliespolicechokshihonoringquidlandfallmallsmann
(1.0)(0.998)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
New York Times News Articles—TNMF.
Table A21. Each column lists the top 10 representative words per dimension of the basis matrix A.
Table A21. Each column lists the top 10 representative words per dimension of the basis matrix A.
Topic 1
#977
Topic 2
#360
Topic 3
#420
Topic 4
#4192
Topic 5
#489
Topic 6
#405
Topic 7
#1135
Topic 8
#748
Topic 9
#108
Topic 10
#1166
1floydcontributediranmasksshipsyriasenatorrestaurantsbloomukraine
(82.861)(137.058)(80.007)(40.463)(87.178)(94.686)(77.285)(93.511)(110.77)(79.611)
2policereportingsuleimanipatientscrewsyrianstormbarsjuliesondland
(64.649)(84.889)(78.78)(34.77)(70.694)(82.565)(43.439)(64.889)(103.282)(61.099)
3protestersmichaeliranianventilatorsaboardkurdishhurricanereopeneditedtestimony
(63.588)(76.156)(72.581)(34.299)(67.464)(82.013)(42.213)(57.541)(100.159)(49.982)
4minneapoliskatieiraqprotectivepassengersturkeyiowastoreslostestified
(63.216)(63.146)(63.27)(33.719)(65.895)(80.374)(41.985)(55.435)(95.747)(49.959)
5protestsemilygenloanscruiseturkishrepublicangymsgraduatedzelensky
(61.585)(60.696)(50.966)(28.178)(63.535)(75.912)(40.993)(50.487)(93.51)(48.427)
6georgealanstrikesuppliesprincesskurdsgovtheatersangelesambassador
(53.378)(59.499)(49.026)(27.15)(45.714)(63.639)(37.689)(49.866)(92.349)(46.086)
7brutalitynicholasiraqiglovesflightfightersbuttigiegclosedberkeleyweinstein
(44.051)(55.899)(46.027)(26.724)(45.375)(62.313)(37.1)(46.544)(85.653)(45.053)
8officerscochraneqassimequipmentnasaforcesdemocratindoorgrewukrainian
(43.581)(52.045)(45.861)(26.252)(43.306)(57.659)(37.087)(44.541)(84.145)(43.716)
9racismbenmajrespiratorynavytroopsrepresentativesalonstodaygiuliani
(43.457)(41.414)(44.921)(25.447)(40.396)(54.045)(35.807)(41.99)(41.818)(42.674)
10demonstrationsmaggiebaghdadtestingastronautsisisbernieshopscaliforniasexual
(42.547)(41.328)(44.867)(24.179)(37.073)(53.743)(35.228)(40.311)(38.807)(39.966)
Table A22. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Table A22. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6Topic 7Topic 8Topic 9Topic 10
0floydcontributediranmasksshipsyriasenatorrestaurantsbloomukraine
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1demonstrationsshearsuleimaniprovidersaboardisiswydenshopsgraduatedvolker
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2systemicannieretaliationdistressedcapsuleceasefireiowatakeouteditedinquiry
(1.0)(1.0)(1.0)(1.0)(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)
3protestsmazzeiqassimtobaccodiamondfighterssteyernightclubsberkeleytranscript
(1.0)(1.0)(1.0)(1.0)(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)
4defundkittyrevengeselfemployeddragonsyrianklobucharpubsgrewinvestigations
(1.0)(1.0)(1.0)(1.0)(1.0)(0.999)(1.0)(1.0)(1.0)(1.0)
0policereportingsuleimanipatientscrewsyrianstormbarsjuliesondland
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1systemicluisstriketreatingaboardalassadcarolinareopengarcettitestifying
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.993)(1.0)
2peacefulbeachymajinfectioncapsulereceprubiononessentialgraduatedmick
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.988)(1.0)
3peacefullykaplanirandevelopprincesserdoganhampshirenaileditedquid
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.988)(1.0)
4kneeglueckretaliationrepaycruisekurdslandfalltakeoutberkeleyimpeachment
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)(0.988)(1.0)
Figure A14. 2-dimensional representation of word embeddings A colored by topic assignment.
Figure A14. 2-dimensional representation of word embeddings A colored by topic assignment.
Make 03 00007 g0a14
Figure A15. 2-dimensional representation of word embeddings A colored by topic assignment).
Figure A15. 2-dimensional representation of word embeddings A colored by topic assignment).
Make 03 00007 g0a15

Appendix B. Matrix Derivatives

In this section we derive the derivatives in (20) and (24) analytically.
We write the loss in trace form by
L ( S , A , R ) = S A R A T F 2 = tr S A R A T T S A R A T = tr Q T Q
Then
d L = d   tr Q T Q = tr d Q T Q = tr ( d Q ) T Q + Q T d Q = tr Q T d Q + Q T d Q = tr Q T + Q T d Q = 2 tr Q T d Q = 2 tr Q T d S A R A T = 2 tr Q T d S 2 tr Q T d A R A T = 2 tr Q T d A R A T
Differential in terms of d R:
d L = 2 tr Q T d A R A T = 2 tr Q T A d R A T = 2 tr A T Q T A d R
L R = 2 A T Q A = 2 A T S A R A T A = 2 A T S A A T A R A T A
Differential in terms of A:
d L = 2 tr Q T d A R A T = 2 tr Q T d A R A T + Q T A R ( d A ) T = 2 tr R A T Q T d A + R T A T Q d A = 2 tr R A T Q T + R T A T Q d A
L A = 2 Q A R T + Q T A R = 2 S A R A T A R T + S A R A T T A R = 2 S A R T A R A T A R T + S T A R A R T A T A R = 2 S A R T + S T A R A R A T A R T + R T A T A R

References

  1. Levy, O.; Goldberg, Y. Neural Word Embedding as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems; NIPS’14; MIT Press: Cambridge, MA, USA, 2014; Volume 2, pp. 2177–2185. [Google Scholar]
  2. Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
  3. Hillebrand, L.P.; Biesner, D.; Bauckhage, C.; Sifa, R. Interpretable Topic Extraction and Word Embedding Learning Using Row-Stochastic DEDICOM. In Machine Learning and Knowledge Extraction—4th International Cross-Domain Conference, CD-MAKE; Lecture Notes in Computer Science; Springer: Dublin, Ireland, 2020; Volume 12279, pp. 401–422. [Google Scholar]
  4. Symeonidis, P.; Zioupos, A. Matrix and Tensor Factorization Techniques for Recommender Systems; Springer International Publishing: New York, NY, USA, 2016. [Google Scholar] [CrossRef]
  5. Harshman, R.; Green, P.; Wind, Y.; Lundy, M. A Model for the Analysis of Asymmetric Data in Marketing Research. Market. Sci. 1982, 1, 205–242. [Google Scholar] [CrossRef]
  6. Andrzej, A.H.; Cichocki, A.; Dinh, T.V. Nonnegative DEDICOM Based on Tensor Decompositions for Social Networks Exploration. Aust. J. Intell. Inf. Process. Syst. 2010, 12, 10–15. [Google Scholar]
  7. Bader, B.W.; Harshman, R.A.; Kolda, T.G. Pattern Analysis of Directed Graphs Using DEDICOM: An Application to Enron Email; Office of Scientific & Technical Information Technical Reports; Sandia National Laboratories: Albuquerque, NM, USA, 2006. [Google Scholar]
  8. Sifa, R.; Ojeda, C.; Cvejoski, K.; Bauckhage, C. Interpretable Matrix Factorization with Stochasticity Constrained Nonnegative DEDICOM. In Proceedings of the KDML-LWDA, Rostock, Germany, 11–13 September 2017. [Google Scholar]
  9. Sifa, R.; Ojeda, C.; Bauckage, C. User Churn Migration Analysis with DEDICOM. In Proceedings of the 9th ACM Conference on Recommender Systems; RecSys ’15; Association for Computing Machinery: New York, NY, USA, 2015; pp. 321–324. [Google Scholar]
  10. Chew, P.; Bader, B.; Rozovskaya, A. Using DEDICOM for Completely Unsupervised Part-of-Speech Tagging. In Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, Boulder, CO, USA, 5 June 2009; Association for Computational Linguistics: Boulder, CO, USA, 2009; pp. 54–62. [Google Scholar]
  11. Nickel, M.; Tresp, V.; Kriegel, H.P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2009; Association for Computing Machinery: Bellevue, WA, USA, 2011; pp. 809–816. [Google Scholar]
  12. Sifa, R.; Yawar, R.; Ramamurthy, R.; Bauckhage, C.; Kersting, K. Matrix- and Tensor Factorization for Game Content Recommendation. KI-Künstl. Intell. 2019, 34, 57–67. [Google Scholar] [CrossRef]
  13. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. Adv. Neural Inf. Proc. Syst. 2013, arXiv:cs.CL/1310.454626, 3111–3119. [Google Scholar]
  14. Lee, D.D.; Seung, H.S. Algorithms for Non-Negative Matrix Factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems; NIPS’00; MIT Press: Cambridge, MA, USA, 2000; pp. 535–541. [Google Scholar]
  15. Furnas, G.W.; Deerwester, S.; Dumais, S.T.; Landauer, T.K.; Harshman, R.A.; Streeter, L.A.; Lochbaum, K.E. Information Retrieval Using a Singular Value Decomposition Model of Latent Semantic Structure. In ACM SIGIR Forum; ACM: New York, NY, USA, 1988. [Google Scholar]
  16. Wang, Y.; Zhu, L. Research and implementation of SVD in machine learning. In Proceedings of the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, 24–26 May 2017; pp. 471–475. [Google Scholar]
  17. Jolliffe, I. Principal Component Analysis; John Wiley and Sons Ltd.: Hoboken, NJ, USA, 2005. [Google Scholar]
  18. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  19. Lebret, R.; Collobert, R. Word Embeddings through Hellinger PCA. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp. 482–490. [Google Scholar]
  20. Nguyen, D.Q.; Billingsley, R.; Du, L.; Johnson, M. Improving Topic Models with Latent Feature Word Representations. Trans. Assoc. Comput. Linguist. 2015, 3, 299–313. [Google Scholar] [CrossRef]
  21. Frank, M.; Wolfe, P. An algorithm for quadratic programming. Naval Res. Logist. Q. 1956, 3, 95–110. [Google Scholar] [CrossRef]
  22. Ni, J.; Li, J.; McAuley, J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 188–197. [Google Scholar]
  23. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  24. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Figure 1. (a) The DEDICOM algorithm factorizes a square matrix S     n × n into a loading matrix A n × k and an affinity matrix R k × k (b) The tensor DEDICOM algorithm factorizes a three dimensional tensor S ¯ t × n × n into a loading matrix A n × k and a three dimensional affinity tensor R ¯ t × k × k
Figure 1. (a) The DEDICOM algorithm factorizes a square matrix S     n × n into a loading matrix A n × k and an affinity matrix R k × k (b) The tensor DEDICOM algorithm factorizes a three dimensional tensor S ¯ t × n × n into a loading matrix A n × k and a three dimensional affinity tensor R ¯ t × k × k
Make 03 00007 g001
Figure 2. Reconstruction loss development during tensor factorization training. The x-axis plots the number of epochs on a logarithmic scale, the y-axis plots the corresponding reconstruction error for each method.
Figure 2. Reconstruction loss development during tensor factorization training. The x-axis plots the number of epochs on a logarithmic scale, the y-axis plots the corresponding reconstruction error for each method.
Make 03 00007 g002
Figure 3. The affinity matrix R describes the relationships between the latent factors. Illustrated here are two word embeddings, corresponding to the words w i and w j . Darker shades represent larger values. In this example we predict a large co-occurrence at S i i and S j j because of the large weight on the diagonal of the R matrix. We predict a low co-occurrence at S i j and S j i since the large weights on A i 1 and A j 3 interact with low weights on R 13 and R 31 .
Figure 3. The affinity matrix R describes the relationships between the latent factors. Illustrated here are two word embeddings, corresponding to the words w i and w j . Darker shades represent larger values. In this example we predict a large co-occurrence at S i i and S j j because of the large weight on the diagonal of the R matrix. We predict a low co-occurrence at S i j and S j i since the large weights on A i 1 and A j 3 interact with low weights on R 13 and R 31 .
Make 03 00007 g003
Figure 4. Reconstruction loss development during matrix factorization training. The x-axis plots the number of epochs, the y-axis plots the corresponding reconstruction error for each method.
Figure 4. Reconstruction loss development during matrix factorization training. The x-axis plots the number of epochs, the y-axis plots the corresponding reconstruction error for each method.
Make 03 00007 g004
Figure 5. (a)2-dimensional representation of word embeddings A colored by topic assignment). (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Figure 5. (a)2-dimensional representation of word embeddings A colored by topic assignment). (b) 2-dimensional representation of word embeddings A colored by original Wikipedia article assignment (words that occur in more than one article are excluded). (c) Colored heatmap of affinity matrix R .
Make 03 00007 g005
Figure 6. Colored heatmap of affinity tensor R ¯ , trained on the Wikipedia data represented as input tensor using automatic gradient methods.
Figure 6. Colored heatmap of affinity tensor R ¯ , trained on the Wikipedia data represented as input tensor using automatic gradient methods.
Make 03 00007 g006
Figure 7. Colored heatmap of affinity tensor R ¯ , trained on the Wikipedia data represented as input tensor using multiplicative update rules.
Figure 7. Colored heatmap of affinity tensor R ¯ , trained on the Wikipedia data represented as input tensor using multiplicative update rules.
Make 03 00007 g007
Figure 8. Colored heatmap of affinity tensor R ¯ , trained on the Amazon review data represented as input tensor using multiplicative update rules.
Figure 8. Colored heatmap of affinity tensor R ¯ , trained on the Amazon review data represented as input tensor using multiplicative update rules.
Make 03 00007 g008
Figure 9. Colored heatmap of affinity tensor R ¯ , trained on the New York Times news article data represented as input tensor using multiplicative update rules.
Figure 9. Colored heatmap of affinity tensor R ¯ , trained on the New York Times news article data represented as input tensor using multiplicative update rules.
Make 03 00007 g009
Table 1. Amazon movie review corpus grouped by movie and number of reviews per slice of input tensor.
Table 1. Amazon movie review corpus grouped by movie and number of reviews per slice of input tensor.
Movie# Reviews
Toy Story 12491
Monsters, Inc.3203
Kung Fu Panda 16708
Toy Story 31209
Kung Fu Panda 21208
Frozen1292
Table 2. New York Times news corpus composition by section and number of articles.
Table 2. New York Times news corpus composition by section and number of articles.
Section# Articles
Politics3204
U.S.2610
Business1624
New York1528
Europe988
Asia Pacific839
Health598
Technology551
Middle East443
Science440
Economy339
Elections240
Climate239
World233
Africa124
Australia113
Canada104
Table 3. New York Times news corpus grouped by month and number of articles. This corresponds to the number of articles per slice of input tensor.
Table 3. New York Times news corpus grouped by month and number of articles. This corresponds to the number of articles per slice of input tensor.
Month# Articles
September 20191586
October 20191788
November 20191623
December 20191461
January 20201725
Febuary 20201602
March 20201937
April 20201712
May 20201713
June 20201828
July 20201814
August 20201886
Table 4. Overview of word count statistics after preprocessing for all datasets. Columns represent from left to right the total number of words per corpus, total number of unique words per corpus, average number of total words per article, average number of unique words per article and the cutoff frequency of the 10,000th most common word. Wikipedia article combinations: Dolphin, Shark, Whale (DSW), Soccer, Bee, Johnny Depp (SBJ), Soccer, Tennis, Rugby (STR).
Table 4. Overview of word count statistics after preprocessing for all datasets. Columns represent from left to right the total number of words per corpus, total number of unique words per corpus, average number of total words per article, average number of unique words per article and the cutoff frequency of the 10,000th most common word. Wikipedia article combinations: Dolphin, Shark, Whale (DSW), Soccer, Bee, Johnny Depp (SBJ), Soccer, Tennis, Rugby (STR).
TotalUniqueAvg. TotalAvg UniqueCutoff
Amazon Reviews252,40015,56015.213.41
Wikipedia DSW14,50043764833.32106.01
Wikipedia SBJ10,43540343478.31600.31
Wikipedia STR11,50132243833.71408.01
New York Times12,043,205141,591582.5366.5118
Table 5. The top 10 representative words per dimension of the basis matrix A , trained on the Wikipedia data as input matrix using automatic gradient methods.
Table 5. The top 10 representative words per dimension of the basis matrix A , trained on the Wikipedia data as input matrix using automatic gradient methods.
Topic 1
#619
Topic 2
#1238
Topic 3
#628
Topic 4
#595
Topic 5
#612
Topic 6
#389
1ballfilmsalazarcupbeesheard
(0.77)(0.857)(0.201)(0.792)(0.851)(0.738)
2penaltystarredgeoffreyfootballspeciescourt
(0.708)(0.613)(0.2)(0.745)(0.771)(0.512)
3mayrolerushfifabeedepp
(0.703)(0.577)(0.2)(0.731)(0.753)(0.505)
4refereeseriesbrentonworldpollendivorce
(0.667)(0.504)(0.199)(0.713)(0.658)(0.454)
5goalburtonhardwickenationalhoneyalcohol
(0.66)(0.492)(0.198)(0.639)(0.602)(0.435)
6teamcharacterthwaitesuefainsectsparadis
(0.651)(0.465)(0.198)(0.623)(0.576)(0.42)
7playersplayedcatherinecontinentalfoodrelationship
(0.643)(0.451)(0.198)(0.582)(0.536)(0.419)
8playerdirectorkayateamsnestsabuse
(0.639)(0.45)(0.198)(0.576)(0.529)(0.41)
9playsuccessmelfieuropeansolitarystating
(0.606)(0.438)(0.198)(0.57)(0.513)(0.408)
10gamejackraimiassociationeusocialstated
(0.591)(0.434)(0.198)(0.563)(0.505)(0.402)
Table 6. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A trained on the wikipedia data as input matrix using automatic gradient methods.
Table 6. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A trained on the wikipedia data as input matrix using automatic gradient methods.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0ballfilmsalazarcupbeesheard
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1penaltystarredgeoffreyfifabeecourt
(0.994)(0.978)(1.0)(0.995)(0.996)(0.966)
2refereerolerushnationalspeciesdivorce
(0.992)(0.964)(1.0)(0.991)(0.995)(0.944)
3mayburtonbardemworldpollenalcohol
(0.989)(0.937)(1.0)(0.988)(0.986)(0.933)
4goalseriesbrentonuefahoneyabuse
(0.986)(0.935)(1.0)(0.987)(0.971)(0.914)
0penaltystarredgeoffreyfootballspeciescourt
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1refereerolerushfifabeesdivorce
(0.999)(0.994)(1.0)(0.994)(0.995)(0.995)
2goalseriessalazarnationalbeealcohol
(0.998)(0.985)(1.0)(0.983)(0.99)(0.987)
3playerburtonbrentoncuppollenabuse
(0.997)(0.981)(1.0)(0.983)(0.99)(0.982)
4ballfilmthwaitesworldinsectssettlement
(0.994)(0.978)(1.0)(0.982)(0.977)(0.978)
Table 7. Top 10 representative words per dimension of the basis matrix A , trained on the wikipedia data as input tensor using automatic gradient methods.
Table 7. Top 10 representative words per dimension of the basis matrix A , trained on the wikipedia data as input tensor using automatic gradient methods.
Topic 1
#481
Topic 2
#661
Topic 3
#414
Topic 4
#457
Topic 5
#316
Topic 6
#1711
1hindgamefilmheardbeesdisorder
(0.646)(0.83)(0.941)(0.844)(0.922)(0.291)
2segmentsfootballstarredcourtbeecollapse
(0.572)(0.828)(0.684)(0.566)(0.868)(0.29)
3bacteriaplayersroledivorcehoneyattrition
(0.563)(0.782)(0.624)(0.51)(0.756)(0.285)
4legsballseriesdeppinsectslosses
(0.562)(0.777)(0.562)(0.508)(0.68)(0.284)
5antennaeteamburtonsuedfoodinvertebrate
(0.555)(0.771)(0.547)(0.48)(0.634)(0.283)
6femalesmaycharacterstatingspeciesrate
(0.549)(0.696)(0.499)(0.45)(0.599)(0.283)
7wingsplaysuccessalcoholnestsbusinesses
(0.547)(0.692)(0.494)(0.449)(0.596)(0.282)
8smallcompetitionsplayedparadisflowersvirgil
(0.538)(0.677)(0.483)(0.446)(0.571)(0.282)
9groupsmatchfilmsallegedpolleniridescent
(0.527)(0.672)(0.482)(0.445)(0.56)(0.282)
10malespenaltyboxstatedlarvaedetail
(0.518)(0.664)(0.465)(0.444)(0.529)(0.281)
Table 8. Top 10 representative words per dimension of the basis matrix A, trained on the wikipedia data as input tensor using multiplicative update rules.
Table 8. Top 10 representative words per dimension of the basis matrix A, trained on the wikipedia data as input tensor using multiplicative update rules.
Topic 1
#521
Topic 2
#249
Topic 3
#485
Topic 4
#871
Topic 5
#445
Topic 6
#1469
1speciesgamehoneyallowinsectsdepp
(3.105)(3.26)(2.946)(0.668)(2.794)(2.419)
2eusocialfootballbeeorganisedpollenfilm
(2.524)(3.05)(2.01)(0.662)(2.239)(2.115)
3solitaryplayersbeekeepingwinnerflowersrole
(2.279)(2.699)(1.933)(0.632)(2.019)(1.32)
4nestballbeesofficiallynectarstarred
(2.118)(2.475)(1.704)(0.626)(1.656)(1.3)
5femalesmayincreasedwinswaspsactor
(1.993)(2.447)(1.589)(0.617)(1.602)(1.155)
6workersteamhumanslevelwingsseries
(1.797)(2.424)(1.515)(0.613)(1.588)(1.126)
7nestsassociationwildfreemanyburton
(1.787)(1.92)(1.415)(0.6)(1.588)(1.112)
8coloniesplaymitesconstitutehindplayed
(1.722)(1.834)(1.4)(0.596)(1.577)(1.068)
9eggrefereecolonyregulationhairsheard
(1.692)(1.809)(1.35)(0.595)(1.484)(1.005)
10maleslawsbeekeepersprestigiouspollinatingsuccess
(1.664)(1.792)(1.332)(0.594)(1.467)(0.981)
Table 9. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A , trained on the wikipedia data as input tensor using automatic gradient methods.
Table 9. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A , trained on the wikipedia data as input tensor using automatic gradient methods.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0hindgamefilmheardbeesdisorder
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1segmentsfootballstarredcourtbeecollapse
(0.995)(1.0)(0.968)(0.954)(0.999)(1.0)
2legsplayersroledivorcehoneylosses
(0.994)(0.999)(0.954)(0.925)(0.99)(1.0)
3antennaeballseriessuedinsectsattrition
(0.993)(0.999)(0.951)(0.907)(0.976)(0.999)
4wingsteamburtonallegedfoodbusinesses
(0.992)(0.998)(0.945)(0.897)(0.97)(0.999)
0segmentsfootballstarredcourtbeecollapse
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1antennaegameroledivorcebeesdisorder
(1.0)(1.0)(0.993)(0.996)(0.999)(1.0)
2wingsplayersseriessuedhoneylosses
(0.999)(0.999)(0.978)(0.991)(0.995)(0.999)
3bacteriaballburtonallegedinsectspesticide
(0.999)(0.999)(0.975)(0.981)(0.984)(0.998)
4legsteamfilmalcoholfoodbusinesses
(0.998)(0.999)(0.968)(0.981)(0.976)(0.998)
Table 10. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A, trained on the wikipedia data as input tensor using multiplicative update rules.
Table 10. For the most significant two words per topic, the four nearest neighbors based on cosine similarity are listed. Matrix A, trained on the wikipedia data as input tensor using multiplicative update rules.
Topic 1Topic 2Topic 3Topic 4Topic 5Topic 6
0speciesgamehoneyallowinsectsdepp
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1easierfootballboatwrightsemancipationultravioletcharlie
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
2tinyplayersgladebroadlymechanicsinfiltrate
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
3halictidaeassociationtutankhamundisabilitiesexploitthenwife
(0.999)(1.0)(1.0)(1.0)(1.0)(1.0)
4provisionteamoracletotalswallowstourist
(0.999)(0.997)(1.0)(1.0)(1.0)(1.0)
0eusocialfootballbeeorganisedpollenfilm
(1.0)(1.0)(1.0)(1.0)(1.0)(1.0)
1oligoceneplayerssubfamiliescomeshoneybeesstarred
(1.0)(1.0)(0.995)(1.0)(1.0)(1.0)
2architecturegameinternalshowsenlargedsmoking
(1.0)(1.0)(0.994)(1.0)(0.998)(0.999)
3uncommonassociationstudiedattentionsimpledislocated
(1.0)(1.0)(0.994)(1.0)(0.998)(0.999)
4termedteamcladogramdeductionsdroveinjuries
(1.0)(0.997)(0.99)(1.0)(0.998)(0.999)
Table 11. Top 10 representative words per dimension of the basis matrix A, trained on the Amazon review data as input tensor using multiplicative update rules.
Table 11. Top 10 representative words per dimension of the basis matrix A, trained on the Amazon review data as input tensor using multiplicative update rules.
Topic 1
#528
Topic 2
#445
Topic 3
#1477
Topic 4
#1790
Topic 5
#1917
Topic 6
#597
Topic 7
#789
Topic 8
#670
Topic 9
#1599
Topic 10
#188
1annashenlegendarylasseterdiscscreamscodemikewoodypo
(4.215)(4.21)(1.459)(2.887)(1.367)(3.779)(3.292)(4.325)(6.12)(5.737)
2elsapeacockvalleydirectorbirdsenergyemailcrystalbuzzmaster
(4.087)(2.668)(1.448)(2.392)(1.343)(3.315)(2.781)(4.055)(5.484)(5.276)
3olafoldmantempleandrewwidescreenmonstropolispromobillyandyshifu
(2.315)(2.627)(1.31)(2.158)(1.327)(3.13)(2.645)(3.911)(4.355)(4.707)
4trollsgarykimstantonouttakesworldfreegoodmantoysdragon
(2.241)(2.423)(1.307)(2.119)(1.238)(3.109)(2.343)(3.812)(4.119)(4.344)
5frozenlordfearsomespecialextrasmonsterspromotionsullylightyearwarrior
(2.196)(2.201)(1.288)(1.892)(1.185)(3.047)(2.279)(3.728)(3.334)(4.274)
6kristoffweaponteacherpetedvdcitypromotionalwazowskiallentai
(2.155)(1.469)(1.288)(1.612)(1.142)(2.994)(2.266)(3.513)(2.752)(4.082)
7queenwolfbattleranftincludedmonsteramazonrandalltimlung
(2.055)(1.405)(1.264)(1.564)(1.13)(2.978)(2.129)(3.404)(2.609)(3.993)
8hansinnerdukjoeshortpowerclicksulleyhanksfive
(2.054)(1.38)(1.257)(1.564)(1.101)(2.919)(2.024)(3.396)(2.471)(2.952)
9sisteryeohtrainfeaturegamesscaredownloadbuscemicowboyoogway
(1.904)(1.359)(1.221)(1.555)(1.1)(2.614)(1.889)(3.266)(2.407)(2.918)
10icemichellewarriorsralphtourclosetinstructionsjamesspacefurious
(1.839)(1.354)(1.22)(1.518)(1.031)(2.555)(1.877)(3.264)(2.375)(2.806)
Table 12. Top 10 representative words per dimension of the basis matrix A, trained on the New York Times news article data as input tensor using multiplicative update rules.
Table 12. Top 10 representative words per dimension of the basis matrix A, trained on the New York Times news article data as input tensor using multiplicative update rules.
Topic 1
#454
Topic 2
#5984
Topic 3
#567
Topic 4
#562
Topic 5
#424
Topic 6
#330
Topic 7
#515
Topic 8
#297
Topic 9
#431
Topic 10
#436
1suleimaniloansmasksfloydcontributedconfederateukrainestormrestaurantsweinstein
(2.812)(0.618)(3.261)(3.376)(4.565)(3.226)(3.191)(2.76)(2.948)(3.442)
2iranuniversityprotectiveminneapolisreportingstatuesondlandhurricanebarssexual
(2.593)(0.551)(2.823)(2.551)(2.788)(2.649)(2.881)(2.622)(2.021)(2.71)
3iraqoilglovespolicemichaelstatueszelenskywindsreopenrape
(2.453)(0.549)(2.516)(2.255)(2.707)(2.416)(2.133)(1.715)(1.684)(2.102)
4iranianbillionventilatorsgeorgekatiemonumentsambassadortropicalgymsassault
(2.408)(0.54)(2.22)(2.088)(2.324)(1.815)(1.976)(1.606)(1.654)(1.861)
5iraqiloansurgicalprotestsalanmonumentukrainianstormsstoresjury
(1.799)(0.468)(2.032)(1.936)(2.292)(1.375)(1.789)(1.439)(1.638)(1.513)
6baghdadbondsgownsbrutalityemilyflaggiulianicoasttheaterscharges
(1.604)(0.466)(1.965)(1.765)(2.165)(1.206)(1.755)(1.415)(1.627)(1.409)
7qassimpaymentsequipmentracismnicholasrichmondvolkerlaurasalonspredatory
(1.599)(0.456)(1.86)(1.579)(2.096)(1.109)(1.754)(1.259)(1.438)(1.387)
8strikeeditedsupplieskneecochranesymbolsinvestigationsisaiasclosedharvey
(1.597)(0.452)(1.816)(1.435)(1.934)(1.089)(1.602)(1.217)(1.424)(1.35)
9gentrilliongearkillingrappeportremovetestifiedcategoryshopsguilty
(1.513)(0.451)(1.742)(1.429)(1.613)(1.058)(1.584)(1.192)(1.325)(1.312)
10majgraduatedmaskofficersmaggieremovaltestimonylandfallindoorsex
(1.504)(0.449)(1.502)(1.405)(1.529)(1.003)(1.558)(1.106)(1.247)(1.3)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hillebrand, L.; Biesner, D.; Bauckhage, C.; Sifa, R. Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM. Mach. Learn. Knowl. Extr. 2021, 3, 123-167. https://doi.org/10.3390/make3010007

AMA Style

Hillebrand L, Biesner D, Bauckhage C, Sifa R. Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM. Machine Learning and Knowledge Extraction. 2021; 3(1):123-167. https://doi.org/10.3390/make3010007

Chicago/Turabian Style

Hillebrand, Lars, David Biesner, Christian Bauckhage, and Rafet Sifa. 2021. "Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM" Machine Learning and Knowledge Extraction 3, no. 1: 123-167. https://doi.org/10.3390/make3010007

Article Metrics

Back to TopTop