Incremental Nonnegative Tucker Decomposition with Block-Coordinate Descent and Recursive Approaches

Nonnegative Tucker decomposition (NTD) is a robust method used for nonnegative multilinear feature extraction from nonnegative multi-way arrays. The standard version of NTD assumes that all of the observed data are accessible for batch processing. However, the data in many real-world applications are not static or are represented by a large number of multi-way samples that cannot be processing in one batch. To tackle this problem, a dynamic approach to NTD can be explored. In this study, we extend the standard model of NTD to an incremental or online version, assuming volatility of observed multi-way data along one mode. We propose two computational approaches for updating the factors in the incremental model: one is based on the recursive update model, and the other uses the concept of the block Kaczmarz method that belongs to coordinate descent methods. The experimental results performed on various datasets and streaming data demonstrate high efficiently of both algorithmic approaches, with respect to the baseline NTD methods.

Nonnegative Tucker decomposition (NTD) is a particular case of the Tucker decomposition in which the nonnegativity constraints are imposed onto the core tensor and all the factor matrices. Due to such constraints, the multi-way features are parts-based representations, easier for interpretation and may have a physical meaning if an input multi-way array contains only nonnegative entries. NTD has been used in multiple applications, including image classification [35][36][37][38], clustering [39], hyperspectral image denoising and compression [40,41], audio pattern extraction [42], image fusion [43], EEG signal analysis [44,45], etc.
There are also various algorithmic approaches to NTD [46], which are mostly based on the well-known alternating optimization strategy. This approach constitutes a convex relaxation to the NP-hard problem, but it involves the Kronecker product of all but one factor matrices across each mode. In consequence, a huge Kronecker product matrix must be stored in the RAM memory and processed in each alternating step, which is computationally intractable for large-scale tensors. To tackle this problem, a variety of computational issues have been proposed in the literature, partially summarized in Section 1.1.
In this study, we assume that observed data can be classified into the following categories: (1) observed data are composed of a large number of multi-way arrays that form a one-mode long static tensor; (2) observed data belong to a class of dynamic multiway streaming data, i.e., the multi-way samples are not observed in the form of one batch, but as a sequence of samples, where one or few samples are observed for each time instant. In this case, all of the factor matrices and the core tensor may also be considered as dynamic objects with time-varying features. Splitting the data observed in a given window into a currently observed sample or a short-time batch of the latest samples and the historical samples, which were observed in the past, the Tucker decomposition model can be split accordingly, which facilitates the factor updating process.
We propose two algorithmic approaches to update the factors in NTD incrementally with currently observed samples. In both approaches, the factor that corresponds to the mode of ordering the samples and the core tensor are updated in the same way. A new upcoming sample or a block of samples increments this factor matrix by a new row or a block of rows, respectively. This step can be performed with any nonnegatively constrained least-squares (NNLS) solver. The core tensor can also be readily updated with any NNLS solver considering only the upcoming data sample. The main difference between the proposed algorithms persists in updating the remaining factors.
In the first approach, the factor matrices are updated with the recursive strategy involving the nonnegatively constrained Gauss-Seidel method [47]. The factor matrix for each mode is estimated from the normal equation that is additively updated with the components, resulting only from processing a new data sample. This strategy considerably reduces the size of the Kronecker product that is computed for the baseline NTD model.
The second approach is motivated by the Kaczmarz-online updating strategy that was proposed in [48] for the online nonnegative matrix factorization (ONMF) model. The Kaczmarz method [49] is addressed for solving a linear least-squares problem by performing cyclic projections onto hyperplanes generated by successive linear equations (row actions). The block Kaczmarz (BK) method [50,51] is based on the similar cyclic strategy but in each step a block of rows is processed instead of a single equation. Motivated by this concept, we propose to update the factor matrices by performing block-coordinate descent updates for each upcoming data sample. This approach has a linear convergence rate and can be applied both to the NTD model, as well as without the nonnegativity constraints.
Using the objective function given by the Euclidean distance, both above-mentioned computational approaches boil down to alternating approximations of the small-scale computational problems with symmetric system matrices and nonnegativity constraints, and finally they can be solved with any NNLS solver. The symmetry is guaranteed by the fundamental properties of normal equations.
The experimental results were performed on both static as well as dynamic data. In the first case, a long sequence of nonnegatively constrained multi-way arrays were generated randomly. In the other case, a benchmark of spectral signals was used to generate mixed observed samples, assuming the spectral signals are time-varying.
The remainder of this paper is organized as follows. Section 1.1 reviews related studies on incremental or online tensor decomposition methods. Section 1.2 presents the notations and the preliminaries to fundamental mathematical operations on tensors. It also contains a short description of the baseline Tucker decomposition method. The proposed incremental Tucker decomposition model and two algorithmic strategies are presented in Section 2.
Numerical experiments performed on large-scale static and dynamic tensorial data are presented and discussed in Section 3. The final section provides concluding statements.

Related Works
Incremental versions of various tensor decomposition models have been extensively studied in the last decade, including online and incremental CP [52,53], online TT [54], dynamic approximation to HT [55], etc. There are many approaches and computational strategies for the Tucker decomposition.
The first versions of streaming Tucker decomposition were proposed by Sun et al. [56,57] in 2006. They were addressed for window-based tensor analysis [57] and dynamic tensor analysis [56]. Both can be regarded as extensions of the baseline Tucker decomposition model to incremental processing. However, orthogonal factor matrices in these models are computed with the standard singular value decomposition (SVD) applied to additively updated covariance matrices, which is their bottleneck. The above models have been next generalized and deeply studied in [58]. The Tucker decomposition with an incremental SVD was discussed in [59]. Gujral et al. [60] proposed the sampling-based batch incremental tensor decomposition (SamBaTen) that extends the CP model to an incremental version.
Another computational approach for processing large-scale streaming tensors with the Tucker decomposition was proposed by Malik and Becker [61]. It is the TensorSketch algorithm that is based on sketching Kronecker products in the Tucker decomposition using randomization techniques. This method is also addressed for updating factor matrices with orthogonality constraints. The Tucker decomposition was also extended for processing large-scale sparse tensors by Oh et al. [11] who proposed the P-TUCKER algorithm. It performs alternating least squares (ALS) updates with a row-wise rule, which is a memorysaving strategy for updating factor matrices and can be easily parallelized for multi-core processing. The updated factors with ALS are then orthogonalized using the alternating QR factorization. Another version of a scalable and incremental version of the Tucker decomposition was proposed by Xiao et al. [49]. In their approach, the input tensor may be dynamic and long in each mode. It is partitioned into timestamp-based small-scale subtensors along each mode, and then the factor matrices are updated with the ALS-based rule considering the factor matrices from previous timestamps and the auxiliary matrices from the current timestamp. The modified Gram-Schmidt orthogonalization is then used to impose the orthogonality constraints on the estimated factors. Other versions of the online Tucker decomposition were also studied in the unpublished work [62]. There are also statistical approaches to the online Tucker decomposition, e.g., Bayesian streaming sparse Tucker decomposition [63] and online stochastic decompositions [64]. Recently, Chachlakis et al. [65] proposed the dynamic L 1 -Tucker decomposition algorithm for dynamic and outlier-resistant Tucker analysis of multi-way streaming data. Their approach is also not addressed for NTD.
To the best of our knowledge, there is no incremental version of NTD. In this study, we propose a new computational approach to NTD that is inspired by the online version of NMF proposed in [48].

Preliminaries
Notations: tensors, matrices, vectors, and scalars are denoted by calligraphic uppercase letters (e.g., Y), boldface uppercase letters (e.g., Y), lowercase boldface letters (e.g., y), and non-bold letters (e.g., y), respectively. For a matrix Y, y i,j denotes the (i, j)-th element, and y j or y j stand for the j-th column or row, respectively. The set of non-negative real numbers is denoted by R + . The subtensor obtained from Y by selecting its t-th batch subtensor across the N-th mode is denoted Y (t) ∈ R . Let Y = [y i 1 ,i 2 ,...,i N ] ∈ R I 1 ×I 2 ×...×I N be the N-way array, which will be referred to as the N-modal tensor.
Unfolding, also known as matricization, is the way of getting a matrix from a multiway array by reshaping the data. The mode-n matricization of Y, denoted by Y (n) ∈ R I n ×∏ p =n I p , rearranges mode-n fibers placing them as the columns of matrix Y (n) . Tensor entries (i 1 , i 2 , . . . , i N ) are mapped to matrix's entries (i n , j), where , . . . , Y (L) } be a set of N-way arrays, such as ∀l : Y (l) ∈ R I 1 ×...×I n−1 ×I (l) n ×I n+1 ×...×I N , where l = 1, . . . , L. The concatenation of these multi-way arrays along the n-th mode is expressed by: where I n = ∑ L l=1 I (l) n .
The standard model of the Tucker decomposition of Y has the form: where G = [g j 1 ,j 2 ,...,j N ] ∈ R J 1 ×J 2 ×...×J N , for J n ≤ I n and n = 1, . . . , N, is the core tensor, and J n ] = [u i n ,j n ] ∈ R I n ×J n is the n-th factor matrix containing the features across the n-th mode of Y. The symbols × n and • stand for the tensor-matrix product along the n-th mode, and the outer product, respectively. Tensor G has the multi-linear rank (J 1 , J 2 , . . . , J N ), i.e., ∀n : rank(G (n) ) = J n , where G (n) is the unfolded version of G along the n-th mode.
Nonnegative Tucker decomposition (NTD) is a special case of the Tucker decomposition where the nonnegativity constraints are imposed onto all the factor matrices and the core tensor, i.e., ∀i n , j n : u i n ,j n ≥ 0, g j 1 ,j 2 ,...,j N ≥ 0 for n = 1, . . . , N.
To estimate the factors in NTD, the alternating optimization scheme is typically used, where model (3) is unfolded along the n-th mode in each step. Thus: is a nonnegative matrix, and ⊗ denotes the Kronecker product.
Let D Y (n) || U (n) G (n) U ⊗ −n T be the objective function that expresses dissimilarity between Y (n) and the unfolded version of the Tucker model. The optimization problem for estimation of the n-th factor in NTD can be presented in the generative form as the nonnegatively constrained nonlinear optimization problem: where Core tensor G can be easily estimated by vectorizing the model (3). Thus where is vectorized core tensor G. The problems (5) and (7) can be solved using various constrained optimization solvers. In particular, when D(·||·) is expressed by the Euclidean distance, problems (5), and (7) become the least-squares problems with nonnegativity constraints. The choice of the Euclidean distance is also motivated by the assumption that the residual error has a normal distribution, which is usually justified in practice. Assuming a wider class of data distributions, the β-divergence seems to be the best choice since it combines many well-known dissimilarity measures, including the generalized Kullback-Leibler divergence and the Itakura-Saito distance [1]. However, the proposed methodology explicitly involves the formulation of normal equations, which limits the profits of using other statistical measures than the Euclidean distance.

Incremental Tucker Decomposition
The proposed incremental NTD (INTD) is based on the assumption that the observed multi-way data can be expressed by a sequence of sub-tensors with respect to one modeusually the mode representing the time. The tensorial sequence can be regarded as one long-tail tensor (with one long mode) or as a dynamic sequence that comes online and needs to be processed sequentially. For both cases, the factor matrices and the core tensor in the proposed INTD can be updated incrementally. This section presents the model and the algorithmic issues for the proposed INTD.

Model
Without loss of generality, let us assume observed tensor Y ∈ R I 1 ×I 2 ×...×I N + can be regarded as a sequence of N-way arrays concatenated along the N-th mode. Thus: be the N-way array representing the historical data, i.e., the multi-way arrays concatenated since the first tensor Y (1) until the (t − 1)-th array. We assume that Y (t) ∈ R is the latest tensor which arrives in the t-th time slot or the t-th subwindow. Following this assumption, model (8) takes the form: For the data having the structure in (9), the INTD has the following form: is the factor containing the past features of Y along the N-th mode, and U contains the current features that are valid in the t-th time slot.
According to the unfolding rule in (1) and using (4), we have: Formula (11) implies that the current sample Y (t) after unfolding with respect to the n-th mode (n < N) is located as the last I (t) N columns in Y (n) , whereas this sample after unfolding with respect to the N-th mode is located as the last I (t) N rows of Y (n) according to (12).

Recursive Algorithm
In NTD, factor matrices U (n) for n < N can be easily estimated from the strongly over-determined system of linear equations in (4). Assuming that the objective function in (5) is expressed by the Euclidean distance, system (4) can be transformed to the normal equations as follows: (13), we have for the t-th sample: where and The sign < ·, · > −n stands for the inner product ( The inner product of A and B along all but the n-th mode is of tensors along all, but the n-th mode. Note that Formula (14) can be regarded as an iterative update rule with respect to the step t. Since Q n for the t-sample can be expressed in a similar manner: and Since matrix Q (t) n is square and symmetric, i.e., Q n , factor U (n) can be updated for ∀n < N by solving the following equation: To estimate factor matrix U (N) t , the system of linear equations in (12) is used. Note that where and Transforming system (23) to the normal equations, we have: Following the similar calculations as in (14) and (17), we obtain: Matrix U (N) t can be readily estimated from the system: using any nonnegatively constrained linear least-squares solver.
Applying the concept of the Gauss-Seidel method for least-squares problems, factors {U (n) } can be updated with the following block-coordinate descent update rule: where q (n,t) j n j n is the j n -th diagonal entry of Q Following the similar strategy to estimate tensor G, we have: which can be equivalently rewritten in the vectorized form: where g = vec{G} ∈ R ∏ p J p + . Vector g, estimated from (30) using any NNLS solver, is then tensorized to G.
Due to the scaling ambiguity, the columns in factors must be scaled to the unit norm, i.e., l 1 -norm. For n < N, the scaling was performed by mapping: ∀n < N : is the j-th column of U (n) . For n = N, the updated block is U (N) t , and we assume that blockŨ (N) has been already normalized. Hence, we have the The samples {Y (t) } can be proceeded sequentially for t = 1, 2, . . ., however, this process must be initialized by performing a batch decomposition on a small initial samplẽ Y using any NTF algorithm. Having the objects {G, U (1) , . . . ,Ũ (n) } estimated fromỸ, the matrices P Compute U (n) using rule (27); 12 Scale U (n) ← U (n) diag ||u To compute (16) and (19), we need O(I , respectively. The computation of G is the most computationally involving when computing the left-hand side in (29), and it can be approximated by O ∑ N n=1 ∏ n p=1 I p ∏ N s=n J s . Assuming ∀n : J n < I n , all these costs can be upper-bounded by the term: where α and β are some constants that are not dependent on I (t) N .

Remark 2.
Following the similar analysis as in Remark 1 for the full batch NTD, and assuming the same rules for updating {U (n) } and G, its computational complexity can be upper-bounded by O Kα + βKI N ∏ N−1 p=1 I p , where K is the number of alternating steps. Assuming the batch data are divided into L equal size multi-way samples, i.e., I N = LI it is thus obvious that the full batch NTD has a higher computational complexity than Algorithm 1 if K > 1.

Block Kaczmarz Algorithm
Model (11) for n < N can be equivalently expressed in the transposed form: and Matrix has a block-structure. Assuming that rank(Z (n) ) = J n and rank(Z (n) t ) = J n , the k-th iterative update of matrix U (n)T for k = 0, 1, . . . can be obtained using the block Kaczmarz method with nonnegativity constraints: where Z can be estimated from (23) using the same computational approach as presented in Section 2.2. Similarly, tensor G can be directly obtained from (30). The block Kaczmarz NTD (BK-NTD) for updating objects {G, U (1) , . . . , U (n) t } for t = 1, 2, . . . is presented by Algorithm 2, which additionally has the inner iterations.
Sweeping over the blocks across the N-th mode, the convergence rate of rule (36) can be analyzed using the concept of row paving in [50].   Compute (23) with any NNLS solver; Compute Z (n) t using (35); 10 Update U (n) with rule (36), // block Kaczmarz update rule 11 Scale Compute g = NNLS(Q, vec(Z )); Assuming in each iterative step k, a new block for t k is selected, and rank(Z where U (n) * is the unique least-squares solution to the problem min The proof of Lemma 1 can be easily obtained considering the result given in [50]. Lemma 1 shows that the convergence rate of rule (36) is linear.  Note that U (n) andḠ contain the time-varying or dynamic features, assuming t represents the time.
Thus, the corresponding model will be referred to as the dynamic Tucker decomposition model, which has the following form with respect to the n-th mode: where ∀n < N :

Numerical Simulations
The proposed algorithms were extensively tested using various benchmarks of synthetic data and observation scenarios.

Setup
The following benchmarks were used in the experiments: • Benchmark I: the observed dataset Y ∈ R I 1 ×I 2 ×I 3 + was created according to model (3).
Factor matrices U (n) = [u i n ,j n ] ∈ R I n ×J n + for n = 1, 2, 3 were generated using the zeromean normal distribution, where ∀n, i n , j n : u i n ,j n = max{0,û i n ,j n } andû i n ,j n ∼ N (0, 1). Thus, such factors have the sparsity of 50%. • Benchmark II: the observed dataset Y ∈ R I 1 ×I 2 ×I 3 + was also created using model (3). Factor matrices U (1) and U (2) were generated similarly as in Benchmark I. Each column in factor U (3) expresses the periodic Gaussian pulse train obtained by a repetition of Gaussian-modulated sinusoidal pulse, circularly shifted with a randomly selected time lag. The negative entries are replaced with a zero-value. One waveform of such a signal is plotted in Figure 1a. The frequency and the phase of the sinusoid signal was set individually for each component/column. Assuming U (3) represents source signals, the fibers in Y along the third mode can be regarded as mixed signals with a multi-linear mixing operator. • Benchmark III: the observed dataset Y ∈ R I 1 ×I 2 ×I 3 + was created according to the dynamic NTD model in (38). For the whole set of temporal samples (t = 1, . . . , T), factors U (1) t and U (2) t take the form of three-way arrays U (1) ∈ R I 1 ×J 1 ×T + and U (2) ∈ R I 2 ×J 2 ×T + that contain J 1 and J 2 components, respectively. Each component is obtained by a linear interpolation of a pair of the spectral signals taken from the file AC10_art_spectr_noi in Matlab toolbox NMFLAB for Signal Processing [66]. Hence, I 1 and I 2 refer to spectral resolutions, while T denotes the number of time slots within the interpolated window. The example of one such component is illustrated in Figure 1b in the form of a 3D plot. Assuming the components in U (1) and U (2) are time-varying spectral signals, model (38) can be used to represent a time-varying multi-linear mixing process. The proposed algorithms: BK-NTD (Algorithm 2) and RI-NTD (Algorithm 1) were compared with the HALS-NTD [67], which is regarded in the literature as one of the most efficient algorithms for NTD. The maximum number of iterations in the HALS-NTD was set to 100 and the tolerance for early termination was equal to 10 −5 . In the incremental versions of NTD, we setĨ 3 = 100 (the number of samples in an initial block) and I (t) n = 100 (the number of samples in a regular block). The number of inner iterations in BK-NTD was set to 5, i.e., K i = 5. The maximum number of iterations in the NNLS algorithm both for BK-NTD as well as for RI-NTD was set to 300.
The BK-NTD and RI-NTD were implemented in MATLAB 2016b, and the HALS-NTD was taken from the Matlab toolbox TENSORBOX (https://faculty.skoltech.ru/people/ anhhuyphan (accessed on 1 August 2018). All of the algorithms were run on a machine supplied with a 4-core Intel Core i7 CPU, 64 GB RAM, and an SSD drive.
To analyze the susceptibility of the algorithms to noisy perturbations, the synthetic data were perturbed with an additive zero-mean Gaussian noise N (0, σ 2 ) with the variance σ 2 selected to satisfy a given level of signal-to-noise ratio (SNR).
The following tests were performed using the above mentioned benchmarks: • Test A: Benchmark A was used in this test with settings: I 1 = I 2 = 100 and J 1 = J 2 = J 3 = 10 in all the observed scenarios. The test was carried out for I 3 ∈ {10 3 , 10 4 , 10 5 }.
No noisy perturbations were used in this test. Since the optimization problem in the NTD model is non-convex and, hence, sensitive to initialization, we performed the Monte Carlo (MC) analysis with 10 runs. In each run, a new sample of dataset and a new initializer were randomly selected. Figure 2 presents the results obtained in Test A. The averaged SIR performance versus the length of the analyzed tensor with respect to the last mode (the third one) is illustrated in Figure 2a in the form of a bar plot. The standard deviation in the MC runs is marked with the whiskers. The averaged ET in Test A is illustrated in Figure 2b.

Results
The results obtained in Test B are depicted in Figure 3. In this test, we demonstrate how the tested algorithms are prone to noisy perturbations. The averaged SIR performance and the normalized residual error versus the power of noisy disturbances are illustrated in Figure 3a,b, respectively. Figure 4a,b present the averaged SIR values and the ET versus the number of observed samples (I 3 ) in Test C, respectively. Figure 5a illustrates the averaged SIR performance versus the SNR values. The corresponding normalized residual errors are depicted in Figure 5b.  Figures 6 and 7 show the results obtained in Test E in which only BK-NTD was used, but in the online version expressed by model (38). In this case, the SIR was computed for U (3) and the last samples in U (1) and U (2) since the temporal resolution decreases with respect to the original factors (due to sequential batch processing). The SIR performance versus the number of entries along the third mode is illustrated in Figure 6a for noise-free and noisy data. The corresponding averaged ET values are presented in Figure 6b.    The original and estimated spectral signals that are represented by factors U (1) and U (2) are displayed in Figure 7 in the form of 2D plots with scaled colors. Note that u (1) Figure 7a presents the same signal as shown in Figure 1b, but in a more compact and informative way, especially for subjective evaluation of time-varying peak profiles.

Discussion
The results demonstrate multiple advantages of the proposed incremental versions of NTD. First, these methods are much faster than batch processing, especially when the observed data contain large numbers of samples. Figures 2b and 4b show that BK-NTD is at least 10 times faster than HALS-NTD, and RI-NTD is about 10 times faster than BK-NTD. The computational complexities of the proposed methods increase linearly with I N , and the experimental results confirm the theoretical analyses in Remarks 1, 2, and 3. Second, the high speed of processing is neither obtained due to a decrease in the quality of model fitting nor the SIR performance. On the contrary, BK-NTD and RI-NTD in Tests A and B (Figures 2 and 3) have considerably higher SIR performance than HALS-NTD if SNR ≥ 10 [dB], at a significantly lower level of the residual error. For a very strong noise (SNR = 0 [dB]) all algorithms have a similar SIR performance, but HALS-NTD reaches a lower residual error. Benchmark II is more difficult for the proposed algorithms than Benchmark I. However, BK-NTD considerably outperforms HALS-NTD for noise-free data with I N = 10 4 , as demonstrated in Test C. For other test cases (in Test C and D), there is no significant difference in the SIR performance between BK-NTD and HALS-NTD. RI-NTD yields the factors of a similar quality as the other algorithms, but only for strongly noisy data (SNR ≤ 10 [dB]). Comparing RI-NTD with BK-NTD, we observed that the former is much faster, but has a better SIR performance only for very noisy data from Benchmark II.
The next advantage of the proposed algorithms is that they can process nearly an infinitely long sequence of multi-way samples, which is intractable for batch NTD algorithms, e.g., such as HALS-NTD. We were unable to run HALS-NTD when I 3 = 10 5 due to the limit of RAM memory (we used 64 GB RAM), while the limit for the incremental versions of NTD restricts mostly to processing time in practice. Due to the incremental processing of multi-way data, factors U (1) and U (2) can capture a dynamic characteristics of the underlying processes. In other words, the factors can vary with time or along with the mode representing the samples. The results obtained in Test E confirmed that BK-NTD demonstrates a high efficiency in recovering dynamic factors U (1) and U (2) , even for strongly noisy data. As shown in Figure 6a, the difference in the SIR performance between the noise-free and noisy data with SNR = 10 [dB] is statistically not significant. The quality of estimation depends on the length of an input sequence but the SIR performance weakly changes if the sequences are not shorter than 10 4 samples (see Figure 6a). The quality of the estimated factors can also be confirmed by the results shown in Figure 7. Similarly as in Tests A and B, the time complexity is linear in terms of the number of samples, which is confirmed by the results plotted in Figure 6b.

Conclusions
In this study, we proposed new computational algorithms for an incremental version of the NTD model. One of them is based on the concept of row-action projections with the block Kaczmarz method, and the other uses the recursive nonnegative least-squares updates. The algorithms were applied both for performing the NTD of a long sequence of multi-way samples as well as for extracting time-varying multi-linear features. All of the proposed solutions were validated experimentally with respect to multiple benchmarks and criteria, including the quality of estimated factors and the runtime. The numerical results demonstrated that both algorithms are at least ten times faster than the batch version of NTD (HALS-NTD), keeping at least the same quality of estimated factors. In a few test cases, the block Kaczmarz algorithm considerably outperforms the baseline version of NTF with respect to the quality of the estimated factors and the residual error, even for moderately noisy data. The block Kaczmarz method also demonstrated a high efficiency in processing a dynamic NTD model with all of the time-varying estimated factors. This approach can find a number of potential applications, e.g., for solving a blind source separation problem with a time-varying mixing operator or for a spectral signal unmixing problem with dynamic endmember spectra.
To summarize, we proposed efficient computational approaches for incremental processing of a long sequence of multi-way samples with the NTD model and for estimating time-varying features in the dynamic NTD model. The efficiency is confirmed with multiple experimental results. Obviously, the proposed methodology can be extended in the future, e.g., to tackle partially orthogonal time-varying features or as an extension to other tensor decomposition models, such as tensor networks.