HOSVD-Based Algorithm for Weighted Tensor Completion

Matrix completion, the problem of completing missing entries in a data matrix with low-dimensional structure (such as rank), has seen many fruitful approaches and analyses. Tensor completion is the tensor analog that attempts to impute missing tensor entries from similar low-rank type assumptions. In this paper, we study the tensor completion problem when the sampling pattern is deterministic and possibly non-uniform. We first propose an efficient weighted Higher Order Singular Value Decomposition (HOSVD) algorithm for the recovery of the underlying low-rank tensor from noisy observations and then derive the error bounds under a properly weighted metric. Additionally, the efficiency and accuracy of our algorithm are both tested using synthetic and real datasets in numerical simulations.


Introduction
In many data-rich domains such as computer vision, neuroscience, and social networks, tensors have emerged as a powerful paradigm for handling the data deluge. In recent years, tensor analysis has gained more and more attention. To a certain degree, tensors can be viewed as the generalization of matrices to higher dimensions, and thus multiple questions from matrix analysis extend naturally to tensors. Similar to matrix decomposition, the problem of tensor decomposition (decomposing an input tensor into several less complex components) has been widely studied both in theory and application (see e.g., [1][2][3]). Thus far, the problem of low-rank tensor completion, which aims to complete missing or unobserved entries of a low-rank tensor, is one of the most actively studied problems (see e.g., [4][5][6][7]). It is noteworthy that, as caused by various unpredictable or unavoidable reasons, multidimensional datasets are commonly raw and incomplete, and thus often only a small subset of entries of tensors are available. It is, therefore, natural to address the above issue using tensor completion in modern data-driven applications, in which data are naturally represented as a tensor, such as image/video inpainting [5,8], link-prediction [9], and recommendation systems [10], to name a few.
In the past few decades, the matrix completion problem, which is a special case of tensor completion, has been extensively studied. In matrix completion, there are mature algorithms [11], theoretical foundations [12][13][14] and various applications [15][16][17][18] that pave the way for solving the tensor completion problem in high-order tensors. Recently, Foucart et al. [19] proposed a simple algorithm for matrix completion for general deterministic sampling patterns, and raised the following questions: given a deterministic sampling pattern Ω and corresponding (possibly noisy) observations of the matrix entries, what type of recovery error can we expect? In what metric? How can we efficiently implement recovery? These were investigated in [19] by introducing an appropriate weighted error metric for matrix recovery of the form H ( M − M) F , where M is the true underlying low-rank matrix, M refers to the recovered matrix, and H is a best rank-1 matrix approximation for the sampling pattern Ω. In this regard, similar questions arise for the problem of tensor completion with deterministic sampling patterns. Unfortunately, as is often the case, moving from the matrix setting to the tensor setting presents non-trivial challenges, and notions such as rank and SVD need to be re-defined and re-evaluated. We address these extensions for the completion problem here.
Motivated by the matrix case, we propose an appropriate weighted error metric for tensor recovery of the form H ( T − T ) F , where T is the true underlying low-rank tensor, T is the recovered tensor, and H is an appropriate weight tensor. For the existing work, the error is only limited to the form T − T F , which corresponds to the case that all the entries of H are 1, where H can be considered to be a CP rank-1 tensor. It motivates us to rephrase the questions mentioned above as follows.
Main questions. Given a sampling pattern Ω, and noisy observations T + Z on Ω, for what rank-one weight tensor H can we efficiently find a tensor T so that H ( T − T ) F is small compared to H F ? And how can we efficiently find such weight tensor H, or determine that a fixed H has this property?

Contributions
Our main goal is to provide an algorithmic tool, theoretical analysis, and numerical results that address the above questions. In this paper, we propose a simple weighted Higher Order Singular Value Decomposition (HOSVD) method. Before we implement the weighted HOSVD algorithm, we first appropriately approximate the sampling pattern Ω with a rank one tensor H. We can achieve high accuracy if H − H (−1) 1 Ω F is small, where H (−1) denotes the element-wise inverse. Finally, we present empirical results on synthetic and real datasets. The simulation results show that when the sampling pattern is non-uniform, the use of weights in the weighted HOSVD algorithm is essential, and the results of the weighted HOSVD algorithm can provide a very good initialization for the total variation minimization algorithm which can dramatically reduce the iterative steps without lose the accuracy. In doing so, we extend the weighted matrix completion results of [19] to the tensor setting.

Organization
The paper is organized as follows. In Section 2, we give a brief review of related work and concepts for tensor analysis, instantiate notations, and state the tensor completion problem under study. Our main results are stated in Section 3 and the proofs are provided in Appendices A and B. The numerical results are provided and discussed in Section 4.

Related Work, Background, and Problem Statement
In this section, we give a brief overview of the works that are related to ours, introduce some necessary background information about tensors, and finally give a formal statement of tensor completion problem under study. The related work can be divided into two lines: that based on matrix completion problems, which leads to a discussion of weighted matrix completion and related work, and that based on tensor analysis, in which we focus on CP and Tucker decompositions.

Matrix Completion
The matrix completion problem is to determine a complete d 1 × d 2 matrix M from its partial entries on a subset Ω ⊆ [d 1 ] × [d 2 ]. We use 1 Ω to denote the matrix whose entries are 1 on Ω and 0 elsewhere so that the entries of M Ω = 1 Ω M are equal to those of the matrix M on Ω, and are equal to 0 elsewhere, where denotes the Hadamard product. There are various works that aim to understand matrix completion with respect to the sampling pattern Ω. For example, the works in [20][21][22] relate the sampling pattern Ω to a graph whose adjacency matrix is given by 1 Ω and show that as long as the sampling pattern Ω is suitably close to an expander, efficient recovery is possible when the given matrix M is sufficiently incoherent. Mathematically, the task of understanding when there exists a unique low-rank matrix M that can complete M Ω as a function of the sampling pattern Ω is very important. In [23], the authors give conditions on Ω under which there are only finitely many low-rank matrices that agree with M Ω , and the work of [24] gives a condition under which the matrix can be locally uniquely completed. The work in [25] generalized the results of [23,24] to the setting where there is sparse noise added to the matrix. The works [26,27] study when rank estimation is possible as a function of a deterministic pattern Ω. Recently, [28] gave a combinatorial condition on Ω that characterizes when a low-rank matrix can be recovered up to a small error in the Frobenius norm from observations in Ω and showed that nuclear minimization will approximately recover M whenever it is possible, where the nuclear norm of M is defined as M * := ∑ r i=1 σ i with σ 1 , · · · , σ r the non-zero singular values of M.
All the works mentioned above are in the setting where recovery of the entire matrix is possible, but in many cases full recovery is impossible. Ref. [29] uses an algebraic approach to answer the question of when an individual entry can be completed. There are many works (see e.g., [30,31]) that introduce a weight matrix for capturing the recovery results of the desired entries. The work [21] shows that, for any weight matrix, H, there is a deterministic sampling pattern Ω and an algorithm that returns M using the observation M Ω such that H ( M − M) F is small. The work [32] generalizes the algorithm in [21] to find the "simplest" matrix that is correct on the observed entries. Succinctly, their works give a way of measuring which deterministic sampling patterns, Ω, are "good" with respect to a weight matrix H. In contrast to these two works, [19] is interested in the problem of whether one can find a weight matrix H and create an efficient algorithm to find an estimate M for an underlying low-rank matrix M from a sampling pattern Ω and noisy In particular, one of our theoretical results is that we generalize the upper bounds for weighted recovery of low-rank matrices from deterministic sampling patterns in [19] to the upper bound of tensor weighted recovery. The details of the connection between our result and the matrix setting result in [19] is discussed in Section 3.

Tensor Completion Problem
Tensor completion is the problem of filling in the missing elements of partially observed tensors. Similar to the matrix completion problem, low rankness is often a necessary hypothesis to restrict the degrees of freedom of the missing entries for the tensor completion problem. Since there are multiple definitions of the rank of a tensor, this completion problem has several variations.
The most common tensor completion problems [5,33] may be summarized as follows (we will define the different ranks subsequently, see further on in this section).
Definition 1 (Low-rank tensor completion (LRTC) [7]). Given a low-rank (CP rank, Tucker rank, or other ranks) tensor T and sampling pattern Ω, the low-rank completion of T is given by the solution of the following optimization problem: where rank * denotes the specific tensor rank assumed at the beginning.
In the literature, there are many variants of LRTC but most of them are based on the following questions: (1) What type of the rank should one use (see e.g., [34][35][36])? (2) Are there any other restrictions based on the observations that one can assume (see e.g., [5,37,38])? (3) Under what conditions can one expect to achieve a unique and exact completion (see e.g., [34])?
In the rest of this section, we instantiate some notations and review basic operations and definitions related to tensors. Then some tensor decomposition-based algorithms for tensor completion are stated. Finally, a formal problem statement under study will be presented.

Preliminaries and Notations
Tensors, matrices, vectors, and scalars are denoted in different typeface for clarity below. In the sequel, calligraphic boldface capital letters are used for tensors, capital letters are used for matrices, lower boldface letters for vectors, and regular letters for scalars. The set of the first d natural numbers is denoted by [d] := {1, · · · , d}. Let X ∈ R d 1 ×···×d n and α ∈ R, X (α) represents the element-wise power operator, i.e., (X (α) ) i 1 ···i n = X α i 1 ···i n . 1 Ω ∈ R d 1 ×···×d n denotes the tensor with 1 on Ω and 0 otherwise. We use X 0 to denote the tensor with X i 1 ···i n > 0 for all i 1 , · · · , i n . Moreover, we say that Ω ∼ W if the entries of X are sampled randomly with the sampling set Ω such that (i 1 , · · · , i n ) ∈ Ω with probability W i 1 ···i n . We include here some basic notions relating to tensors, and refer the reader to e.g., [2] for a more thorough survey.

Definition 2 (Tensor).
A tensor is a multidimensional array. The dimension of a tensor is called the order (also called the mode). The space of real tensors of order n and size d 1 × · · · × d n is denoted as R d 1 ×···×d n . The elements of a tensor X ∈ R d 1 ×···×d n are denoted by X i 1 ···i n .
An n-order tensor X can be matricized in n ways by unfolding it along each of the n modes. The definition for the matricization of a given tensor is stated below. Definition 3 (Matricization/unfolding of a tensor). The mode-k matricization/unfolding of tensor X ∈ R d 1 ×···×d n is the matrix, which is denoted as , whose columns are composed of all the vectors obtained from X by fixing all indices except for the k-th dimension. The mapping X → X (k) is called the mode-k unfolding operator. Example 1. Let X ∈ R 3×4×2 with the following frontal slices: Definition 4 (Folding operator). Suppose that X is a tensor. The mode-k folding operator of a matrix M = X (k) , denoted as fold k (M), is the inverse operator of the unfolding operator.
Definition 5 (∞-norm). Given X ∈ R d 1 ×···×d n , the norm X ∞ is defined as The unit ball under the ∞-norm is denoted by B ∞ .
Definition 6 (Frobenius norm). The Frobenius norm for a tensor X ∈ R d 1 ×···×d n is defined as Definition 7 (Max-norm for matrix). Given X ∈ R d 1 ×d 2 , the max-norm for X is defined as Definition 8 (Product operations).

•
Outer product: Let a 1 ∈ R d 1 , · · · , a n ∈ R d n . The outer product among these n vectors is a tensor X ∈ R d 1 ×···×d n defined as: The tensor X ∈ R d 1 ×···×d n is of rank one if it can be written as the outer product of n vectors.

•
Kronecker product of matrices: The Kronecker product of A ∈ R I×J and B ∈ R K×L is denoted by A ⊗ B. The result is a matrix of size (KI) × (JL) defined by • Khatri-Rao product: Given matrices A ∈ R d 1 ×r and B ∈ R d 2 ×r , their Khatri-Rao product is denoted by A B. The result is a matrix of size (d 1 d 2 ) × r defined by where a i and b i stand for the i-th column of A and B respectively. • Hadamard product: Given X , Y ∈ R d 1 ×···×d n , their Hadamard product X Y ∈ R d 1 ×···×d n is defined by element-wise multiplication, i.e., • Mode-k product: Let X ∈ R d 1 ×···×d n and U ∈ R d k ×J , the multiplication between X on its mode-k with U is denoted as Y = X × k U with Definition 9 (Tensor (CP) rank [1,39]). The (CP) rank of a tensor X , denoted rank(X ), is defined as the smallest number of rank-1 tensors that generate X as their sum. We use K r to denote the cone of rank-r tensors.
Given k M ∈ R d k ×r , we use 1 M, · · · , n M to denote the CP representation of tensor X , i.e., Different from the case of matrices, the rank of a tensor is not presently well understood. Additionally, the task of computing the rank of a tensor is an NP-hard problem [40]. Next we introduce an alternative definition of the rank of a tensor, which is easy to compute. Definition 10 (Tensor Tucker rank [39]). Let X ∈ R d 1 ×···×d n . The tuple (r 1 , · · · , r n ) ∈ N n is called the Tucker rank of the tensor X , where r k = rank(X (k) ). We use K r to denote the cone of tensors with Tucker rank r.
Tensor decompositions are powerful tools for extracting meaningful, latent structures in heterogeneous, multidimensional data (see e.g., [2]). In this paper, we focus on two most widely used decomposition methods: CP and HOSVD. For more comprehensive introduction, readers are referred to [2,41,42].

CP-Based Method for Tensor Completion
The CP decomposition was first proposed by Hitchcock [1] and further discussed in [43]. The formal definition of the CP decomposition is the following.
Definition 11 (CP decomposition). Given a tensor X ∈ R d 1 ×···×d n , its CP decomposition is an approximation of n loading matrices A k ∈ R d k ×r , k = 1, · · · , n, such that where r is a positive integer denoting an upper bound of the rank of X and A k (:, i) is the i-th column of matrix A k . If we unfold X along its k-th mode, we have Here the ≈ sign means that the algorithm should find an optimal X with the given rank such that the distance between the low-rank approximation and the original tensor, X − X F , is minimized.
Given an observation set Ω, the main idea to implement tensor completion for a low-rank tensor T is to conduct imputation based on the equation where X = A 1 , · · · , A n is the interim low-rank approximation based on the CP decomposition, X is the recovered tensor used in next iteration for decomposition, and For each iteration, we usually estimate the matrices A k using the alternating least squares optimization method (see e.g., [44][45][46]).

HOSVD-Based Method for Tensor Completion
The Tucker decomposition was proposed by Tucker [47] and further developed in [48,49].
Definition 12 (Tucker decomposition). Given an n-order tensor X , its Tucker decomposition is defined as an approximation of a core tensor C ∈ R r 1 ×···×r n multiplied by n factor matrices A k ∈ R d k ×r k , k = 1, · · · , n along each mode, such that X ≈ C × 1 A 1 × 2 · · · × n A n = C; A 1 , · · · , A n , where r k is a positive integer denoting an upper bound of the rank of the matrix X (k) .
If we unfold X along its k-th mode, we have Tucker decomposition is a widely used tool for tensor completion. To implement Tucker decomposition, one popular method is called the higher-order SVD (HOSVD) [47]. The main idea of HOSVD is:

1.
Unfold X along mode k to obtain matrix X (k) ; 2.
Find the economic SVD decomposition of X (k) = k U k Σ k V T ; 3.
Set A k to be the first r k columns of k U; 4. C = X × 1 A T 1 × 2 · · · × n A T n . If we want to find a Tucker rank r = [r 1 , · · · , r n ] approximation for the tensor X via HOSVD process, we just replace A k by the first r k columns of U k .

Tensor Completion Problem under Study
In our setting, it is supposed that T is an unknown tensor in K r ∩ βB ∞ or K r ∩ βB ∞ . Fix a sampling pattern Ω ⊆ [d 1 ] × · · · × [d n ] and the weight tensor W. Our goal is to design an algorithm that gives provable guarantees for a worst-case T , even if it is adapted to Ω.
In our algorithm, the observed data are Gaussian random variables. From the observations, the goal is to learn something about T . In this paper, instead of measuring our recovered results with the underlying true tensor in a standard Frobenius norm T − T F , we are interested in learning T using a weighted Frobenius norm, i.e., to develop an efficient algorithm to find T so that is as small as possible for some weight tensor W. When measuring the weighted error, it is important to normalize appropriately to understand the meaning of the error bounds. In our results, we always normalize the error bounds by W (1/2) F . It is noteworthy that which gives a weighted average of the per entry squared error. Generally, our problem can be formally stated below.

Problem : Weighted Universal Tensor Completion
Parameters: Goal: Design an efficient algorithm A with the following guarantees: • A runs in polynomial time; • With high probability over the choice of Z, A returns an estimate T of T so that where δ depends on the problem parameters.
Remark 1 (Strictly positive W). The requirement that W i 1 ···i n is strictly greater than zero is a generic condition. In fact, if W i 1 ···i n = 0 for some (i 1 , · · · , i n ), some mode k with index i k of W is zero, then we can reduce the problem to a smaller one by ignoring that mode k with index i k .

Main Results
In this section, we state informal versions of our main results. With fixed sampling pattern Ω and weight tensor W, we can find T by solving the following optimization problem: It is known that solving (2) is NP-hard [40]. However, there are some polynomial time algorithms to find approximate solutions for (2) such that the approximation is (empirically) close to the actual solution of (2) in terms of the Frobenius norm. In our numerical experiments, we solve (2) via the CP-ALS algorithm [43]. To solve (3), we use the HOSVD process [48]. Assume that T has Tucker rank r = [r 1 , · · · , r n ]. Let and set U i to be the left singular vector matrix of A i . Then the estimated tensor is of the form In the following, we call this the weighted HOSVD algorithm.

General Upper Bound
Suppose that the optimal solution T for (2) or (3) T can be found, we would like to give an upper bound estimations for W (1/2) (T − T ) F with some proper weight tensor W. Theorem 1. Let W = w 1 ⊗ ⊗ ⊗ · · · ⊗ ⊗ ⊗ w n ∈ R d 1 ×···×d n have strictly positive entries, and fix Ω ⊆ [d 1 ] × · · · × [d n ]. Suppose that T ∈ R d 1 ×···×d n has rank r for problem (2) or Tucker rank r = [r 1 , · · · , r n ] for problem (3), and let T be the optimal solutions for (2) or (3). Suppose that Z i 1 ···i n ∼ N (0, σ 2 ). Then with probability at least 1 − 2 −|Ω|/2 over the choice of Z, Notice that the upper bound in Theorem 1 is for the optimal output T for problems (2) and (3), which is general. However, the upper bound in Theorem 1 contains no rank information of the underlying tensor T . To introduce the rank information of the underlying tensor T , we restrict our analysis for Problem (3) by considering the HOSVD process in the sequel.

Results for Weighted HOSVD Algorithm
In this section, we begin by giving a general upper bound for the weighted HOSVD algorithm.

General Upper Bound for Weighted HOSVD
Theorem 2 (Informal, see Theorem A1). Let W = w 1 ⊗ ⊗ ⊗ · · · ⊗ ⊗ ⊗ w n ∈ R d 1 ×···×d n have strictly positive entries, and fix Ω ⊆ [d 1 ] × · · · × [d n ]. Suppose that T ∈ R d 1 ×···×d n has Tucker rank r = [r 1 , · · · , r n ]. Suppose that Z i 1 ···i n ∼ N (0, σ 2 ) and let T be the estimate of the solution of (3) via the HOSVD process. Then with high probability over the choice of Z, where and a b means that a ≤ cb for some universal constant c > 0.

Remark 2.
The upper bound in [19] suggests whereM is obtained by considering the truncated SVD decompositions. Notice that in our result, when n = 2, the upper bound becomes 2 r log(d 1 Since µ in our work is much bigger than the µ 1 in [19], the bound in our work is weaker than the one in [19]. The reason is that in order to obtain a general bound for all tensor, the fact that the optimal approximationsM for a given matrix in the spectral norm and Frobenious norm are the same cannot be applied.

Case Study: When Ω ∼ W
To understand the bounds mentioned above, we also study the case when Ω ∼ W such that (W (1/2) − W (−1/2) 1 Ω ) (k) 2 is small for k = 1, · · · , n. Even though the samples are taken randomly in this case, our goal is to understand our upper bounds for deterministic sampling pattern Ω. To make sure that (W (1/2) − W (−1/2) 1 Ω ) (k) 2 is small, we need to assume that each entry of W is not too small. For this case, we have the following main results.
Upper bound: Then the following holds with high probability. For our weighted HOSVD algorithm A, for any Tucker rank-r tensor T with T ∞ ≤ β, A returns T = A(T Ω + Z Ω ) so that with high probability over the choice of Z, where r = max k {r k } and d = max k {d k }.
• Lower bound: If additionally, W is flat (the entries of W are close), then for our weighted HOSVD algorithm A, there exists some T ∈ K r ∩ βB ∞ so that with probability at least 1 2 over the choice of Z, and C is some constant to measure the "flatness" of W.
Remark 3. The formal statements in Theorems A2 and A7 are more general than the statements in Theorem 3.

Simulations for Uniform Sampling Pattern
In this section, we test the performance of our weighted HOSVD algorithm when the sampling pattern arises from uniform random sampling. Consider a tensor T of the form and Ω be the sampling pattern set according to uniform sampling. In this simulation, we compare the results of numerical experiments for using the HOSVD algorithm to solve and where p = |Ω| First, we generate a synthetic sampling set Ω with sampling rate SR: = |Ω| ∏ n k=1 d k = 30% and find a weight tensor W by solving via the alternating least squares method for the non-negative CP decomposition. Next, we generate synthetic tensors T ∈ R d 1 ×···×d n of the form C × 1 U 1 × 2 · · · × n U n with n = 3, 4 with rank (T (i) ) = r, where i = 1, · · · , n, and r varies from 2 to 10. Then we add mean zero Gaussion random noise Z with variance σ = 10 −2 so that a new tensor is generated, which is denoted by Y = T + Z. Then we solve the tensor completion problems (4), (5) and (6) by the HOSVD procedure. For each fixed low-rank tensor, we average over 20 tests. We measure error using the weighted Frobenius norm. The simulation results are reported in Figures 1 and 2. Figure 1 shows the results for the tensor of size 100 × 100 × 100 and Figure 2 shows the results for the tensor of size 50 × 50 × 30 × 30, where the weighted error is of the form . These figures demonstrate that using our weighted samples performs more efficiently than using the original samples. For the uniform sampling case, the p weighted samples and W weighted samples exhibit similar performance.

Simulation for Non-Uniform Sampling Pattern
To generate a non-uniform sampling pattern with sampling rate 30%, we first generate a CP rank 1 tensor of the form H = 1; Let Ω ∼ H. Then we repeat the process as in Section 4.1. The simulation results are shown in Figures 3 and 4. As shown in figures, the results using our proposed weighted samples perform more efficiently than using the p weighted samples.

Remark 4.
When we use the HOSVD procedure to solve (4), (5), and (6), we need (an estimate of) the Tucker rank as input. Instead of inputting the real rank of the true tensor, we could also use the rank that is estimated by considering the decay of the singular values for the unfolded matrices of the sampled tensor Y Ω as the input rank, which we call SV-rank. The simulation results for the non-uniform sampling pattern with SV-rank as input are reported in Figure 5. The simulation shows that the weighted HOSVD algorithm performs more efficiently than using the p weighted samples or the original samples. Comparing Figure 5 with Figure 3, we could observe that using the estimated rank as input for HOSVD procedure performs even better than using the real rank as input. This observation motivates a way to find a "good" rank as input for HOSVD procedure.

Remark 5.
We only provide guarantees on the performance in the weighted Frobenius norm, (as we report the weighted error ), our procedures exhibit good empirical performance even in the usual relative error when the Tucker rank of the tensor is relatively low. However, we observe that the advantages of weighted HOSVD scheme tend to be diminished in terms of relative error when the Tucker rank increases. This result is not surprising since the entries are treated unequally in scheme (6). Therefore we leave the investigation on relative error and the tensor rank for future work.

Test for Real Data
In this section, we test our weighted HOSVD algorithm for tensor completion on three videos, see [50]. The dataset is the tennis-serve data from an Olympic Sports Dataset [51]. One can download the dataset from http://vision.stanford.edu/Datasets (accessed date 10 May 2021). There are a lot of videos in the zip file and we only choose three of them: "d2P_zx_JeoQ_00120_00515.seq" (video 1), "gs3sPDfbeg4_00082_00229.seq"(video 2), and "VADoc-AsyXk_00061_ 0019.seq" (video 3). The three videos are color video. In our simulation, we use the same setup as the one in [50], and choose 30 frames evenly from each video. For each frame, the size is scaled to 360 × 480 × 3, so each video is transformed into a 4-D tensor data of size 360 × 480 × 3 × 30. The first frame of each video after preprocessing is illustrated in Figure 6. We implement the experiments for different sampling rates of 10%, 30%, 50%, and 80% to generate uniform and non-uniform sampling patterns Ω. In our implementation, we use the SV-rank of T Ω as the input rank. According to the generated sampling pattern, we find a weight tensor W and find estimates T 1 and T 2 by considering (4) and (6) respectively, using the input Tucker rank r. The entries on T 1 and T 2 are forced to be the observed data. The Signal to Noise Ratio (SNR) are computed and the simulation results are reported in Tables 1 and 2. As shown in the tables, applying HOSVD process to (6) can give a better result than applying HOSVD process to (4) directly regardless of the uniformity of the sampling pattern. Finally, we test the proposed weighted HOSVD algorithm on real candle video data named "candle_4_A" [52] (The dataset can be downloaded from the Dynamic Texture Toolbox in http://www.vision.jhu.edu/code/ (accessed date 10 May 2021). We have tested the relation between the relative errors and the sampling rates using r = (5, 5, 5) as the input rank for HOSVD algorithm. The relative errors are presented in Figure 7. The simulation results also show that the proposed weighted HOSVD algorithm can implement tensor completion efficiently. Relation between relative error and sampling rate for the dataset "candle_4_A" using [5,5,5] as the input rank for HOSVD process. The left figure records the relative error for the uniform sampling pattern and the right figure for the non-uniform sampling pattern. The sampling error stands for the relative error between the original video and the video with masked entries estimated to be zeros, hence should approximately equal to √ 1 − SR, where SR is the sampling rate.

The Application of Weighted HOSVD on Total Variation Minimization
As shown in the previous simulations, the weighted HOSVD decomposition can provide better results for tensor completion by comparing with HOSVD. There are a bunch of algorithms that are Sensitive to initialization. Additionally, real applications may have higher requirements for accuracy. Therefore, it is meaningful to combine our weighted HOSVD with other algorithms in order to further improve the performance. In this section, we would consider the application of weighted HOSVD decomposition on the total variation minimization algorithm. As a traditional approach, the total variation minimization (TVM), is broadly applied in studies about image recovery and denoising. While the earliest research could trace back to 1992 [53]. The later studies combined TVM and other low rank approximation algorithms such as Nuclear Norm Minimization (see e.g., [54][55][56]) and HOSVD (e.g., [57][58][59]) in order to achieve better performance in image and video completion tasks.
Motivated by the matrix TV minimization, we proposed the tensor TV minimization which is summarized in Algorithm 1. In Algorithm 1, the Laplacian operator computes the divergence of all-dimension gradients for each entry of the tensor. The shrink operator simply moves the input towards 0 with distance λ, or formally defined as: shrink(x, λ) = sign(x) · max(|x| − λ, 0) For the initialization of X 0 in Algorithm 1, we assign X 0 to be the output of the result from HOSVD-w.
Applying the same experiment setting as in Section 4.3, we evaluate the performance of the cocktail approach as well as the regular HOSVD approach. We report the simulation results in Table 1 and we measure the performances by considering the signal to noise ratio(SNR). As shown in Table 1, the total variation minimization could be applied to further improve the result of (6). Specifically, the TVM with 0 as initialization performs similar to TVM with HOSVD-w as initialization when the observed rate is high, but the HOSVD-w initialization could improve the performance of TVM when the observed rate is very low (e.g., 10%). Additionally, we compared the decay of relative error for using the weighted HOSVD output as initialization and the default initialization (X 0 = 0). The iterative results are shown in Figure 8, and it shows that using the result from weighted HOSVD as an initialization could notably reduce the iterations of TV-minimization for achieving the convergence threshold ( X k − X k−1 F < 10 −4 ).

Conclusions
In this paper, we propose a simple but efficient algorithm named the weighted HOSVD algorithm for recovering an underlying low-rank tensor from noisy observations. For this algorithm, we provide upper and lower error bounds that measure the difference between the estimates and the true underlying low-rank tensor. The efficiency of our proposed weighted HOSVD algorithm is also shown by numerical simulations. Additionally, the result of our weighted HOSVD algorithm can be used as an initialization for the total variation minimization algorithm, which shows that using our method as an initialization for the total variation minimization algorithm can increasingly reduce the iterative steps leading to improved overall performance in reconstruction (see our conference paper [60]). It would be interesting for future work to combine the weighted HOSVD algorithm with other algorithms to achieve more accurate results for tensor completion in many settings.

Acknowledgments:
The authors take pleasure in thanking Hanqin Cai, Keaton Hamm, Armenak Petrosyan, Bin Sun, and Tao Wang for comments and suggestions on the manuscript.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof for Theorem 1
In this appendix, we provide the proof for Theorem 1.

Appendix B. Proof of Theorems 2 and 3
In this appendix, we provide the proofs for the results related with the weighted HOSVD algorithm. The general upper bound for weighted HOSVD in Theorem 2 is restated in Appendix B.1 and its proof is also presented there. If the sampling pattern Ω is generated according to the weight tensor W, the related results in Theorem 3 are illustrated in Appendix B.2.
Notice that Therefore, we have for k = 1, · · · , n. Let us consider the case when k = 1. Other cases can be derived similarly. Using the fact that T (1) has rank r 1 and To bound (W (−1/2) Z Ω ) (1) 2 , we consider where e i k is the i k -th standard basis vector of R d k . Please note that ∑ i 1 ,··· ,i n 1 (i 1 ,··· ,i n )∈Ω W i 1 ···i n e i 1 (e i 2 ⊗ · · · ⊗ e i n ) T (e i 2 ⊗ · · · ⊗ e i n )e i 1 T = ∑ i 1 ,··· ,i n Similarly, By ([61] Theorem 1.5), for any t > 0, We conclude that with probability at least 1 − Similarly, we have with probability at least 1 − 1 d k +∏ j =k d j , for k = 2, · · · , n. Plugging all these into (A2), we can obtain the bound in our theorem.
Next let us give the formal statement for the upper bounds in Theorem 3.

Appendix B.2.2. Lower Bound
To deduce the lower bound, we have to construct a finite subset S in the cone K r so that we can approximate the minimal distance between two different elements in S. Before we prove the lower bound, we need the following theorems and lemmas.

Theorem A3 (Hanson-Wright inequality).
There is some constant c > 0 so that the following holds. Let ξ ∈ {0, ±1} d be a vector with mean-zero, independent entries, and let F be any matrix which has zero diagonal. Then Theorem A4 (Fano's Inequality). Let F = { f 0 , · · · , f n } be a collection of densities on K, and suppose that A : K → {0, · · · , n}. Suppose there is some β > 0 such that for any i = j, The following lemma specializes Fano's Inequality to our setting, which is a generalization of ( [19] Lemma 19). In the following lemma, we show that for any reconstruction algorithm on a set K ⊆ R d 1 ×···×d n , with probability no less than 1 2 , there exists some elements in K such that the weighted reconstruction error is bounded below by some quantity, where the quantity is independent of the algorithm. Lemma A3. Let K ⊆ R d 1 ×···×d n , and let S ⊆ K be a finite subset of K so that |S| > 16. Let Ω ⊆ [d 1 ] × · · · × [d n ] be a sampling pattern. Let σ > 0 and choose κ ≤ σ log |S| 4 max T ∈S T Ω F , and suppose that κS ⊆ K.
Then for any algorithm A : R Ω → R d 1 ×···×d n that takes as input T Ω + Z Ω for T ∈ K and outputs an estimate T to T , there is some X ∈ K so that with probability at least 1 2 .
Proof. Consider the set which is a scaled version of S. By our assumption, S ⊆ K.
Recall that the Kullback-Leibler (KL) divergence between two multivariate Gaussians is given by Suppose that A is as in the statement of the lemma. Define an algorithm A : R Ω → R d 1 ×···×d n so that for any Y ∈ R Ω if there exists T ∈ S such that then set A(Y ) = T (notice that if such T exists, then it is unique), otherwise, set A(Y ) = A(Y ). Then by the Fano's inequality, there is some T ∈ S so that Finally, we observe that which completes the proof.
To understand the lower bound κ 2 min T =T ∈S H (T − T ) F in (A5), we construct a specific finite subset S for the cone of Tucker rank r tensors in the following lemma.
There is a set S ⊆ K ∩ γB ∞ so that 1.
The set has size |S| ≥ N, for (2 w k 2 / w k 1 r k log(r k ) + 2 w k ∞ / w k 1 r k log(r k )
Proof. Let Ψ ⊆ {±1} r 1 ×···×r n be a set of random ±1-valued tensors chosen uniformly at random with replacement, of size 4N. Choose i U ∈ {±1} d i ×r i to be determined below for all i = 1, · · · , n . Let First of all, we would estimate T Ω F and T ∞ . Please note that where the expectation is over the random choice of C. Then by Markov's inequality, We also have T ∞ = max i 1 ,··· ,i n |B i 1 ···i n | ∑ j 1 ,··· ,j n C j 1 ···j n 1 U(i 1 , j 1 ) · · · n U(i n , j n ) .
Using the fact that |B i 1 ···i n | ≤ 1 and a union bound over all n ∏ k=1 d k values of i 1 , · · · , i n , we conclude that Thus, for a tensor T ∈ S, the probability that both of T ∞ ≤ 1 2 n ∏ k=1 r k log 8 Thus, by a Chernoff bound it follows that with probability at least 1 − exp(−CN) for some constant C, there are at least |S| 4 tensors T ∈ S such that all of these hold. Let S ⊆ S be the set of such T 's. The set guaranteed in the statement of the lemma will be S, which satisfies both item 1 and 2 in the lemma and is also contained in K ∩ γB ∞ .
Thus, we consider item 3: we are going to show that this holds for S with high probability, thus in particularly it will hold for S, and this will complete the proof of the lemma.
Fix T = T ∈ S, and write where ξ is the vectorization of 1 2 (C − C). Thus, each entry of ξ is independently 0 with probability 1 2 or ±1 with probability 1 4 each. Rearranging the terms, we have where D k denotes the d k × d k diagonal matrix with w k on the diagonal.
To understand (A6), we need to understand the matrix ⊗ n k=1 The diagonal of this matrix is n ∏ k=1 w k 1 I. We will choose the matrix k U for k = 1, · · · , n so that the off-diagonal terms are small.
Theorem A5. There are matrices k U ∈ {±1} d k ×r k for k = 1, · · · , n such that: Proof. By ( [19] Claim 22), there exist matrices k U ∈ {±1} d k ×r k such that: Using (a) and the fact that ⊗ n k=1 ( k U T D k k U) By (b) and the fact that ⊗ n k=1 ( k U T D k k U) Having chosen matrices k U for k = 1, · · · , n, we can now analyze the expression (A6).

Theorem A6.
There are constants c, c so that with probability at least we have (r k w k 1 ).

Proof.
We break H (T − T ) 2 F into two terms: For the first term (I), we will use the Hanson-Wright Inequality (see Theorem A3). In our case, the matrix F = 4 ⊗ n k=1 k U T D k k U − n ∏ k=1 w k 1 I . The Frobenius norm of this matrix is bounded by The operator norm of F is bounded by Thus, the Hanson-Wright inequality implies that (2 w k 2 r k log(r k ) + 2 w k ∞ r k log(r k ) Plugging in t = 1 2 n ∏ k=1 r k w k 1 , and replacing the constant c with a different constant c , we have Next we turn to the second term (I I). We write and bound the error term 4 r k is a zero-mean subgaussian random variable, and thus satisfies for all for some constant c . Thus, for any t > 0 we have Thus, Combing (A7) and (A8), we can conclude that with probability at least (2 w k 2 / w k 1 r k log(r k ) + 2 w k ∞ / w k 1 r k log(r k ) + By a union of bound over all of the points in S, we establish items 1 and 3 of the lemma. Now we are ready to prove the lower bound in Theorem 3. First we give a formal statement for the lower bound in Theorem 3 by introducing the constant C to characterize the "flatness" of W.
Suppose that we choose each (i 1 , · · · , i n ) ∈ [d 1 ] × · · · × [d n ] independently with probability W i 1 ···i n to form a set Ω ⊆ [d 1 ] × · · · × [d n ]. Then with probability at least 1 − exp(−C · m) over the choice of Ω, the following holds: Let σ, β > 0 and let K r ⊆ R d 1 ×···×d n be the cone of the tensor with Tucker rank r = [r 1 · · · r n ]. For any algorithm A : R Ω → R d 1 ×···×d n that takes as input T Ω + Z Ω and outputs a guess T for T , for T ∈ K r ∩ βB ∞ and Z i 1 ···i n ∼ N (0, σ 2 ), then there is some T ∈ K r ∩ βB ∞ so that with probability at least 1 2 over the randomness of A and the choice of Z. Above c, C are constants which depend only on C .
We instantiate Lemma A4 with H = W (1/2) and B being the tensor whose entries are all 1. Let S be the set guaranteed by Lemma A4. We have for T = T ∈ S. Using the assumption that w k are flat, the size of the set S is bigger than or equal to (2 w k 2 / w k 1 r k log(r k ) + 2 w k ∞ / w k 1 r k log(r k ) (2C r k log(r k )/d k + 2C r k log(r k )/d k + 1) − 1 where C depends on c and C. Set (2C r k log(r k )/d k + 2C r k log(r k )/d k + 1) − 1 so this is a legitimate choice of κ in Lemma A3. Next, we verify that κS ⊆ K ∩ βB ∞ . Indeed, we have so κS ⊆ βB ∞ , and every element of S has Tucker rank r by construction.
Then Lemma A3 concludes that if A works on K r ∩ βB ∞ , then there is a tensor T ∈ K r ∩ βB ∞ so that (2C d k r k log(r k ) + 2C r k log(r k ) + d k )) − (2C r k /d k log(r k ) + 2C r k /d k log(r k ) + 1) − 1 Additionally, by Lemma A2, we conclude that wherec depends on the above constants.