On The Synergy Between Nonconvex Extensions of The Tensor Nuclear Norm for Tensor Recovery

Low-rank tensor recovery has attracted much attention among various tensor recovery approaches. A tensor rank has several definitions, unlike the matrix rank--e.g. the CP rank and the Tucker rank. Many low-rank tensor recovery methods are focused on the Tucker rank. Since the Tucker rank is nonconvex and discontinuous, many relaxations of the Tucker rank have been proposed, e.g., the tensor nuclear norm, weighted tensor nuclear norm, and weighted tensor Schatten-$p$ norm. In particular, the weighted tensor Schatten-p norm has two parameters, the weight and $p$, and the tensor nuclear norm and weighted tensor nuclear norm are special cases of these parameters. However, there has been no detailed discussion of whether the effects of the weighting and $p$ are synergistic. In this paper, we propose a novel low-rank tensor completion model using the weighted tensor Schatten-$p$ norm to reveal the relationships between the weight and $p$. To clarify whether complex methods such as the weighted tensor Schatten-$p$ norm are necessary, we compare them with a simple method using rank-constrained minimization. It was found that the simple methods did not outperform the complex methods unless the rank of the original tensor could be accurately known. If we can obtain the ideal weight, $p = 1$ is sufficient, although it is necessary to set $p<1$ when using the weights obtained from observations. These results are consistent with existing reports.

the average rank of these matrices is calculated.The Tucker rank is very difficult to handle because of its nonconvexity and discontinuity.
To address this problem, the tensor nuclear norm, which is a convex surrogate of the Tucker rank, is proposed [1], [3].Methods based on the Tucker rank replace the rank of unfolding matrices with their nuclear norms, where the nuclear norm is known as a continuous tightest convex surrogate of the matrix rank [16].
On the other hand, the weighted nuclear norm and the Schatten-p norm have been proposed as different surrogates of the matrix rank [17]- [20].Both are a generalization of the nuclear norm and usually perform better than the nuclear norm for low-rank matrix recovery.Following this trend, a weighted tensor nuclear norm and a tensor Schatten-p norm have also been proposed [8], [12].They are extensions of the weighted nuclear norm and the Schatten-p norm for tensors, respectively, and they generally perform better for low-rank tensor recovery as well.However, for effective use, we need to select appropriate weights and parameters p.
The ideal (oracle) weights for the weighted nuclear norm are the inverses of the singular values of the original matrix.This is because the weighted nuclear norm with the oracle weights of the original matrix is identical to the rank.Generally, obtaining the singular values of the original matrix is difficult.Therefore, for practical usage, we need some methods to estimate the singular values of the original matrix to determine the weights [19].On the other hand, the parameter p for the Schatten-p norm is generally determined in a heuristic manner and in most cases, p < 1 is employed [8], [9], [18].We should note that both the weighted tensor nuclear norm and the tensor Schatten-p norm (with p < 1) are in general nonconvex, as is the case with the matrix counterparts.Now, some natural questions arise: Are the effects of the weightings for singular values and the Schatten-p extension synergistic, or does one of them encompass?Is there any chance that a simple rank-constrained minimization, which is also a nonconvex optimization, can compete with these advanced and complicated methods?
In this paper, to answer the questions above, we propose a novel general constrained optimization problem combining the weighting and the Schatten-p extension for tensors, and we develop an efficient algorithm to solve it.We performed exhaustive experiments, and the results showed that if we can use the oracle weights, the combination of p = 1 and the weighting is the most effective choice for all cases.We also found that the combination of p = 1/2 and the weighting is effective when using the weights estimated from degraded measurements.The rank-constrained minimization problem performs well as long as we know the rank of the original tensor.If we are agnostic toward the correct rank, the performance drops sharply.
The main contributions of this paper are summarized as follows: • We propose a general constrained optimization problem and an efficient solver for analyzing the relationship between the weightings of singular values and the Schattenp extension for tensors.• We show that the weighting and the Schatten-p extension are synergetic and that the effective value of p is dependent on how the weights are determined.• We show that the rank constrained minimization problem is not able to outperform the advanced methods unless the true rank of the original tensor is known.The performance is sensitive to the rank values used as the constraints.

II. LOW-RANK TENSOR COMPLETION
In what follows, N, R and R + denote the set of all nonnegative integers, all real numbers, and all nonnegative real numbers.We use capital calligraphic letters for tensors, capital bold letters for matrices, and lowercase bold letters for column vectors.
In this paper, we assume that an observation model of tensor recovery can be described as where are an N-th order observation tensor, an N-th order low-rank original tensor, and an N-th order random tensor, respectively, whose entries are i.i.d.Gaussian variables with zero mean and known variance σ 2 n .The degradation operator is defined as where Ω is a set of indicators of observable entries.
If we can assume that the original tensor is low rank, it can be estimated by finding a tensor that is close to the observation tensor and also low rank.In particular, assuming that the rank of the original tensor is known and that the variance of the noise is 0, estimating the original tensor is the problem of finding a tensor whose rank is identical to the rank of the original tensor and whose known elements match the observation tensor, i.e., finding a tensor within the following set: (3) where rank m (X) = rank(unfold m (X)) and m = 1, • • • , N. We denote rank as the matrix rank and rm as the matrix rank of an m-th mode unfolded original tensor.
However, in general, the set containing the equations for the ranks, as in Eq. ( 3), is hard to determine.Thus, we employ an alternative set with an inequality constraint instead of Eq. (3): The sets shown in Eqs.(3) and ( 4) are used directly for the estimation of the original tensor because they generally contain multiple matrices.Additionally, if the observation process includes noise (σ N 0), it may yield the empty set.Thus, we employ the L2 norm between the observation tensor, which corresponds to the negative log-likelihood of the Gaussian distribution, and the solution of the following minimization problem, including this norm as the estimated tensor.
where • 2 is an ℓ 2 norm of the tensor that is defined as the square root of the sum of the squares of each element of the tensor.Although that this problem is one of nonconvex optimization, we can efficiently solve it by using the alternating direction method of multipliers (ADMM) [21], which is known as an algorithm for solving convex optimization problems and is effective in practice for solving nonconvex optimization problems [22]- [24].
Eqs. ( 3) and ( 5) include the matrix rank of each unfolded tensor, which is very difficult to handle since it is not only nonconvex but also discontinuous.Moreover, the situation in which we know the rank of each unfolded matrix of the original tensor rm is unrealistic.
The weighted tensor Schatten-p norm (WTSPN) is proposed as a representation of a nonconvex but continuous tensor rank, where • p w, p is a weighted Schatten-p norm raised to the power p (WSPN) [20] The WTSPN is generally a nonconvex function that is consistent with the weighted tensor nuclear norm [12] when p = 1, the tensor Schatten-p norm [8] when w is uniform (all elements of w are the same value), and the tensor nuclear norm [1], [3] when p = 1 and w is uniform.
The m-th mode tensor unfolding operator of the n-th order tensor unfold m : R The WSPN is described as where The WSPN is a generalization of the nuclear norm and the weighted nuclear norm [17], [19], which are often used in low-rank matrix recovery.
As mentioned in Section I, the proper weights of the WTSPN and the proper value of p have not been investigated in detail.Revealing them is one of the objectives of this paper.
On the other hand, when recovering the observation model of Eq. ( 1) with the WTSPN as the regularization term and the ℓ 2 norm as the fidelity term, even if the noise variance does not change, the optimal hyperparameter (balancing the regularization term and the fidelity term) varies according to the parameters of the regularization term, w, and p.This makes a fair comparison difficult.In addition, a parameter that is so difficult to tune is not desirable for practical use.Therefore, in the next section, we propose a method to solve this problem.

III. PROPOSED METHOD
To solve the above problem, we propose a method using ℓ 2 ball constraints and WTSPN minimization.Specifically, we formulate the following minimization problem where B(Y, r) is an ℓ 2 ball, and the ℓ 2 ball with center By using the ball constraint, it is possible to determine the appropriate parameters based only on the variance of the noise [25]- [27], which is convenient when comparing various regularization parameters, as in this paper.Additionally, |Ω| is the number of elements of the set Ω.
Since we assume that the standard deviation of noise σ n is known, we can expect the realization of the noise V added to the original tensor to exist inside the hypersphere determined by the standard deviation.The constraints of Eq. ( 9) accord with this fact.This method allows us to "fairly compare the performance of different regularization terms (if the variance of the noise is known)." In general, Eq. ( 9) is a nonconvex optimization problem, which makes it difficult to find a globally optimal solution.As mentioned in Section II, ADMM exhibits empirical performance on nonconvex optimization problems.Therefore, we propose solving Eq. ( 9) using ADMM.The proposed algorithm is shown in Algorithm 1.
The objective function in line 5 of Algorithm 1 is argmin which is nonconvex, although one of the solutions can be written as [9], [18], [20]: where 1,m is a singular value decomposition (SVD) and S w, p (•) is a weighted thresholding operator.

Algorithm 1 Proposed algorithm
2 = 0 2: while A stopping criterion is not satisfied do 3: for m = 1 to N do 5: 1,m 7: end for 8: λ = 0.99λ Each element of the weighted thresholding operator for a rectangular diagonal matrix Y (S w, p (Y)) i,i is defined as a solution to the following minimization problem: The solution of Eq. ( 13) is a soft thresholding max(Y i,i −w i , 0) when p = 1 and the closed-form thresholding proposed in [28] when p = {1/2, 2/3}.The first term in line 8 of Algorithm 1 is which is a metric projection of the set B(Y, σ n |Ω|).The metric projection is defined as Eq. ( 14) has a closed-form solution, The set of Ω in the second term of line 8, is the complement of the set Ω, which is a set of indicators of missing entries.A Ω is defined as

IV. EXPERIMENTAL COMPARISON A. Setting
In section I, we posed two questions:  Although the rank of the original tensor is different, the relative performance of all methods shows a similar trend in Fig. 1.This supports the fact that our conclusions in section IV-B are independent of the rank of the original tensor.
• Are the effects of the weighting and p-squared on singular values synergistic?Or does one encompass the other?• Is simple rank-constrained minimization insufficient?
To answer these questions, we performed some experiments using an artificial tensor.Each element of an N-th order artificial tensor X ∈ R n 1 ×•••×n N is generated by using the Tucker model: , the core tensor, and the factor tensor.Each element of S and U k is generated uniformly over the intervals [0, 1] and [−0.5, 0.5], respectively.Finally, we normalized the difference between the maximum and minimum elements of X to 1.
As mentioned in section I, the widely used weights used as the ideal weights for the singular values are the inverses of the singular values of the unfolded original tensor X org .Although the rank and the order of the original tensor are different, the relative performance of all methods shows a similar trend to Fig. 1.This supports the fact that our conclusions in section IV-B are independent of the rank and the order of the original tensor.
Throughout this paper, we define our ideal weights as where R is smaller of the row and column dimensions of unfold j (X org ).Since the ideal weights are not always optimal in terms of the recovery performance, we introduce a parameter α to bring additional flexibility to the setting of the weights.However, the true singular values are not available in practical applications.
A method that does not require the true singular values is to use the singular values obtained from observations for estimating the ideal weights.One of these methods is as follows: where Ỹ is a tensor the missing elements of the observation tensor Y filled in using the average of the observed entries of the observation tensor Y.We refer to these weights as Although the rank and the order of the original tensor are different, the relative performance of all methods shows a similar trend to Fig. 1.This supports the fact that our conclusions in section IV-B are independent of the rank and the order of the original tensor.
The tensor Schatten-p norm with no weights is a special case of the WTSPN.Therefore, we can use the WTSPN with the following special weights as the tensor Schatten-p norm: In the following experiments, we use three types of weights (the ideal weights w Id , observation weights w Obs and uniform weights w Uni ) to reveal the relationship between the weighting and Schatten-p extension on the performance of WTSPN.The weight determination parameter α varies in increments of 0.25 in the range [1,4].The parameters of Schatten-p are chosen from p = {1/2, 2/3, 1}, and each element of γ is set to 1/N, where N is the order of the target tensor.In all cases, the parameter of ADMM is set to λ = 100． We compare the performance of Algorithm 1 using w Id , w Obs , and w Uni as well as rank-constrained minimization.The following error is used to evaluate the performance of each method: error( where X is the estimated tensor obtained by each method.

B. Results and Discussion
We performed recovery from observed tensors with missing rate of 0.4 and 0.8 and standard deviations of noise σ n of 0 and 1, and the results of the combinations of these parameters are shown in (a) to (d) in Figs. 1, 2, 3, and 4. The horizontal axis of each graph is the parameter α used in the Eqs.(20) and (21) for determining the weights, and the vertical axis is the performance of each method defined by Eq. ( 23).The red, green, blue, and yellow lines show the results of Algorithm 1 with w Id (Id in the legends of the graphs), with w Obs (Obs), with w Uni (Uni), and with the rank-constrained minimization shown in Eq. ( 5) (RC).In the case of Id, Obs, and Uni, the results corresponding to the different values of p are shown with different line types.Similarly, in the case of RC, the results corresponding to the different target ranks r are shown with different line types.
Fig. 1 is the result when the size of the original tensor is 40 × 40 × 40 and the rank is [4,4,4].From the graphs (a)-(d) in Fig. 1, one can see that • In the case of Id, if we choose α < the choice of p does not have much effect on the performance.The slowest degradation of performance due to the change in α is obtained at p = 1.• In the case of Obs, p = 1/2 shows the best result across (a)-(d).These results are consistent with the results in previous studies [8], [9], [18], [20].• Regardless of p, the performance of Id and Obs is the same or better than that of Uni in all cases.• In all cases, RC shows the worst performance unless we can choose the correct rank r.
Fig. 2 shows the results when we only change the tensor rank to [5,5,5].Note that the size of the tensor is still 40 × 40 × 40.The results show a similar trend as in the case of Fig. 1.From the results in Figs. 1 and 2, for the third-order tensor, we can conclude that the effect of the choice of the weights, p, and the algorithms on performance is rank-independent.
To reveal the impact of the changes in the order of the tensor on the common trend in Figs. 1 and 2, we performed experiments on the 4th-order tensor.The results are shown in Figs. 3 and 4. In Figs. 3 and 4, the sizes of the original tensors are both 16×16×16×16, and the ranks are [2, 2, 2, 2] and [3,3,3,3], respectively.In the case of the 4th-order tensor, there was no change in the common trend of each graph when the rank was varied.The same trend is observed in comparison with Figs. 1 and 2. From these observations, we can say that the relationship between the weighting and Schatten-p extension and the performance gap between the proposed algorithm and the rank-constrained minimization that we revealed is a law and is independent of the tensor rank and order.
From these results, we can conclude that • It is sufficient to use p = 1 if weights that are close to the ideal weights can be estimated in some way.• It is better to set a small p value if the weights estimated from the degraded singular values are not reliable.• Simple methods using rank constraints are very sensitive to the choice of ranks used for the constraints and cannot outperform complex methods like the proposed algorithm unless one can correctly estimate the original ranks.

V. CONCLUSION
In this paper, to reveal the relationships between the weighting and Schatten-p extension, we propose a general tensor recovery model that combines them and propose an algorithm to solve it.
From the experiments with artificial data using the proposed algorithms, the effect of the recovery performance in the presence or absence of the weighting and the Schatten-p extension for various situations is determined.
Consequently, the simple rank-constrained minimization method cannot outperform complex methods such as the proposed algorithm unless the rank r used in the constraint is chosen properly.The relationships between the weighting and Schatten-p extension in WTSPNs vary with the degree to which we can estimate the ideal weights.The Schatten-p extension does not affect the performance if the ideal weight is available.On the other hand, the effect of the Schatten-p and the weighting on singular values is synergistic if we need to determine the weights from heavily degraded observations.
Our conclusion is summarized in the flowchart in Fig. 5, where "weighting" indicates that using the weights w, which are determined based on the estimates of the singular values of the unfolded original tensor.This flowchart implies that if we can access limited information about the rank (or the singular values) of the original tensor, we need to use complex methods to obtain good results.

APPENDIX
In Section II, we mentioned that we can solve Eq. ( 5) efficiently by using ADMM, although we did not show a specific algorithm.The algorithm for solving Eq. ( 5) is shown in Algorithm 2.
1,m = 0, Y 1,m , Z (0) 2 = 0 2: while A stopping criterion is not satisfied do 3: end for 8: The objective function in line 5 of Algorithm2 is proj {X |rank(X)≤ rm } (unfold m (X (k+1) ) + Z (k)  1,m ), which is a nonconvex function because the set {X|rank(X) ≤ rm } is a nonconvex set.However, one of the solutions of Eq. ( 24) can be obtained as where UΣV ⊤ = unfold m (X (k+1) )+Z (k) 1,m is an SVD and T rm (•) is a truncation operator.The truncation operator for a rectangular diagonal matrix Y T r (Y) is defined as

1 Fig. 1 :
Fig.1: Experimental results for the original tensor with the order 3 and the rank 4. (a)-(d) are the results of varying the missing rate and the standard deviation of the noise σ n during the observation process.The horizontal axis of the graph is the parameter α used for determining the weights and the vertical axis is the error between the estimated tensor and the original tensor calculated by each method.Red, green and blue indicate the results for the proposed method (Algorithm 1) with different types of weight vectors-w Id ,w Obs , and w Uni , respectively-and yellow is the result of the rank-constrained minimization.The line type corresponds to the value of the parameter p or r.

1 Fig. 2 :
Fig.2: Experimental results for the original tensor with the order 3 and the rank 5.The results of different missing rates and the standard deviations are shown in (a)-(d) and the meaning of the axes and of the colors and types of lines are the same as in Fig.1.Although the rank of the original tensor is different, the relative performance of all methods shows a similar trend in Fig.1.This supports the fact that our conclusions in section IV-B are independent of the rank of the original tensor.

1 Fig. 3 :
Fig.3: Experimental results for the original tensor with the order 4 and the rank 2. The results of different missing rates and the standard deviations are shown in (a)-(d) and the meaning of the axes and of the colors and types of lines are the same as in Fig.1.Although the rank and the order of the original tensor are different, the relative performance of all methods shows a similar trend to Fig.1.This supports the fact that our conclusions in section IV-B are independent of the rank and the order of the original tensor.

1 Fig. 4 :
Fig.4: Experimental results for the original tensor with the order 4 and the rank 3. The results of different missing rates and the standard deviations are shown in (a)-(d) and the meaning of the axes and of the colors and types of lines are the same as in Fig.1.Although the rank and the order of the original tensor are different, the relative performance of all methods shows a similar trend to Fig.1.This supports the fact that our conclusions in section IV-B are independent of the rank and the order of the original tensor.

Fig. 5 :
Fig.5: The flowchart for determining a method and parameter for the low-rank tensor recovery problem.