Spectrally Sparse Tensor Reconstruction in Optical Coherence Tomography Using Nuclear Norm Penalisation

: Reconstruction of 3D objects in various tomographic measurements is an important problem which can be naturally addressed within the mathematical framework of 3D tensors. In Optical Coherence Tomography, the reconstruction problem can be recast as a tensor completion problem. Following the seminal work of Candès et al., the approach followed in the present work is based on the assumption that the rank of the object to be reconstructed is naturally small, and we leverage this property by using a nuclear norm-type penalisation. In this paper, a detailed study of nuclear norm penalised reconstruction using the tubal Singular Value Decomposition of Kilmer et al. is proposed. In particular, we introduce a new, efﬁciently computable deﬁnition of the nuclear norm in the Kilmer et al. framework. We then present a theoretical analysis, which extends previous results by Koltchinskii Lounici and Tsybakov. Finally, this nuclear norm penalised reconstruction method is applied to real data reconstruction experiments in Optical Coherence Tomography (OCT). In particular, our numerical experiments illustrate the importance of penalisation for OCT reconstruction.


Motivations and Contributions
3D Image reconstruction from subsamples is a difficult problem at the intersection of the fields of inverse problem, computational statistics and numerical analysis. Following the Compressed Sensing paradigm [1], the problem can be tackled using sparsity priors in the case where sparsity can be proved present in the image to be recovered. Compressed sensing has evolved from the recovery of a sparse vector [2], to the recovery of spectrally sparse matrices [3]. The problem we are considering in the present paper was motivated by Optical Coherence Tomography (OCT), where a 3D volume is to be recovered from a small set of measurements along a ray across the volume, making the problem a kind of tensor completion problem.
The possibility of efficiently addressing the reconstruction problem in Compressed Sensing, Matrix Completion and their avatars greatly depend on formulating it as a problem which can be relaxed as a convex optimisation problem. For example, the sparsity of a vector is measured by the number of non-zero components in that vector. This quantity is a non-convex function of the vector but a surrogate can be easily found in the form the 1 -norm. Spectral sparsity of a matrix is measured by the number of nonzero singular values and, similarly, a convex surrogate is the 1 -norm of the vector of singular values, also called the nuclear norm.
The main problem with tensor recovery is that the spectrum is not a well defined quantity and several approaches have been proposed for defining it [4,5]. Different nuclear norms have also been proposed in the literature, based on the various definitions for the spectrum [6][7][8]. A very interesting approach was proposed in References [9,10]. This approach is interesting in several ways: • the 3D tensors are considered as matrix of 1D vectors (tubes) and the approach uses a tensor product similar to the classical matrix product, after replacing multiplication of entries by, for example, convolution of the tubes, • the SVD is fast to compute, • a specific and natural nuclear norm can be easily defined.
Motivated by the 3D reconstruction problem, our goal in the present paper is to study the natural nuclear norm penalised estimator for tensor completion problem. Our approach extends the method proposed in [11] to the framework developed by [10]. The main contribution of the paper is

•
to present the most natural definition of the nuclear norm in the framework of Reference [10], • to compute the subdifferential of this nuclear norm, • to present a precise mathematical study of the nuclear norm penalised least-squares reconstruction method, • to illustrate the efficiency of the approach on the OCT reconstruction problem with real data.

Matrix Completion
After the many successes of Compressed Sensing in inverse problems, Matrix and tensor completion problems have recently taken the stage and become the focus of an extensive research activity. Completion problems have applications in collaborative filtering [12], Machine Learning [13,14], sensor networks [15], subspace clustering [16], signal processing [8], and so forth. The problem is intractable if the matrix to recover does not have any particular structure. An important discovery in References [17,18] is that when the matrix to be recovered has a low rank, then it can be recovered based on a few observations only [18] and using nuclear norm penalised estimation is a reasonably easy problem to solve.
The use of the nuclear norm as a convex surrogate for the rank was first proposed in Reference [19] and further analysed in a series of breakthrough papers [3,11,17,[20][21][22]].

Tensor Completion
The matrix completion problem was recently generalised to the problem of tensor completion. A very nice overview of tensors and algorithms is given in Reference [23]. Tensors play a fundamental role in statistics [24] and more recently in Machine Learning [13]. It can be used for multidimensional time series [25], analysis of seismic data [6], Hidden Markov Models [13], Gaussian Mixture based clustering [26], Phylogenetics [27], and much more. Tensor completion is however a more difficult problem from many points of view. First, the rank is NP-hard to compute in full generality [5]. Second, many different Singular Value Decompositions are available. The Tucker decomposition [5] extends the useful sum of rank one decomposition to tensors but it is NP-hard to compute in general. An interesting algorithm was proposed in References [28,29]; see also the very interesting Reference [30]. Another SVD was proposed in Reference [31] with the main advantage that the generalized singular vectors form an orthogonal family. The usual diagonal matrix is replaced with a so called "core" tensor with nice orthogonality properties and very simple structure in the case of symmetric [32] or orthogonally decomposable tensors [33].
Another very interesting approach, called the t-SVD was proposed recently in [10] for 3D tensors with applications to face recognition [34] and image deblurring, computerized tomography [35] and data completion [36]. In the t-SVD framework, one dimension is assumed to be of a different nature from the others, such as, for example, time. In this setting, the decomposition resembles the SVD closely by using a diagonal tensor instead of a diagonal matrix as for the standard 2D SVD. The t-SVD was also proved to be amenable to online learning [37]. One other interesting advantage of the t-SVD is that the nuclear norm is well defined and its subdifferential is easy to compute.
The t-SVD proposed in Reference [10] is a very attractive representation of tensors in many settings such as image analysis, multivariate time series, and so forth. Obviously, it is not so attractive for the study of symmetric moment or cumulant tensors as studied in Reference [13], but for large image sequences and times series, this approach seems to be extremely relevant.

Sparsity
One of the main discoveries of the past decade is that sparsity may help in certain contexts where data are difficult or expensive to acquire [1]. When the object A 0 to recover is a vector, the fact that it may be sparse in a certain dictionary will dramatically improve recovery, as demonstrated by the recent breakthroughs in high dimensional statistics on the analysis of estimators obtained via convex optimization, such as the LASSO [38][39][40], the Adaptive LASSO [41] or the Elastic Net [42]. When A 0 is a matrix, the property of having a low rank may be crucial for the recovery as proved in References [11,43]. Extensions to the tensor case are studied in References [7,44,45]. Tensor completion using the t-SVD framework has been analyzed in Reference [36]. In particular, our results complement and improve on the results in Reference [36]. Reference [46] deserves special mention for using more advanced techniques based on real algebraic geometry and sums of squares decompositions.
The usual estimator in a sparse recovery problem is a nuclear norm penalized least squares such as the one studied here and defined by (27). Several other types of nuclear norms have been used for tensor estimation and completion. In particular, several nuclear norms can be naturally defined such as in for example, References [7,47,48]. It is interesting to notice that, sparsity promoting penalization of the least squares estimator crucially relies on the structure of the subdifferential of the norm involved. See for instance References [11,49] or Reference [50]. In Reference [7] a subset of the subdifferential is studied and then used in order to establish the efficiency of the penalized least squares approach. Another interesting approach is the one in [48] where an outer approximation of the subdifferential is given. In the matrix setting, the work in References [51,52] are famous for providing a neat characterization of the subdifferential of matrix norms or more generally functions of the matrix enjoying enough symmetries. In the 3D or higher dimensional setting, however, the case is much less understood. The relationship between the tensor norms and the norms of the flattening are intricate although some good bounds relating one to the other can be obtained, as, for example, in Reference [53].
The extension of previous results on low rank matrix reconstruction to the tensor setting is nontrivial but is shown to be relatively easy to obtain once the appropriate background is given. In particular, our theoretical analysis will generalise the analysis in Reference [11]. In order to do this, we provide a complete characterisation of the subdifferential of the nuclear norm. Our results will be illustrated by computational experiments for solving a problem in Optical Coherence Tomography (OCT).

Plan of the Paper
Section 2 presents the necessary mathematical background about tensors and sparse recovery. Section 3 introduces the measurement model and the present our nuclear-norm penalised estimator.
In Section 4 we prove our main theoretical result. In Section 5, our approach is finally illustrated in the context of Optical Coherence Tomography, and some computational results based on real data are presented.

Background on Tensors, t-SVD
In this section, we present what is meant by the notion of tensor and the various generalisations of the common objects in linear algebra to the tensor setting. In particular, we will introduce the Singular Value Decomposition proposed in Reference [10] and some associated Schatten norms.

Basic Notations for Third-Order Tensor
In this section, we recall the framework introduced by Kilmer and Martin [9,10] for a very special class of tensors which is particularly adapted to our setting.

Slices and Transposition
For a third-order tensor A, its (i, j, k)th entry is denoted by A ijk .

Definition 1. The kth-frontal slice of A is defined as
The jth-transversal slice of A is defined as A tubal scalar (t-scalar) is an element of R 1×1×n 3 and a tubal vector (t-vector) is an element of R n 1 ×1×n 3 Definition 2 (Tensor transpose). The conjugate transpose of a tensor A ∈ R n 1 ×n 2 ×n 3 tensor A t obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices starting from the slide number 2 to the slice number n 3 and then appending the conjugate transposed frontal slice A (1) . Definition 3 (The "dot" product). The dot product A · B between two tensors A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×n 4 ×n 3 is the tensor C ∈ R n 1 ×n 4 ×n 3 whose slice C (n) is the matrix product of the slice A (n) with the slice B (n) : We will also need the canonical inner product.
Definition 4 (Inner product of tensors). If A and B are third-order tensors of same size n 1 × n 2 × n 3 , then the inner product between A and B is defined as the following (notice the normalization constant of FFT),

Convolution and Fourier Transform
Definition 5 (t-product for circular convolution). The t-product A * B of A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×n 4 ×n 3 is an n 1 × n 4 × n 3 tensor whose (i, j)-th tube is given by where * denotes the circular convolution between two cubes of same size.
Definition 6 (Identity tensor). The identity tensor J ∈ R n 1 ×n 1 ×n 3 is defined to be a tensor whose first frontal slice J 1 is the n 1 × n 1 identity matrix and all other frontal slices J i , i = 2, . . . , n 3 are zero.
A is a tensor which is obtained by taking the Fast Fourier Transform (FFT) along the third dimension and we will use the following convention for Fast Transform along the 3rd dimension The one-dimensional FFT along the 3th-dimension is given Naturally, one can compute A fromÂ via ifft Â , [ ], 3 using the inverse FFT, and is defined: Definition 8 (Inverse of a tensor). The inverse of a tensor A∈ R n×n×n 3 is written as A −1 satisfying where J is the identity tensor of size n × n × n 3 .

Remark 1.
It is proved in Reference [10] that for any tensor A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×n 4 ×n 3 , we have A * B = C ⇔Â ·B =Ĉ.

The t-SVD
We finally arrive at the definition of the t-SVD.
Definition 10 (Tensor Singular Value Decomposition: t-SVD). For M ∈ R n 1 ×n 2 ×n 3 , the t-SVD of M is given by where U and V are orthogonal tensor of size n 1 × n 1 × n 3 and n 2 × n 2 × n 3 respectively. S is a rectangular f-diagonal tensor or size n 1 × n 2 × n 3 , and the entries in S are called the singular values of M. This SVD can be obtained using the Fourier transform as follows: This t-SVD is illustrated in Figure 1 below. Notice that the diagonal elements of S, i.e., S(i, i, :) are tubal scalars as introduced in Definition 1. They will also be called tubal eigenvalues.

Definition 11. The spectrum σ(A) of the tensor A is the tubal vector given by
for i = 1, . . . , min{n 1 , n 2 }.

Some Natural Tensor Norms
Using the previous definitions, it is easy to define some generalisations of the usual matrix norms.
Definition 12 (Tensor Frobenius norm). The induced Frobenius norm from the inner product defined above is given by, Definition 13 (Tensor spectral norm). The tensor spectral norm A ∞ is defined as follows: where . 2 is the l 2 -norm.

Proposition 1.
Let M be n 1 × n 2 × n 3 tensor. Therefore where F corresponds to the Fast Fourier Transform.
Definition 14 (Tubal nuclear norm). The tensor nuclear norm of a tensor A denoted as A is the sum of singular values of all the frontal slices of A. Moreover, Note that by Parseval's inequality Therefore, it is equivalent to define the tubal-nuclear norm via in the Fourier domain. Recall moreover that theŜ(i, i, j) are all non-negative due to the fact thatÛ (k)Ŝ(k)V(k) t is the SVD of the kth slice of A.
Proposition 2 (Trace duality property). Let A, B be n 1 × n 2 × n 3 tensor. Therefore taking the maximum of F (B) jj 2 and the sum the slices of Ŝ jj 2 1/2 , and apply (12) and inverse of FFT, we obtain the result.

Proof. Again by Cauchy-Schwartz, we have
) and take this t-svd This means to solve the equation system to determine the α j . Thus, With this result, the remaining of proof follows directly from the proof of Proposition 2.

Rank, Range and Kernel
The rank, the range and the kernel are extremely important notions for matrices. They will play a role in our analysis of the penalised least squares tensor recovery procedure as well.
As noticed in Reference [10], a tubal scalar may have all its entrees different from zero but still be non-invertible. According to the definition, a tubal scalar a ∈ R 1×1×n 3 is invertible if there exists a tubal scalar b such that a * b = b * a = e. Equivalently, the Fourier transformâ of a has no coefficient equal to zero. We can define the tubal rank ρ i of S i,i,: as the number of non-zero components ofŜ(i, i, :). Then, the easiest way to define the rank of a tensor is by means of the notion of multirank as follows.
We now define the range of a tensor.

Definition 16.
Let j denote the number of invertible tubal eigenvalues and let k denote the number of nonzero noninvertible tubal eigenvalues. The range R(M) of a tensor M ∈ R n 1 ×n 2 ×n 3 is defined as Definition 17. Let j denote the number of invertible tubal eigenvalues. The kernel K(M) of a tensor M ∈ R n 1 ×n 2 ×n 3 is defined as

The Observation Model
In the model considered hereafter, the observed data are Y 1 , . . . , Y n given by the following model where the notation ·, · stands for the canonical scalar product of tensors. This can be seen as a tensor regression problem X i , i = 1, . . . , n are some tensors in R n 1 ×n 2 ×n 3 and ξ i , i = 1, . . . , n are some independent zero mean random variables. Assume that the frontal faces X (i) are i.i.d uniformly distributed on the set where e k (n) are the canonical basis vectors in R n .
Our goal is to recover the tensor A 0 based on the data Y i , i = 1, . . . , n only for n as small as possible.
Definition 18. For any tensors A, B ∈ R n 1 ×n 2 ×n 3 , we define the scalar product and the bilinear form and will denote by M the tensor given by

The Estimator
The approach proposed in Reference [11] for low rank matrix estimation which will be extended to tensor estimation in the present paper consists in minimisinĝ where where · is a tubal tensor nuclear norm that we will introduce in Definition 14. Recall that nuclear norm penalisation is widely used in sparse estimation when the matrix to be recovered is low rank [1]. Following the success of the application of sparsity to low rank matrix recovery, several extensions of the matrix nuclear norm were proposed in the literature [7,8,47], etc. Another type of nuclear norm was proposed in Reference [36] based on the tubal framework of Kilmer [10]. Our estimator is another nuclear norm penalisation based estimator. As it will be explained in Section 2 below, the nuclear norm used in the present paper has some advantages over other norms in the context of tubal low rank tensors and the resulting estimator is most relevant in many applications where we want to recover a tensor which is the sum of a small number of rank-one tensors.

Main Results
In this section, we provide our main results. First, in Section 4.2, we give a complete characterisation of the subdifferential of the nuclear norm (Definition 14). Then, we propose a statistical study of the nuclear-norm penalised estimatorÂ λ in Section 4.3.

Orthogonal Invariance
It is easy to see that the t-nuclear norm is orthogonally invariant. Indeed, consider two orthogonal tensors O 1 ∈ R n 1 ×n 1 ×n 3 , O 2 ∈ R n 2 ×n 2 ×n 3 , k = 1, . . . , n 3 . Since the product of two orthogonal tensors is itself orthogonal, we have

Support of a Tensor
Given that the t-svd of a tensor is with ( U (1) , . . . , U (min{n 1 ,n 2 }) ) is a family of orthonormal matrix in R n 1 ×n 1 , and ( V (1) , . . . , V (min{n 1 ,n 2 }) ) a family of orthonormal matrix in R n 2 ×n 2 and s kk = S(k, k, :) are the spectrum of A.
We also let S ⊥ j , j = 1, 2 to be the orthogonal complements of S j and by P S j , j = 1, 2, the projector on linear vector subspace S of tubal tensors.

The Subdifferential of the t-nuclear Norm
Our first result is a characterisation of the subdifferential of · . Recall first the particular case of the matrix nuclear norm · * . By Corollary 2.5 in [52], we have This result is established in [52] using the Von-Neumann inequality and in particular the equality case of this inequality.

Von-Neumann's Inequality for Tubal Tensors
Theorem 1. Let A, B be n 1 × n 2 × n 3 tensor. Therefore where S A is a rectangular f -diagonal tensor, contains all the singular values of A.
Equality holds in (21) if and only if A, B have the same singular tensors.
Proof. Let F denote the Fast Fourier Transform. We have .
Thus, using the Von Neumann inequality for matrices, we get where the last equality stems from the isometry property of Fast Fourier Transform.
Notice that the Von-Neumann inequality was extended in [54] to general tensors and exploited in [55] for the computation of the subdifferential of some tensor norms. In comparison, the case of the t-nuclear norm only need an appropriate use of the matrix Von Neumann inequality.

Lewis's Characterization of the Subdifferential
Theorem 2. Let f : R min{n 1 ,n 2 } → R be a function convex, so : The proof is exactly the same as in [52], Theorem 2.4. Theorem 3. Let us suppose that the function f : R min{n 1 ,n 2 } −→ R is convex. Then, the tensor The proof is the same as in [52], Corollary 2.5 where the matrix Von Neumann inequality (and more precisely, the exact characterization of the equality case) is replaced with Von Neumann inequality for tubal tensors given by Theorem 4.1. X which are non-zero. The subdifferential of the t-nuclear norm is given by Proof. We only need to rewrite (22) using the well-known formula for the subdifferential of the Euclidean norm. We provide the details for the sake of completeness.
Let S X be a t-vector in R n 1 ×1×n 3 .
where in second equality we use the result of [56], Chapter 16,l Section 1, Proposition 16.8.
As is well known, the subgradient of the l 2 -norm is and plugg this formula into (24). Therefore Let T be the set of indices j which tubal scalar σ(X) j = 0. Thus, Therefore, U * D( µ) * V is of the form So we have,

Error Bound
Our error bound on tubal-tensor recovery using the approach of Koltchinskii, Lounici and Tsybakov [11] is given in the following theorem. Theorem 5. Let A ⊆ R n 1 ×n 2 ×n 3 be a convex set of tensors. LetÂ λ be defined bŷ where we recall that · denotes the tensor nuclear norm (see Definition 14). Assume that there exists a constant ρ > 0 such that, for all tensor A ∈ A − A : Let λ ≥ 2 M ∞ , then Proof. We follow the proof of Reference [11]. We provide all the details for the sake of the completeness.
Let us compute the directional derivative of L n atÂ λ .
The optimality condition ofÂ λ implies DL n (Â λ ; A −Â λ ) ≥ 0, ∀A ∈ A. Thus On the other hand, by compacity of G ∈ ∂ . (Â λ ); see [57]. Combining (31) with (30), we obtain Consider an arbitrary tensor A ∈ A of tubal rank r with spectral representation where s jj = S (j, j, :), and with support (S 1 , S 2 ). Using that it thus follows from (32) that By the monotonicity of subdifferentials of convex functions, we have Ĝ − G,Â λ − A ≥ 0 (cf. [58], Chapter 4, Section 3, Proposition 9). Therefore Furthermore, by (23), the following representations holds where W is an arbitrary tensor with W ∞ 1 and and using Lemma 3, we can choose W such that where in the first equality we used that A has support (S 1 , S 2 ) . For this particular choice of W, Using the identity and the fact that Now, to find an upper bound on 2 M,Â λ − A , we use the following decomposition: . This implies, due to the trace duality, where Due to Proposition 3, we have (41) and using the assumption (28), it follows from (38) and (40) that Using we deduce from (42) that In Appendix A, it is proved that we can set ρ = mn 1 n 2 n 3/2 3 (43) and

Numerical Experiments
The proposed methods were numerically validated using 3D OCT images. OCT is widely studied as a medical imaging system in many clinical applications and fundamental research. In numerous clinical purposes, OCT is considered as a very interesting technique for in situ tissue characterization known as "optical biopsy" (in opposition to the conventional physical biopsy). OCT is operating under the principle of low coherence interferometry providing micro-meter spatial resolution at several MHz A-scan (1D optical core in z direction) acquisition rate. In these experiments, we found that depth was a different coordinate from the two other coordinates and the tubal SVD approach appeared particularly relevant. One way of circumventing the problem in the case where the three coordinates have the same properties is to perform the reconstruction three times using the proposed method and take the average of the results.

Benefits of Subsampling for OCT
The OCT imaging device, as the case of the most medical imaging systems, obeys two key requirements for successful application of compressed sensing methods: (1) medical imaging is naturally compressible by sparse coding in an appropriate transform domain (e.g., wavelet, shearlet transforms, etc.) and (2) OCT scanning system (e.g., Spectral-Domain OCT, the most used) naturally acquire encoded samples, rather than direct pixel samples (e.g., in spatial-frequency encoding). Therefore, the images resulting from Spectral-Domain OCT are sparse in native representation, hence yielding themselves well to various implementations of the ideas from Compressed Sensing theory.
Certainly, OCT enables high-speed A-scan and B-scan acquisitions ( Figure 2B) but presents a serious issue when it comes to acquiring a C-scan volume. Therefore, especially in case of biological sample/tissue examination, using OCT volumes poses the problem of frame-rate acquisition (generating artifacts) as well as the data transfer i.e., several hundred Mo for each volume ( Figure 2B).
Indeed, OCT volume data can be considered as a n 1 × n 2 × n 3 tensor of voxels. Thereby, the mathematical methods and materials related to tensors study are well suited for 3D OCT data.

Experimental Set-Up
An OCT imaging system ( Telesto-II 1325 nm spectral domain OCT) from THORLABS (Figure 3), is used to validate the proposed distortion models. Axial (resp. lateral) resolution is 5.5 µm (resp. 7 µm) and up to 3.5 mm depth. The Telesto-II allows a maximum field-of-view of 10 × 10 × 3.5 mm 3 with a maximum A-Scan (optical core) acquisition rate of 76 kHz.

Validation Scenario
The method proposed in this paper was implemented in a MATLAB framework without taking into account the code optimization aspects as well as the time-computation. In order to validate experimentally the approach presented here, we acquired different OCT volumes (C-Scan images) of realistic biological samples: for example, a part of a grape (Figure 4 (left)) and a part of a fish eye retina (Figure 4 (right)). The size of the OCT volume used in these numerical validations, for both samples, is A n×m×l = 281 × 281 × 281 voxels. To access the performance of the proposed algorithm, we constructed several undersampled 3D volume using 30%, 50%, 70%, and 90% of the original OCT volume. To do this, we created two types of 3D masks. The first consists of a pseudo-random binary masks M v using a density random sampling in which the data are selected in a vertical manner ( Figure 5 (left)). For the second type of mask, instead of a vertical subsampling of the data, we designed oblique masks M o as shown in Figure 5 (right)) which are more appropriate in the case of certain imaging systems such as Magnetic Resonance Imaging (MRI), Computerized Tomography scan (CT-scan), and in certain instances, OCT imaging systems as well.

Obtained Results
The validation scenarios were carried out as follows-the studied method was applied to each OCT volume (i.e., grape or fish eye) using various subsampling rates (i.e., 30%, 50%, 70%, and 90%). Also, in each case, the vertical M v or the oblique M o 3D binary masks were used. The results obtained with our nuclear norm penalised reconstruction approach are discussed below. Figures 6 and 7 depict the different reconstructed OCT volumes using the oblique and the vertical binary masks. Instead of illustrating the fully reconstructed OCT volume, we choose to show 2D images (the 100th xy slice of the reconstructed volumes) for a better visualization, with the naked eye, of the quality of the obtained results. It can be highlighted that the sharpness of the boundary is well preserved; however, it loses some features in the zones where the image has low intensity. This is a common effect of most of the compressed sensing and inpainting methods. In order to improve the quality of the recovered data, conventional filters based post-processing could be also be used in order to enhance contrast. Figure 6. [sample: grape, mask: vertical]-Reconstructed OCT images (only a 2D slice is shown in this example). Each row corresponds to an under-sampling rate: 30% (1st row), 50% (2nd row), 70% (3th row), and 90% (4th row). The first column represents the initial OCT image, the second column the under-sampled data used for the reconstruction, and the third column, the recovered OCT images. [sample: grape, mask: oblic]-Reconstructed OCT images (only a 2D slice is shown in this example). Each row corresponds to an under-sampling rate: 30% (1st row), 50% (2nd row), 70% (3th row), and 90% (4th row). The first column represents the initial OCT image, the second column the under-sampled data used for the reconstruction, and the third column, the recovered OCT images.

Fish Eye Sample
we also performed validation experiments using an optical biopsy of a fish eye (see the images sequence depicted in Figures 8 and 9). Each row corresponds to an under-sampling rate: 30% (1st row), 50% (2nd row), 70% (3th row), and 90% (4th row). The first column represents the initial OCT image, the second column the under-sampled data used for the reconstruction, and the third column, the recovered OCT images.  90% (4th line). The first column represents the initial OCT image, the second column the under-sampled data used for the reconstruction, and the third column, the recovered OCT images.

The Singular Value Thresholding Algorithm
In this section, we describe the algorithm used for the computation of our estimator, namely a tubal tensor version of the fixed point algorithm of Reference [59]. This algorithm is a very simple and scalable iterative scheme which converges to the solution of (17). Each iteration consists of two successive steps: • a singular value thresholding step where all tubal singular values with norm below the level 2λ are set to zero and the remaining larger singular values are being removed an offset 2λ. • a relaxed gradient step.
In mathematical terms, the algorithms works as follows: In the setting of our algorithm, the Shrinkage operator operates as follows: • compute the circular Fourier transform F (A (l−1) ) of the tubal components of A (l−1) , • compute the SVD of all the slices of F (A (l−1) ) and forms the tubal singular values, • sets to zero the tubal singular values whose 2 -norm lies below the level 2δλ and shrink the other by 2δλ, • recompose the spectrally thresholded matrix and take the inverse Fourier transform of the tubal components.
On the other hand, P Ω is the operator that assigns to the entries indexed by Ω the observed values and leaves the other values unchanged.
The convergence analysis of [59] directly applies to our setting.

Analysis of the Numerical Results
In order to quantitatively assess the obtained results using different numerical validation scenarios and OCT images, we implemented two images similarity scores extensively employed in the image processing community. In particular, we use • the Peak Signal Noise Ratio (PSNR) computed as follows where d is the maximal pixel value in the initial OCT image and the MSE (mean-squared error) is obtained by with I o and I r represent an initial 2D OCT slice (selected from the OCT volume) and the recovered one, respectively.

•
The second image similarity score consists of the Structural Similarity Index (SIMM) which allows measuring the degree of similarity between two images. It is based on the computation of three values namely the luminance l, the contrast c and the structural aspect s. It is given by where, with µ I r , µ I o , σ I r , σ I o , and µ I r ,I o are the local means, standard deviations, and cross-covariance for images I r , I o . The variables C 1 , C 2 , C 3 are used to stabilize the division with weak denominator.

Illustrating the Rôle of the Nuclear Norm Penalisation
In order to understand the rôle of the nuclear norm penalisation in the reconstruction, we have performed several experiments with different values of the hyper-parameter λ which are reported in Figure 10. This figure shows the performance of the estimator as a function of the ratio between the largest singular value and the smallest selected singular value. This ratio is an implicit function of λ, which is more intuitive than λ for interpreting the results. The smaller the ratio, the smaller the number of singular values incorporated in the estimation.
The corresponding values are reported in Table 1. The results of these experiments show that different weights for the nuclear norm penalisation imply different erros. In these experiments, one sees that the SME is optimised at a ration equal to 8 and the PSNR is maximised at 8 as well. The SSIM is maximum for values of the ratio equal to 7 and 8. Estimation which does not account for the intrinsic complexity of the object to recover will clearly fail to reconstruct the 3D images properly. Figure 10. Evolution of the PSNR, SSIM and SME performance criteria as a function of the ratio between the largest singular value and the smallest singular value selected by the penalised estimator.

Performance Results
Tables 2 and 3 summarise the numerical values of MSE, PSNR, SSIM computed for each test, that is, using different undersampling rates and masks, for our two different test samples (grape or fish eye retina). The parameter λ was chosen using the simple and efficient method proposed in [60]. As expected the error decreases as a function of the percentage of observed pixels. The results also show that the estimator is not very sensitive to the orientation of the mask. There seems to be a phase transition after the 70% level, above which the reconstruction accuracy is suddenly improved in terms of PSNR and SSIM, but the method still works satisfactorily at smaller sampling rates.

Conclusions and Perspectives
In this paper, we studied tensor completion problems based on the framework proposed by Kilmer et al. [10]. We provided some theoretical analysis of the nuclear norm penalised estimator. These theoretical results are validated numerically using realistic OCT data (volumes) of biological samples. These encouraging results with real datasets demonstrate the relevance of the low rank assumption for practical applications. Further research will be undertaken in devising fast algorithms and incorporating other penalties such as, e.g., based on sparsity of the shearlet transform [61]. Funding: This work has been supported by ANR NEMRO (ANR-14-CE17-0013).

Acknowledgments:
The authors would like to thank S. Aeron from Tufts University for interesting discussions and for kindly sharing his code at an initial stage of this research. They would also like to thank the reviewers for their help in improving the paper by their numerous comments and criticisms.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Some Technical Results
Appendix A.1. Calculation of ρ and a value of λ such that M S ∞ ≤ λ with high probability Appendix A.1.1. Computation of ρ Using the fact n = m × n 3 where m is the number of pixels considered, we have with i = {j 1 , j 2 , j 3 } A, X j 1 ,j 2 ,j 3 2 1 n 1 n 2 In this part, we will show that a possible value for the coefficient λ can be taken as In order to prove this bound, we will need a large deviation inequality given by the following result Proposition A1 (Theorem 1.6 in [62]). Let Z 1 , . . . , Z n be independant random variables with dimension n 1 × n 2 that satisfy E [Z i ] = 0 and Z i S ∞ U almost surely for some constant U and all i = 1, . . . , n. We define Then, for all t > 0, with probability at least 1 − e −t , we have The next lemma gives a bound of the stochastic error for tensor completion.
Lemma A1. Let X (i) be i.i.d uniformly distributed on X . Then for any t > 0 with probability at least In order to prove this lemma, we will need the following lemma Lemma A2. We draw X (j) i with a uniform random position on {1, . . . , n 1 } × {1, . . . , n 2 } with null entries everywhere except one input equals 1 at the position (k, l). Observe that   Proof of Lemma A1. Consider the tensor completion under uniform sampling at random with Gaussian error. Recall that in this case we assume that the pairs (X i , Y i ) are i.i.d. Using the fact Y i = X i , A 0 + ε i and E [ε i X i |X i ] = 0, we have In the following, we treat the terms Λ 1 and Λ 2 separately in the lemma and proposition below. Before proceeding, notice that Λ 1 is the Schatten norm of a quadratic function of X 1 , . . . , X n . Furthermore, X i , A 0 is the Fourier Transform of the tube (i 1 , i 2 ) for the frequency k 3 . Using the Property 1, we have Define the operator Γ (j) which takes the slice j of a tensor (i.e tensor F (X i )) and puts it in the same place in a zero tensor. The following proposition Proposition A2. Let T be a null tensor except at slice j. Therefore Using (A7), we have Proof. To prove this lemma, we apply the Proposition A1 for the random variables Moreover, using the facts Using the duality trace and Jensen's inequality, we have Thus, (A8) follows from (A1). Now, we study the bound of Λ 2 . For this purpose, we use the proposition below is an immediate consequence of the matrix Gaussian's inequality of Theorem 1.6 of [62]. Proposition A3. Let Z 1 , . . . , Z n be independent random variables with dimensions n 1 × n 2 , and {ε i } be a finite sequence of independent standart normal. Define Then, for all t 0, t + log(m) n where m = n 1 + n 2 Using this fact and (A7), we have