A Fast Self-Learning Subspace Reconstruction Method for Non-Uniformly Sampled Nuclear Magnetic Resonance Spectroscopy

: Multidimensional nuclear magnetic resonance (NMR) spectroscopy is one of the most crucial detection tools for molecular structure analysis and has been widely used in biomedicine and chemistry. However, the development of NMR spectroscopy is hampered by long data collection time. Non-uniform sampling empowers rapid signal acquisition by collecting a small subset of data. Since the sampling rate is lower than that of the Nyquist sampling ratio, undersampling artifacts arise in reconstructed spectra. To obtain a high-quality spectrum, it is necessary to apply reasonable prior constraints in spectrum reconstruction models. The self-learning subspace method has been shown to possess superior advantages than that of the state-of-the-art low-rank Hankel matrix method when adopting high acceleration in data sampling. However, the self-learning subspace method is time-consuming due to the singular value decomposition in iterations. In this paper, we propose a fast self-learning subspace method to enable fast and high-quality reconstructions. Aided by parallel computing, the experiment results show that the proposed method can reconstruct high-fidelity spectra but spend less than 10% of the time required by the non-parallel self-learning subspace method.


Introduction
Multidimensional nuclear magnetic resonance (NMR) spectroscopy plays an important role in the fields of biomedicine and chemistry [1][2][3]. However, long data acquisition time [4][5][6] has to be solved, and one of the effective approaches to reduce the time is to acquire partial data with nonuniform sampling (NUS) [7][8][9][10][11][12]. To obtain a full spectrum, reconstruction needs to incorporate priors, and the most common ones are sparsity [13,14] and low rankness [15][16][17]. Sparsity [18][19][20] assumes the spectrum has few non-zero values in the spectrum but will not be satisfied at low-intensity broad peaks [21]. The low rank approach transforms the time-domain NMR signal (called free induction decay (FID)) into a Hankel matrix, and explores the low rankness of this matrix. Therefore, it is also called a low-rank Hankel matrix (LRHM) method. Since the peak width will not affect the rank, LRHM offers better reconstructions for these challenging peaks, even though low-intensity peaks, referring to small singular values in the matrix rank minimization, may be compromised or even lost in LRHM reconstructions [22]. This problem also exists in general low-rank matrix reconstruction and has been alleviated by introducing the truncated nuclear norm (TNN) to preserve the small singular values in the reconstruction [23]. Recently, we proposed a self-learning subspace (SLS) [24] method to mitigate the spectra degradation of LRHM at high acceleration factors. Beyond the singular values, we found that the subspace of the Hankel matrix corresponds to spectral peaks, and the TNN provides one approach to incorporate the subspace information. With the help of TNN, we establish the concept of signal subspace reconstruction in the low-rank Hankel matrix. The signal space is divided into two subspaces, an obvious, easily-restored strong signal subspace and an unstable, hard-to-estimate weak signal subspace [24]. This method includes two iterations loops: The outer loops update the signal subspace with prior constraints, and the inner one performs updates of signal reconstruction under the given subspace. Experimental results show that SLS achieves as high-quality reconstructions as does LRHM, and notably, produces better reconstructions of low-intensity peaks [24].
The SLS approach, however, requires intensive computations of singular value decomposition (SVD). In the iterative reconstruction, SVD will be called approximately 100 to 200 times, which is very time-consuming. In addition, since the computational complexity of SVD is proportional to the three power of the matrix size [25], the lengthy SVD computations may be impractical due to the data size and dimensions rapidly increasing in high-dimensional NMR spectroscopy.
In this work, we propose an accelerated algorithm for the state-of-the-art self-learning subspace method by introducing matrix factorization [26,27] to avoid SVD. Results on synthetic and realistic NMR data show that compared with the SLS, the fast SLS approach saves considerable time without sacrificing spectrum quality and enables faster reconstruction with parallel computing.

Related Work
Mathematically, a 1D FID can be modeled as the finite sum of damping exponential functions as follows [28][29][30][31]: where J is the number of spectral peaks, where  denotes an operator converting the undersampled FID x into a Hankel matrix x  ,  is an undersampling operator, *  represents the matrix nuclear norm as the sum of singular values, 2  represents the 2 l norm that measures data consistency, and  is a regularization parameter that balances the low rankness and the data consistency.
However, the LRHM model may sacrifice low-intensity peak recovery, which is related to small singular values. As we know, the information of a spectral peak includes its intensity, central frequency, and line shape. The nuclear norm only focuses on intensity. In order to utilize the extra information, including central frequency and line shape, hidden in the Hankel matrix, the SLS method was proposed [24]. The SLS method built a subspace framework to separate the original signal into a relatively strong signal subspace and a weak one. Then, peaks that lie in the weak subspace are particularly protected by enforcing them to have prior spectral peak shapes. The selflearning subspace reconstruction model is expressed as [24]

A B
of the l th outer iteration is more accurate than the initial estimate. Machine learning typically includes supervised learning, semi-supervised learning, and unsupervised learning, which use all, partial, and non-labeled samples, respectively. Transduction learning [32] is another type that predicts specific test samples by learning specific training samples. The proposed method belongs to unsupervised learning since no labeled samples or specific test samples are used. The singular value decomposition is adopted to learn the subspace of the Hankel matrix. The self-learning process means no prior or label information is provided in advance, and we only learn the subspace space information from the initial time-domain signal that is filled with zeros, and we update this information from the intermediate reconstruction. In practice, prior or label spectra are not accessible in the biomedical magnetic resonance experiment. Thus, unsupervised selflearning is a reasonable choice in this application.
Nevertheless, the application of SLS to higher dimensional NMR spectroscopy experiments is impeded by the expensive computation of SVD demanded by computing the nuclear norm term. Now, to reduce the computation time, we will introduce the matrix factorization and derive the fast algorithm.

Methods
For a given matrix, which can be factorized into two matrices, m r   P  and n r   Q  (size r is unlimited), its nuclear norm can be computed as the following [33,34]: where F  denotes the Frobenius norm of a matrix and H  means conjugate transpose.
In this work, we propose a self-learning subspace matrix factorization (SLSMF) method to accelerate the state-of-the-art SLS method: It requires no SVD computations in the inner loop, which greatly speeds up the reconstruction. Next, we will solve the optimization problem of Equation (6). The augmented Lagrange form of Equation (6) where D is a dual variable used to improve the convergence speed of the algorithm,

A B
to determine the best subspace under the prior information. Once the subspace is obtained, in the inner one, in order to solve Equation (7), it is converted into four sub-problems by adopting alternating direction methods of multipliers (ADMM) [35]: The solution is: (2) Fixing The solution is: (3) Fixing The solution is: (4) Fixing The alternating iterations in Equation (8) stop if the number of iterations k reaches the maximal number K , or the normalized successive difference +1 is smaller than that of a given tolerance tol  . The pseudocode of this reconstruction algorithm is shown in Table 1. Initialization: Input y ,  ,  , set outer maximal iterations times =5 L , convergence condition In detail, the reconstruction tasks for different slices are assigned to multiple processing cores [36] and can be computed simultaneously (Figure 1).

Experiments and Results
We compare the proposed SLSMF method with the other three state-of-the-art NMR reconstruction methods, including the LRHM, the LRHM with matrix factorization (LRHMF) [26], and the SLS. Spectra reconstruction experiments were conducted on both synthetic 1D NMR spectra ( Figure 2) and real 2D NMR spectra (Figure 3-5). All reconstruction algorithms were performed using MATLAB 2017b (Mathworks Inc., Natick, MA, USA) on a computational server with two E5-2650v4 CPUs (12 core/CPU) and 160 GB RAM.
The important spectra parameters, including two 2D spectra of small, large, and intrinsically disordered proteins and one 2D spectrum of solid NMR, are listed in Table 2. More details can be found in the experimental descriptions below.

2) While
End for;  Figure 5 Note: The NUS and reconstruction are performed on the indirect dimension, which is noted as t1.
The 2D 1 H-15 N HSQC spectrum of GB1 was prepared with following conditions: The GB1 object sample is 2 mM U-15 N, 20%-13 C GB1 in 25 mM PO4, pH 7.0 with 150 mM NaCl, and 5% D2O. Data were collected using a phase-cycle selected HSQC (hsqcfpf3gpphwg in Bruker library) at 298 K on a Bruker Avance 600 MHz spectrometer using a room temp HCN TXI probe, equipped with a z-axis gradient system. The fully sampled spectrum consists of 1676  170 complex points; the direct dimension ( 1 H) has 1676 data points, while the indirect dimension ( 15 N) has 170 data points.
The 2D 1 H-15 N best-TROSY spectrum of ubiquitin was acquired at 298.2 K temperature on an 800 MHz Bruker spectrometer and was described in a previous paper [37]. The fully sampled spectrum consists of 683  128 complex points; the direct dimension ( 1 H) has 683 data points, while the indirect dimension ( 15 N) has 128 data points.
The solid-state NMR spectrum was acquired from an imidazole sample at room temperature on a 21.1 T Bruker AVANCE-III spectrometer. The sample was spinning at 60 kHz inside a 1.3 mm probe. The 2D 1 H-1 H double-quantum spectrum was recorded using a symmetry-based R1225 pulse sequence with a recycle delay of 5 s. Data were recorded by co-adding 16 transients with 300 t1 increments. The number of fully sampled data points in the indirect dimension is 150.
In addition, to quantify the reconstruction error, a relative 2 l -norm error (RLNE) defined as: where x is the reconstructed signal and x is the fully sampled signal, is adopted.

Reconstruction of Synthetic 1D NMR Spectra
The synthetic 1D NMR spectrum is generated according to Equation (1), and its specific parameter settings are shown in Table 3. The total number of data points is 512, and 31 complex data points are acquired (means 6% NUS data). Then, simulated Gaussian noise with a mean of zero and a standard deviation of 0.005 is added to the FID. Here, the maximum value of the noise is lower than that of the lowest spectral peak.
The reconstruction results are shown in Figure 2. Under very limited sampled data, the first and second reconstructed spectral peaks are severely damaged in LRHM low-fidelity reconstructions, while the SLS and the proposed method reconstruct the spectral peaks with high fidelity (see the supplementary materials Table S1 for the correlation of each peak in Figure 2). In Table 4, it is clearly shown that the robustness of the proposed method is consistent with SLS, and the Pearson correlation coefficient is much higher than LRHM at low intensity spectral peaks. The proposed method can accelerate the reconstruction of the best method, SLS, without compromising the spectra quality. Thus, it is correct and with the expectation that no significant difference between Figure 2f, the reconstructed spectra using SLSMF, and Figure 2e, the reconstructed spectra using SLS. Accordingly, LRHMF only accelerates the algorithm of LRHM, thus Figure 2c,d should be the same. In addition, in the supplementary materials, we displayed more reconstruction results of traditional methods with moderate-fidelity ( Figure S2) and high-fidelity ( Figure S3) from 100 simulation data trails and the correlation of their corresponding spectral peaks. Since the 1D reconstruction is fast (several seconds) for all compared methods, there is no need to compare the computation time. Table 3. Parameter settings of the synthetic 1D NMR spectrum.

Reconstruction of 2D NMR Spectra
Reconstruction experiments with realistic NMR are conducted on 15% NUS 2D 1 H-15 N HSQC spectra of GB1, 20% NUS 2D 1 H-15 N best-TROSY spectra of ubiquitin, and 15% NUS 2D 1 H-1 H solidstate spectra. Figures 3a, 4a, and 5a are fully sampled 2D 1 H-15 N HSQC spectra of GB1, 2D 1 H-15 N best-TROSY spectra of ubiquitin, and 2D 1 H-1 H solid-state spectrum, respectively. Results show that all methods can reconstruct a majority of spectral peaks well. However, some peaks in best-TROSY experiments shown by arrows in Figure 4b,c are lost in LRHM and LRHMF. LRHM and LRHMF also produce some pseudo peaks in the reconstructed GB1 and solid spectra, as shown by arrows in Figures 3b,c and 5b,c. By contrast, SLS and SLSMF methods provide faithful reconstructions.
What is more, Figure 6 illustrates that the proposed method performs better in reconstructing low-intensity peaks. For example, LRHM and LRHMF weaken the reconstructed spectral peak near 125 ppm (Figure 6b-1,c-1) ), while SLSMF obtains the spectrum with peak intensity and line shape close to the fully sampled spectrum (Figure 6e-1).
Regression analyses (Figures 7-9) also exhibit that the SLSMF outperforms LRHM and LRHMF. It is shown that the SLSMF method gets a higher correlation than LRHM and LRHMF, especially in low-intensity peaks (Figures 7e-h, 8e-h, and 9e-h). Besides, lower RLNEsby SLSMF can be achieved than in LRHM and LRHMF, and RLNEs of SLSMF decrease much faster ( Figure 10). For instance, we can see that after the update of the subspace, the RLNEs of SLSMF drop almost linearly, implying that the introduction of the signal subspace can help reconstruct high-fidelity spectra more quickly and accurately with the same sampling rate. The above observations indicate that in terms of reconstruction time and reconstruction quality, the proposed approach permits considerable improvements from the SLS and low-rank Hankel matrix methods. From Figure 11, we can see that compared to the SLS approach, the proposed method significantly speeds up the reconstruction time about 2-4 times (Figure 11a). If equipped with parallel computing architecture, the acceleration in the reconstruction time of SLSMF would be striking. Take the GB1 experiment as an example: With the parallel computing SLSMF, the time spent on reconstructing spectra only requires 4.6% of the computation of the SLS without parallelism. However, if the data dimension is too small, parallel computing is not recommended because it must pay for a certain time on switching the parallel pool, i.e., a mechanism used to control multiple processing cores in software. For example, for the reconstruction of the solid-state spectra, using the LRHMF only takes about 22 s, but it takes 26 s under parallel computing, whereas switching the parallel pool spends about 24 s. We also ran methods on the personal computer and the computation time, as reported in Figure 12. The calculation time with the personal computer also confirms that the proposed method can accelerate the reconstruction and can further reduce the time by parallel computing.  In the fast NUS NMR, high-fidelity reconstruction of the spectrum is most important, followed by the rest of other performances, such as the computation time. Our work is to design a new algorithm to reduce the computation time while achieving the best spectra. To emphasize our proposed method's advantages in both reconstruction quality and speed, in Table 5 we list the advantages and disadvantages of the comparison method.        Note: R 2 denotes the square of the Pearson correlation coefficient. The correlation of low-intensity peaks and all peaks are averaged on all spectra in Figures 7-9. The reconstruction time are averaged on all spectra in Figures 11 and 12. The Pearson correlation coefficients of low-intensity peaks and all peaks are greater than 0.95 and 0.99, respectively, indicating high-fidelity reconstruction.

Conclusions
In this paper, a self-learning subspace matrix factorization (SLSMF) method is introduced to speed up the faithful reconstruction of highly-accelerated NMR spectroscopy. Comparisons with the state-of-the-art self-learning subspace method and the low-rank method imply that the new approach remarkably reduces computation time without sacrificing spectrum quality.