Signal Deconvolution and Noise Factor Analysis Based on a Combination of Time-Frequency Analysis and Probabilistic Sparse Matrix Factorization.

Nuclear magnetic resonance (NMR) spectroscopy is commonly used to characterize molecular complexity because it produces informative atomic-resolution data on the chemical structure and molecular mobility of samples non-invasively by means of various acquisition parameters and pulse programs. However, analyzing the accumulated NMR data of mixtures is challenging due to noise and signal overlap. Therefore, data-cleansing steps, such as quality checking, noise reduction, and signal deconvolution, are important processes before spectrum analysis. Here, we have developed an NMR measurement informatics tool for data cleansing that combines short-time Fourier transform (STFT; a time-frequency analytical method) and probabilistic sparse matrix factorization (PSMF) for signal deconvolution and noise factor analysis. Our tool can be applied to the original free induction decay (FID) signals of a one-dimensional NMR spectrum. We show that the signal deconvolution method reduces the noise of FID signals, increasing the signal-to-noise ratio (SNR) about tenfold, and its application to diffusion-edited spectra allows signals of macromolecules and unsuppressed small molecules to be separated by the length of the T2* relaxation time. Noise factor analysis of NMR datasets identified correlations between SNR and acquisition parameters, identifying major experimental factors that can lower SNR.


2
In this study, signal deconvolution was applied to free induction decays (FIDs) of one-3 dimensional (1D) nuclear magnetic resonance (NMR) to separate components and improve the 4 signal-to-noise ratio (SNR). The mathematical theory underlying this signal deconvolution is 5 based on the combined methods of short-time Fourier transform (STFT) and probabilistic sparse 6 matrix factorization (PSMF).

7
In Fourier transform (FT) NMR spectroscopy, an FID is the NMR signal generated by non-8 equilibrium nuclear spin magnetization precessing along the magnetic field. This non-9 equilibrium magnetization can be generated by applying a pulse of resonant radiofrequency close 10 to the Larmor frequency of the nuclear spins in the sample. An FID is usually a sum of multiple 11 decayed oscillatory signals. These signals return to equilibrium at different rates or relaxation 12 time constants. Analysis of the relaxation times of an FID for a sample gives significant insight 13 into the chemical composition, structure, and mobility of that sample. FIDs acquired by NMR 14 measurement are composed of many signals derived from the sample and several types of noise, 15 such as external noise, physical vibration, power supply, and internal noise of the spectrometer 16 due to thermal noise. Therefore, an FID signal can be modeled as: where is the initial transverse magnetization at t = 0 immediately after the 90° pulse (Equation

27
(S2)). The relaxation process can be described by saying that the transverse magnetization ( ) 28 decays exponentially according to Equation (S2). The shorter the relaxation time * , the more 29 rapid the decay.

30
If an FID has more than one component, the FID will be the sum of contributions from each 31 component (Equation (S3)):

3
STFT of ( ) can be written as: where the window function is first used to intercept the progress of FT on ( ) around = 5 locally, and then FT of the segment is performed on t (Equation (S5)) [3]. By moving the center 6 position of the window function sequentially, all of the FTs at different times can be obtained.

7
Applying Euler's formula (Equation (S6)), shows that the value of ( , ) is complex and composed of two signals, a real part ( ) and 9 an imaginary part ( ), whose phases differ by 90° from each other ( Figure S1, Equation (S7) and 10 (S8)): To change a complex value into an absolute value, the following equation is applied (Equation

12
(S9)): For PSMF [4], positive-valued matrices are needed and the original signal values must be 14 converted to their logarithmic form for optimal analysis. To convert Equation (S9) to a positive 15 logarithmic form, the following equation is applied (Equation (S10)):

16
= log (| | + 1). (S10) In our method using PSMF, we focus on sparse factorizations and on properly accounting for 17 uncertainties while computing the factorization. Thus, signal deconvolution is formulated as 18 finding the factorization of the data matrix (Equation (S11)): When considering the separation of signal and noise, Equation (S11) can be described as the sum 20 of a signal component, a noise component, and residuals (Equation (S12)): (S12) Equation (S12) estimates that the signal component ( (S15) To evaluate SNR, both noise-removed and noise-only FIDs are converted to signal and noise 29 spectra, respectively, by applying standard FT. SNR is calculated as the ratio of the signal peak 30 intensity to the noise value by using the method of Mnova (Equation (S16)) [6]: (S16) The noise value is calculated by using the standard deviation of the signals-free region (Equation

1
(S17)): where is number of points in the signal-free region, ( ) is the value of each digital point in 3 that region, and ( ) is average of the digital points in that region.

4
Finally, the relative SNR is the ratio of the SNR after denoising ( ) to the original SNR 5 ( ), which is calculated as follows (Equation (S18)): (S18) In order to obtain a theoretical SNR index based on acquisition parameters, the theoretical SNR 7 value (calcSNR) was calculated by using a previously described formula (Equation (S19)) [7]: where, C is the number of spins in the system (sample concentration/number of protons),

9
is the gyromagnetic ratio of the excited nucleus, is the gyromagnetic ratio of the detected 10 nucleus, NS is the number of scans, B is the external magnetic field, 2 is the transverse    Table   S1. Relative SNR of this spectra is 1.14-fold.  SNR-raw, SNR of raw data; SNR-denoised, SNR of denoised data; RelativeSNR, relative SNR; Total-int, total intensity; ShortT2*-int, intensity of short T2* signal; LongT2*-int, intensity of long T2* signal; ShortT2*/Total, ratio of intensity of long T2* signal to total intensity; Noise-raw, noise of raw data; Noise-denoised, noise of denoised data; GPZ, gradient pulse in the z-axis; RG, receiver gain; NS, number of scans; DE, pre-scan delay; SW, spectral width; O1, the offset of the transmitter frequency ; LOCKED, if LOCK is on, value is 1, if not, value is 0. Figure S9: Histogram of the composition of the separated signal in diffusion-edited NMR data.
We investigated the relationship between the composition of the separated signal and the gradient pulse in the z-axis (GPZ) parameter of diffusion-edited NMR. The histogram shows the relative SNR in NMR data measured using two different GPZ values. Shown is the ratio of the sum of short T2 intensity to total intensity for GPZ = 36.6% (blue) and for GPZ = 80% (red).
The average value in each pulse sequence is indicated.