Sensor Configuration and Algorithms for Power-Line Interference Suppression in Low Field Nuclear Magnetic Resonance

Low field (LF) nuclear magnetic resonance (NMR) shows potential advantages to study pure heteronuclear J-coupling and observe the fine structure of matter. Power-line harmonics interferences and fixed-frequency noise peaks might introduce discrete noise peaks into the LF-NMR spectrum in an open environment or in a conductively shielded room, which might disturb J-coupling spectra of matter recorded at LF. In this paper, we describe a multi-channel sensor configuration of superconducting quantum interference devices, and measure the multiple peaks of the 2,2,2-trifluoroethanol J-coupling spectrum. For the case of low signal to noise ratio (SNR) < 1, we suggest two noise suppression algorithms using discrete wavelet analysis (DWA), combined with either least squares method (LSM) or gradient descent (GD). The de-noising methods are based on spatial correlation of the interferences among the superconducting sensors, and are experimentally demonstrated. The DWA-LSM algorithm shows a significant effect in the noise reduction and recovers SNR > 1 for most of the signal peaks. The DWA-GD algorithm improves the SNR further, but takes more computational time. Depending on whether the accuracy or the speed of the de-noising process is more important in LF-NMR applications, the choice of algorithm should be made.


Introduction
In the past decades, low field (LF) nuclear magnetic resonance (NMR) and magnetic resonance imaging (MRI) research has got particular attention with the goal to become a complementary supplement to conventional high-field NMR/MRI. Some potential applications of LF NMR based on both induction coil [1] and superconducting quantum interference devices (SQUID) [2,3] were investigated. LF NMR also has some successful applications, for instance, to investigate how gas hydrate accumulates and dissociates in the pore space [4]. If only low magnetic fields are needed, one can build an inexpensive and portable NMR/NMR system. As we know, the area integral under the NMR line will not change with the detection field (B 0 ) strengths and signal-to-noise ratio (SNR) is field noise components dominate the magnetic field noise [26] and we focus on suppressing these noise components. (3) The interferences from all detectors should be strongly spatially correlated.

Sensor Configuration and Measurement Sequence
We configure a 4-channel SQUID sensors system which is placed in a commercial liquid helium dewar. The schematic diagram of the system can be found in [26]. The signal sensor, as shown in Figure 1, is composed of a 2nd-order gradiometer connecting to SQUID module (so-called S module in Figure 1) with 22 mm diameter and 50 mm baseline. 2nd-order gradiometers which consist of two 1st-order gradiometers connected with opposite wire-winding direction, are usually used in unshielded or conductively shielded environment to suppress both homogeneous field noise and 1st-order gradient noise. Our S module consist of an input coil and a DC SQUID with 680 nH mutual inductance. The distance between the room temperature sample and the lowest coil of the 2nd-order gradiometer is 15 mm. The reference sensors consist of three orthogonal magnetometers with 2 mm diameter connecting to S modules. They are placed about 60 mm above the sample to make sure that the amplitude of the sample NMR signal picked up by the magnetometers is below 12 fT/ √ Hz, which is the value of our measured environmental white noise above 3 kHz. Note that the reference sensors can also include 1st-order gradiometers in the case of strong environmental gradients. Our LF-NMR experiments are performed in an unshielded environment. The sample is 2,2,2-trifluoroethanol with NMR grade 99.8%. The 2,2,2-trifluoroethanol is placed in a 20 mL polyvinyl chloride bottle. A 0.65 T permanent magnet pair is employed to supply the prepolarization (B p ) field. A motorized transportation device is used to transfer the sample from the polarizing field to the measurement location underneath the SQUID sensors. A more detailed description of the system (see Figure A1 in Appendix A) can be found in [27].
The experiment starts by prepolarizing the sample in the permanent magnet pair for a time T p = 5 s, as shown in Figure 2. After prepolarization, the sample is transported to the outer bottom of the dewar in a transportation time of T tran = 550 ms. In order to get the NMR signal at the Larmor frequency f L of 5 kHz (B 0 =117 µT), a π/2 and a π excitation field pulse are applied as excitation (B 1 ) fields. T FID is the time when the free induction decay (FID) signal relaxes completely. We record the signal with the frequency encoding gradient G z switched on to bring down the SNR to a low level on purpose, e.g., to obtain an SNR of about one, in order to study the efficiency of our de-noising approach. The SQUID readout electronics is kept in the reset state during 2 ms of delay time after finishing the π pulse. Subsequently, spin echoes are recorded for a duration of T a .

Discrete Wavelet Analysis-Least Squares Method (DWA-LSM) Based De-Noising
Wavelets are mathematical functions that cut up data into different levels in time domain. Each level is matched with a frequency range [28]. Although there are many applications of wavelets in NMR signal noise reduction, the appropriate de-noising technique is specific for one particular spectrum and cannot easily be generalized. Two important factors affect the DWA-based de-noising quality. The first one is the selection of the wavelet functions. The Daubechies (db) wavelet, coif wavelet and sym wavelet are generally suitable for retaining the signal. In order to limit the complexity of computation, the db wavelet is chosen [29]. The db4 wavelet attains the smallest entropy, which means the most critical information of the signal will be retained [20]. The db3 also attains a small entropy, and it can smoothen the spectrum, sharpen multi-peak shapes and at the same time maintain the peaks' locations [30]. In addition, no signal peak will be eliminated, and weak peaks that are embedded in the noise will be recovered. So we choose db3 wavelet for interference reduction. The second key factor is the determination of a decomposition level. White noise estimation is often used to determine the decomposition level for signal with white noise [20]. Here, both the wanted signal (spin echo) and unwanted signal (power-line interference) can be seen as 'signal' whose features should be retained during DWA, since this is very important for the further de-noising processes. Although in our case the noise is not dominated by white noise, we want to ensure that white noise can be sufficiently weakened at each level, which helps us to focus on suppressing the power-line interferences. The optimal decomposition level is 5 for a white noise estimation based method [29]. But if the duration time of the spin echo signal is short, for example for the case of short spin-spin (T 2 ) relaxation time, the frequency resolution of the signal will be limited by the duration time. In that case, the Gabor wavelets will be a better choice because they provide the best compromise between simultaneous time and frequency signal representations [31].
By using DWA to decompose the recorded spin echo signal B echo and the three-axis interference components B x,y,z (a simplified representation of B x , B y , B z ) into different levels in time domain, we can process the different harmonic interferences independently. We assume that the signal frequency range is from f 1 to f 2 , which depends on the Larmor frequency, the J-coupling strength, and the signal levels, which cover the frequency range from f 1 to f 2 , are the levels ranging from n 1 to n 2 . The de-noising process will be only implemented within the levels from n 1 to n 2 , which helps us speed up the process. Now the decomposed B echo at the i th level can be described as: where S i (j) is the pure echo signal at the i th level, N 2G i (j) the noise at the i th level picked up by the 2nd-order gradiometer, and j the number of data points ranging from 1 to 100000. The sampling rate of the data acquisition is 100 kHz and the data recording time T a is 1 s. The key goal is to remove N 2G i (j) from B echo (j) at each level. As mentioned, S(j) and N 2G (j) are uncorrelated, because S(j) is transient and N 2G (j) is a continuously periodic signal dominated by the homogeneous magnetic field noise from x, y and z directions. So when the echo signal ends, the B echo in the i th level can be described as: Here, we assume that the echo signal ends at the data point m. Now the noise suppression coefficients k i x,y,z at the i th level can be obtained by fitting the N 2G i (j) and the B x,y,z i (j) based on the LSM with the data points ranging from m+1 to 100000, as given below: Finally, we use k x,y,z to suppress the noise at each level and sum up the de-noised signals of each level. The total de-noised output B out (j) can be written as: A detailed description of the noise reduction process based on the DWA-LSM can be found in [26]. The effect of the LSM-based noise suppression may be limited by two reasons: (1) some near-field interference could not be suppressed by the method based on spatial correlation, (2) the least squares method finds the globally optimal solution, so the suppression coefficients matrix k can be easily dominated by the interferences with strong intensity, but the weak interferences will be ignored. However, the process of the LSM-based noise suppression method is simple and therefore suited for online de-noising.

Discrete Wavelet Analysis -Gradient Descending (DWA-GD) Based De-Noising
We suggest the GD algorithm after 1D DWA to determine the suppression coefficients matrix k x,y,z for further suppression of the weak interferences. Before determining k x,y,z , we propose normalization for data pre-processing, before applying the process of Equation (4). Normalization is particularly useful for the algorithm involving neural networks and clustering, and it helps prevent attributes with initially large range from outweighing attributes with initially smaller range [32]. We choose the min-max normalization because it can preserve the relationships among the original data values [33], and the relative amplitudes and the arrangements of the data will be retained. The formulation of the min-max normalization can be described as: where X is the original data, X the new value linearly transformed from X, and X max and X min the maximal and minimal values in X. It can be easily seen that when X = X min , X = 0, and when X = X max , X = 1. The entire range of X values from X min and X max is mapped to the range from 0 to 1. The advantage is that the weight of each data point is the same.
After the normalization of the recorded spin echo data and the three orthogonal interference components in each level, the k x,y,z can then be fitted based on the GD optimization algorithm. Note that the determination of k can be regarded as a linear regression model because the suppression is focused on suppressing the homogeneous magnetic field noise. In statistics, linear regression is a linear approach to modeling the relationship between a dependent variable and one or more independent variables (three orthogonal interference components B x,y,z ). By setting α, β, γ as the independent variables andŶ as the dependent variable, a linear relationship between these variables in each level can be described as: whereŶ is the predicted value for given values of α, β and γ. a, b and c are the slopes (the suppression coefficients in our case), and d is the intercept. Now the challenge is to determine the values of a, b, c, and d. The GD algorithm starts by introducing the mean squared error function ∆, known as the loss function, to calculate the difference between the actual Y (the recorded spin echo signal) and the predictedŶ value [34,35]: where i is the index of the data point. Then we calculate the partial derivatives of the loss function with respect to a, b, c, and d, and plug in the values of α, β, γ, and Y: The starting points of a 0 , b 0 , c 0 , and d 0 can be zero or random numbers. Now we update the current values of a c , b c , c c , and d c using the partial derivatives and the step length L: In the first step, the step length L 1 can be chosen a little bit larger to speed up convergence. We repeat this process from Equations (7) to (9) until the loss function is smaller than a given threshold ε 1 , then we get the first approximation of values a 1 , b 1 , c 1 , d 1 . The threshold ε 1 should not be too small at the first time, which can also speed up convergence. Now the first approximation of values a 1 , b 1 , c 1 , and d 1 are close to the optimal values, but still need to be improved. Note that the results are very sensitive to the choice of the starting points a 0 , b 0 , c 0 , and d 0 , and setting them to zero or to random numbers is not the best option. In order to solve this problem, we subsequently use the first approximation values of a 1 , b 1 , c 1 , and d 1 to be the starting points, and repeat the process again. It can be regarded as the introduction of a feedback to improve the starting points. In this round, the step length L 2 and the threshold ε 2 should be smaller to obtain precise results. The optimal suppression coefficients can be obtained. Then we repeat the process in Equation (4) to remove the noise from B echo by using the optimal suppression coefficients.

Numerical Simulation
We firstly verified the effectiveness of the two noise suppression algorithms using a simulated spin echo signal with two different groups, as shown in Figure 3a, whose strength distributions are 1:2:1 (S 1-3 ), and 1:5:1 (S 4-6 ), respectively. The duration of the simulated signal is 0.45 s and the data acquisition time is 1 s. To simulate the practical noise performance, we added some real noise picked up by the reference sensor, which contains both white noise and power-line interferences, with different noise strengths to achieve different SNR. The noise-added signals are depicted in Figure 3b, c and d, with SNR = 0.15, 0.6 and 3, respectively. Using these artificial signals covered with real noise, we can verify the de-noising effectiveness in case of low SNR. We use the ratio of the energy spectral density integral under the smallest signal peak (S 4 ) and one of the strongest harmonic peaks (N 2 ) which doesn't overlap with the signal peaks, for determination of the SNR. The noise of the synthesized signals was suppressed based on the two algorithms described in Section 2.2. Figure 4a-c show the noise suppression effects for different values of SNR. We can see that the noise peaks are greatly suppressed. In the case of lowest SNR, N 1-3 have been suppressed by a factor of 88%, 83% and 85%, respectively, by using DWA-LSM. By using DWA-GD, N 1-3 have been suppressed by a factor of 95%, 90% and 90%, respectively. The suppression factors can be calculated by two steps: (1) Calculate the residual noise defined as the difference between de-noised signal and pure signal. (2) Calculate the suppression factors by using the residual noise spectrum and pure noise spectrum. Figure 4d shows the signals with SNR = 0.15 before and after de-noising and the pure signal in time domain. It clearly illustrates the noise reduction in time domain. Table 1 shows the SNR changes during the de-noising process for different values of SNR. In all cases, the SNR has been significantly improved. Compared with LSM, GD yields further noise suppression. However, the GD consists of several steps and the process takes 10 min in case of SNR = 0.15, and several minutes in the other two cases (10,0000 data points). This is because the initial value of the added noise is different when we use the same step length to converge to the optimal point. A larger initial value will even take more time. For comparison, the LSM process can be finished within ten seconds. The number of data points also influences the computing time, based on the Bachmann-Landau notation [36]. When the data points double, the computing complexity will be 4 times longer than before since both the GD and the LSM involve square operations. Furthermore, the result is very sensitive to the step length set in the GD. If the step size is too large, the output might not converge to the optimal solution. If the step is too small, the convergence will be very slow. The SNR is limited by the residual noise whose amplitude is about 5% of the original noise amplitude in Figure 3b. The accuracy of the loss function and the step length are the main reasons that limit further noise reduction. However, higher accuracy and smaller step length will take more time and memory. In practical application, if the amplitudes of the noise peaks are significantly smaller than the amplitudes of the signal peaks, comparable to the white noise, a lower accuracy is acceptable. Note that before the suppression but after addition of the noise, the amplitudes of S 1 and S 5 reduce in Figure 3b-d. The lower the SNR, the larger the reduction. After the suppression, the amplitudes of S 1 and S 5 recover from the reduction. For example, compared to the pure signal, the amplitudes of S 1 and S 5 reduce by a factor of 4.5% and 5.9% in Figure 3b, and recover to 0.7% and 1% in Figure 4a. It is clear that the frequencies of S 1 and S 5 are very close to the noise peaks, so that their amplitudes can be affected when we suppress the noise in the time domain. The amplitude can be increased or decreased, depending on the phases of signal and noise. It can be explained that in the extreme case, when two signals with the same frequency but different phases pass through a linear system at the same time, the amplitude and phase of the output signal become a combination of these two signals. When one of the two signals is removed, the amplitude and phase of the remaining one will be recovered from the combination. Since S 2,3,4,6 are stable and S 1,5 change no more than 1%, it can be concluded that 99% NMR information has been preserved.

Spectral Correlation Coefficients
Firstly, we study the spectral correlation among the noise spectra of the different sensors, the magnetometer, the 1st-and 2nd-order gradiometers, between 3 and 5 kHz, as shown in Figure 5. The results show that there are two harmonics every 100 Hz, so if the signal bandwidth or J-coupling bandwidth is more than 50 Hz, the interference will overlap with the signal spectrum. The noise spectral correlation coefficient among the detectors can then be calculated by using the Pearson correlation coefficient (PCC), as shown in Table 2. The PCC is a measure of the strength of the association between two variables [37]. The number between −1 and 1 indicates the extent to which two variables are linearly related. The values of 1 and −1 imply complete positive and negative correlations, respectively. A value of 0 implies that the two variables are uncorrelated. We found correlation coefficients above 0.89, and the correlation coefficient between the magnetometers and the second-order gradiometer reached a value of 0.97. This means that the frequencies of the harmonics picked up by different sensors are the same, and the relative output amplitudes of the harmonics from each sensor are similar.
Although the values of the correlation coefficients will change with the measurement position, all the three coefficients at the three measurement positions amount to more than 0.80. In order to achieve a good spectral correlation coefficient, the measurement system should be located far away from near-field sources (e.g., neighboring electronic devices).

Interference Suppression Based on the DWA-LSM
Based on the good spectral correlation between the 2nd-order gradiometer and the magnetometer, we choose the 2nd-order gradiometer as the signal sensor to record spin echo signals of 20 mL 2,2,2-trifluoroethanol at 5 kHz proton Larmor frequency. The power-line harmonics interferences are recorded by three orthogonal magnetometers at the same time. In order to verify our de-noising method at low SNR, we implement a 2 µT/m frequency encoding gradient during the measurement to reduce the SNR. Figure 6 shows the heteronuclear J-coupling spectra of the 2,2,2-trifluoroethanol NMR signal and the 50 Hz harmonics interferences between 4650 and 5050 Hz which are picked up by the signal sensor. The fluorine and the proton groups of peaks are well separated, with center frequencies of about 4708 Hz and 5004 Hz, respectively. The linear broadband detector SQUID can easily record the signal groups from two different nuclei. We can clearly observe 3 fluorine peaks (F 1 -F 3 ) and 5 proton peaks (H 1 -H 5 ). As expected, the strength distribution of 1:2:1 for triplet fluorine peaks due to the coupling of the three fluorine nuclei with the two protons can be easily observed. The OH is uncoupled, so the proton spectrum appears as five proton peaks. The proton-fluorine coupling strength is about 9 Hz, which is in good agreement with measurement results at high-field [38] and in the earth's magnetic field [39]. The line width of the NMR spectra is 1 Hz, which is limited by the data acquisition time 1 s and by the frequency encoding gradient. However, the interferences (N 1 -N 3 ) destroy the spectrum of the fine structure of matter and make it impossible to get accurate parameters and structure estimation from the NMR spectrum. We also use the ratio of the energy spectral density integral under each signal peak and N 2 for determination of SNR. For most of the signal peaks, the SNR is much less than 1. For noise suppression in the case of SNR < 1 and noise distribution covering the signal frequency range (e.g., N 1 , N 3 ), we firstly decompose the B echo and the B x,y,z using 1D DWA and suppress the noise based on the LSM. Figure 7 shows the suppression effect. The noise peaks N 1 and N 4 were removed, the amplitudes of N 2 and N 3 were significantly suppressed by a factor of 2.6 and 3.4, respectively. We assume the noise peaks are removed when their amplitudes are below the white noise. Both the frequencies and the amplitudes of most of the fluorine and proton peaks are almost unchanged. However, some noise peaks are still present and their amplitudes are comparable with the weakest proton peak H 1 . The energy spectral density integral under N 2 has been suppressed by a factor of 6.5. For most of the signal peaks, the SNR has become more than 1, but the integrals under F 1 and H 2 have changed by 6.7% and −6%, respectively. The reasons can be explained from two aspects: (1) as mentioned in 2.3, the signal amplitudes can be affected by nearby noise peaks. (2) The fitting error due to nonlinear items in the interference may disturb the signals close to the noise peaks. In general, the interference peaks need to be further suppressed.

Optimizing the Determination of the Suppression Coefficients
In order to further improve the SNR, we implement normalization for data pre-processing and suppress the interferences based on the DWA-GD. As shown in Figure 8, all the interference peaks are further suppressed as compared to Figure 7, and the amplitude of N 2 has been reduced by a factor of 2.8 and N 3 has been removed. The weakest proton peak H 1 is larger than all interference peaks. Table 3 lists the calculated SNR and the energy spectral density integrals under the signal and noise peaks. The integral under N 2 has been suppressed by a factor of 7.3 when using DWA-GD de-noising process compared to that with DWA-LSM. With DWA-GD, we improve the SNR of all signal peaks to be larger than 1. The integrals under F 1 and H 2 become a bit larger, but the total change is no more than 10%, which is much less than the change of the area under N 2 . The rest of the fluorine and proton peaks are retained well. Compared to the LSM, the GD can achieve a better noise reduction effect and a wider signal frequency bandwidth, and the min-max normalization ensures that the data points are not overwhelmed by each other. However, compared with the numerical simulation in 2.3, the GD takes 20-30 min, even more than the 10 min in the simulation, and the noise peaks amplitude suppression factors reduce from 95% in the simulation to 87% in the practical application. This is because the noise recorded at the signal sensor is slightly different than that at the reference sensor, but is the same in the simulation since the noise in the synthesized signal is taken from the reference sensor. The different white noise in the signal and reference sensors makes the noise distribution more complicated than in the synthesized signal, which takes more computing time and reduces the suppression factor. For the case that SNR is not too low, people can choose the LSM only to speed up the process with an acceptable noise suppression effect.  Table 3. The calculated energy spectral density integral under the signal and noise peaks, as well as the SNR before and after the de-noising process by using the DWA-LSM and DWA-GD.

Conclusions
To study pure heteronuclear J-coupling in ultra-low field in an unshielded environment with power-line harmonics interference, we suggested two noise suppression methods based on the DWA-LSM and the DWA-GD optimal algorithms. We firstly verified the effectiveness of two de-noising methods on reducing the noise in the synthesized signal. We then demonstrated the DWA-LSM suppression procedure using spectral data recorded from a 2,2,2-trifluoroethanol sample. The result showed that the SNR had been significantly improved, while the signal peaks which were close to the interference peaks changed no more than 7%. In order to further suppress the interferences, we developed a second method to determine the suppression coefficients based on the DWA-GD. Before determining the suppression coefficients, we used the min-max normalization to ensure that every data point had the same weight. The result illustrated that the amplitude of the noise peak N 2 was suppressed by a factor of 7.28 and the SNR was improved by a factor of 47 if we ignored the change in the energy spectral density integral under F 1 and H 2 which was no more than 10%. With suitable choice of starting point and step length, both algorithms show good robustness in case of low SNR. Depending on whether accuracy or speed of the de-noising process is more important in LF-NMR applications, the choice of algorithm should be made. The algorithm should work in the case of multi-peaks J-coupling spectra with low SNR at ultra-low field, and also for MRI in the unshielded or conductively shielded case.