Enhancing Compression Level for More Efficient Compressed Sensing and Other Lessons from NMR Spectroscopy

Modern nuclear magnetic resonance spectroscopy (NMR) is based on two- and higher-dimensional experiments that allow the solving of molecular structures, i.e., determine the relative positions of single atoms very precisely. However, rich chemical information comes at the price of long data acquisition times (up to several days). This problem can be alleviated by compressed sensing (CS)—a method that revolutionized many fields of technology. It is known that CS performs the most efficiently when measured objects feature a high level of compressibility, which in the case of NMR signal means that its frequency domain representation (spectrum) has a low number of significant points. However, many NMR spectroscopists are not aware of the fact that various well-known signal acquisition procedures enhance compressibility and thus should be used prior to CS reconstruction. In this study, we discuss such procedures and show to what extent they are complementary to CS approaches. We believe that the survey will be useful not only for NMR spectroscopists but also to inspire the broader signal processing community.


Introduction
Nuclear Magnetic Resonance spectroscopy (NMR) is currently one of the most versatile techniques of chemical and physical analysis. Its range of applications is impressively broad: from analysis of small molecules structures in all states of matter [1], through characterization of complex natural mixtures [2,3], including applications to medical screening (metabolomics) [4] up to the biological studies of structure and dynamics of proteins and ribonucleic acids [5]. The introduction of Fourier transform (FT) in 1966 [6] became a cornerstone of modern NMR spectroscopy, which is based on a measurement of a free induction decay signal (FID) in a time domain. The FID is induced in a receiver coil of an NMR spectrometer by oscillating effective magnetization of nuclear magnetic moments polarized by an external magnetic field and excited by a radio frequency (RF) pulse. Importantly, the precession frequency is dependent not only on the nuclear magnetic moment and the induction of an external magnetic field but also on the electronic surrounding of a nucleus causing shielding or deshielding effect. Thus, the frequencies emitted by the sample are interesting for chemists that can deduct molecular structures from them. Formally, the precession frequency (ω) is dependent on the external magnetic field (B 0 ), shielding tensor (σ) and magnetic moment being a product of gyromagnetic ratio γ and spin vector I.
The FID signal s(t) typically takes a form of a sum of oscillatory decaying components: The number of components K is equal to the number of groups of nuclei differing in resonance frequency. Each component has its amplitude (A k ) and frequency (ω k ). The imaginary part of a frequency corresponds to a decay rate of a signal (transverse relaxation). The phase error φ stems from various experimental imperfections and is typically either constant or linearly dependent on the frequency (φ = φ 0 + φ 1 ω).
The concept of multidimensional data acquisition [7] opened way to measurement of N-dimensional FID signals that are functions of several time variables t 1 , t 2 , . . . , t N . Such signals are built of products of N components similar to those in Equation (2): where the k -index in ω k,l corresponds to the component's number and l corresponds to the dimension of the signal (there are N dimensions in total, one direct and other indirect). N-dimensional spectra contain useful information-not only about resonance frequencies ω k,l but also about interactions exploited to trigger excitation transfer between nuclei. This allows resolving the structure of a studied molecule, i.e., to determine which nuclei are connected by single or multiple chemical bonds (transfer via spin-spin couplings), which are close in space (transfer via dipole-dipole cross-relaxation) etc. [1].
However, the acquisition of multidimensional NMR data is very time-consuming, which is cumbersome due to high costs of NMR hardware maintenance, chemical instability of some samples [8] and processes occurring in them [3]. The problem of lengthy acquisition stems from combination of three facts. First, according to the Nyquist-Shannon sampling theorem [9], the sampling rate must be at least equal to the bandwidth of a signal. Secondly, the spectral resolution is proportional to the maximum time sampled [10]. Both requirements must be fulfilled in all spectral dimensions, which means that a number of data points grows exponentially with the dimensionality of a spectrum, reaching many thousands. Finally, every sampled data point in indirect dimensions (t 1 , t 2 . . . t N−1 ) is acquired as a separate, one-dimensional FID signal (s(t N )). The excited spin system needs to recover (or at least approach) its equilibrium state before next point is acquired, which usually takes up to a few seconds. Multiplied by several thousand indirect dimension points, this leads to even days-long NMR experiments.
Many methods have been proposed to alleviate the problem of lengthy sampling in multidimensional NMR experiments. Currently, the majority of them are based on the concept of sparse non-uniform sampling, where certain sampling points are removed from the sampling schedule and reconstructed mathematically based on various assumptions about the resulting spectrum. The assumptions may include: maximum entropy of a spectrum [11], presence of empty regions [12], minimum number of FID components [13] or minimum number of meaningful spectral points ("sparsity") [14][15][16]. The latter assumption is a central pillar of the compressed sensing (CS) method that conquered many branches of technology and science [17], including chemical sciences [18]. The sparsest spectrum is found by minimizing the penalty function which is a sum of two terms: the first measures its accordance with the measured data and the second expressed by the p -norm (0 < p ≤ 1) of the spectrum corresponds to the spectrum sparseness. The minimum can be found by algorithms like iterative soft thresholding [14,19] (IST) or iteratively re-weighted least squares [20] (IRLS). Approaches related to famous CLEAN method also resemble p -norm minimization [21]. The concise presentation of the CS theory can be found below in Section 2.
Other techniques include the projection spectroscopy based on co-sampling of several indirect time dimensions [22], covariance spectroscopy based on non-Fourier analysis of the conventionally sampled data [23], extrapolation of such data using linear prediction [24] or attempts to remove aliasing from sampling below Nyquist rate [25]. Although their effectiveness is also based on the compressibility of the spectrum, the relationship to compressed sensing is loose and thus they are out of scope of this study.
In this work, we will examine the relation between the sampling level and compressibility of a spectrum in the context of various NMR experiments. We survey the data acquisition and signal processing tricks that enhance the compression level and show, for the first time, that it is related to the amount of data required to obtain credible spectral reconstruction with CS methods. The relation, although stems from CS theory, has never been practically verified and demonstrated to the NMR community. On the other hand, readers from outside NMR field may get inspired by the experimental tricks enhancing the compressibility and use their analogues in different contexts. Thus, we find it beneficial to share lessons from NMR spectroscopy with experts of the broadly defined signal processing field.

Theory
Compressed sensing theory is based on the concept of sparse representation of a signal and compressibility of a signal. These two features depend on a chosen basis of a signals' vector space V. The basis v 1 , . . . , v n is usually referred to as a dictionary. The examples of dictionaries that on the one hand are often used by practitioners and on the other are covered by the CS theory are Fourier basis and wavelet basis [26,27]. As shown by Qu et al. [16], the former is more efficient in the case of NMR spectra. Given a dictionary v 1 , . . . , v n we say that a signal s ∈ V is k-sparse if it can be written as λ 1 v 1 + . . . + λ n v n where at least n − k λ j -coefficients are equal to zero. The set of k-sparse signal is denoted by Σ k . Please note that by simple combinatorics Σ k is a union of ( n k ) vector subspaces of V but it is not a convex subset of V.
Having the latter in mind let us formulate the central problem of CS theory. For simplicity of formulation, we shall consider the case of V = C n for which we fix one of the standard dictionaries (e.g., the Fourier dictionary is very useful in the NMR context). For a class of k-sparse signals s = (s 1 , s 2 , . . . , s n ) ∈ Σ k the CS theory determines the minimal number of the coordinates s j of the signal s required for the effective determination of the remaining ones. The effectivity requirement excludes the brutal search strategy, which is just checking all possible l-sparse signals with l ≤ k satisfying the measurement constraint and finding the sparsest one among them. Non-effectivity of this approach is recognized by estimating a time needed for this strategy to be implemented in case of the standard size signals, e.g., n = 512 and small sparsity, e.g., k = 10. The time required for solving such a problem by a brute-force would be of the order of hundreds of years (see e.g., an estimation in [28], p. 54). The CS theory provides a useful alternative for a brutal search strategy by replacing a non-convex problem (recall the non-convexity of Σ k ) by its convex version.
Before describing the CS approach in more details, let us note that the sparsity is undoubtedly too strong condition from the practitioner standpoint. Indeed, a signal acquired in a given experiment is contaminated with a measurement noise and the noisy part of a signal excludes its sparsity with respect to the standard dictionaries. Nevertheless, CS is still useful due to the approximate sparsity of a signal, in this work referred to as compressibility.
Qualitatively speaking a signal is compressible if it can be well-approximated by a sparse representation. This feature can be expressed quantitatively by means of the distance of the signal from the subset of the k-sparse signal Σ k . The measure of this distance can be chosen in many different ways. One possibility is to use the p -norm and define If V = C n and (e 1 , e 2 , . . . , e n ) is the canonical basis of C n then the norm · p,e i will be denoted · p : Expanding a signal s in a dictionary v 1 , . . . , v n s = λ 1 v 1 + . . . + λ n v n the best kth approximation is given by keeping k largest components from the coordinates (λ 1 , λ 2 , . . . , λ n ) and putting the others to zero. Compressed sensing provides a methodology for High probability refers to the fact that recovery of the signal provided by CS method is (approximately) exact with very high probability, i.e., a wrong recovery is possible but very improbable.
To describe the convexification of CS problem let us consider the function · 0,v i returning the number of non-zero components in the (v i )-expansion. This is often referred to as 0 -norm (actually not being a norm in the mathematical sense). Fix m ≤ n together with a subset J ⊂ {1, . . . , n} of cardinality m and for every j ∈ J fix s j ∈ C n . Using 0 -norm one formulates the main CS problem as follows: The CS theory is based on the fundamental insight that the 0 -problem as formulated in (5) can be replaced by its 1 -version at least for a certain class of basis [29]. To be more precise, the solution to the problem (5) coincides with the solution of (6) and for this to hold there must exist sufficiently k-sparse solution s, and the measurement matrix must satisfy so-called uniform uncertainty principle (UUP) also known as restricted isometry property (RIP). The measurement matrix A = (a ji ) j∈J,i∈{1,...,n} assigned to a basis (v i ) i∈{1,...,n} of C n and a fixed measurement schedule J ⊂ {1, . . . , n}, |J| = m is a matrix A with m rows and n columns such that for all j ∈ J We say that A satisfies uniform uncertainty principle for k-sparse vector if there is a constant δ > 0 (sufficiently small) such that for any λ = (λ i ) ∈ C n which has at least n − k zero coordinates, or in other words when λ 1 v 1 + . . . + λ n v n ∈ Σ k . If this condition is satisfied with sufficiently small constant δ and if the k-sparse solution exists then the solutions of (5) and (6) are equal. Restricted isometry property for A is desired by practitioners. For specific dictionaries, this condition can be highly probable in the sense that given the sparseness level k and a random measurement schedule J of size m, RIP holds with high probability for sufficiently large m. An example of a precise criterion in the case of Fourier basis was given in [26] where the authors proved that for then the random schedule J leads to the measurement matrix A which satisfies uniform uncertainty principle with probability 1 − O(n −ρ ). In particular, we can control the level of RIP-probability of the partial Fourier transform by choosing sufficiently large m, and the good news is that the number m of the measurement of the required signals, is linear in k up to the log(n) component. The above discussion considers the idealized case of a sparse vectors' recovery by CS method. However, the practice, in particular in NMR, immediately leads to non-sparse CS context. The first reason is the measurement error (noise) which forces the strict equality x j = s j in (6) to be replaced by an equality up to a certain error x j ≈ s j . Usually ≈ is expressed by 2 Moreover, in a certain areas of applied CS, for instance in NMR, the strict sparseness assumption must be replaced by the compressibility of signals even in the noiseless case. Indeed, the standard Lorentzian peaks present in an NMR spectrum have infinite support in frequency domain. These two facts justify the replacement of the strictly sparse CS problem (6) by its relaxed approximately sparse noisy version minimize where η reflects the level of the measurement errors. As proved within CS theory [30], the reconstruction algorithms are stable, i.e., for small error level and for the compressible vector s the solution of (10) is close to s. To be more precise this happens if the measurement matrix has the restricted isometry property-the error of the solution of 1 -CS problem is measured by σ k (x) 1,v i , see (4). In other words the k largest λ's in the expansion x = λ 1 v 1 + . . . + λ n v n will be recover with high accuracy, see [29]. CS in the NMR context is often applied for the recovery of the spectra consisting of peaks with significantly different amplitudes (e.g., an NMR spectrum may contain a dominating peak whose amplitude may be even four orders of magnitude larger than the amplitudes of other peaks, see NOESY spectra). In such a case, many points contributing to the bottom part of the large (Lorentzian!) peak can be "more significant" (have higher values) than smaller peaks and thus be reconstructed in the first place. From the spectroscopist point of view, the hierarchy of importance is opposite-the small, "off-diagonal" peaks in NOESY carry the most important information. Such non-linearity of the reconstruction is also the reason the signal-to-noise ratio (SNR) in NMR spectra reconstructed with CS is not informative [31,32]. Depending on the reconstruction parameters the apparent noise level can vary from zero (high sparsity enforced) to far higher than actual (too low sparsity enforced causing incomplete removal of "NUS artifacts"). This effect can be seen in all Figures below.
To summarize this part, the concept of approximate sparsity and its relationship with the amount of data to be measured is crucial in the context of NMR experiment. Thermal noise, dynamic range of peak intensities and their linewidths, the fact that signal is complex (but only real part is of interest) create a specific framework for the application of CS to reconstruct missing points in the FID signal.
Keeping this in mind, we move to the practical considerations-a bunch of Lessons about the effective use of CS in experimental NMR.

Lesson 1: Reduce the Number of Peaks
As can be seen from inequality (8), the number of sampling points required for an efficient CS reconstruction (m) is dependent on the number of important spectral points (k). In the language of NMR spectroscopy, k is, roughly speaking, the number of points contributing to peaks. Thus, experimental techniques reducing the number of peaks to the necessary minimum not only improve spectral resolution but also allow reconstruction of the spectrum with lower m and shorten the experimental time.
Many such techniques were proposed. "Pure-shift NMR" (PS-NMR) certainly belongs to the most spectacular improvements over the last decade [33,34]. The idea of PS-NMR is to remove the multiplet structure of NMR spectra by suppressing the effect of J-couplings i.e., interactions between nuclear spins transferred within the molecule via the nearby chemical bonds. The splittings of peaks caused by J-couplings can be informative but lead to the reduction of resolution and requirement of more sampling points for the proper CS reconstruction (since more peaks are present in a spectrum). With the splittings removed, less data points are required for the reconstruction (see (8)). The simulation from Figure 1 shows this effect. The fact that PS-NMR naturally fits to the compressed sensing reconstruction has been discussed extensively by Aguilar and coworkers [35]. Importantly, while the couplings between nuclei of the same kind (i.e., homonuclear) can be removed using selective echo pulse sequence block in both direct and indirect dimensions of an NMR spectrum, the pseudo-random NUS is feasible only in the latter case. In the direct dimension, pure-shift experiment can be performed by sampling "chunks" of an FID signal. However, as shown in several papers [36][37][38][39] such data can also be used as an input for CS algorithms, although the RIP (cf. Equation (7)) is worse and thus more sampling points are required. Interestingly, such "burst sampling" has been reported by some authors to have also certain advantages over other sampling schemes in the indirect dimensions [40].
Besides pure-shift approach, the number of spectral peaks can also be reduced by more selective coherence transfer in correlation experiments. The selectivity is achieved by adjusting the delay time ∆ in the coherence transfer block or additional coherence-selection delay. The coherence of spins coupled with J-constant evolves in an oscillatory manner, typically as sin n (π J∆) where n is the number of nuclei J-coupled to the nucleus from which the transfer starts. The classic example is multiplicity selection [41,42] which exploits the fact that coherence transfer between interacting nuclei A and X is dependent on the multiplicity (n) of a spin system AX n . For instance, CH groups can be selectively excited in a 2D 1 H-13 C Heteronuclear Single-Quantum Correlation spectrum (HSQC), making CH 2 and CH 3 peaks invisible. Similarly, one can make the transfer selective by exploiting the differences in J between various pairs of nuclei. In 3D HNCA, the basic experiment used to establish sequential connectivities in spectra of proteins, the excitation is initially transferred from amide hydrogens to amide nitrogens. Then, from each 15 N nucleus the transfer may go two-fold-to α carbon of the same (i) and preceding (i − 1) amino acid residue. This is caused by the fact that H N-C α coupling constants for both ways are similar (typically 11 Hz for N i -C αi and 7 Hz for N i -C αi -1 ) and ∆ can be set to average value in-between. However, the variants of the experiment with an exclusive transfer to one C α also exist and have been used in combination with non-uniform sampling [43], also due to better compressibility of the spectrum and thus the sampling. In this case, the sampling level turned out to be too low resulting in wrong reconstruction. A fully sampled (512 points) pure-shift experiment shows multiplets collapsed into the singlets (spectrum (C)). The corresponding reconstructed spectrum (D) obtained using the same sub-sampling scheme and reconstruction parameters as for (B) reveals to be of good quality. Reduced number of signal components (enhanced compressibility) allowed for reliable signal reconstruction using the same number of sampling points.
Another approach to achieve the reduction in a number of peaks in protein spectra was proposed by Dötsch and colleagues [44]. They modified a CBCA(CO)NH pulse sequence [45], to acquire a signal for amino acid types selected basing on topology. Only desired amino acid residues give signals in such spectra, which facilitates the analysis and allows efficient low-level non-uniform sampling [46].
Other compressibility-enhancing pulse sequence blocks allow the triggering of an exponential signal decay due to diffusion or relaxation and suppress signals selectively due to differences in the decay rate. The diffusion filter is based on the gradient echo block added to the standard NMR pulse sequence [47]. Used for mixtures of chemical compounds, it suppresses the signal from quickly diffusing smaller molecules (although it exists also in a reverse mode [48]). The somewhat opposite effect is achieved by a T 2 -filter (Carr-Purcell-Meiboom-Gill sequence block [49]) which suppresses signals from nuclei with short transverse relaxation times (typically belonging to larger molecules). Diffusion-filtering and multiplicity selection, as well as their effect on the required number of sampling points, are shown in practice in Section 3.6.

Lesson 2: Minimize Dynamic Range
The high dynamic range of amplitudes of signal components does not constitute a significant problem for CS reconstruction in the case of signals with strictly sparse representation. The real FIDs, however, contain noise and are represented by Lorentzian peaks in the Fourier domain, with their half-width being not negligible, but proportional to the signal decay rate. The consequence is an imperfect performance of CS algorithms which are usually based on an iterative deconvolution of a point spread function (PSF) from the spectrum (for the meaning of PSF in NMR context see [50][51][52][53]). For example, one of the most classical CS algorithms, the orthogonal matching pursuit (OMP), does this by seeking for the FT basis function (an "atom") giving the highest inner product with the FID. Then the approximation that uses only that atom is subtracted from the signal and the process is repeated. Other algorithms, like iterative (soft or hard) thresholding are based on a very similar concept [21]. The noise obviously disturbs the approximation and makes artifact removal less complete. Additionally, as mentioned in Section 2 above, the algorithm will rather tend to improve the bottom points at the sides of large Lorentzian peaks than reconstruct lower components. Thus, whenever possible, it is crucial to avoid high dynamic range of components in the spectra reconstructed with CS.
Very large and very tiny peaks are found together in spectra of mixtures of chemical compounds. It might happen, however, that some of the compounds are not interesting for the spectroscopist and can be suppressed in a spectrum. This is easy to achieve if interesting and non-interesting compounds differ significantly in the molecular size. As mentioned above, the diffusion and relaxation filters can be useful for this purpose. Unfortunately, the small and large peaks may be found even in the pure, single-compound samples. This is the case of spectra based on nuclear Overhauser effect (NOESY and ROESY). The additional difficulty arises from the fact that peaks intensities (especially those of small peaks) are the most informative parameters and thus must be reconstructed with high fidelity. This is troublesome, as series of tiny off-diagonal peaks are accompanied by huge (even 10 4 × larger diagonal peaks. To deal with this difficulty the diagonal-free NOESY experiments have been proposed [54] and shown to be particularly effective when combined with non-uniform sampling [55][56][57][58].
The Figure 2 shows this effect on a 1D cross-section along the indirect dimension of a simulated NOESY spectrum. Notably, reducing the dynamic range of a measured object is beneficial in CS applications other than NMR. In a nice example of identification of bacterial species in a mixture by a single Sanger-sequencing reaction [59], Amir and Zuk suggested taking a square root of the data to reduce the differences between "peaks".

Lesson 3: Pre-Processing
The reconstruction of missing points in NUS data from NMR experiments is one of the middle steps in the data processing workflow. This, usually, starts from the conventional procedures performed in the direct dimension: filtering, apodization, zero-filling, phasing, Fourier transform and baseline correction [60]. Importantly, before the CS reconstruction of the indirect dimension points is performed, one can apply procedures that will make the frequency representation more compressible. These include: removal of the imaginary part of a spectrum (virtual echo, VE) [61], removal of assumed modulation in the FID (virtual decoupling, VD) [62][63][64] and combination of in-phase and anti-phase (IPAP) complementary sub-signals [65,66].
The first trick, virtual echo, is based on the fact that phase of the signal in the indirect dimensions (φ in Equation (3)) is usually known apriori, and thus the signal can be phased before the reconstruction. Thus, imaginary part of the spectrum is not needed for CS procedure and can be removed. This is beneficial, as an imaginary part of the Lorentzian line has dispersive shape and is less sparse than absorptive real part. The effect is achieved by combining FID signal with its conjugate mirror-reflected "copy" in each dimension [61]. Alternatively, a similar effect can be achieved by modifying the minimized term in Equation (10) to calculate only the real part of x [67]. The Figure 3 demonstrates the concept of VE.
The virtual decoupling is the removal of the manifestation of scalar couplings, i.e., cosine modulation of the FID, by dividing the signal by the assumed cos(π Jt) function [63,64] or by equivalent modification in the algorithm [62]. The operation makes the spectrum sparser but is based on two rigorous assumptions. First, all FID components must share the same modulation (i.e., the same J). Secondly, division by zero must be avoided-either by regularization or by omitting zeros in the sampling schedule [68]. Contrary to virtual echo, the virtual decoupling is beneficial even in the case of fully sampled data, where no reconstruction is required. It leads to resolution and sensitivity enhancement as broad multiplets collapse into narrower and higher singlets. However, the requirement of constant J among all spin systems is rarely fulfilled. Typical examples are limited to adjacent carbons in 3D HNCA [64] (C α -C β coupling) or HC-CH TOCSY spectra [63] (coupling between carbon atoms belonging to methyl and a neighboring group).
The NMR spectra are sometimes combined from subsets, as is done in case of IPAP method [65,66]. Two experiments are recorded, first providing doublets in-phase and the other anti-phase. Then, they are added which cancels the doublet components with opposite sign. In standard FT processing, it does not matter whether the addition is performed on FIDs or on spectra. For CS, however, the former is more beneficial, since it makes the spectrum more compressible (reduces the number of peaks) before the reconstruction.
The reduction of the number of components by sample preparation, sophisticated signal excitation or pre-processing is not limited to NMR and can enhance CS reconstruction in other fields.
The inspiring examples can be found in CS video processing where common features of neighboring frames can be found by motion-estimation helping to enhance sparsity [69]. The sub-regions of interest in a reconstructed object can be also explicitly defined to reduce the number of significant points, as shown in the field of angiography [70]. Both signals were sub-sampled using the same sampling scheme of 48 random points. Importantly, a sampling scheme also undergoes the operation of VE in the same way as the signal. The missing points were reconstructed using 40 iterations of the CS-IRLS algorithm. The resulting spectra indicate that VE pre-processing leads to a better-quality spectrum (F). At this level of sampling, the spectrum reconstructed without VE pre-processing (C) suffers from characteristic phase distortions (see black arrows in the corresponding panel). The dotted line in panels C and F shows the real part of the fully sampled spectrum.

Lesson 4: Match Sampling with the Decay
The vital aspect of NUS in the indirect dimensions of an NMR experiment is that the sampling schedule can be completely arbitrary. For example, its density can be modulated according to the assumed function. Besides J-modulation mentioned above, the useful trick is to avoid large gaps in the sampling schedule [50,71] by approaches known from other fields like jittered sampling [72] or Poisson-disk sampling [73]. Some authors show that gaps should be avoided at the beginning and at the end of a signal [74].
The oldest and most commonly applied modulation of NUS density is relaxation-matched sampling introduced by Barna et al. [75]. The gains on signal-to-noise ratio (SNR) have been analyzed in detail by other authors [31,32,76]. The reason for SNR improvement is quite simple-an FID signal decays exponentially in time, while noise level remains constant. Thus, initial sampling points have higher SNR. The measurement sensitivity aspect of the problem is simple. However, the situation becomes more complicated when analyzed from the point of view of CS theory. It can be shown that although relaxation-matching improves SNR, it worsens restricted isometry property of the measurement matrix [77]. Thus, there is certain balance between the two effects. This fact is also connected to the observation that CS works more efficiently for the interpolation of the data rather than extrapolation [78].
Interestingly, the sensitivity benefit from relaxation-matched CS can be so strong that it can even lead to results better than fully sampled experiment acquired in the same time. The simulation in Figure 4 shows this effect.
In many applications outside NMR field, the sampling density can be adjusted "on-the-fly", i.e., during the measurement. Such adaptive sampling is well established in image processing where sampling density is a function of local image variance [79,80]. In NMR spectroscopy such examples, although feasible, are still lacking.  Figure 4. A simulation illustrating the benefit of relaxation-matched non-uniform sampling on signal reconstruction. A signal of 1024 points length containing 2 components of equal amplitudes (A) was artificially contaminated with a white noise such that peaks in a corresponding spectrum (B) were hardly visible. A blue spectrum imposed in (B,D,F) is obtained from the noiseless signal (A) to mark the correct positions of the hidden peaks (for better visualization, the peak intensities in blue spectra are normalized to half-intensity of the maximum peak in the corresponding black spectrum). The same 2-component signal was sub-sampled to 256 random points (C), and 256 points selected according to the relaxation-matched probability (E). A continuous black line in (C,E) stands for the full signal, whereas red markers correspond to sub-sampled points. Both sub-sampled sets of points were used for reconstruction using 40 iterations of the IRLS algorithm. Importantly, the sub-sampled signals (C,E) were injected into a noise being 2 times lower than for signal A. This is due to a fact that 256 points can be acquired with 4 times more scans keeping the same total experimental time, thus SNR of the acquired samples will be 2 times higher. A reconstructed spectrum (D), obtained from random non-uniform sampling strategy (C) shows no improvement, while the spectrum (F) obtained from a relaxation-matched non-uniform sampling strategy (E) indicates a significant improvement of the visibility of peaks. As described above in the text, the relaxation-matched sampling (E) strategy leads to better results in such cases as more samples are collected for the initial part of the signal, where SNR is higher.

Lesson 5: Non-Stationarity
The parameters of typical NMR signals (frequencies, amplitudes, relaxation rates) do not vary in time when measured for the stable samples. Sometimes, however, at non-stationary conditions, the frequency in some spectral dimensions varies in time [81]. Since sampling of the indirect dimensions can be arbitrary (e.g., pseudo-random), the effective frequency-time dependencies can be complicated. As demonstrated [8,81], the FID frequency varying in time leads to lineshape distortions in the case of "chronological sampling", i.e., (t = 0, ∆t, 2∆t, 3∆t . . .) and noise-like artifacts in the case of "shuffled" sampling. Figure 5 shows that NUS of a non-stationary signal lead to spectral quality even better than fully sampled data. This can be explained by the fact that frequency variations within an FID, occurring, e.g., due to chemical reaction in the sample, are reduced due to shorter time needed for data collection. This means that compressed sensing should be the method of choice for samples whose state varies over the time of experiment.  [81,82]. Importantly, the best spectrum is obtained with 12.5 % sampling (E, far better than with full sampling A). All the NUS data sets, except of 100% NUS, were reconstructed with 40 iterations of CS-IRLS algorithm and their corresponding spectra are plotted in black.
In the scientific literature, one can find several interesting examples of application of NUS/CS in NMR spectroscopy for monitoring of the physical/chemical processes. The monitoring of processes involving biomolecules (e.g., proteins) is particularly interesting. In a paper by Bermel et al. [83], the NUS NMR experiment accompanied by CS reconstruction was successfully applied to monitor protein dynamics in a function of temperature. The application of NUS and CS in their work allowed to precisely track peak positions and intensities during sample heating. The reactions occurring in complex mixtures were also extensively investigated by NUS/CS NMR spectroscopy [3,84]. Usually, such mixtures require at least 2D NMR experiment to resolve important peaks in a spectrum, which makes monitoring troublesome using conventional sampling. One may also apply NUS/CS NMR spectroscopy to track different chemical reactions when a good temporal resolution and the benefits provided by 2D NMR experiments are required [85,86].
Obviously, the fact that NUS/CS experiments are quick compared to full sampling provides benefits in disciplines other than NMR. A good example has been discussed by Vasanawala et al. [87], where authors applied CS in pediatric MR imaging. Since children are rarely able to stay still during the measurement, the undersampled data is often of better quality (resolution) despite the need for reconstruction.

Practical Example
In this section, we finally move to the experimental example of a 2D HSQC spectrum acquired with NUS and reconstructed using CS (see Figure 6). The 2D HSQC is one of the main workhorses of structural identification, acquired in huge numbers in the most of NMR labs. Thus, optimization of its speed and quality is important.
As described in Section 3.1, NMR offers a variety of different pulse sequence "blocks" that can reduce the number of observed peaks in a spectrum to the necessary minimum and hence increase its compressibility. In this section, we verify the relation between the number of peaks in a spectrum (compressibility) and the reconstruction quality. For that purpose, we acquired 3 variants of the 13 C HSQC experiment characterized by a different number of peaks in a spectrum. The acquired 2D NMR signals for each HSQC variant were artificially sub-sampled by taking out random points from the full data in the t 1 ( 13 C) dimension and reconstructed back to the original size. The reconstructed spectra from the corresponding sub-sampled HSQC experiments are depicted in Figure 6: standard unedited 13 C HSQC (Figure 6b), 13 C HSQC with CH-only editing (Figure 6c) and 13 C HSQC with CPMG filter (Figure 6d). A fully sampled, unedited 13 C HSQC spectrum is also depicted in Figure 6a) and stands as a quality reference for the reconstructed spectra (Figure 6b-d). The 13 C HSQC NMR experiments used in this study employ the same core HSQC pulse sequence [88], which allows observing single-quantum 1 H-13 C correlation signals. The use of appropriate filters (multiplicity-editing- Figure 6c and CPMG- Figure 6d) to the core HSQC sequence (Figure 6a,b) allowed us to reduce the number of components in the signal. The filters were chosen concerning the physicochemical properties of the substances being measured. The sample used for experiments was a mixture of sucrose and heparin dissolved in D 2 O. Both compounds are saccharides, but their molecular weights (MW) differ significantly, as heparin is a polysaccharide of MW in the range from 6000 up to 20,000 g/mol, while sucrose is a disaccharide of MW = 342.3 g/mol. We used this fact to suppress signals of fast-relaxing nuclei belonging to large heparin molecules by means of CPMG relaxation-filter (Figure 6d). We also used the fact that structures of sucrose and heparin consist mainly of CH and CH 2 chemical sites, which are the source of 1 H-13 C single-quantum correlation signals. We employed the multiplicity-editing block to suppress signals that arise from CH 2 chemical sites, thus, only the signals corresponding to CH sites were visible. The unedited 13  The benefits of using editing and filtering "blocks" in 2D NUS NMR experiments can be found through the comparison of stacked spectra in Figure 6. The numerous 1 H-13 C correlation signals in the unedited 13 C HSQC spectrum ( Figure 6a) were poorly reconstructed using 24 out of 256 t 1 sub-samples ( Figure 6b). The effect is visible on heparin signals near 3.55/75.0 ppm and 3.65/80.0 ppm (marked with the black arrows in Figure 6). A reduction of the number of peaks in a spectrum allowed for more reliable reconstruction using the same 24-points sampling level for 13 C HSQC with CH-only editing ( Figure 6c) and 13 C HSQC with CPMG filter (Figure 6d).  Figure 6. A reference unedited 13 C HSQC spectrum with conventional sampling of 256 t 1 ( 13 C dimension) × 3348 t 2 ( 1 H dimension) points matrix (a) and the reconstructed spectra obtained using only 24 t 1 sub-samples from corresponding experiments: unedited 13 C HSQC (b), 13 C HSQC with CH-only editing (c), 13 C HSQC with CPMG filter (d). The missing data for (b-d) was reconstructed with IRLS algorithm based on CS using 40 iterations. The virtual-echo method was applied in all the reconstructions. The processing was performed using mddnmr software [89]. The concentration of each compound in a sample was adjusted to yield similar peak heights in the 1 H NMR spectrum (ca. 0.6 mg/mL of sucrose, and 14.6 mg/mL of heparin).

Conclusions
Due to high maintenance costs of high-resolution NMR spectroscopy, it is beneficial to apply sparse sampling techniques in multidimensional measurements and save the experimental time. However, since the number of sampling points required for the efficient reconstruction grows with the "complexity" of a spectrum (number of peaks and dynamic range of intensity) it is recommended to minimize it before the CS reconstruction. This can be achieved using dedicated acquisition and processing techniques. It is also noteworthy that in some cases, like strongly decaying or non-stationary signals the sparse sampling followed by the reconstruction leads to results superior to full sampling followed by Fourier transform. In this work we summarized those, often unnoticed, aspects of compressed sensing in NMR.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: