This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

One of the most significant challenges in the comparative analysis of Nuclear Magnetic Resonance (NMR) metabolome profiles is the occurrence of shifts between peaks across different spectra, for example caused by fluctuations in pH, temperature, instrument factors and ion content. Proper alignment of spectral peaks is therefore often a crucial preprocessing step prior to downstream quantitative analysis. Various alignment methods have been developed specifically for this purpose. Other methods were originally developed to align other data types (GC, LC, SELDI-MS,

Although Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful analytical tool for quantitative metabolomics profiling, one of the aspects that hamper robust differential analysis is the fact that the resonance frequencies of peaks can undergo shifts. A variety of factors, often related to an imperfect control of experimental conditions, contribute to inconsistent peak shifts, including physicochemical interactions and differences in pH, temperature, background matrix or ionic strength [

A simple and popular solution to extract intensities from multiple spectra prior to comparative analysis is spectral bucketing or binning. Binning consists of dividing the spectra into small buckets (typically 0.04 ppm), which are ideally large enough to encompass peak shift variations [

Hypothetical examples of how binning addresses peak shifts. (

To deal with these problems, various improvements to the binning approach have been developed. For example, instead of using fixed bin sizes, Davis

The solution to process and compare spectra with peak shifts consists of peak alignment. A number of peak alignment approaches have been specifically developed for NMR spectroscopy. Other methods were originally proposed for similar data, such as LCMS or GCMS spectra. In this review we discuss and compare the available peak alignment methods that are directly and without special adaptation applicable to NMR spectra.

NMR spectrum alignment is a process to correct for variations in the position of peaks across NMR spectra, by introducing a series of shifts that individual data points undergo. The process is illustrated in

Example of a spectral region before and after alignment using CluPA [

The calculation of the optimal set of shifts to align spectra is computationally non-trivial, and a number of choices need to be made along this process, each with repercussions on the final outcome. A list of NMR alignment methods is presented in

While most NMR alignment methods work directly on the data points of the spectra, some approaches work with representatives of the spectra instead, by first converting spectra into peak lists. The size of such a list of peaks extracted or “picked” from the spectrum is much smaller than the original spectrum. This improves computational performance of the subsequent alignment, or allows for computationally more demanding techniques to be used.

List of methods and their features.

Short Name | Full Name | Reference | Technique | Target Function | Peak Picking? | Number of Parameters | Original Applied Data | Segment-Wise? | Pair-Wise? | Correction Method | Software |
---|---|---|---|---|---|---|---|---|---|---|---|

PLF | Partial Linear Fit | [ |
Segmentation model by consecutive peaks distances less than window size D | Sum of squared differences in intensity | No | 2 (window |
1D NMR | Yes | Yes | Shift | NA |

COW | Correlation Optimized Warping | [ |
Dynamic programming | Pearson correlation coefficient | No | 2 ( |
Chromatograpic data | yes | Yes | Insert and deletion | (1) |

PAGA | Peak alignment by genetic algorithn | [ |
Genetic Algorithm | Pearson correlation coefficient | No | 6 - Based on GA (normalize geometric ranking |
1D NMR | Yes | Yes | Shift & Insert and deletion | NA |

PARS | Peak alignment using Reduced Set | [ |
Breadth first search (BFS), Dynamic Programming (DP), complexity reduced dynamic programming (crDP) | Euclidean distances | Yes | 2 (search window size, mismatch weight) | 1D NMR, Gas Chromatography | No | Yes | Shift | (+) |

DTW | Dynamic Time Warping | [ |
Dynamic programming | Squared Euclidean distance | No | 2 ( |
Chromatograpic data | No | Yes | Insert and deletion | (1) |

PABS | Peak alignment by Beam search | [ |
Beam search algorithm | Pearson correlation coefficient | No | 3 (ranges of segment number, sideway movement and interpolation) | 1D NMR | Yes | Yes | Shift & Insert and deletion | (+) |

PAPCA (*) | Peak alignment by PCA | [ |
Principle Component Analysis | CORREL | No | 1 (correlation threshold 0.8) | 1D NMR | No | No | Shift | (+) |

PTW | Parametric Time Warping | [ |
Global polynominal model | Root mean squared (RMS) | No | 1 (degree of polynomial warping function) | Chromatograpic data | No | Yes | Polynominal model | (2) |

PAFFT | Peak alignment by FFT | [ |
FFT + segmentation model by equal size segments | FFT cross-correlation | No | 2 (segment size: |
Chromatograpic data | Yes | Yes | Shift | (3) |

RAFFT | Recursive alignment by FFT | [ |
FFT + Recursive segmentation model from global to local | FFT cross-correlation | No | 1 (max. allowable shift) | Chromatograpic data | Yes | Yes | Shift | (3) |

SpecAlign | NA | [ |
Sliding windows | Minimal matched peak distances | No | 1 (window size |
Mass Spectrometry | No | Yes | Insert and deletion | (3) |

FW | Fuzzy Warping | [ |
Fuzzy logic for matching most intense peaks | Maximize fuzzy membership Gaussian function | Yes | 1 (the number of most intense peaks) | 1D NMR | No | Yes | Insert and deletion | (4) |

GFHT | Generlized Fuzzy Hought Transform | [ |
Hough transform | Hough score | Yes | 3 (expansion factor alpha, step size, lower vote threshold) | 1D NMR | No | No | NA | NA |

RSPA | Recursive segment-wise peak alignment | [ |
Recursive segmentation model | FFT cross-correlation | Yes | 6 (peak height threshold, splitting threshold, min. segment size, validation of segment alignment, max. allowable shift, alignment acceptance) | 1D NMR | Yes | Yes | Shift & Insert and deletion | (+) |

PCANS | Progressive Consensus Alignment of NMR Spectra | [ |
Segmentation model+Dynamic programming + progressive consensus alignment | Scoring by similarity between peaks calculated by height, half height and position of peaks | Yes | 5 |
1D NMR | Yes | No | Shift | (5) |

BAA (*) | Bayesian approach for alignment | [ |
Bayesian modeling | Bayesian estimation | No | 3 (noise variance, two parameter values in diagonal entries of diagonal covariance matrix) | 1D NMR | No | Yes | Polynomial model | NA |

icoshift | interval correlation shifting | [ |
Segmentation model by equal size segments or manually selecting segments | FFT cross-correlation | No | 2 (the number of intervals or the length of interval |
1D NMR | Yes | Yes | Shift & Insert and deletion | (6) |

CluPA | hierarchial Cluster-based Peak Alignment | [ |
Segmentation model by hierarchical clustering | FFT cross-correlation | Yes | 1 (max. allowable shift) | 1D NMR | Yes | Yes | Shift | (7) |

(*): This name is not from the authors, but assigned by us for convenience; NA: The software implementation information is not available; (+): The implementation of the algorithm can be requested from the authors; (1):

A general framework of Nuclear Magnetic Resonance (NMR) spectrum alignment methods. The stacked blocks with white background represent possible methodological variations.

Many effective and advanced peak picking algorithms are available. In all cases, accurate peak detection is required to build a quality alignment. For a discussion and comparison of peak picking methods we refer the reader elsewhere [

In general, peak lists are used to compute how individual data points of each spectrum should be shifted to optimally align all input spectra. First, the extracted peak lists of different spectra are compared to find corresponding peaks. To align these peaks, a set of shifts is computed, which are subsequently applied to the intact spectra. Methods differ in how they find corresponding peaks and their regions, in how shifts are computed, and in how they are applied.

The first example is PARS [

In FW [

PCANS [

RSPA [

GFHT [

The major advantage of an intermediate peak-picking step is the reduced data size. Consequently, these methods are generally faster than methods working on whole spectra, like COW [

A second criterion by which we can classify alignment approaches is the fact whether a reference spectrum is needed or not. In pairwise methods, a reference spectrum is selected to which all the other spectra are subsequently aligned. With inter-sample methods, all samples are taken into account for the alignment.

Most NMR alignment methods are based on pairwise approaches, which are generally less complex. In pairwise methods, a reference spectrum is selected or created first. Other spectra are aligned to this reference one by one. The reference spectrum should be representative for the whole dataset and ideally contains all peaks of interest. Due to its strong impact on the ultimate alignment, a number of reference selection approaches have been proposed. There are generally two reference types. Either the reference is a virtual spectrum that is artificially created from the dataset, or the reference is a directly selected spectrum from the dataset.

A virtual reference spectrum can be built in different ways. The reference spectrum may be a median or average spectrum constructed from the dataset [

Alternatively, users can select the reference from multiple trials. In FW [

Even though reference-based approaches are relatively simple and popular, there are some disadvantages. Due to the variability between spectra not all important peaks are present in all individual spectra and thus in the selected reference. Significant differences may exist between spectra depending on the group they belong to. The quality of the results therefore depends on the selected reference spectrum.

Although most alignment methods follow pairwise approaches that depend on a reference, there are a few that can do alignment without a reference.

PAPCA [

By considering the whole spectral data as an image, GFHT [

A third inter-sample method is PCANS [

The next distinction between different alignment workflows can be made according to whether they align complete spectra or smaller segments.

A first group of methods, considers the whole spectra for alignment. For example PTW [

These methods usually get slow when the size of the spectra increases. To address this performance problem, a class of methods was developed to divide spectra into smaller corresponding segments, to which the alignment is subsequently applied. PLF [

Alignment is an optimization problem, in which a set of parameters needs to be estimated. A typical factor in optimization techniques is the “target function”, which is the criterion by which candidate or partial solutions are evaluated throughout the alignment process. Even though different NMR alignment methods have different underlying principles, they often use similar optimization criteria. A common criterion is Pearson correlation coefficient, which can be maximized between segment pairs [

After finding the corresponding points or segments in spectra, the alignment methods need to correct the misalignment. A first class of methods uses stretching/compression (or insertion/deletion) to correct the misalignment spectra. Usually, stretching/compression is done by a linear interpolation to fit the corresponding segments in the reference [

After alignment, the aligned spectra need to be evaluated to assess the quality of alignment methods. Below we discuss different levels of evaluation of aligned spectrum sets.

Visualization is a powerful approach to rapidly assess the properties of a dataset, and in the context of this review, to evaluate the quality of an alignment procedure. We can visualize a number of relevant features in a few different ways, as illustrated in

A good alignment usually leads to an increased correspondence between spectra. Inter-spectrum similarity is thus a useful criterion for the evaluation of alignments. The most popular approach to evaluate inter-spectrum similarity consists of comparing average Pearson correlation coefficients of spectra before and after alignment [

Examples of evaluation by using visualizations for a region in the Wine data [

Principle Component Analysis (PCA) is a technique to project high-dimensional data into linearly uncorrelated vectors or principle components, in such a way that the first component represents the majority of the variance in the data, with subsequent components representing decreasing variance. PCA is a natural way to express data and discover data patterns based on their similarities. The fact that features of PCA before and after alignment are different can be used for evaluation of alignment in several ways. Vogels

In studies where metabolome profiles are used to compare or classify different sample classes, the classification accuracy itself gives an indication on alignment quality. A good alignment should improve the accuracy. In general, any classifier that is used for classifying metabolome profiles can be used for this purpose, for example SVM [

Instead of using classifiers as a black box, we can also evaluate alignment according to specific properties derived from a classification model. For example, a back-scaled loading coefficients of an OPLS classifier has been used [

There are a number of other metrics to evaluate alignment quality. One is the relative standard deviation of peak intensity as in GFHT [

Comparing the speed of all NMR alignment methods is not trivial, since the computational time of some algorithms depends on parameter setting. For example, searching in PABS [

SpecAlign [

Most NMR alignment methods rely on a set of user-defined parameters. Optimizing these parameters is a challenge for most users. Different data sets may require different parameter settings. In practice, most users try a few parameter sets and select the set that yields the best result, without a guarantee that they selected the best possible set of parameters. The more parameters a method requires, the more complicated and difficult it becomes to use. Consequently, some methods attempt to reduce the number of user-set parameter without sacrificing (as much as possible) the performance of the alignment. An overview of the numbers of parameters of several algorithms is presented in

1D ^{1}H-NMR is used in the majority of NMR-based metabolome profiling studies because it is a fast approach to determine the biomolecular constituents of a sample. When working on complex biological samples however, there is often significant overlap between different signals. 2D NMR, usually in the combination ^{1}H-^{13}C, is a good alternative to overcome this problem. 2D NMR improves the understanding of the structure of an organic compound, but it is also affected by peak shifting problems. Since 1D NMR alignment methods cannot be applied directly on 2D NMR data, dedicated 2D NMR alignment methods are needed. Only a few methods are available for this type of data. Binning can be used to compare imperfectly aligned 2D NMR datasets, but has the disadvantages discussed earlier for 1D NMR. Since 2D NMR datasets can be considered as images, image-processing techniques from the computer vision field could be applied for finding matching points in the image. However, applying them to high-resolution 2D NMR data remains a challenge. Zheng

NMR spectrum alignment remains a difficult problem for which there is no golden standard solution. For example the problem of peak order changes mentioned by Csenki

TNV acknowledges support by a BOF interdisciplinary grant of the University of Antwerp.

The authors declare no conflict of interest.

^{1}H NMR spectral peaks of metabolites in huge sets of urines

^{1}H NMR spectra for improved metabolic biomarker recovery

^{1}H NMR spectroscopic metabonomic studies

^{1}H NMR data sets

_{2}toxicity