The Use of Infrared Spectroscopy for the Quantification of Bioactive Compounds in Food: A Review

Infrared spectroscopy (wavelengths ranging from 750–25,000 nm) offers a rapid means of assessing the chemical composition of a wide range of sample types, both for qualitative and quantitative analyses. Its use in the food industry has increased significantly over the past five decades and it is now an accepted analytical technique for the routine analysis of certain analytes. Furthermore, it is commonly used for routine screening and quality control purposes in numerous industry settings, albeit not typically for the analysis of bioactive compounds. Using the Scopus database, a systematic search of literature of the five years between 2016 and 2020 identified 45 studies using near-infrared and 17 studies using mid-infrared spectroscopy for the quantification of bioactive compounds in food products. The most common bioactive compounds assessed were polyphenols, anthocyanins, carotenoids and ascorbic acid. Numerous factors affect the accuracy of the developed model, including the analyte class and concentration, matrix type, instrument geometry, wavelength selection and spectral processing/pre-processing methods. Additionally, only a few studies were validated on independently sourced samples. Nevertheless, the results demonstrate some promise of infrared spectroscopy for the rapid estimation of a wide range of bioactive compounds in food matrices.


Infrared Spectroscopy
Infrared (IR) spectroscopy is a well-established tool in analytical chemistry, offering a non-invasive, non-destructive and rapid means of assessing the chemical composition of a wide range of sample types. For the purposes of analytical spectroscopy, the infrared spectrum can be divided into three main regions: near-infrared (NIR; 750-2500 nm), midinfrared (MIR; 4000-400 cm −1 ) and far-infrared (400-10 cm −1 ; rarely used in the food analysis sector). Historically, NIRS has been and continues to be utilised more than MIRS in the food industry due to its lower cost, greater penetrative power (i.e., lower absorption by the sample) that allows for more representative sampling [1] and reduced sample preparation times [2]. Wavelengths in this NIR region are absorbed due to the overtone and combination bands of IR-active bonds, rather than their fundamental tones.
Compared to other analytical methods, the main advantages of IR spectroscopy are its speed, relatively low price of the instrument, and the fact that it is typically non-destructive and non-invasive, lowering or eliminating sample preparation time [3,4]. Furthermore, IR spectroscopy is highly sensitive, requires a small amount of sample and allows users to analyse samples from a wide variety of matrix types, including solids, powders, films, gels, liquids and gases [3], and does not produce any waste [5]. Conversely, the challenges involve interpreting spectra from complex mixtures and the need to create and maintain robust calibration models for quantitative analysis [3]. Briefly, a robust model refers to one which can be used year-after-year without losing accuracy over time, or when applied to different population groups (e.g., different varieties, different geographic locations).
In addition, IR spectroscopy-particularly NIRS-is best suited for the analysis of macroconstituents (usually those present at concentrations of~0.5% or higher). Below this concentration range, it is difficult to separate out the signal of the analyte from the rest of the spectral peaks. In many cases reporting the detection of analytes at much lower concentrations, it is likely that NIRS is actually detecting a different analyte present at macro-levels-the concentration of which is correlated with the targeted analyte. This is known as a secondary, or surrogate, correlation [4]. In many cases, this correlation may be unavoidable due to both analytes absorbing in similar regions [6]. In other situations, it may be the only way through which IR spectroscopy can be used to estimate the microconstituent concentration. The use of such secondary correlations is acceptable in many cases-as long as the correlation holds true for all samples analysed. Some publications have reported that these correlations may change between different sample populations or harvest years [6], which may explain the poor performance of independent test sets found in some studies using IR spectroscopy for the analysis of microconstituents.
Despite these limitations, the speed and cost-effectiveness of IR spectroscopy have led to its adoption across many sectors of the food industry. This review focuses on the application of IR spectroscopy (both MIR and NIR) for the quantitative assessment of bioactive compounds in foodstuffs. It concludes with a contemporary perspective on the future of IR spectroscopy for the analysis of bioactive compounds in the food industry and highlights key areas where further research is required.

Key Absorbance Peaks in the MIR and NIR Regions
As previously mentioned, one of the major challenges of working with IR spectroscopy is interpreting the spectra. To aid researchers in this process, this section provides some information on the aetiological functional groups responsible for observed peaks at different locations in the MIR and NIR regions.
The peak locations of some MIR bonds of particular importance for food analysis are provided in Figure 1 and Table 1. Additionally, the absorption bands in the NIR region are shown in Figure 2. The NIR region contains overtones, meaning that absorption peaks from a single bond occur repeatedly throughout the NIR spectrum, at different levels of attenuation ( Figure 2). In addition, combination bands can occur in the far-NIR region (<2000 nm) when two or more fundamental vibrations are excited simultaneously [7].

Sample Presentation
In order to gain an accurate assessment of the sample matrix using infrared spectroscopy techniques, it is essential that the portion of the sample that the instrument "sees" is representative of the whole sample. Furthermore, due to the wide range of sample types which can be analysed using IR spectroscopy (such as solids, liquids, films, gels and powders), there are a variety of sample presentation methods that have been adopted for IR spectroscopy.
Perhaps the simplest form of sample presentation is the full transmittance mode (180° light-sample-detector). This is also the only method for which the Beer-Lambert law holds true. In this presentation mode, the IR light enters one side of the sample and some wavelengths are absorbed by the sample, with the remaining light measured as it exits the other side of the sample. As long as the length of the light path is sufficiently low, transmittance mode ensures that the emitted light has an opportunity to interact with nearly all of the analytes present in the light path. Consequently, it is usually quite representative of the true matrix composition. However, it is only suitable for analysing relatively thin samples due to the high level of absorbance in aqueous-based matrices. As shown by Beer-Lambert's law, increasing the light path length will proportionally increase the absorbance, making it more difficult to detect the signal of the resultant spectra. For example, a path length of only a few millimetres is often required when using transmittance NIR spectroscopy for the analysis of aqueous-based solutions. Due to path length limitations, the use of transmittance spectroscopy for the analysis of solid or powder substances can be more challenging compared to reflectance modes; however, analysis of whole fruits is possible using higher incidence light intensities and more sensitive detectors [11,12].

Sample Presentation
In order to gain an accurate assessment of the sample matrix using infrared spectroscopy techniques, it is essential that the portion of the sample that the instrument "sees" is representative of the whole sample. Furthermore, due to the wide range of sample types which can be analysed using IR spectroscopy (such as solids, liquids, films, gels and powders), there are a variety of sample presentation methods that have been adopted for IR spectroscopy.
Perhaps the simplest form of sample presentation is the full transmittance mode (180 • light-sample-detector). This is also the only method for which the Beer-Lambert law holds true. In this presentation mode, the IR light enters one side of the sample and some wavelengths are absorbed by the sample, with the remaining light measured as it exits the other side of the sample. As long as the length of the light path is sufficiently low, transmittance mode ensures that the emitted light has an opportunity to interact with nearly all of the analytes present in the light path. Consequently, it is usually quite representative of the true matrix composition. However, it is only suitable for analysing relatively thin samples due to the high level of absorbance in aqueous-based matrices. As shown by Beer-Lambert's law, increasing the light path length will proportionally increase the absorbance, making it more difficult to detect the signal of the resultant spectra. For example, a path length of only a few millimetres is often required when using transmittance NIR spectroscopy for the analysis of aqueous-based solutions. Due to path length limitations, the use of transmittance spectroscopy for the analysis of solid or powder substances can be more challenging compared to reflectance modes; however, analysis of whole fruits is possible using higher incidence light intensities and more sensitive detectors [11,12].
One variation of the full transmittance mode is partial transmission spectroscopy, also known as interactance spectroscopy. This refers to the mode where the infrared light is partially transmitted through the sample matrix, before being detected by another sensor at the matrix surface, but located adjacent to the source. These instruments utilise a physical barrier between the light source and detector to prevent the detector from receiving any IR light reflected from the sample surface (see Figure 3). The benefits of this method are a reduced path length compared to full transmittance mode, and increased interaction between the IR light and the sample matrix compared to reflectance geometry.
Reflectance mode is one of the most commonly used presentation modes in IR spectroscopy applications, particularly for NIRS. In this mode, the infrared light enters one side of the sample and interacts with the sample matrix as it penetrates into the sample. The majority of non-absorbed light is then reflected back to the surface of the sample, where it is detected by the instrument sensor. Some non-absorbance scattering of the IR light can also occur, which can bias the resultant spectra. One of the main advantages of reflectance mode is its one-dimensionality (i.e., the instrument only needs access to the sample surface in one location, as opposed to transmittance spectroscopy where both sides of the sample must be accessible), allowing it to be used in a much broader range of applications compared to transmittance spectroscopy. However, it is reliant on the assumption that the composition of the surface material is representative of the entire sample matrix [4].
Within the food sector, reflectance NIR spectroscopy is widely reported in publications for the analysis of horticultural produce [13,14] and in the grains industry [15,16]. There are no commercial instruments designed to use this geometry mode for the analysis of whole fruits, as fruit skin composition (e.g., thickness, starch/fibre content, chlorophyll content) can change in populations from year to year, depending on other variables such as rainfall, amount of sunlight, application of fertiliser, etc. In turn, this variability in skin composition would interfere with the NIR spectra and reduce the robustness of the model, which is designed to only measure the internal composition of the fruit. However, reflectance NIR spectroscopy is commonly used for the analysis of ground grain products in industry/commercial settings, as the surfaces of these samples are generally quite representative of the entire sample.
is partially transmitted through the sample matrix, before being detected by another se sor at the matrix surface, but located adjacent to the source. These instruments utilise physical barrier between the light source and detector to prevent the detector from recei ing any IR light reflected from the sample surface (see Figure 3). The benefits of th method are a reduced path length compared to full transmittance mode, and increase interaction between the IR light and the sample matrix compared to reflectance geometr Reflectance mode is one of the most commonly used presentation modes in IR spe troscopy applications, particularly for NIRS. In this mode, the infrared light enters on side of the sample and interacts with the sample matrix as it penetrates into the sampl The majority of non-absorbed light is then reflected back to the surface of the sampl where it is detected by the instrument sensor. Some non-absorbance scattering of the I light can also occur, which can bias the resultant spectra. One of the main advantages reflectance mode is its one-dimensionality (i.e., the instrument only needs access to th sample surface in one location, as opposed to transmittance spectroscopy where both sid of the sample must be accessible), allowing it to be used in a much broader range of ap plications compared to transmittance spectroscopy. However, it is reliant on the assum tion that the composition of the surface material is representative of the entire samp matrix [4].
Within the food sector, reflectance NIR spectroscopy is widely reported in public tions for the analysis of horticultural produce [13,14] and in the grains industry [15,16 There are no commercial instruments designed to use this geometry mode for the analys of whole fruits, as fruit skin composition (e.g., thickness, starch/fibre content, chlorophy content) can change in populations from year to year, depending on other variables suc as rainfall, amount of sunlight, application of fertiliser, etc. In turn, this variability in sk composition would interfere with the NIR spectra and reduce the robustness of the mode which is designed to only measure the internal composition of the fruit. However, refle tance NIR spectroscopy is commonly used for the analysis of ground grain products Diffuse or body reflectance mode is also commonly used by NIR spectroscopists. It functions similarly to regular reflectance spectroscopy, but benefits from increased interaction between the IR light and the sample compared with specular (surface) reflectance modes ( Figure 3).
A diagrammatical summary of the main sample presentation modes used in IR spectroscopy is given in Figure 3. As each sample presentation mode has its drawbacks and benefits [17,18], the optimum method will depend on the sample matrix and intended application.

Data Processing
The final stage in the use of infrared spectroscopy for analytical purposes is the processing of the spectral data. In many cases, the signal of the desired analyte may be obscured by other matrix components present in much higher concentrations, such as water or carbohydrate-based structures. The use of modern mathematical data analysis techniques-termed chemometrics-can aid in uncovering minor analyte signals and developing optimum models for the quantification of the analytes. However, it is important to note that no amount of data analysis or chemometrics can "uncover" an analyte if the signal from the analyte is either not present or too low to be detected by the instrument. The exception to this occurs when a secondary correlation exists between the analyte and a macroconstituent that can be detected by NIRS (see Section 1.1).

Spectral Pre-Processing
Typically, IR spectra are subjected to pre-processing before they can be used for quantitative analytical purposes. The aim of this procedure is to remove spectroscopic artefacts from the measurement process, such as random noise, scatter or baseline drift [19,20]. The effects of these artefacts are particularly detrimental when attempting to analyse complex mixtures or analytes present in very low concentrations [21].
A variety of spectral pre-processing methods are available. These include smoothing, multiplicative scatter-correction (MSC), standard normal variate (SNV), normalisation by range (NBR) and the calculation of derivatives [22]. As previous authors have reviewed the range of available spectral pre-processing methods in detail [23,24], only a brief summary of the most commonly pre-processing methods is presented here.
Standard normal variate (SNV) is a normalisation-based pre-processing method. In this pre-processing method, the mean value of each spectrum is calculated and this constant value is subtracted across the entire spectrum, before the spectrum is divided by the standard deviation of the entire spectrum.
Calculating the derivative of spectra is another common approach to account for baseline shift or amplitude differences in the spectra. First and second derivatives are the most commonly used. Although higher order derivatives, such as the third derivative, have been successfully used in some applications [25][26][27], there is an accompanying decrease in the signal-to-noise ratio as the derivative order increases [25].
Finally, it is important to note that pre-processing methods are often combined. For example, typical pre-processing of spectra for use in analytical spectroscopy could involve calculating the SNV of the spectra, before subsequently calculating the first derivative of the SNV-processed spectra.
The choice of optimum spectral pre-processing methods is poorly defined and strongly dependent upon both the matrix type and analyte of interest. Furthermore, the need for and choice of pre-processing method may also vary with the sample size of the population [21]. In the absence of definitive guidelines, trial and error is often the best approach when developing new applications for infrared spectroscopy.

Data Analysis Methods
For quantitative applications of IR spectroscopy, regression modelling is among the most commonly used data analysis methods. One of the first chemometric methods applied in quantitative IR spectroscopy applications was multiple linear regression (MLR), which attempts to predict the analyte concentration from the spectral absorbance at several different wavelengths. However, it cannot be used for the analysis of entire spectra, due to the high multicollinearity of the datapoints comprising the spectra.
Partial least squares (PLS) regression is a derivative of MLR suited to datasets with high levels of multicollinearity, such as infrared spectra [28]. Through a variety of algorithms, the key contributing variables are identified and weighted such that the wavelengths most closely correlated with the analyte concentration have the greatest contribution to the PLS model [28]. PLS regression is widely used for the development of IR spectroscopy models across the food science sector [29][30][31].
In recent years, there has also been an emerging interest in machine learning techniques, such as artificial neural networks (ANNs), support vector machine (SVM) and deep learning [32][33][34][35]. These non-linear techniques look for patterns within the data in order to optimise model weighting and extract the desired information from the data. As more datapoints are added to the dataset, the model can update over time in order to provide more accurate prediction results.
As with spectral pre-processing, the optimum chemometric technique may vary depending on the sample matrix and/or analyte [36,37].

Functional Foods
Recent years have seen an expansion of the "functional food" market-where foods are purchased for their health-benefiting effects, rather than as a source of basic nutrition and energy [38][39][40]. For example, the consumption of juice from Queen Garnet plums has been shown to reduce oxidative stress [41] and reduce the risk of blood clot formation in clinical trials [42], while polyphenolics isolated from chickpeas have been found to provide anti-cancer effects, particularly against colorectal cancer [43]. If high levels of such health-benefiting compounds can be demonstrated in a particular crop, consumers may pay a price premium for such a product, particularly if they are familiar with the concepts of functional foods [44]. For example, Spanish consumers reported that they would pay~55% extra for resveratrol-enriched wine [45], as this compound has purported benefits for cardiovascular health. This willingness to pay a premium for healthier food has been mirrored in several other studies [46][47][48], albeit with typically lower price premiums reported (e.g., 10-15% higher than the regular price).
Even if there is not a market for the functional foods in its unprocessed form, such produce also has potential for the development of value-added foods and ingredients [49][50][51], marketed on the basis of their levels of health-benefiting compounds. Examples of foods experiencing a considerable rise in popularity due to their reported health benefits include the so-called ancient grains (such as chia, quinoa, millet and spelt), pulse crops (including mungbeans, chickpeas, faba beans and lentils), as well as numerous other crops [52][53][54]. For instance, the plum variety Queen Garnet was developed and marketed with a sole emphasis on its exceptionally high levels of anthocyanins, which possess antioxidative and anti-thrombotic properties [41,55,56]. Another well-known example is the açaí berry from South America, popularised due to its high anthocyanin content and antioxidant capacity [57].

Definition of Bioactive Compounds
There is no clear literature consensus on the definition of bioactive compounds, with Guaadaoui et al. [58] proposing them to be "compounds which have the capability and the ability to interact with one or more component(s) of living tissue by presenting a wide range of probable effects". However, from a consumer's perspective, bioactive compounds are generally regarded as compounds which promote good health or provide health-benefitting effects. This is more similar to the consensus statement from the 23rd Hohenheim Consensus Meeting, which stated that "bioactive compounds are essential and non-essential compounds (e.g., vitamins or polyphenols) that occur in nature, are part of the food chain, and can be shown to have an effect on human health" [59]. Such bioactive compounds may also be referred to as "nutraceuticals" [60], which reflects their presence in the human diet.

Classes of Bioactive Compounds
The majority of bioactive compounds can be broadly classified as phytochemicalscompounds that are produced by plants-although some (such as fatty acids) are also found in animal-based foods. There are numerous classes of bioactive compounds (Figure 4), each with their own distinct biological activities and health benefits. These include polyphenols, flavonoids, carotenoids, phytosterols, phytoestrogens, alkaloids, glucosinolates, anthocyanins, terpenoids and others [61,62]. Each compound class is characterised by distinct structural features in their chemical composition. For example, polyphenols display the presence of multiple phenol groups, while all flavonoids comprise two phenyl rings and a heterocyclic ring containing an embedded oxygen heteroatom. 4), each with their own distinct biological activities and health benefits. These include polyphenols, flavonoids, carotenoids, phytosterols, phytoestrogens, alkaloids, glucosinolates, anthocyanins, terpenoids and others [61,62]. Each compound class is characterised by distinct structural features in their chemical composition. For example, polyphenols display the presence of multiple phenol groups, while all flavonoids comprise two phenyl rings and a heterocyclic ring containing an embedded oxygen heteroatom. It could be considered that compounds which show antioxidant activity form a class of bioactive compounds. However, a structurally diverse array of compounds may exhibit antioxidant activity, including polyphenols, anthocyanins, flavonoids and carotenoids. It could be considered that compounds which show antioxidant activity form a class of bioactive compounds. However, a structurally diverse array of compounds may exhibit antioxidant activity, including polyphenols, anthocyanins, flavonoids and carotenoids. For this reason, this review excluded studies solely reporting quantification of the total antioxidant capacity (TAC) of samples, as, in many cases, the antioxidant activity of a matrix cannot be directly related to the concentration of a specific structural class of bioactive compounds [64]. Nevertheless, this does not detract from the importance of TAC as a potential indicator of crude biological activity. Although, the concept of TAC has been criticised by several researchers as a result of its lack of specificity [65,66], numerous clinical trials have indicated a positive relationship between a greater intake of antioxidants and reduced levels of oxidative stress and inflammatory markers [67][68][69][70], reduced all-cause mortality (in non-elderly cohorts) [71,72] and reduced risk of adverse cardiovascular events, particularly ischaemic stroke [73][74][75][76][77].

Current Analytical Methods
There are numerous analytical methods available for the quantification of bioactive compounds, depending on the physical and chemical properties of the specific class of compound(s) of interest.
For example, terpenoids and other volatile compounds are commonly analysed by gas chromatography coupled with mass spectrometry (GC-MS), which uses a mobile inert gas phase and a stationary column phase to separate the compounds of interest [78,79].
Non-volatile compounds, such as polyphenols, anthocyanins, flavonoids and carotenoids, are typically analysed using the related technique of liquid chromatography coupled with mass spectrometry (LC-MS) [80][81][82]. As with GC, the column contains the stationary phase, while a liquid mobile phase carrying the analyte flows through the column. The relative affinity of the analyte for the mobile and stationary phases allows for its separation from other matrix constituents. Finally, the mass spectrometry module is used to identify the analyte based on its molecular weight.
An alternative approach to hyphenated techniques is coupling GC or LC separation to FTIR detection. This allows individual compounds to be separated in the gas or liquid phase, before collecting FTIR spectra from each eluting compound, providing detailed structural information on the functional group of the analyte(s). GC-FTIR has proved to be effective in identifying and quantifying separated compounds in foodstuffs [83,84]. More recently, the use of a new FTIR interface allowed the detection, identification and quantification of trace components at the nanogram level [85,86]. The use of the same interface coupled to liquid chromatography [87] opens the way to more applications for LC-amenable constituents.
In cases where the compounds of interest are known and pure standards are available for comparative purposes, high-performance liquid chromatography (HPLC) with ultraviolet-visible detection may suffice [88]. This method works in the same way as LC-MS, but uses absorbance in the ultraviolet-visible region to detect the eluting compounds, rather than mass spectrometry.
Colorimetric methods, such as the Folin-Ciocalteu assay, may also be used for the analysis of total phenolics, or for the quantification of anthocyanins using the pH differential method [89]. However, these methods are less specific compared to separation-based techniques, such as HPLC and GC-MS.
More recently, there has been interest in using rapid, non-invasive, stand-alone analytical techniques, such as IR spectroscopy, for the prediction of bioactive compounds [90][91][92]. This emerging area of research is the focus of this review.

Previous Work and Aims
Although several previous reviews have focused on the use of IR spectroscopy for the estimation of specific groups of bioactive compounds, such as antioxidants [93,94] and phenolics [95], there are no contemporary reviews in the last decade on this technique for the quantification of bioactive compounds in food products. For instance, the review by McGoverin et al. [96] on this topic is over ten years old, with numerous IR-related papers published during the ensuing period. Similarly, the review by Pallone et al. [97] on the use of vibrational spectroscopy in food analysis included only seven studies quantifying constituents which could be classified as "bioactive" compounds. Hence, this paper aims to review the contemporary literature reporting the estimation or quantification of bioactive compounds in food matrices.

Methods
The Scopus database (https://www.scopus.com/; accessed on 11 October 2021) was used to search for articles between 2016-2020 containing the following terms in their title, abstract or keywords sections:

•
Any of the following: near infrared OR mid infrared OR spectroscopy; • AND food; • AND bioactive OR phenolic OR antioxidant OR anthocyanin; • AND quantification OR determination OR measurement.
In this way, articles pertaining to the quantification of bioactive constituents of functional foods using infrared spectroscopy were listed.
Articles up to and including 31 December 2020 were considered, with the search limited to articles published in the 5 years prior (i.e., 1 January 2016 to 31 December 2020). The titles and abstracts of all articles were manually screened to find relevant articles for inclusion in this review.
Inclusion criteria were as follows: • Original studies published in the last 5 years (between 2016 and 2020); • Quantified a compound or group of compounds with recognised health-benefiting effects, above that expected from basic nutritional needs; • The matrix was a food or potential food product.

Scientific Effort (2016-2020)
The scientific effort over the past five years is summarised in Table 2 (for NIRS) and Table 3 (for MIRS). The information presented in the tables includes the type of matrix analysed, analyte(s) investigated, sample size of the calibration and validation sets, wavelength range used in the optimised model, and statistical method used for analysis of the spectra. All fruit and vegetable samples were analysed fresh and intact, unless otherwise stated in the table. The test set column shows whether the authors used a dependent test set for the model validation (i.e., samples from the same population as the calibration set) or independent test set (i.e., samples drawn from a different population to the calibration set, such as from a different year, season or geographic location). The cross-validation statistics (R 2 CV and RMSECV) are reported in the corresponding columns for all studies. In cases where the study also included an independent test set, the R 2 p and RMSEP for the test set are reported in the test set column. Finally, the notes column provides information about the sample population details and notable findings of the study.

Matrix Type
Interrogation of the included studies by matrix type revealed that NIR spectroscopy was most commonly used for the analysis of bioactive compounds in fruit matrices, followed by aromatic plants, grains/pulses and beverages (Table 4). In contrast, MIR spectroscopy was most often reported for the analysis of beverages, likely due to the ease of presentation for this sample type.

Optical Geometry
The majority of publications (58%) using NIR spectroscopy for the prediction of bioactive compounds used reflectance or diffuse reflectance geometry. A further 16% of studies used hyperspectral imaging in reflectance mode. Only 20% of studies used transmittance and 9% used transflectance, the majority of which were performed on beverage or oil samples. However, it should be cautioned that the vast majority of studies were not validated through independent test set validation and hence have not shown their robustness in "real-world" use; consequently, the optical geometry types used in the academic studies reported here may not reflect the optical geometry of instruments used commercially.
All of the MIR spectroscopy studies except one [92] used an Attenuated Total Reflection (ATR) sampling platform, which requires samples to be placed in close contact with the ATR crystal. In general, studies comparing both NIRS and MIRS tended to show similar accuracy between these two techniques. The simple sample preparation for MIRS-particularly when using MIR-ATR-combined with its generally high accuracy would seem to make it suitable for a wide range of applications.          [140] Abbreviations: RBF-NN = radial basis function neutral network; LOO = leave-one-out cross-validation; n/s = not specified; PLS = partial least squares; SVM = support vector machine; TAC = total antioxidant capacity.      Models based on Raman spectra were slightly better than FTIR [148] Abbreviations: LOO = leave-one-out cross-validation; n/s = not specified; SFA = saturated fatty acids; MUFA = monounsaturated fatty acids; PUFA = polyunsaturated fatty acids; TAC = total antioxidant capacity. The number of calibration samples ranged from 10 to 387 (mean = 83 ± 72 samples), while the size of the validation set ranged from 5 to 182 (mean = 37 ± 32 samples). The majority of studies used a dependent test set (65%) or did not use any test set (24%), while only 9% of studies used an independent test set for validation of the developed model.
Within the four NIRS studies utilising an independent test set, one used transmittance [112], while the others used reflectance [13,98] or diffuse reflectance geometry [109]. Cunha Júnior et al. [98] sourced their test set from the following season to the calibration set, while Ncama et al. [13] used test set samples from a geographically distinct farm (~400 km away) and Cozzolino et al. [109] used commercially sourced samples for validation purposes. The study by Tilahun et al. [112] could arguably be classified as using a semi-independent test set, as the authors used samples from a different harvest time point within the same season and from the same location. Interestingly, all four of these NIRS studies were performed on fruit rather than other food matrices.
In most of these studies, the test set validation statistics were moderately poorer compared to the cross-validation statistics. For example, Cunha Júnior, et al. [98] found an R 2 CV of 0.89-0.91 and RMSECV of 2.5-2.9 g/kg, compared to an R 2 p of 0.74-0.88 and RMSEP of 5.1-6.8 g/kg. Similarly, the RMSEP for the prediction of lycopene content in tomato fruit was moderately higher at 1.79 mg/kg compared to the RMSECV of 1.56 mg/kg [112]. However, the performance of the test set from Cozzolino, et al. [109] was much worse, with an R 2 p of 0.73 and RMSEP of 4733 mg/100 g (compared to an R 2 CV of 0.93 and RMSECV of 1839 mg/100 g for cross-validation).
Using MIRS for the analysis of chocolate samples, Hu, et al. [31] found that the test set statistics for the prediction of (+)-catechin, (+)-epicatechin and total phenolics in chocolate using MIRS were quite comparable to the cross-validation statistics. However, the RMSEP for prediction of total antioxidant capacity (TAC) in the same samples was 3-12 times higher than the RMSECV, suggesting that MIRS was not suitable for the accurate estimation of TAC in this matrix. These few examples illustrate the level of over-optimistic results which are likely to be purported when using no test set or a dependent test set for model validation.

Chemometric Techniques
Nearly all of the publications used partial least squares (PLS) regression or some derivative of this regression technique for model development. Tschannerl et al. [118] used Support Vector Regression (SVR), a quantitative form of Support Vector Machines (SVM), for the prediction of total phenolic content in barley malt samples. However, only 10 samples were investigated in that study, with no independent test set used. Zhang et al. [106] also used SVR for the prediction of phenolic content in wine grape skins and seeds, demonstrating that for most analytes, the use of SVR gave better results than PLS or principal component regression (PCR). Xiao et al. [102] used a Least Squares Support Vector Machine (LS-SVM) algorithm for the prediction of total phenolics in white and red grapes, again with better results found compared to the standard PLS algorithm. Finally, Ding, et al. [113] compared the use of Radial Basis Function Neural Networks (RBF-NN) and PLS in dehydrated tomato samples, finding that RBF-NN performed better for the lycopene, total phenolic content and total antioxidant capacity measured by the DPPH and ABTS assays, while PLS performed better for the prediction of total antioxidant capacity via the FRAP method. Furthermore, only this one study used a deep learning or ANN algorithms (in this case, RBF-NN), although machine learning is a topic of increasing interest for other areas of IR spectroscopy [149,150].

Trends by Analyte Class
Another major aspect of interest to researchers in this field is the types of bioactive analyte(s) that have been measured using IR spectroscopy. Consequently, Table 5 presents a break-down of the studies included in this review by the compound class of the reported analytes. Additionally, the major classes are discussed in the following sections. Table 5. Number of studies included in this review, broken down by analyte class. Note that if the same study investigated multiple matrices or investigated more than one analyte class in the same matrix, it was counted separately.

Polyphenols
The greatest number of studies examined for the purpose of this current review were focused on predicting the total polyphenol content, or the content of specific polyphenol compounds present in the matrix (Table 5), with over half of all investigations focused on these analytes. There is an ongoing interest in biochemical characterisation and quantification of polyphenols across a wide range of food products, given that compounds from this class have been associated with a wide range of potential health-benefiting effects [151][152][153][154][155], particularly in improving cardiovascular health [156][157][158][159][160]. Consequently, the rapid prediction of total polyphenol content using infrared spectroscopy could have the potential to greatly benefit the effectiveness and robustness of the quality assurance of functional food products [40,96].
Ferrer-Gallego et al. [161] provided a recent review of the use of vibrational spectroscopy in the prediction of the phenolic composition of grapes and wines, although other food matrices were not considered in their review. The authors considered that this technique showed considerable promise for this purpose, although noted that future studies on grapes and wine should incorporate a wider range of environmental and genotypic variation.
Some authors have reported difficulty in creating robust models for the prediction of total polyphenols using infrared spectroscopy. For example, Martín-Tornero et al. [162] found that NIRS and MIRS could only be used as a screening method for the total polyphenol content in grape leaves, due to the high prediction errors associated with the models created. These authors used a dependent test set (leaves collected from different locations within the same vineyards). In blackberry fruit, the best model for total phenolics reported by Toledo-Martín et al. [100] had a R 2 CV of 0.69 and RMSECV of 169 mg/100 g. Again, the cross-validation samples used in this study were randomly selected from the same population as the calibration samples; consequently, the model performance on an independent population would be lower again. Similar results in terms of model accuracy were found by Rodríguez-Pulido et al. [110] in raspberries, Trapani et al. [127] in olive paste and Hernández-Hernández et al. [131] in cocoa bean, while quite poor cross-validation results were found by Nogales-Bueno et al. [133] for the prediction of total phenolic content (TPC) in coffee bean using NIR hyperspectral imaging. As the mean TPC of the samples was 3.6% w/w, the poor performance appears more attributable to the reproducibility of sample presentation or the wavelength selection, rather than the concentration of the analyte.
In contrast, Tzanova et al. [101] and Jara-Palacios et al. [105] reported quite good findings for the prediction of total polyphenol content in grapes and grape pomace, respectively (R 2 CV = 0.87-0.97; RMSECV = 9.6-21 mg/100 g), indicating that the instrument choice, geometry and data processing techniques may have an influence in addition to the matrix type. However, it is important to note that none of the aforementioned studies on the prediction of total phenolic content used an independent test set; therefore, the results should be taken with caution.
There do not appear to have been any studies that focused on the IR quantitation of specific phenolic compounds or total phenolic content in model systems; hence, it is difficult to know what limit of detection and level of error to expect when using IR spectroscopy for this purpose. Although Abbas, et al. [9] used MIRS for the qualitative identification of 36 phenolic compounds (presented in powder form), they did not attempt the quantitation of these compounds in a model matrix.

Anthocyanins
The second-most common analyte type that has been investigated using infrared spectroscopy was anthocyanins. Most of these studies (13 out of 17) looked at the total anthocyanin content, while only 4 studies attempted the prediction of specific anthocyanins. As a class of flavonoids, anthocyanins are less abundant than total polyphenols, so would be expected to be a more challenging target for infrared spectroscopy. Anthocyanins are brightly coloured and absorb light at around 520 nm; hence, it may be thought that they could be detected using the visible wavelengths of Vis-NIR instruments. However, surprisingly, all except one of the studies using NIRS for the measurement of anthocyanins did not include the visible light region in the optimised models, indicating that the infrared region actually contained most of the functional information pertaining to the anthocyanin content. Given the low concentration of anthocyanins, their prediction through NIRS is likely to rely upon secondary correlations with other matrix constituents.
Most studies using NIRS reported reasonably high accuracies for anthocyanin prediction in fresh sample matrices (R 2 CV = 0.72-0.98; RMSECV = 9-13 mg/100 g), while MIRS performed similarly well for the estimation of anthocyanin content in soybean, grape juice and red wine. Additionally, Rodríguez-Pulido et al. [110] found that there was a reduced model linearity using NIRS in raspberry fruit (R 2 CV = 0.63), although the RMSECV obtained was roughly comparable at 12 mg/100 g.
Studies attempting the prediction of individual anthocyanins in red grapes [103] and wine [92,147] found that the concentrations of most of these compounds could be predicted with only slightly lower accuracy compared to the total anthocyanin content. Given the very low concentrations of many of these compounds, it is likely that the created models were indirectly measuring their concentration via their correlation with other, more abundant compounds which are more readily detected using infrared spectroscopy (possibly the predominant individual anthocyanin compounds present in the sample). Somewhat confusingly, many of the studies reported the anthocyanin content in units of mg/L of the sample extracts, rather than being correctly reported in mg/g or mg/100 g of the intact fruit from which the infrared spectra were obtained. Hence the results of these models should be interpreted with some degree of prudence. Future researchers in this area should be aware of and avoid this common pitfall.

Carotenoids
In contrast to the trends observed for anthocyanins, studies investigating carotenoids using infrared spectroscopy mainly attempted the prediction of specific carotenoid compounds (β-carotene, lycopene) compared to those predicting the total carotenoid content. Additionally, all of the studies attempting carotenoid prediction were performed using NIRS.
Toledo-Martín et al. [100] also found acceptable results for the total carotenoid content in blackberry (R 2 CV = 0.76, RMSECV = 0.01 mg/100 g), with the carotenoid model outperforming that developed for total phenolic content in the same crop. Higher model accuracies (R 2 CV > 0.9; RMSECV < 0.01 mg/100 g) were reported for β-carotene content in carrot [115] and marsh grapefruit [13], as well as for total carotenoids in honey [139].

Ascorbic Acid
Studies using infrared spectroscopy (NIRS or MIRS) for the estimation of ascorbic acid content were performed in Kakadu plum powder [109], carrot [115], frozen guava pulp [107], cashew apple and guava nectar [135], and soft drinks [138]. Most models showed reasonable accuracy (R 2 CV = 0.7-0.98; RMSECV = 4-7 mg/100 g). Due to the exceptionally high ascorbic acid content in Kakadu plum (mean content of 14,323 mg/100 g), the RMSECV values of Cozzolino et al. [109] were much higher at 1811-1839 mg/100 g. The model linearity was quite high (R 2 CV = 0.91-0.93), with an RPD of 4.0-4.1, although the independent test set validation (comprising commercially purchases samples of Kakadu plum powder) gave a high RMSEP. All of the other aforementioned studies did not validate their models using independent test sets, but only used dependent test sets (comprising randomly selected samples from the full dataset).
The study by Cozzolino et al. [109] was also the only study to compare the performance of NIRS and MIRS for predicting ascorbic acid content, finding a slightly improved accuracy of NIRS compared to MIRS in dried Kakadu plum powder.

Other Analytes
Other bioactive compounds assessed using infrared spectroscopy included chlorophylls, fatty acid esters, squalene and tocopherols (compounds related to vitamin E) in olive oil, piperine in black pepper, caffeine in black tea, and theobromine in cocoa bean. In general, good results were generally found for caffeine, and most tocopherols and fatty acids, while moderately accurate results were found for theobromine, squalene, chlorophylls and piperine (R 2CV = 0.7-0.8). However, it should be noted that most of these analytes were only investigated in a single study. Nevertheless, these results support the use of infrared spectroscopy as an adaptable tool for the rapid estimation of a substantially wide range of bioactive compounds in food-based matrices.

Future Directions
As found throughout this review, IR spectroscopy shows considerable potential for the quantification and relative prediction of the levels of bioactive components in food matrices. Although the research has mostly been presented as proof-of-concept work and/or conducted under controlled laboratory conditions, interest and applications in this field are likely to continue to grow. A brief discussion on several particular aspects worth noting is provided here.
Hyperspectral imaging is a rapidly growing area of research in the food science sector, particularly for the determination of food quality and safety [163][164][165][166], but also for authentication purposes [167,168]. This technique can collect near-infrared spectra from each pixel in a photograph (creating a 'hypercube' dataset), allowing for analysis of the spatial variation of the analyte, in addition to its mean concentration in the sample. Consequently, hyperspectral imaging could potentially be used for the quantification of bioactive compounds [169]. Indeed, several of the studies reviewed here applied hyperspectral imaging for the estimation of anthocyanins and phenolic acids in grapes and grape byproducts [103][104][105][106], and for the estimation of phenolics in barley malt [118] and coffee beans [133]. However, there have been limited applications of hyperspectral imaging systems in industrial applications to date, due to its associated challenges such as obtaining reproducible sample presentation, minimising the effects of ambient light, and the complexity of data analysis [166]. Furthermore, hyperspectral imaging can only be used with a reflectance geometry. Additionally, the cost of these instruments remains quite high; thus, they are only used for applications which have a need for spatial information.
The use of IR spectroscopy as a real-time, online (or "inline") process analytical technology is another principal area of interest. NIRS is already commonly used in manufacturing environments and processing plants for the online analysis of a range of food products, principally for the determination of proximate quality parameters, such as moisture/dry matter content, soluble solids and protein [170,171]. This real-time information can then be fed back into the manufacturing system, allowing various processing parameters to be adjusted accordingly in view of maintaining the optimal quality of the product. With emerging interest in bioactive compounds in functional food products, online NIRS could potentially be extended to the quality assurance of the presence of these compounds in addition to existing analytes already being monitored.
Finally, it is worth noting the importance of confirming the accuracy and reproducibility of infrared spectroscopy techniques using sufficiently large sample sizes and test sets which are independent to the calibration sets. Given that only a small fraction of the studies reviewed here used a fully independent test set for model validation purposes, it is likely that the reported accuracy is in many instances quite over-optimistic and not representative of the true accuracy which could be expected if applying the model for routine quality assurance purposes.

Conclusions
The technique of IR spectroscopy has enjoyed considerable success in the food analysis industry over the past few decades. In recent years, an increasing number of studies are exploring the use of this technology for the analysis of bioactive compounds in food products, such as polyphenols, anthocyanins or carotenoids. While much reported work is still in the proof-of-concept or method development stage, IR spectroscopy appears to show promise for the relative assessment-if not absolute quantification-of these bioactive analytes. In particular, the ease of sample preparation and reasonable accuracy of MIR-ATR (comparable to NIRS in many studies) would appear to make this technology suitable for a wide range of applications in the food industry.