Critical Review on the Utilization of Handheld and Portable Raman Spectrometry in Meat Science

Traditional methods for the determination of meat quality-relevant parameters are rather time-consuming and destructive, whereas spectroscopic methods offer fast and non-invasive measurements. This review critically deals with the application of handheld and portable Raman devices in the meat sector. Some published articles on this topic tend to convey the impression of unrestricted applicability of mentioned devices in this field of research. Furthermore, results are often subjected to over-optimistic interpretations without being underpinned by adequate test set validation. On the other hand, deviations in reference methods for meat quality assessment and the inhomogeneity of the meat matrix pose a challange to Raman spectroscopy and multivariate models. Nonetheless, handheld and portable Raman devices show considerable potential for some applications in the meat sector.


Introduction
Meat industry as well as meat science have put forth numerous analytical methods for meat quality assessment in order to ensure edibility, to be in compliance with control authority regulations and to offer consumers high quality products and satisfy their needs. Improvements of old and established technologies, technological advances [1,2] and the consumers' constant demand for safe and guaranteed quality have made their contribution to this development. Apart from that, meat scandal headline news all over the world (e.g., horse meat scandal in Europe in 2013, export of rotten beef from Brazil in 2017 or reports on hygiene breaches in meat plants in the UK in 2018) raised consciousness about the quality of meat added to the growing interest and faster progress in meat quality assurance [3][4][5][6][7].
Among consumers and producers both, the taste experience and palatability of meat are considered to be the most important characteristics of meat and meat products. The palatability of meat is influenced by flavor, juiciness and tenderness, and according to Ref. [8] each of these three traits had a high correlation with the overall acceptability in a study with untrained consumer panels. Still, the evaluation of the mentioned meat quality parameters by human test persons is affected by subjective perceptions [9]. More objective tenderness assessment methods are for example the Warner-Bratzler shear force (WBSF) test [10] and particle size analysis based on the laser diffraction technique [11]. However, especially the popular and widely applied WBSF test shows discrepancies when compared to panel ratings-most likely due to the measurement setup and sample orientation [10,12]. A compact summary of instrumental methods for tenderness assessment can be found in Ref. [10]. are caused. Although Raman spectroscopy is focussed on vibrational energy level transitions, there are also transitions between electronic and rotational energy levels. Raman spectra can provide structural as well as qualitative information of a substance [45].
When it comes to the analysis of spectral data, multivariate methods are often used for the calibration of spectra with reference data. This calibration process is done with a calibration (or training) set and must contain all the variability which is expected in future samples. A modeling error called root mean square error of calibration (RMSEC) is obtained, which is often highly over-optimistic in terms of prediction ability. A closer estimate of a model's future prediction performance of unknown samples is the root mean square error of cross validation (RMSECV). This error results from a cross validation procedure, where every sample is left out of the calibration once and subsequently predicted with the created calibration model. This approach is referred to as full cross validation or leave-one-out cross validation (LOO-CV). Another cross validation technique is know as segmented (or k-fold) cross validation, where a certain number of calibration samples are grouped into segments. Each segment is then left out of the calibration process once and predicted with the calibration model. Although the RMSECV represents a more realistic estimate of a model's prediction ability then the RMSEC, the samples used in the cross validation process still originate from the calibration set. Thus, the cross validated error can not be considered as a reliable source of future prediction performance of a multivariate model on new and unknown samples. The only way to access an estimate of the expected error for the prediction of new and unknown samples is to use a set of samples with known reference values, which have never been involved in the calibration process. Such a set of samples is called test set or independent validation set and the resulting error is referred to as root mean square error of prediction (RMSEP). This validation step is essential in order to estimate the future model performance in predicting completely new and unknown samples and to avoid both overfitting and unterfitting. However, literature differs on the required size of the validation set-some sources suggest that the validation set should contain a similar number of samples as the calibration set, others recommend that approximately one third of the data should be kept out of the calibration in order to be used as test samples [46,47]. Due to the fact that meat is a very inhomogeneous sample matrix and subjected to a high level of biological variability due to various influences such as breed, feed, condition of livestock farming and age (to name just a few), a proper test set validation is mandatory for reliable statements on future samples. In our point of few, the ideal procedure for the development of spectroscopic applications in meat science would be to procure samples for the establishment of a calibration model at a time and after this, again procure new samples at another time for the purpose of testing the prediction ability (performance) of the developed calibration model. It is to be assumed, that other validation approaches than the one just mentioned above-especially cross validation-will lead to biased, less trustworthy or even useless and impractical multivariate models due to the already described reasons of pronounced inhomogeneity and biological variability of meat samples. Studies without an explicit independent validation set should point this out clearly in publications and draw conclusions carefully.
In the following chapters, a critical review of peer reviewed literature on the utilization of handheld and portable Raman spectrometers in meat science in the period of 2010 to 2018 will be presented. The advantages of handheld Raman devices in comparison to benchtop Raman spectrometers include the ability to perform in-field measurements (abattoir, retail, etc.), robust design as well as the usually simple handling. However, spectra of handheld devices are often less reproducible, less accurate and more affected by noise, which can be considered as a significant drawback-especially in food science and the therein faced complex matrices. For a comprehensive review on quality assessment of meat and fish with benchtop Raman devices, the interested reader is referred to Ref. [44].

Prediction of Eating Quality Traits with Raman Spectroscopy
The sensory attributes of lamb and beef have been investigated with traditional methods and an attempt has been made to correlate these results with Raman spectra from a handheld device   [48], Fowler et al. (2014aFowler et al. ( , 2014bFowler et al. ( , 2015bFowler et al. ( , 2018 [49][50][51][52], Bauer et al. (2016) [53]]. One of the most frequently studied traits therein is tenderness using shear force measurements. All the attempts to correlate Raman spectra with shear force measurements can be considered as insufficient for routine application, as the best prediction error (root mean square error of prediction-RMSEP) using partial least squares regression (PLS-R) models was about 20% (normalized RMSEP-NRMSEP; see Equation (1)) of the calibration range with a coefficient of determination of R VAL 2 = 0.23 and 0.33 for two models with different data sets, respectively [Bauer et al. (2016) [53]]. The poor predictive power of shear force from Raman spectra with PLS-R can be elucidated with R 2 values from Bauer et al.
Unfortunately, Bauer et al. (2016) [53] are the only ones that utilized an independent validation set (i.e., test set), although it is mandatory to test the performance of a PLS-R model [46] and despite the fact that the other publications had an adequate number of samples to at least test their PLS-R models with a few validation samples in predicting shear force of meat. Schmidt et al. (2013) [48] reported cross validation errors (RMSECV) between 26% and 31% for the prediction of shear force of 140 lamb samples from two different sites, but they did not mention how this percentage error was calculated. Independent of how this percentage error was calculated, the conclusion must be drawn very carefully whether the spectroscopic data from the Raman measurements can be correlated with shear force measurements as stated by the authors. Furthermore, they only reported R CAL 2 values, which can be considered as insufficient information when PLS-R has been used without any information on R CV 2 and R VAL 2 . However, Fowler et al. (2014aFowler et al. ( , 2014bFowler et al. ( , 2015bFowler et al. ( , 2018 [49][50][51][52] concluded from their studies that there was little or no ability to predict shear force of lamb as well as beef using Raman spectra recorded with a handheld device. Figure 1 from   [49] clearly shows the poor correlation between reference and predicted shear force values. Table 1 gives a brief overview of the references which attempted to predict shear force and other target figures of various meat types using handheld Raman devices. A more promising approach seems to be a classification of meat samples into tender and tough samples as described by Bauer et al. (2016) [53]. The authors set five different thresholds according to shear force measurements and investigated the ability of partial least squares discriminant analysis (PLS-DA) to differentiate between tender and tough beef samples from Raman spectra. The classification accuracy of the PLS-DA varied across the different thresholds between 59% and 80% correctly classified samples of the validation set (N = 75). Although the accuracy should be somewhat improved, this may find application in supporting producers to rapidly identify the toughest samples. However, it would have been interesting to see how other classification techniques would have performed, e.g., linear discriminant analysis (LDA) or support vector machine classification (SVM-C). Especially the performance of SVM-C would have been interesting since this method accounts for non-linear effects [54].
In the study of Fowler et al. (2018) [52] an untrained sensory panel was used to determine the sensory traits tenderness and juiciness of 45 beef loins in order to investigate potential correlations with Raman spectra. An error of RMSECV = 10.52 scores in a range of 19.8-82.6 scores is reported for predicting tenderness from Raman spectra measured with a handheld device and analyzed using PLS-R.
According to Equation (1), this corresponds to an NRMSECV of approximately 17%, which might be acceptable, but it has to be mentioned that this error most probably will increase considerably when an independent test set is used for validation. Furthermore, there is a noticable drop in correlation from calibration to cross validation from ρ CAL = 0.60 to ρ CV = 0.47. The prediction of juiciness scores determined by the same sensory panel with Raman spectra yields an error of NRMSECV = 21% with PLS-R. It should be consequently concluded, and in contrast to the authors' conclusion, that there is no sufficient ability of predicting juiciness scores with Raman spectroscopy. Although the authors continuously use the term RMSEP, they actually calculated an RMSECV. This can lead to misinterpretations of the reported errors (see Section 2.5, 3rd paragraph in Fowler et al. (2018) [52]).

Prediction of pH Values with Raman Spectroscopy
Since the pH is of particular importance to meat quality evaluation [63], serveral studies have tried to correlate pH measurements with Raman spectra obtained from handheld devices ready for mobile use    [58]]. In the study of   [55] the pH values from 10 pork samples were recorded with a pH electrode as well as Raman spectra with a handheld device in a period from 30 min to 10 h post mortem. Subsequently, three different approaches were applied in order to predict the pH of pork from Raman spectra: peak intensity ratio of two Raman signals originating from phosphate groups at 980 and 1080 cm −1 (for more detailed information please refer to   [55]), multiple linear regression (MLR) and PLS-R. Due to the lack of information on the pH range from 30 min to 10 h post mortem, the range was estimated herein using Figure 5 in   [55] in order to get an idea of the error's order of magnitude. As a result from this, the pH calibration range was determined to be approximately pH = 6.6-5.4. The peak ratio approach is reported to have a root mean square error of calibration (RMSEC) of 0.30 pH units with an R CAL 2 = 0.71. According to Equation (1), this yields an NRMSEC of 25%, which is rather high considering the fact that this is an error of calibration and no independent validation samples were used. The MLR and PLS-R approaches both show cross validation errors of RMSECV = 0.22 pH units and R CV 2 values of 0.70 (MLR) and 0.87 (PLS-R). With regard to the calibration range this error is roughly about NRMSECV = 18%. These values-especially the ones from the PLS-R-look somewhat promising, but it should be kept in mind that the sample set was quite small and that no validation set was used to test the performance of the cross validated models with independent samples. In the study of Scheier et al. (2014) [56] a larger sample set consisting of 96 pigs was used for pH measurements at 45 min (pH 45 ) and 24 h (pH 24 or ultimate pH-pH u ) post mortem with a puncture electrode. Raman spectra were recorded between 60 and 120 min post mortem with a handheld device developed for the conditions prevailing in abattoirs. PLS-R with cross validation (random blocks method) was used to create a multivariate model for the prediction of pH in pork. Figure 2 shows the pH reference values plotted against the predicted pH values from the cross validated PLS-R models for pH 45 (Figure 2a) as well as for pH 24 (Figure 2b). Despite a sample set of 96 pork samples, no samples have been separated for a validation check on the performance of the PLS-R models. The figures of merit for the calibration look quite promising for both pH 45 and pH 24 with RMSEC = 0.11 and 0.06 pH units, and R CAL 2 = 0.82 and 0.85, respectively. The normalized error for the calibration (NRMSEC; see Equation (1)) is below 10% for both pH measurement times.
Expectedly, the figures of merit look somewhat worse for the cross validation with RMSECV = 0.17 and 0.09 pH units, and R CV 2 = 0.65 and 0.68 for pH 45 [56] can be made according to Ref. [64]. Thus, the reported coefficients of determination from the cross validation can be considered as suited for screening and "approximate" calibrations. Nevertheless, Scheier et al. (2014) [56] reported promising cross validation errors with NRMSECV values between approximately 10% and 12% although they used up to 8 latent variables (LVs) in their PLS-R models. The use of this number of LVs is certainly justifiable with regard to the complexity of the present meat matrix and the target value, but still a comment on the model loadings would have been interesting. = 0.75, however, after cross validation the R 2 dropped to R CV 2 = 0.55. As a consequence, this means that 55% of the variance in the pH reference measurements can be accounted for by the variance in the Raman spectra, which seems rather low if valid conclusions are to be drawn. In the case of pH 24 , the coefficient of variation after cross validation is even lower with R CV 2 = 0.31. The reported error of cross validation for pH 35 is RMSECV = 0.09 pH units in a range of 6.09-6.94 pH units, resulting in a normalized error of NRMSECV = 10% (see Equation (1)). The PLS-R model for pH 24 yielded an RMSECV = 0.05 pH units in a range of 5.30-5.65 pH units, and thus an NRMSECV of approximately 14%. This cross validation errors again look promising, but they need to be verified with independent validation samples. Considering the fact that 151 meat samples from 48 slaughter batches were available, it is not clear why no samples were kept out of the calibration and cross validation precudure in order to test the performance of the PLS-R models. As long as no independent validation set has been predicted with acceptable RMSEP values, no principal feasability of handheld Raman spectrometers to predict pH of meat should be concluded. Fowler et al. (2015b) [51] conducted two separate experiments in order to predict (among other meat quality traits) the pH of lamb 24 h post mortem (pH 24 ) and 5 days post mortem (ultimate pH, pH u ) with Raman spectra obtained from a handheld device. Each of the two experiments consisted of 80 randomly selected lamb samples and the measurements were distributed over four consecutive days for experiment 1 and two consecutive days for experiment 2. PLS-R with k-fold cross validation was used to investigate potential correlations of Raman spectra with pH reference values. Although pH was measured with a pH electrode at 24 h and 5 d post mortem, the results for the prediction of pH u (5 d post mortem) using Raman spectra collected at 5 d post mortem were not reported for experiment 1. Instead, Raman spectra recorded at 24 h post mortem were used to predict both pH 24 and pH u . As a result, coefficients of determination of R CV 2 = 0.48 and 0.59 were obtained for the prediction of pH 24 respectively pH u with Raman spectra recorded at 24 h post mortem. According to Ref. [64], these R 2 values can be considered as acceptable for rough screening. The reported cross validation errors are RMSECV = 0.12 and 0.07 pH units for the prediction of pH 24 and pH u , respectively. Normalization of these errors according to the range of calibration using Equation (1) yields an NRMSECV = 17% for pH 24 and NRMSECV = 10% for pH u . The errors look quite promising, but taking into consideration that these are cross validated errors and that, despite the availability of 80 meat samples, no validation set has been used to evaluate the true performance of the PLS-R models, the reported errors must be treated with caution. This conclusion is further supported by the fact that no reasonable correlation between pH reference values and Raman spectra was obtained in experiment 2 as reported by the authors. Another study of Fowler et al. (2018) [52] with 45 beef loins showed a very weak correlation of Raman spectra with pH reference values at 72 h post mortem between measured and predicted pH (ρ CAL = 0.12 and ρ CV = 0.09). The reported cross validation error from a PLS-R analysis is RMSECV = 0.87 pH units, which is even higher than the pH range used for the calibration (pH CAL-range = 5.5-6.1) and yields an NRMSECV = 145%. Nache et al. (2016) [58] used 96 pork samples for their study of a potential feasability of predicting pH 45 and pH 24 from Raman spectra recorded with a handheld device. Also, they investigated cross-predictions, i.e., prediction of pH 24 from Raman data at 45 min post mortem and vice versa. The authors employed the metaheuristic approach ant colony optimization (ACO) [65] for selecting the best spectral regions in combination with SIMPLS regression [66]. Furthermore, Nache et al.  [56] used the same pork muscle (m. semimembranosus) and had the same amount of samples. In their work they reported a range of pH = 5.41-6.8 and 5.28-6.13 for pH 45 and pH 24 , respectively. Considering this range, the estimated percentage error for pH 45 is approximately 13% (see Equation (1)) for the cross validation approach (NRMSECV) as well as the validation set approach (NRMSEP). Due to different pre-treatments, the error for pH 24 for the cross validation approach is estimated beween NRMSECV = 12%-14%. The validation set approach for pH 24 yields an NRMSEP = 16%. It is important to emphasize, that Nache et al. (2016) [58] tested the performance of their multivariate calibration model with a separate independent validation set and thus the reported error for the prediction of pH 45 and pH 24 using Raman spectra may be assessed as realistic with regard to the complexity of the present sample matrix. Unfortunately, only coefficients of determination for the cross validation approach were reported and none for the validation set approach. Nevertheless, R 2 values between 0.85 and 0.90 for the cross validation approach show the potential of ACO in selecting the most useful spectral areas. The reported figures of merit for the cross-predictions of both pH values were similar to those of the regular predictions. To this end, samples standardized in size were stored at 4 • C over a period of 48 h and drip loss was determined by the difference between initial and final weight. The drip loss ranged from 0.7-9.2%. The prediction of drip loss with Raman spectra measured at 60-120 min post mortem yielded an error of RMSEC = 0.6% with R CAL 2 = 0.9 and RMSECV = 1.0% with R CV 2 = 0.73. It is quite difficult to assess the true predictive power of a PLS-R model with only calibration and cross validation data available. On the one hand, the RMSEC and R CAL 2 look very promising for the prediction of L*, but on the other hand, the error has more than doubled and a remarkable drop of the coefficient of determination is recorded after cross validation. An increase of the percentage error from NRMSEC = 5% to NRMSECV ≥ 13% (Equation (1)) for L* makes it difficult to estimate a test set validated error. As a consequence, no final conclusions on the expected error in a potential implementation in routine analysis can be derived. Furthermore, an R CV 2 of 0.64 for L* can be considered as applicable for rough screening [64], which is by no means sufficient enough for analysis in routine operations in the food sector. The cross validation results for the prediction of drip loss can be categorized as good enough for screening and some other "approximate" calibrations [64], but still no reliable statement on the true predictive power of drip loss using Raman spectroscopy can be made without independent validation samples.  [57] reported an NRMSECV of more than 17% (see Equation (1)) for L*, which is rather high considering that this error originates from cross validation. However, due to the very poor coefficient of correlation (R 2 ≤ 0.1), the authors did not consider this PLS-R model as having any predictive power. In case of the drip loss, the authors reported an error of RMSEC = 0.4% and RMSECV = 0.6% with R CAL 2 = 0.83 and R CV 2 = 0.52. These errors look quite low, but taking the range of the reference measurements into consideration, the NRMSECV is somewhat higher than 14% (see Equation (1)). Additionally, the significant drop of the coefficient of determination from calibration to cross validation has to be noted. Accordingly, higher NRMSEP values and lower correlations (R VAL 2 ) are to be expected when an independent validation set is used to test the performance of the drip loss model.  [51], the failure of this work in predicting L* and drip loss from Raman spectra demonstrates that any results concerning these two parameters should be viewed with caution-especially if they are not test set validated.

Meat Spoilage Identification with Raman Spectroscopy
Schmidt et al. (2010) [59] presented a prototype handheld Raman sensor ready for mobile use in meat quality analysis. After testing the wavelength and intensity stability of this prototype, pork samples were investigated with regard to time dependent changes of their Raman spectra. For this purpose, pork samples (number of samples not specified by the authors) were stored at 5 • C in plastic packaging and measured on days 2, 3, 6-17, 20 and 21 post mortem (three weeks) through the packaging and without the packaging. Microbiological reference analysis was performed and expressed in terms of colony-forming units (cfu). Data analysis was conducted using principal component analysis (PCA). The results for both approaches (packaged and unpackaged) are not straightforward, as the authors reported a deterioration of the signal-to-background ratio with progressive storage time, which they attributed to laser induced fluorescence (LIF) coming from porphyrins. This decrease in signal-to-background ratio is mainly responsible for the separation of the time-dependent Raman spectra in principal component 1 (PC 1) of the PCA. Schmidt et al. (2010) [59] assigned PC 2 and PC 3 to changes in protein structure occuring during storage and aromatic amino acids by interpretation of the corresponding PC loadings. Furthermore, a cluster formation along PC 1 of days 2-6 (unspoiled) and 7-9 (started spoilage) can be seen, which is in accordance with the reference analysis, where the threshold of 106 cfu/cm 2 is reached by days 5-6. Besides this, the authors claimed that PCA was able to indicate the entrance of the steady state of the sigmoid bacterial growth curve obtained from reference analysis at days 9-10. As a consequence, Schmidt et al. (2010) [59] reported that the steady state can be separated from the bacteria's exponential growth phase until day 9 to 10 with PCA using PC 1 and PC 2. The interpretation of the PCA plot presented in this work is conclusive, although it should be noted that no validation took place and that no classification (in the strict sense, i.e., LDA, SVM-C, projection of unknown samples using a PCA model, etc.) according to the PCA interpretation was conducted. Nevertheless, the authors state that they cannot conclude that the bacteria caused the differences in the Raman spectra leading to the clustering in the PCA. In fact, they claimed to detect the effects of increasing bacterial growth with time. Additionally, Schmidt et al. (2010) [59] performed a PLS-R using the surface concentration of the bacteria on each day and the corresponding Raman spectra. Unfortunately, the authors only reported a coefficient of correlation of R 2 = 0.969 with 5 LVs, but they did not provide detailed information on how the PLS-R was performed. Thus, no reasonable statement regarding the feasability of predicting bacteria concentration from Raman spectra using PLS-R can be made.
In the study of Sowoidnich et al. (2012) [60] a portable Raman spectrometer was used for fast and non-invasive identification of meat spoilage. The sample set consisted of three porcine m. longissimus dorsi (LD) and two porcine M. semimembranosus (SM), whereas each muscle was cut into 14 and 16 slices, respectively. All samples were subjected to a storage period of up to three weeks post mortem at 5 • C. Raman measurements as well as microbiological reference analyses (total viable counts-TVC) were conducted daily from day 2 to 21 post mortem. The spectroscopic measurements were analyzed using PCA and an attempt was made to find potential patterns which could subsequently be explained by the results of the reference analyses. The authors presented a PCA scores plot of PC 1 and PC 3 for each investigated muscle (Figure 3a,b). Sowoidnich et al. (2012) [60] stated that the PCA scores and the clustering of samples found therein reflected characteristic stages of the bacterial sigmoid growth curve, i.e., clustering of fresh samples, samples with incipient spoilage and microbial spoiled samples. From examining the PCA loadings of PC 1, the authors identified mainly vibrations of aromatic amino acids and protein conformation sensitive vibrations as being responsible for the separation along PC 1 in both muscles. According to the authors, PC 3 was dominated by signal shifts of tyrosine and phenylalanine bands as well as certain CN and CH vibrations. The interpretation of the separation in PC 1 is conclusive and comprehensible, but the one for PC 3 is not quite complete. The authors' conclusion, that PC 3 is responsible for changes in the protein structure at an early stage, might be valid for the fresh samples (day 2-7), but the question remains what all the samples arranged along PC 3 have in common which are assigned to different spoilage degrees (see Figure 3a,b). However, it is important to mention that the interpretation of PC 3 is not at all straightforward. In addition to the experiment described above, the authors studied the effect of meat sterilization using six LD muscles. From this sample set, three muscles were sterilized using a 5% sodium hypochlorite solution, while the other three remained untreated. Each sample was cut into 13 slices, sterile packed and stored under the same conditions as in the previous experiment. Reference analytics and Raman spectroscopic measurements were also conducted daily over a period of 2 weeks post mortem. The authors reported different patterns in the PCA scores plot between sterilized and untreated samples. Similar to the previous experiment, the Raman spectra of the untreated samples revealed a separation according to the reference analyses of fresh samples, samples with incipient spoilage and microbial spoiled samples in PC 1 and PC 2. However, Sowoidnich et al. (2012) [60] reported that such a pattern was not observed in the PCA scores of the Raman spectra of the sterilized samples, since the bacterial surface coverage remained beneath the detection limit for these samples, yet there was a storage-time dependent clustering trend. The authors stated that this separation was not caused by the bacteria, but by the occurance of structural changes that take place in the meat during storage. Due to the fact that no validation samples nor classification methods (see previous paragraph on Sowoidnich et al. (2012) [60]) were used to verify the obtained results, the authors' final conclusion that their study demonstrated a fast detection of spoiled meat using a portable Raman system is not supported. Rather the presented results only indicate a potential discrimination between fresh and microbially spoiled meat, which requires further profound investigation.

Other Applications of Raman Spectroscopy in Meat Science
Boar Taint: The study of Liu et al. (2016) [61] dealt with the detection and classification of boar taint using a handheld Raman spectrometer. Boar taint is an unpleasant odor which occurs in varying intesity when fat or meat of non-castrated male pigs is prepared. This undesired feature is caused by the accumulation of androstenone and skatole in the fat tissue of boars [67]. Currently, tainted meat is commonly detected by trained assessors, but there are also chromatographic [68] and spectroscopic methods [69,70] described in literature. However, the sample set of Liu et al. (2016) [61] consisted of fat tissue from 46 boars including skin, and inner (IL) and outer layer (OL) of subcutaneous fat. The reference analyses were conducted with gas chromatography (GC) and a trained sensory panel, whereby IL and OL were not separated in both cases. For detailed information about the reference analytics, please refer to Liu et al. (2016) [61]. According to this results, the samples were classified into two categories: samples with high boar taint (labeled H; 21 samples) and samples with low boar taint (labeled L; 25 samples), which served as reference values for the multivariable classification. The Raman spectra were recorded seperately on the IL and OL in order to account for any side-dependent effects. Partial least squares discriminant analysis (PLS-DA) was chosen as multivariate classification technique. The authors created five PLS-DA models with distinct input variables based on the spectra obtained on the IL and OL of the sample: only IL spectra, only OL spectra, IL + OL spectra, average apectra of IL and OL and ratio spectra of IL/OL. Liu et al. (2016) [61] reported classification accuracies of 45.5-72.1% samples correctly classified as either H (high tainted) or L (low tainted). However, the highest accuracy was achieved using only IL Raman spectra, while using only OL Raman spectra yielded the lowest accuracy. Furthermore, the authors reported an improvement to more than 80% correctly classified samples using only IL Raman spectra by selecting important wavenumbers according to the model's regression coefficients. Although the presented results look quite good, it should be kept in mind that the reported classification accuracies are obtained from cross validation. Therefore, independent validation samples are mandatory for conclusions about the model's true performance in discriminating between samples with high and low boar taint [46]. Nevertheless, the reported cross validated classification accuracies seem promising and further investigations should be conducted. The results might be further improved by application of classification techniques that account for non-linear effects like SVM-C [54]. Apart from that, the authors tried to discriminate between samples which are high in androstenone/high in skatole, high in androstenone/low in skatole, low in androstenone/high in skatole and low in androstenone/low in skatole. Unfortunately, the authors concluded that the accuracy of this PLS-DA classification model was not satisfactory.
Intramuscular Fat: Fowler et al. (2015a) [62] investigated the predictability of intramuscular fat (IMF) content and major fatty acid (FA) groups using Raman spectra obtaind from a handheld device. The sample set incorporated 80 lamb carcases which were randomly selected and measured on four consecutive days. Reference analytics for IMF was performed by Soxhlet extraction using an adapted AOAC method [19] and major FA groups were determined by extraction and derivatization according to Ponnampalam et al. [71]. The authors employed PLS-R for the construction of multivariate prediction models for IMF and major FA groups. Considering the range of each determined quantity, all NRMSECV values (see Equation (1)) are in the area of 20%. However, the reported coefficients of determination in a range of R CV 2 = 0.01-0.21 for the cross validated PLS-R models are inferior and do not show any signs of correlation between reference values and predicted values from Raman spectra. Therefore, considering the presented figures of merit in the study of Fowler et al. (2015a) [62], the authors' conclusion that Raman spectroscopy is a promising tool for the prediction of certain major FA groups is incomprehensible. Furthermore, the approach of Raman spectra selection for the prediction of IMF needs to be discussed. The authors subjected the selection of spectra for the chemometric data analysis to two stages: in the first stage, spectra containing any lipid signals were separated from those containing only meat signals and in the second stage, spectra containing only lipid signals and spectra which did not contain exclusively lipid signals were separated from each other. This approach is reasonable for the prediction of FA groups from Raman spectra, since the interest is focussed on lipid tissue only. It is questionable, however, to use only lipid spectra for the prediction of percentage IMF in meat, simply because this procedure produces a lack of relation between lipid tissue and the rest of the meat sample.

Conclusions
Despite the well known advantages of Raman spectroscopy as being a fast and non-invasive technique which requires only a small amount or no sample pre-treatment at all and the fact that portable handheld devices are available, the dependency on reference analytics is a considerable drawback. For instance, this becomes clear in the case of shear force measurements or the determination of tenderness by a sensory panel in meat science. Especially meat quality traits like the ones just mentioned are affected by modest reproducibility, large variations and subjective perceptions. To a large extent, this is owed to the very complex and inhomogeneous meat matrix, but partly also to the reference methods in general. Therefore, it must be concluded, that the current reference methods must be improved or new methods for reference analysis need to be established. Furthermore, due to the small diameter of the laser spot (according to Ref. [59] approximately 50 µm), only a very small part of the sample is irradiated. Considering the inhomogeneity of the meat matrix, this fact impedes the acquisition of representative Raman spectra.
Unfortunately, the majority of the herein reviewed articles did not utilize independent validation samples in order to test their multivariate calibration models. Subsequently, some authors drew conclusions, which should not be drawn from data that has not been test set validated. Therefore, in order to get an idea of a spectroscopic method's true performance, reproducibility and robustness, it is essential to use samples, that have never been used before in any calibration model and collect an appropriate number of samples over a long period of time. There might be a certain potential of handheld Raman spectrometers for meat sector-relevant applications, but without the evaluation of the calibration models with independent validation samples, no well-founded statement about the true applicability can be made. This is even more true in view of the meat matrix' inhomogeneity and complexity.
In the light of the herein reviewed results and depending on the meat producer's needs, perhaps reflections should take place whether it is expedient to stick with the prediction of individual parameters (e.g., pH, L* value, drip loss, etc.) using handheld Raman spectrometers or to switch to an overall categorization of meat samples into classes of desired and undesired meat quality. In this way, the focus would be on the interaction of all individual meat quality parameters, which might lead to more robust and reliable results. First, the meat samples would be classified into desired and undesired meat quality according to the reference analyses. Subsequently, classification techniques like LDA, SIMCA or SVM-C could be used to create classification models, followed by an evaluation using independent validation samples. This would certainly not satisfy all requirements in the meat sector, but might be a valuable support in certain cases. The application of various multivariate approaches on data produced by non-invasive analytical techniques in food science is reviewed in Ref. [72].
Funding: This research was funded by the Interreg V-A initiative of the European Union (project "QualiMeat" AB116).

Acknowledgments:
The authors want to thank the whole team of the project "QualiMeat" for the good cooperation and support.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: