Ongoing Multivariate Chemometric Approaches in Bioactive Compounds and Functional Properties of Foods—A Systematic Review

: In this review, papers published in the chemometrics field were selected in order to gather information and conduct a systematic review regarding food science and technology; more precisely, regarding the domain of bioactive compounds and the functional properties of foods. More than 50 papers covering different food samples, experimental techniques and chemometric techniques were selected and presented, focusing on the chemometric methods used and their outcomes. This study is one way to approach an overview of the current publications related to this subject matter. The application of the multivariate chemometrics approach to the study of bioactive compounds and the functional properties of foods can open up even more in coming years, since it is fast-growing and highly competitive research area


Introduction
The expression "chemometrics" first appeared in the 1970's and it was proposed by Swedish professor Svante Wold [1].He defined the term chemometrics as "the art of extracting chemically relevant information from data produced in chemical experiments" [2].Chemometrics is a chemical discipline that uses statistical and mathematical methods to perform objective data evaluation and extract meaningful information.The information being extracted from chemical data sets contains different related and unrelated data.
Since the food science and technology field is strongly linked with chemistry, analytical chemistry and analytical technologies are subsequently leading to chemometrics application, so this study is relevant and interesting to researchers.Nowadays, scientists all over the world deal with massive amounts of experimental data from the different measurements and devices they employ.Scientists started to deal with mathematical and statistical procedures in different scientific branches.The chemometric approach is an ongoing trend in various research fields such as medicine, pharmacy, agronomy, agriculture, biotechnology and biology since it can contribute to a better understanding of the data throughout their interpretation, presentation and visualization.The high attractiveness of chemometrics is reflected in research studies that deal with summing up the existing collaboration patterns that occur in the field of chemometrics [3] and bibliometric studies covering the use of chemometrics in food science and technology [4].Authors have reported that, in food science and technology, the chemometric tools of choice are as follows: principal component analysis, partial least squares and discriminant analysis [4].
In this review, the Web of science and Science direct databases were used to collect bibliometric data.The main keywords used in search strategy in this paper were "chemometrics + food".Food science and technology was selected as the subject category and the search covered research articles and review articles published from 2015 to 2024, together with some studies published earlier covering chemometric tools that are used less often.
Additionally, a rising number of articles that are coming from non-pure chemometricians who use chemometric tools is being observed.Since the bioactive compounds and functional properties of foods is a fast-growing and highly competitive research area, we expect that the multivariate chemometrics approach application will draw more attention in the coming years.

The Advantages of Multivariate Chemometric Approaches
The benefits of a multivariate chemometrics approach to the study of the bioactive compounds and functional properties of foods are numerous.All regression, classification and non-parametric methods have found their use in the domain of food science and technology.Through the use of multivariate chemometric approaches, performance attributes such as accuracy, precision, robustness and reproducibility can be improved.The bioactive compounds and functional properties of foods can be estimated using data obtained from cheap and fast analytical methods (such as spectroscopy) while not relying on expensive and time-consuming analytical methods (such as high-performance liquid chromatography-HPLC).Using chemometrics, the correlations among the variables of interest can be utilized.Different chemometric approaches provide a wide angle for the observation and interpretation of experimentally observed data.Finding the correlations between and the prediction of different foods' bioactive compounds content and various functional food properties' experimental data also can be achieved using different chemometric methods.Along with tables and equations, different graphical representations of experimentally observed data can be used in order to better conjure up the most important aspects of research.If the current trend in research and publishing continues, there is plenty of space for chemometrics' involvement in the interpretation and presentation of the results from studies on food science and technology.In this paper, only a small part of the various chemometrics approaches is presented, focusing on:

Regression: Linear Modeling in Foods
In this section, the use of MLR, PLSR and LDA analyses regarding the application of chemometrics in food science and technology will be reviewed.In the domain of linear modeling, MLR stands out as one of the widely used and most exploited techniques.Researchers use MLR to correlate and predict different foods' bioactive compounds contents and various functional food properties data.MLR quantitates the relationship between more than one independent variable and one dependent variable.The most important criterion that has to be respected is the absence of multicollinearity which is checked via the variance inflation factor (VIF) [5].Together with MLR, PLSR and LDA are extensively used in research papers for the analysis and presentation of experimental data.PLSR is used for the construction of predictive models when many variables that are highly correlated are present.This type of regression is associated with regression techniques such as MLR and PCR (principal component regression).Regarding LDA, it aims to find the linear discriminant function (LDF) that takes into account the original variables.LDF takes into consideration original measurements for each object, reducing the data to one dimension which reveals the differences between the groups [6].An overview of the different types of foods and modeling regarding linear modeling is given in Table 1.Tea quality was estimated based on different experimentally gathered information from leaves and soil [7].MLR was applied to predict the main components of the tea's quality.Five optimal elements from tender and mature leaves were taken into account in order to construct the quality parameter estimation models.For the prediction of the soluble sugars, MLR performed well giving R 2 values from 0.5400 to 0.8400 [7].
MLR was used in studies that dealt with soft-bodied raw ewe's milk cheese samples from six different dairy industries in two different seasons [8].A low-frequency ultrasound was used for the quality control of analyzed samples.Stepwise regression modeling was used to explore the capacity of the ultrasonic parameters to predict the studied variables.The authors used MLR to predict the microbial, physicochemical, textural and sensory parameters of the observed cheeses [8].
Near-infrared spectroscopy (NIR) was coupled with chemometrics to assess the intact lemon fruits authentication and traceability [9].A total of 119 lemon samples from two production years were collected from Italy and analyzed.MLR was used to quantify the relationship between the NIR spectra and the lemon quality properties such as peel chromaticity, thickness of pericarp, peel water, juice yield, soluble solid content and titratable acidity.Observed R 2 values ranged from 0.159 to 0.985, for both years and when data were merged.The authors outlined that many significant correlations were observed with lemon properties which suggests the NIR's applicability to predicting the quality of lemons and lemon juices [9].
Gray relation analysis (GRA), correlation analysis (CA) and MLR were used to evaluate the relationships between components contents and the mechanical properties of maize kernels [10].The interpretation of mechanical properties was carried out using scanning electron microscope (SEM).Ten maize varieties were collected and underwent mechanical and components content testing.MLR combines the effects of moisture, protein and starch contents on the obtained mechanical properties.All of the R 2 values were above 0.7 and the standard errors were less than the standard deviations.The results were validated, and they indicated that the moisture and starch contents had negative correlations with most of the mechanical properties, while the protein content had negative correlations with most of the mechanical properties.The authors proposed optimal MLR models for hardness, rupture force, rupture energy, apparent elastic modulus and viscoelastic parameters prediction [10].
PLSR and MLR were used to predict the age of 53 samples of commercial bottled dry red wine according to color parameters and pigments that were experimentally observed [11].MLR models were constructed based on calibration sets comprising 35 wine samples and no more than four variables were included in each model.Both the MLR (with three variables) and the PLSR (with six variables) models presented were suitable for the accurate prediction of the age of the wine samples.The constructed models possessed high coefficients of determination and low enough root mean square errors of cross-validation.The prediction results from the presented MLR model were as accurate as those obtained using the PLSR model [11].
Since the adulteration of black pepper and cumin samples is common, there is a need for the development of method to detect adulteration.MLR and PLSR modeling were used together with near infrared spectroscopy for the rapid quantitative detection of black pepper and cumin adulterations [12].The presented MLR and PLSR models possessed a high predictive capacity for the different types of single or complex starch adulterants, with good results for the statistical parameters: correlation coefficients higher than 0.9000 and root mean square errors that ranged from 2.2 to 7.0.The formed models were practically tested using samples of commercially available powdered spices [12].
Twenty-one adulterated sesame oil samples and four pure oil samples (sesame, canola, corn and sunflower) were used in a study that dealt with using Fourier transform infrared spectroscopy (FTIR) data for the development of method for the detection of adulteration [13].All of the processing data collected were compared using PLSR.The authors employed the nonlinear iterative partial least squares algorithm (NIPALS) and kernel PLS which gave the same results up to four significant digits.Different preprocessing techniques were used: orthogonal signal correction (OSC), standard normal variate transformation (SNV) and extended multiplicative scatter correction (EMSC).The results covering sesame oil adulterated with canola, corn and sunflower oil indicated that all of the R 2 values for the calibration set were above 0.983 (except when using sunflower oil with three different types of preprocessing).The authors reported that about half of the preprocessing methods did not improve the RMSE of the PLSR model for the prediction of the level of corn oil adulteration [13].
The authors published the performance of 11 strains of non-Saccharomyces yeasts that produced polyphenol-enriched and fragrant kiwi wines [14].In the 14 kiwi wines a total of 130 volatiles were detected.However, it has been concluded that some yeasts produce a higher concentration of volatile compounds than others.PLSR was applied to 15 aroma notes in order to expose the complex relationship between aroma characteristics and the overall acceptability of the examined kiwi wines.The accepted PLSR models had calibration R 2 values higher than 0.9500 and validation R 2 values higher than 0.8000.The results indicated that all the aroma descriptors used were closely related to the scores [14].
PLSR modeling was successfully applied for the prediction of intramuscular fat in lamb M. longissimus lumborum [15].The intramuscular fat content of lamb meat is the most important factor in consumer acceptability.Hyperspectral imaging was used for in-line measurements of intramuscular fat in fresh meat and those data were used for PLSR modeling.Since fifteen trials consisting of eight independent flocks across 5 years were investigated, two models were developed: one comprising data from the first year of the trials and the progressive model (comprising data in chronological order).When the experimental conditions were consistent, the models performed similarly regarding statistical parameters, but under imaging conditions that were diversified, the progressive model was able to account for this variability, resulting in better parameters of statistical performance [15].
The chemometric approach and Raman spectroscopy were used for the classification of the vegetable oil samples [16].The Raman spectra of 108 vegetable oil samples were recorded and used for PLSR modeling.The Raman spectra of 72 calibration samples modeled with reference values obtained from high-performance liquid chromatography were used and a PLSR model was established for the determination of the alpha-tocopherol content.The data obtained using both methods were highly correlated (R 2 > 0.9500).The proposed method could be used to distinguish between pure vegetable oil samples and adulterated ones [16].
PLSR was the chemometric method of choice in a research paper that dealt with monitoring the oxidation process of nut oil through Raman technology combined with PLSR and random forest PLSR (RF-PLSR) modeling [17].Samples were from hazelnuts, cashew nuts, almonds, Hawaiian fruits, sunflower seeds, watermelon seeds, red pine seeds and peanuts.The peroxide index represents one of the most important characteristics of nut oils since they easily oxidize during the preservation.This study proposed a novel method for the determination of the peroxide index of nut oils based on PLSR and RF-PLSR.A total of 36 wavenumbers were selected and used for the PLSR modeling.The R 2 values for the correction and prediction sets for the PLSR and RF-PLSR models were 0.9552, 0.8672, 0.8048 and 0.7927.The root-mean-square errors of calibration and prediction were 0.067, 0.1100, 0.1514 and 0.1547, respectively [17].
The amino acid profiles obtained using HPLC were interpreted using chemometric analysis in order to detect fruit juice adulterations [18].The authors applied chemometric methods to prove and confirm the authenticity of blood orange juice.The question was whether PLSR could be used for the quantification of the amount of blood orange juice in the blood orange juice samples.PLSR and PLS-DA were conducted, accounting for five latent variables, which resulted in statistically valid models (R 2 higher than 0.9510, root mean square error of calibration 9.6979 and root mean square error of cross-validation 13.1149).The authors suggested that PLSR could be a suitable approach to quantifying the amount in the case of fruit juice [18].
Another study [19] covering nut oils' oxidation and PLSR was conducted and a model with a slightly lower R 2 value than the one in ref. [14] (Wang et al., 2021) was reported.Nut oils were extracted from hazelnuts, cashews, almonds, macadamia nuts, sunflower seeds, watermelon seeds, pine seeds and peanuts samples.For the experimental data collection, Fourier transform infrared spectroscopy was used, and based on these data, a PLSR model was established.After including the unknown sample, the prediction coefficients of determination were all above 0.9000.All statistical tests indicated the good predictive ability of the formed model.This indicates that this approach could achieve the rapid detection of oil oxidation indexes [19].
Data comprising the Raman spectra of intact tomatoes with various carotenoid concentrations were used to develop PLSR and PLS-DA models [20].It was found that accuracy of the PLSR model was affected by the exposure time (0.7 and 10 s), while on the PLS-DA model, exposure time did not have any impact.When Raman spectra were acquired after 10 s, the accuracy of PLSR model was great (R 2 = 0.87) but decreased with decreasing exposure time (R 2 = 0.69, 0.7 s).The authors concluded that Raman spectroscopy combined with PLS-DA is very helpful for the analysis of carotenoids in fruits and vegetables [20].
A combination of PCA with LDA combined with PCA and support vector machine (SVM) for determining the geographical origin of coconuts in the coastal plantation in Indonesia was derived [21].The examination of coconut endosperms from 13 districts was conducted using portable sensing device near infrared spectroscopy (PSD-NIRS).The performed LDA that used raw data and a single pre-processing procedure did not gain sufficient accuracy level.Then, the authors introduced PCA and PCA-LDA, which showed the maximum accuracy (100%) for the data.The combination of PCA and LDA can accurately forecast the sample group [21].
Six samples of high-value Italian chickpeas (Cicer arietinum L.) were characterized, and the content of different elements was determined using inductively coupled plasma optical emission spectrometry (ICPOES) [22].The elements determined were as follows: Ca, K, P, Mg, Mo, Cu, Fe, Mn, Zn and Sr.The results were evaluated using ANOVA, LDA and soft independent modelling of class analogies (SIMCA).ANOVA pointed out the significance of the detected elements.The result of LDA modeling, both in calibration and validation, revealed that the proposed LDA model correctly assigned all of the chickpea samples to their geographical origin [22].
The linear discriminant analysis of the nutritional and physicochemical composition of 50 potato genotypes was carried out in [23].The studied potato samples were from 24 different countries of origin and had four different flesh colors (purple, red, marble and yellow), as well as being different cultivation types.The authors carried out ANOVA analysis employing Tukey's HSD or Tamhane's T2 test in order to classify the statistical differences between the potato samples.Further, LDA was used to identify the variables that mostly characterized each potato cultivation type or flesh color.The purpose of the LDA was to describe the relationship between a dependent variable (cultivation type or flesh color) and the data set of independent variables (all determined parameters).The stepwise method was used for the significant independent variables' selection.The first LDA had a classification performance with 100% accuracy for the original grouped cases, while for the cross-validated grouped ones, the accuracy was 99.20%.The second LDA's performance was 100% for both the original grouped cases and for the cross-validated ones [23].

Regression: Non-Linear Modeling in Foods
An artificial intelligence technique that is widely used in food science and technology research publications is ANN, which is used to model non-linear correlations.ANN represents a mathematical model that imitates the way that the brain processes and stores information [24].The construction of ANN consists of a learning or a training episode.Every ANN is composed of elements called artificial neurons that are mutually connected.These connections, or artificial synapses, are named weights and they are modified during data processing in order to obtain an output layer.The structure of an ANN model is determined by the number of layers and the number of nodes per layer.Several layers of neurons participate in ANN construction: input, hidden and output layers.The outcome of ANN modeling does not include the parametric equation; rather the network is described by the statistical parameters [25].
Some researchers dealt with shiitake mushrooms from which they extracted a β-glucan called lentinan, using natural deep eutectic solvents (NDES) [26].Since the empirical and trial-and-error methods used for NDES selection are time-consuming, the researchers employed conductor-like screening models for realistic solvation (COSMO-RS).The extraction conditions were optimized and the effects of their interactions with lentinan content were analyzed using an artificial neural network coupled with a genetic algorithm (ANN-GA).For the analysis of lentinan extraction, a two-layer feed-forward network with sigmoid hidden neurons and linear output neurons was employed.The input layer consisted of three neurons, the hidden of ten and the output layer consisted of one neuron.The ANN model was trained with the Levenberg-Marquardt algorithm, while the fitness function was used to find the optimal value in the range of the limits of the extraction conditions.The authors reported that the combination of COSMO-RS and ANN-GA can be used for the solvent screening and the optimization of the extraction process [26].
The eating and cooking quality of rice, as well as the texture properties of cooked rice, were predicted using ANN [27].The authors developed models using stepwise MLR, principal component analysis plus MLR, PLSR, k-nearest neighbor, random forest and gradient boosted decision tree with satisfying statistical parameters.After introducing ANN, the R 2 values were improved, ranging from 0.675 to 0.979, while the RMSE values ranged from 0.574 to 1.32.If using the textural properties, the ANN model ha an R 2 of 0.921 and an RMSE of 1.06, while combining it with rice components and/or pasting characteristics leads to R 2 values higher than 0.96 and RMSE values lower than 0.75.The authors concluded that rice textural properties are more suitable for ANN model formation [27].
Some researchers compared three statistical approaches, PLS-DA, classification and regression trees (CART) and ANN, for the authentication of 82 red wine samples based on their anthocyanin profile [28].Two non-linear, layered, feed-forward networks were generated, multilayer perceptron (MLP) and radial basis function (RBF) neural networks, in order to obtain a statistically valid ANN.The variables that stood out regarding the ANN model for monovarietal wines' authentication were: malvidin-3-acetylglucoside, petunidin-3-glucoside, malvidin-3-glucoside, peonidin-3-coumaroylglucoside and delphinidin-3glucoside.The authors suggested that 6 out of 500 MLP artificial neural networks had high test, validation and training set accuracy.Proposed networks were generated using automatic network designer (AND) and the Broyden-Fletcher-Goldfarb-Shanno algorithm [28].
The quality of gamma-irradiated smoked bacon during storage was predicted using back propagation artificial neural network (BP-ANN) [29].For the construction of the ANN, the following data were used: physical and chemical indicators, irradiation dose and storage time (input variables).As for the output variables, the total number of colonies and sensory scores were used.The hidden layer consisted of 13 neurons and the transfer functions for the input-hidden layer and the output-hidden layer were ReLu and Sigmoid, respectively.The effects of different neuron counts and numbers of epochs were also considered.The presented results indicate that the proposed model based on physical and chemical indicators, irradiation dose and storage time has a great perspective in the prediction of the quality of smoked bacon [29].
A group of researchers investigated the ultrasound-assisted extraction of phytochemicals from green coconut shells, which was optimized using integrated ANN [30].The aim was to maximize antioxidant and antimicrobial activity while modeling the extraction process.The Tansig transfer function was used together with the feed-forward backpropagation method, and two input and fifty-six output values were used.Modeling resulted in a few statistically valid networks that were evaluated by means of the low mean square error and the high coefficient of determination.The best ANN for the ultrasoundassisted extraction process was the one with three neurons in the input layer, four neurons in the hidden layer and five neurons in the output layer (3-4-2) [30].
A commercial coffee plantation was used to carry out the experiment and ANN modeling was conducted covering a few morphological variables and a few vegetation indexes collected in the upper, medium and lower thirds of the coffee plant [31].The formed MLP and the radial basis function (RBF) were applied for the prediction of morphological variables and were evaluated in the terms of accuracy (RMSE) and precision (R 2 ).For plant height, the MLP used three and the RBF used five input variables, while for the plant diameter, both models used three input variables.The presented results indicate that, using MLP, it is possible to estimate coffee tree volume with reasonable accuracy [31].
Visible and near-infrared hyperspectral images were paired with unidimensional deep learning convolutional neural networks (CNNs) for the identification of anthracnose in olives [32].The experimental data set covered a total of 250 olives without external defects.The authors formed CNN models and selected the ResNet101 architecture as being the most suitable and statistically acceptable.A two-step training process was carried out: in the first case, the weights of the previously trained ResNet101 were locked and the weights of the newly added layers were updated for 20 epochs.In the second case, all the weights were updated.The authors reported that the proposed method was successfully tested and could be used for the control of olive anthracnose [32].
Research covering themes related to the nitrogen content in cucumber plant leaves (Cucumis sativus L.) used hyperspectral imaging data with a neural network [33].Two artificial intelligence approaches were used: artificial neural networks-particle swarm optimization (ANN-PSO) and CNN.A prediction model was developed for each of the three categories: 30%, 60% and 90% nitrogen excesses.The results showed that regression coefficients for ANN-PSO ranged from 0.9370 to 0.9650 and for CNN the range was from 0.9650 to 0.9850 for the test set.The authors reported that the presented models have an exceptional ability to predict the amount of nitrogen content in cucumber plants using hyperspectral leaves.In this study, the authors also conducted PLSR analysis and the results showed slightly better statistical parameters and performance than ANN-PSO and the CNN [33].
A study regarding the modeling of moisture content evolution in convective drying of quince from Greece was carried out [34].The first group of trials covered single hidden layer neural networks consisting of 10-100 neurons and different transfer functions together with 500 epochs.The second group of trials comprised ANNs containing two hidden layers with different combinations of artificial neuron number and transfer functions in each layer.The top 15 models, according to their statistical parameters, along with R 2 values higher than 0.9910 and RMSE values around 0.13, were presented.The authors reported a good agreement between the experimental and estimated values and confirmed that ANNs are able to perform predictions for newly obtained experimental data with a reasonable error [34].
Domestic garlic samples (Allium sativum L.) produced in Spain, Croatia and China were purchased together with one sample from a local producer in Slovenia, and ultrasoundassisted extraction of polyphenols was performed [35].The experimental method was optimized using an ANN approach.The statistical parameters of the proposed ANN were as follows: root mean square error for training (0.0209), validation (3.6819) and test set (1.8341).The good predictive ability of the ANN was evaluated by the correlation coefficient between the experimentally obtained total phenolic content and the total flavonoid content and values for the training, validation and test sets were 0.9998, 0.9733 and 0.9821, respectively [35].
In research on dragon fruits of the variety Hylocereus undatus that were used for microwave vacuum drying (MVD) experiments, the authors used ANN modeling, aiming to model drying process [36].The feed-forward back-propagation approach was optimized with GA.The ANN model was developed covering two phases, feed-forward and back propagation, and the efficiency of the proposed ANN was assessed through the mean squared error value as well as the relative deviation values between the experimentally observed and predicted data.The proposed model predicted that the vacuum had the most significant influence on the total phenolic content, microwave power and citric acid concentration.The obtained ANN-GA model predicted data that were in powerful concurrence with a low relative deviation value that ranged from 1.557 to 2.936%.The employed ANN-GA model could be practically used for the modeling of the microwave vacuum drying process for dragon fruit [36].
An efficient crop yield prediction was achieved using the machine learning algorithm ANN [37].In this study, MLR was also performed and a hybrid MLR-ANN model, together with conventional ANN, was proposed.A feed-forward phase with a back propagation training algorithm was used for the models' construction.Both models, ANN and MLR-ANN, were compared from the perspective of their statistical validity and predictive power.The computational time for both hybrid MLR-ANN and conventional ANN was calculated.The RMSE and R values for conventional and hybrid model were as follows: 0.098 and 0.051 for error and 0.9200 and 0.9900 for correlation coefficient.The results indicate that the hybrid MLR-ANN has a better accuracy than the conventional ANN for same data set [37].

Classification Techniques in Foods
One of the most exploited classification techniques in scientific publications as well as in food science and technology papers is PCA.This method reduces multivariate data sets and/or the dimensionality of data.It reduces and simplifies the original variables to linear combinations known as principal components (PCs).PCs are characterized by loadings and score.Scores are the new coordinates of the projected objects and loadings reflect the direction with respect to the original variables [38].An overview of food types and classification techniques used in the selected paper is presented in Table 2.
Gas chromatography mass spectrometry (GC-MS), (FTIR) and ultraviolet visible-near infrared spectroscopy (UV-Vis-NIR) data were used to determine quality of katsuobushi based on the number of smoking treatments.Katsuobushi is smoked and dried skipjack tuna, and is a traditional Japanese food additive with a specific flavor and taste [39].A total of forty-six metabolites were identified and five of them were selected as key compounds.GC-MS, FTIR and NIR spectral data were used for PCA analysis and the results were presented through heatmaps, biplots and VIP scores.Free amino acids and nucleotide-related compounds PCA analysis resulted in the first two PCs describing 92.6% of variance, while katsoubushi samples were distinguished into five groups.After looking at the PCA of the FTIR spectra and the PCA of the NIR spectra, the katsoubushi samples were again divided into five groups based on the number of smoking treatments (zero, three, six, nine and twelve rounds of smoking) which led to compositional changes in the katsoubushi.Regarding the FTIR spectra data, PC1 contributed 89.4%, while PC2 contributed 3.9%.When looking at the NIR spectra, the total variance described by first two PC was 99.1%.In this case, the non-smoked samples were well separated from the smoked groups along the PC1, which indicates the significant difference in the metabolic profiles of non-smoked and smoked samples [39].
The total mercury level distribution in fish and fish products was evaluated and its relationship to fish type, weight, protein and lipids content was observed using a multivariate approach [40].The influence of lipids and protein content on Hg accumulation in the fish tissues and the impact of Hg concentration and fish consumption on the estimated weekly intake (EWI) were evaluated using PCA.PCA analysis covering Hg distribution in fish samples and moisture ratio resulted in plot with five clusters.PC1 covers 77.72% and PC2 22.28% of the total variance, so the PCA resulted in a model explained by two PCs covering 100%.When the total Hg distribution in fresh fish samples and lipids, protein and moisture content was taken into account, PC1 covers 53.62% and PC2 36.86% of the total variance.When the total Hg distribution in whole fresh fish samples, fish average weight, lipids, protein and moisture content was observed, the total variance covered by the first two PCs was 84.4%.The PCA results revealed that: (a) Hg contamination levels are determined by protein-lipids content; (b) a high lipids content gave lower Hg levels; (c) high Hg levels in fish with a high lipids content corresponded to the polluted environment; (d) EWI was correlated to Hg concentration, except in the case of a low Hg concentration [40].
The fatty acid profiles, pH and color changes of cow milk probiotic yoghurt (CPY) and goat milk probiotic yoghurt (GPY) were studied using gas chromatography (GC) and the chemometric pattern recognition method-PCA [41].Alterations to the fatty acid profiles of CPY and GPY were presented via a scores plot where PC1 accounted for 88% and PC2 for 4.2% of the total variation.The authors reported that two well-separated clusters can be noticed and that the relative abundance of fatty acids presented in the two clusters was different.Since the CPY and GPY clusters were on the opposite sides of the axis, a negative correlation is indicated for the majority of the fatty acids' composition [41].
Pan-fried chicken meat patties were studied with respect to the effects of different levels of allspice seed extract (ASE) and perilla frutescens seed extract (PSE) [42].For the researchers, the impact of ASE and PSE on the formation and migration of heterocyclic amines (HCAs) was of interest.The chicken meat patties were divided into three groups with a control group and the experiment was conducted.PCA was performed in order to reveal the differences in the HCAs profiles.The PCA analysis resulted in a scores plot with PC1 covering 28.48% and PC2 covering 54.65% of variance.The PCA results revealed that most of the single and mixed phenolics displayed strong inhibitory effects on HCAs formation but the mitigating effect of a few mixed phenolics on HCAs formation was weak [42].
PCA was employed in a study investigating commercial fruit beers regarding their polyphenolic and amino acid profiles [43].The data set comprised twenty-six fruit beers and three control beers without fruit.On the loadings plot, the PCA revealed five different groups for polyphenols, pigments and AAs related to their chemical structure for both phenolic profile and amino acid profile analysis.For the phenolic profile, the eigenvalues for PC 1 and PC2 were 2.50 and 1.17, respectively, while PC1 covered 50.1% and PC2 corresponded to 23.5% of the variation in all data.Beer samples were separated into five groups and their amino acid profiles were subjected to PCA analysis.In this case, the eigenvalues were PC1 = 2.4 and PC2 = 1.5 while PC1 accounted for 48.0% and PC2 30.3% of the total data variability [43].
Hawthorn (Crataegus azarolus L.) fruit from Turkey was the subject of a study that investigated the effect of maturity stage on fruit quality characteristics, sensory attributes and volatile compounds [44].Solid-phase micro-extraction (SPME) and gas chromatographymass spectrometry (GC-MS) were conducted and experimental data sets were collected.For the PCA analysis, the dominant volatile organic compounds in fruit from different maturity stages were used as variables.According to the eigenvalues, PC1, PC2 and PC3 were chosen covering 78% of the total variance.A scores plot (PC1 50.5% and PC2 17.5%) resulted in three well-separated clusters that grouped the hawthorn fruit according to their maturity stage: (a) immature, (b) mature and (c) over-mature fruit.The authors reported that mature and over-mature fruit were more likable to panelists, while having the highest level of esters responsible for flavor [44].
The fruit and leaf diversity of mangoes (Mangifera indica L.) was investigated [45].Fifty-eight mango genotypes from India including twenty selections, seventeen hybrids and twenty-one landraces (local genotypes) were taken and analyzed.The experimental analysis covered a total of 70 pheno-biochemical characteristics based on leaf morphology and fruits.From the PCA analysis, it can be concluded that several pomological traits of economic significance showed extreme variability.On the presented PCA graphs, the genotypes were clustered depending on their biochemical characteristics and phenotypic similarity [45].
Since pacu fish are mainly grown in Argentina as an important source of food and revenue, their growth withing rice fields and the effect of this on the fish metabolome were researched [46].Farmed (bred by the integrated rice and fish farming system and obtained in local market) and control (raised in a tank) fish muscle samples were investigated using two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC-TOFMS).The total ion current and diagnostic ions for sugars data were used as the input for the PCA analysis which created an overview regarding the separation of farmed and control pacu fish samples.PCA resulted in a scores plot presenting PC1 (50.95% of total variance) and PC2 (26.82% of total variance) while showing two clusters: one with the control and other with the farmed samples.The scores plot obtained revealed that control and farmed samples are best separated along PC2.The authors concluded that PCA gave a meaningful overview of the effects of farming pacu fish within rice fields [46].
Intact beef, venison and lamb meat types were investigated using Raman spectroscopy in order to establish fast and reliable techniques for intact meat discrimination [47].The meat samples were from the New Zealand red meat sector and a total of 90 samples were used: beef (Bos taurus), lamb (Ovis aries) and venison (Cervus elaphus scoticus, hippelaphus and pannonensis).PCA analysis resulted in a scores plot that had PC1 with 33% and PC2 of 26% of the total variability in the meat data.Regardless of the fact that minor overlaps occurred, the PCA scores plot showed a good separation of the beef, venison and lamb meat samples.The authors outlined that Raman spectroscopy could be paired with a chemometrics approach which would result in fast and reliable techniques for intact meat differentiation [47].
Quinoas were analyzed using gas chromatography-ion mobility spectrometry (GCIMS) and PCA was applied to obtain characteristic volatile profiles [48].Used quinoas were white, red and black quinoa varieties from China and total of 28 characteristic volatile compounds were found and quantified.The total contribution of the first two PCs was 85.31% (PC1 68.73% and PC2 16.58%), which was adequate to explain the similarities between samples.The scores plot resulted in three well-separated clusters, each of them containing one quinoa variety: white, red and black.Since different colored samples were placed in different quadrants, far away from each other, it can be concluded that there are significant differences in volatile profiles in different quinoa varieties.The authors concluded that the combination of visual plots provided by GC-IMS and PCA results were of high enough quality for the characterization of aroma profiles of different quinoa varieties [48].
A research group investigated the volatile flavor compounds found in different production stages of fermented soybean whey tofu from China using headspace-gas chromatography-ion mobility spectrometry (HS-GC-IMS) in the combination with PCA [49].In the samples across all production stages, 24 representative flavor compounds were experimentally obtained.The PCA scores plot revealed that samples from the six stages of fermented soybean whey tofu production occurred in the independent spaces and clear-cut differences between the groups were noticeable.The results of this study indicate that the flavor fingerprints of the samples from different stages of fermented soybean whey tofu production can be well assembled using HS-GC-IMS and PCA through the detection of the volatile compounds [49].
A common adulteration in food industry is that of extra virgin olive oil since it has greater value than other edible oils [50].Scientists developed a rapid method for the detection of extra virgin olive oil adulteration by means of ultra-high-performance liquid chromatography with charged aerosol detection (UHPLC-CAD) profiling of triacylglycerols coupled with the chemometric pattern technique-PCA.A single-variety of 25 fresh extra virgin olive oil samples from California together with samples that included eleven grapeseed oils, three soybean oils, seven canola oils, four high-oleic safflower oils and five high-oleic sunflower oils were purchased and analyzed.Experimentally obtained data from the tryglycerols analysis were used as input data for the PCA analysis.The tryglycerols profiles were determined for olive oil samples and for the five common olive oil adulterants.Several combined PCA scores were obtained using different input data combinations.All graphs indicate that combining tryglycerols profiles with PCA can sucessfully differentiate extra virgin olive from high-oleic sunflower oil at adulteration levels greater than 10%.The authors reported that UHPLC-CAD coupled with PCA needs minimal sample preparation and carries out fast analysis for the rapid determination of extra virgin olive oil authenticity [50].Together with PCA, the second most widely used classification technique is HCA.HCA divides a group of objects into classes and sorts similar objects into the same class (cluster).In this type of analysis the objects that are close together in the variable space are being searched.As a result of HCA analysis, a tree diagram (dendrogram) occurs, where the horizontal axis explains the dissimilarity between the clusters throughout their distance.An overview of the food types and classification techniques in the selected papers is presented in Table 2.
Chestnut honey samples from Turkey were distinguished based on their phenolic compositions and biological activities using HCA [51].A total of 16 phenolic compounds and organic acids were detected by HPLC-DAD.The antioxidant activity was evaluated using ABTS •+ , β-carotene-linoleic acid, CUPRAC, DPPH • and metal chelating assays, while antimicrobial activity was tested against Gram-positive and Gram-negative bacteria and Candida species.Additionally, anti-inflammatory activity was evaluated against COX-1 and COX-2, while enzyme-inhibitory activity was assessed on AChE, BChE, urease, and tyrosinase.The collected data were used for the classification of 41 chestnut honey samples from different locations in Turkey.A dendrogram based on the single linkage and Euclidean distance resulted in two well-separated clusters: chestnut honeys produced in Turkey and Bursa.The habitats of samples labeled BO, BI, BK1 and BK2 are adjacent to inland lakes ( ˙Iznik and Ulubat) and their habitats are similar to each other [51].
The authors of [52] researched the discrimination of rice varieties using colorimetric sensor arrays together with gas chromatography techniques [52].A total of nine rice varieties from Pakistan were analyzed and experimentally obtained data were used for HCA analysis in order to reveal and visualize the differences between different rice samples from various geographical origins.For the similarity exploration, the Euclidean distance method was used.Covering colorimetric analysis data, the cut distance was 8.4 and rice cultivars were clustered into six groups, while group three was out of clusters.Rice samples from the same cultivar and geographical origin were placed into the same or adjacent clusters.The presented results indicate that the HCA method can be used to correctly differentiate rice varieties from different geographical origins.Data obtained in GC-MS analysis were also evaluated using HCA which resulted in the rice cultivars from different geographical origins being separated into three clusters.This clustering showed that variability in the concentration of aroma compounds had a leading role when discriminating rice varieties [52].
The stress caused by cadmium, lead and aluminum exposure in basil (Ocimum basilicum L.) cultivated in Brazil was assessed using multivariate analysis approach [53].Caffeic and rosmarinic acid were determined by high performance liquid chromatography analysis with a diode detector (HPLC-DAD) while total phenolics and total flavonoids were determined by spectrophotometry.Plants were exposed to four different concentration levels of metals: cadmium (0.2, 0.6, 1.2, and 1.8 mmol L −1 ), lead and aluminum (0.04, 0.08, 0.12, and 0.16 mmol L −1 ).Ward's method and Euclidean distances were used in the HCA analysis.This grouping reveals that there were different influences of the studied metals on the secondary metabolism of O. basilicum [53].
Another group of researchers investigated the characterization of the chemicals in pine nuts from Brazil using exploratory data analysis [54].The mineral composition (Ca, Cu, Fe, K, Mg, Mn, P and Zn), centesimal composition (moisture, ash, lipids, protein and carbohydrate) and amount of lead (Pb) were determined using inductively coupled plasma optical emission spectrometry (ICP OES) and graphite furnace atomic absorption spectrometry (GF AAS).The results gained with HCA confirmed the results from the PCA analysis.The dendrogram was generated using the Ward method and Euclidean distances and resulted in two distinct groups according to the mineral composition with a similarity index of 15 [54].
Spirulina (Spirulina platensis) and its commercial products are very interesting to the food industry since spirulina is rich in protein content and has other nutritional values [55].Two-trace two-dimensional (2T2D) correlation infrared spectral analysis was conducted and a chemometric analysis of the experimentally obtained data was carried out.The S. platensis strain from India was grown and taken as a control sample while commercial samples of Spirulina food products and food supplements were purchased.HCA analysis was performed based on Ward's method and Euclidean distances.The dendrogram resulted in two clusters with a subcluster in which it can be seen that the control samples are distinguished from the others.The authors concluded that the results of the HCA analysis are in accordance with the results obtained by PCA analysis [55].
The researchers wanted to identify which sweet cherry (Prunus avium L.) cultivars in Italy are mostly diffused [56].They characterized 35 sweet cherry cultivars and one sour cherry cultivar through the analysis of different pomological and nutraceutical traits.In addition, the authors wanted to identify cultivars that had antioxidant activity and total anthocyanins content closest to those values presented for largely diffused cultivar in Italy-Ferrovia.Two HCA analysis were conducted with paired group algorithm taking into account the following: (a) titratable acidity, soluble solid content, soluble solid content, titratable acidity ratio, and pH; and (b) total phenolic content, antioxidant activity and total anthocyanins content.The first clustering resulted in eight groups, while the second one resulted in five groups separated in the dendrogram.In the first dendrogram the sour cherry cultivar was out of the clusters, while in the second, one sweet cherry sample was out of the clusters.The authors concluded that clustering highlighted a wide diversity in sweet cherry genotypes in Italy [56].

Non-Parametric Methods in Foods
The SRD method is a non-parametric method used when objects are being ranked based on the defined reference ranking (ideal ranking or golden standard) and it was introduced by Héberger and Kollár-Hunek [57,58].Using the defined reference ranking's mean, median, minimum and maximum, a known standard can be found.In this analysis, results are shown in the form of a graph that represents the distribution of the samples in relation to the chosen reference ranking.The closer the value of SRD is to zero, the better the variable is.The validation is carried out through the comparison of ranks by random numbers (CRRN) procedure and by the seven-fold cross-validation procedure [57,58].Table 3 shows studies that recently applied the SRD method.
The PLS-DA model was upgraded with the SRD algorithm and model for tea grade identification using electronic tongue data [59].Tea grade identification plays a crucial role in tea pricing and sales.The tea grades were distinguished and identified using PCA and the PLS-DA-SRD model.The performances of the established PLS-DA and PLS-DA-SRD models were compared and significant improvement regarding accuracy, and sensitivity was proven when SRD was coupled with PLS-DA.The authors concluded that the PLS-DA-SRD approach successfully identified tea sample grade [59].
The authors of [60] employed ranking and multicriteria decision making in the optimization of raspberry convective drying processes [60].A comparative experiment for the investigation of the suitable process parameters for convective drying that may be considered as the alternative to freeze-drying was conducted.SRD was applied to reveal the differences and similarities between the applied drying methods.Multiple validation steps including different resampling methods and leave-multiple-out cross validations were used.The results of conducted SRD analysis indicate that convective drying of fresh raspberries turned out be more similar to freeze-dried raspberries than convective drying of frozen ones [60].
Since there is an ongoing trend for the human consumption of insects, the authors of [61] conducted research that aimed to propose which insect species is the most suitable for human consumption [61].Previously published results were used and a comprehensive picture of the nutritional profile of insects using the sum of ranking differences was presented through cases studies.The case studies dealt with the proximate nutritional profile of the insects and traditional protein sources (beef, pork, chicken, egg, salmon and milk) in terms of mineral content, amino acid profiles, vitamin content and origin.The main difficulties that the authors faced included the original data's quality, missing data, as well as the fact that studies from different parts of the world gave significantly different nutrient results for the same insect species.Their results suggest that the superiority of insects as a protein source cannot be stated in every case but the general view in favor of insects is promising [61].
SRD was used in a study that dealt with beer microfiltration with a static turbulence promoter [62].The main challenges during beer microfiltration are fouling and quality maintenance.The experiments were ranked using SRD based on the analytical properties, hydrodynamic parameters and separation characteristics parameters of ten different membrane filtration experiments.SRD was used to determine the best performing membrane filtration method based on the reference ranking determined as the min value [62].
The combination of the leave-one-out (LOO) cross-validation methodology of SRD values and significant differences by post hoc Wilcoxon matched pairs test was used for the evaluation of various tomato landraces and one commercial variety [63].The study aimed to search for a combination of methodologies that can be validated as being suitable for this type of study and these samples.The authors collected 11 varieties of red and orange tomato samples and characterized them by phytonutrients composition.One sample was a commercial variety and the rest were tomato landraces.The SRD analysis resulted in the formation of three groups: (a) the two samples closest to the reference landrace that had the highest phytonutrients content values; (b) seven samples following the first group, and (c) comprising two samples (one of them was the commercial variety).The authors reported that the investigated commercial variety had a lower phytonutrient content than those of landraces [63].
The SRD method was applied to evaluate the performance of eight different Ocimum basilicum L. varieties' gene bank accessions [64].Using the varieties' characteristics, the gene bank accessions were compared with the SRD method.LOO cross-validation was performed to characterize the uncertainty of the SRD values and the Wilcoxon matched pairs test and Sign test were used for the pairwise comparison defining.The results indicated that one variety (M.Grünes) was evaluated as the best performing of the selected gene bank-stored accessions.The authors pointed out that basil species selection based on multicriteria and correct statistical tests were being published for the first time [64].
One of the most underutilized chemometrics method is GPCM.This method is mostly used in studies regarding biologically active compounds and in the analytical chemistry domain [65].The method was first introduced by Rajkó and Héberger as the pair correlation method (PCM) which can discriminate between two variables [66].Then, the PCM was generalized (GPCM), and it can be performed for up to several hundred features [67,68].Very few studies related to food science and technology topics have included GPCM although this method has a great potential.In Table 3, some research papers that employ GPCM are presented.
Researchers produced and evaluated buckwheat-pasta enriched with silkworm powder and used just-about-right (JAR) data evaluation [69].A part of their study was GPCM analysis that was conducted on the basis of consumer sensory analysis results (overall liking, color, odor, texture, graininess and flavor attributes).The poppy seed-flavored white chocolates' sensory acceptance was also evaluated using GPCM [70].GPCM analysis takes into account parameters regarding color, texture, taste and overall liking.JAR attributes (color, texture, meltiness, particle size, global taste intensity, poppy seed flavor and chocolate flavor) were ranked using conditional probability ordering and conditional Fisher's exact test.Flavored mineral water samples with mango-passion fruit aroma were examined regarding different JAR attributes (color intensity, odor intensity, fruit flavor, carbonation, sweet taste, sour taste, bitter taste and aftertaste intensity) [71].All GPCM methods (simple, difference and significance ordering) and tests (McNemar's, Chi-square, conditional Fisher's and William's t-test) were applied in order to rank JAR attributes.

Concluding Remarks
As Svante Wold concluded, the future of chemometric is bright [2].In this paper a brief and systematic review regarding ongoing multivariate chemometrics approach in bioactive compounds and functional properties of foods is summarized.In the current literature that is available there is a wide spectrum of different regression, classification and non-parametric chemometric methods used for experimentally observed data presentation and interpretation.The ongoing trend in recent research indicates that chemometrics will be progressively used in the domain of food science and technology since its benefits are repeatedly proven and since this research area is highly competitive and fast growing.

Author Contributions:
Conceptualization, M.K.B. and S.K.; methodology, M.K.B.; investigation, M.K.B. and S.K.; writing-original draft preparation, M.K.B., S.K. and S.P.-K.; writing-review and editing, M.K.B., S.K. and S.P.-K.; visualization, M.K.B. and S.K.; supervision, S.P.-K.; project administration, S.P.-K.All authors have read and agreed to the published version of the manuscript.Funding: The present research is financed in the framework of the project of Provincial Secretariat for Higher Education and Scientific Research of AP Vojvodina (Project: Molecular engineering and chemometric tools: Towards safer and greener future (No. 142-451-3457/2023-01/01) and the project of the Ministry of Science, Technological Development and Innovation (Project No. 451-03-65/2024-03/200134).

Table 1 .
Food and modeling type overview.

Table 2 .
Food and chemometric classification techniques overview.

Table 3 .
Food and non-parametric methods overview.