Near-Infrared Spectroscopy Coupled with Chemometrics and Artificial Neural Network Modeling for Prediction of Emulsion Droplet Diameters

There is increased interest in the food industry for emulsions as delivery systems to preserve the stability of sensitive biocompounds with the aim of improving their bioavailability, solubility, and stability; maintaining their texture; and controlling their release. Emulsification in continuously operated microscale devices enables the production of emulsions of controllable droplet sizes and reduces the amount of emulsifier and time consumption, while NIR, as a nondestructive, noninvasive, fast, and efficient technique, represents an interesting aspect for emulsion investigation. The aim of this work was to predict the average Feret droplet diameter of oil-in-water and oil-in-aqueous mint extract emulsions prepared in a continuously operated microfluidic device with different emulsifiers (PEG 1500, PEG 6000, and PEG 20,000) based on the combination of near-infrared (NIR) spectra with chemometrics (principal component analysis (PCA) and partial least-squares (PLS) regression) and artificial neural network (ANN) modeling. PCA score plots for average preprocessed NIR spectra show the specific grouping of the samples into three groups according to the emulsifier used, while the PCA analysis of the emulsion samples with different emulsifiers showed the specific grouping of the samples based on the amount of emulsifier used. The developed PLS models had higher R2 values for oil-in-water emulsions, ranging from 0.6863 to 0.9692 for calibration, 0.5617 to 0.8740 for validation, and 0.4618 to 0.8692 for prediction, than oil-in-aqueous mint extract emulsions, with R2 values that were in range of 0.8109–0.8934 for calibration, 0.5017–0.6620, for validation and 0.5587–0.7234 for prediction. Better results were obtained for the developed nonlinear ANN models, which showed R2 values in the range of 0.9428–0.9917 for training, 0.8515–0.9294 for testing, and 0.7377–0.8533 for the validation of oil-in-water emulsions, while for oil-in-aqueous mint extract emulsions R2 values were higher, in the range of 0.9516–0.9996 for training, 0.9311–0.9994 for testing, and 0.8113–0.9995 for validation.


Introduction
The consumption of medicinal plants is inversely related to the occurrence of diseases such as several types of cancer, cardiovascular, cerebrovascular, and neurodegenerative diseases. The high presence of antioxidants in plants, in the form of bioactive compounds, represents an important basis for the health-protecting effects connected with their consumption [1]. Due to their beneficial effects, these bioactive compounds extracted from medicinal plants have been successfully incorporated into food systems [2].
In order to increase the use of extracts with compounds isolated from medicinal plants in food matrices, several technologies should be considered: nanoemulsions, nanocapsules, vapors, and edible films [2]. There is increased interest in the food industry for emulsions as delivery systems to maintain the completeness of sensitive biocompounds with the aim of products [33]. NIRs is based on the absorption of NIR radiation in the wavelength range from 780 nm to 2500 nm [34]. The absorbance of light is mainly caused by overtones and the combination vibrations of some hydrogen-based functional groups such as O-H, C-H, C-O, and N-H [24]. Due to the large number of recorded spectra, it is imperative to analyze the acquired spectral data. In order to extract important information and to identify significant patterns in NIR spectra, various mathematical and statistical methods (principal component analysis, (PCA), partial least-squares regression (PLSR), canonical correlation analysis (CCA), and principal component regression (PCR)) can be used [10,23]. The big advantage of these statistical methods includes exploring these spectral data for qualitative and quantitative applications [35]. However, due to the complex nature of food, nonlinear techniques, compared to multivariate methods, have proved to be a good solution. Artificial neural networks (ANNs) coupled with NIR spectroscopy have been identified as an excellent tool for monitoring emulsion droplet dimeter prediction [10], the prediction of the physical and chemical properties of plant extracts [36], and honey adulteration detection and quantification [37]. For all three mentioned experiments, developed ANN models described the experimental data with high accuracy (the coefficients of determination were greater than 0.8).
The aim of this work was to predict the average Feret droplet diameters of oil-in-water and oil-in-aqueous mint extract emulsions based on the combination of NIR spectra with PLS regression and ANN modeling. Emulsification was performed in a microfluidic system including a static teardrop micromixer with the addition of three emulsifiers (PEG 1500, PEG 6000, and PEG 20,000) at three different concentrations (2%, 4%, and 6%). To the best of our knowledge this is the first application of NIR spectra and PLS and ANN modeling for the analysis of oil-in-aqueous mint extract emulsions prepared using a microfluidic device, motivated by the fact that emulsion technology is generally applied for the encapsulation of bioactives in aqueous solutions, which can either be used directly in the liquid state or can be dried to form powders [38]. Different spectral preprocessing methods (first-order Savitzky-Golay derivative (SG1), standard normal variate (SNV), multiplicative scatter corrections (MSC), first-order Savitzky-Golay derivative followed by standard normal variate (SG1+SNV), and first-order Savitzky-Golay derivative followed by multiplicative scatter corrections (SG1+MSC)) were applied in order to determine the predictive ability of the models used.

Materials
Edible sunflower oil (Zvijezda plus d.o.o., Zagreb, Croatia) was purchased from a local supermarket. Polyethylene glycols with average molecular weights of 1500 g/mol (PEG 1500) and 6000 g/mol (PEG 6000) were purchased from Acros Organics (Geel, Belgium), while the 20,000 g/mol polyethylene glycol (PEG 20,000) was obtained from Sigma-Aldrich (Taufkirchen, Germany). Dried mint leaves (Mentha piperita L.) were purchased from Suban, Croatia. Plant materials were collected during the flowering season of 2019 in the north-western part of Croatia, dried naturally, and stored in ambient conditions before use.

Mint Extract Preparation
First, 1 g of dry plant material was placed in a 200 mL glass with 50 mL of deionized water, and solid-liquid extraction was performed using an Ika HBR4 digital oil bath (IKA-Werk GmbH & Co., KG, Staufen, Germany) at 80 • C and 250 rpm for 30 min. After the extraction, samples were filtered through a 100% cellulose paper filter (LLG Labware, Meckenheim, Germany) with 5-13 µm pores and stored at 4 • C until analyzed. The dry matter content of the aqueous mint extract was 0.85%.

Emulsification in a Microfluidic System
Glass microchips with laser-engraved microchannels were placed in stainless-steel holders, which provided leak-free connections (Micronit Microfluidics B.V., Enschede, The Netherlands). Experiments were performed in microchannels with the following dimensions: width/height/length = 250 µm: 150 µm: 55.3 mm. The microchannels were equipped with static teardrop micromixers. Emulsion droplets were generated in a borosilicate glass microfluidic device. Two syringe pumps (NE -1000 Syringe Pump, New Era Pump Systems, New York, NY, USA) with high-pressure stainless-steel syringes (8 mL, Harvard Apparatus) were used for solution delivery. Two phases, oil and aqueous mint extract, were introduced separately into microchannels through a fused silica connection (375 µm o.d., 150 µm i.d., Micronit Microfluidics B.V., Enschede, The Netherlands). The emulsification experiments were performed according to the design of experiments previously described by Grgić et al. [39] (Table 1).

Average Feret Diameter
The prepared samples of oil-in-aqueous mint extract emulsions were photographed using a microscope equipped with a camera (BTC type LCD-35, Bresser, Germany) at 4× magnification. The average Feret diameter of the droplets was measured using the software tool ImageJ (v.1.8.0. National Institutes of Health, Bethesda, MD, USA). The Feret diameter was defined as the perpendicular distance between two tangents located on opposite sides of a particle [39]. The average Feret diameter of oil-in-water emulsions prepared in the same microfluidic system according to the same experimental design (Table 1) was previously published in an article by Grgić et al. [39].

Near-Infrared Spectra of Emulsions
The near-infrared (NIR) spectra of all oil-in-water and oil-in-aqueous mint extract emulsions were recorded in the wavelength range from 904 nm to 1699 nm using an NIR spectrometer (NIR-128-1.7-USB/6.25/50 µm Control Development Inc., South Bend, IN, USA). The NIR spectra were recorded in disposable plastic cuvettes with a liquid sample measurement setup. The NIR spectra were recorded in three parallel runs.

NIR Spectra Processing and Modeling
The effects of preprocessing methods of NIR spectra on sample grouping were analyzed using the Unscrambler X software (Version 10.1. CAMO AS, Oslo, Norway).
For the prediction of the average droplet sizes of the oil-in-water emulsions and oil-inaqueous mint extract emulsions with PEG 1500, PEG 6000, or PEG 20,000 as emulsifiers, PLS regression models were developed. The PLS model input data were the triplicates of the spectra of each sample. Each PLS model used seven latent variables (factors) and a random cross-validation method based on splitting the input data into 20 segments. No normalization method was used before the data transformation. The applicability of the PLS models developed using raw and preprocessed NIR spectra was estimated based on: (i) the coefficients of determination for calibration (R 2 cal ) and cross-validation (R 2 cval ), (ii) the root-mean-square error for calibration (RMSEC) and cross-validation (RMSECV), (iii) the average value of the difference between the predicted and observed values (bias), and (iv) the ratio of the predicted deviation (RPD) and the range error ratio (RER) [10,37].
The average droplet sizes of the oil-in-water emulsions and oil-in-aqueous mint extract emulsions with PEG 1500, PEG 6000, or PEG 20,000 as an emulsifier based on NIR spectra were also predicted using artificial neural network (ANN) modeling in Statistica v.13.0 software (Tibco Software Inc., Tulsa, OT, USA). Multiple layer perceptron network (MLP network) models consisted of an input layer, a hidden layer, and an output layer ( Figure 1). The first five factors from the PCA analysis were represented by the five neurons in the input layer. ANN inputs were chosen from the first five principal components, which accounted for more than 99.99% of the data variability. A PCA based on the raw spectra and a PCA based on the selected preprocessing method were used separately. The following set of alternatives was randomly chosen as the hidden activation function and the output activation function: identity, logistic, hyperbolic tangent, and exponential. The MLP chose a random number between 3 and 11 neurons for the hidden layer. The data matrix for ANN modeling comprised 51 rows representing emulsio ples prepared using an individual emulsifier and 6 columns referring to five PCA nates (factors) and the measured average droplet sizes. During the construction, da randomly divided into 70% for network training, 15% for network testing, and 1  The data matrix for ANN modeling comprised 51 rows representing emulsion samples prepared using an individual emulsifier and 6 columns referring to five PCA coordinates (factors) and the measured average droplet sizes. During the construction, data were randomly divided into 70% for network training, 15% for network testing, and 15% for model validation. For each emulsifier, 1000 networks were generated. Model training was carried out using a back error propagation algorithm and the sum-of-squares error function implemented in Statistica v.13.0 (Tibco Software Inc., Tulsa, OT, USA) automated neural networks. The proposed ANN model performance was estimated based on the R 2 and root-mean-square error (RMSE) values for the training, testing, and validation. This research examined the applicability of a microfluidic device for generating oilaqueous mint extract emulsions utilizing emulsifiers such as PEG 1500, PEG 6000, and PEG 20,000. The prepared emulsions were observed under a microscope at the microfluidic device outlet. Photos were taken and used for the average Feret diameter measurements. The gathered photos of the prepared emulsions and measured average Feret diameters are presented in Figure 2. It can be observed that spherical oil droplets of a dispersed phase were generated in the continuous aqueous mint extract (Figure 2(a1-a3)). As previously described by Shah et al. [40], in contrast to bulk emulsification techniques, an emulsion is carefully manufactured one drop at a time in a microfluidic device, and therefore a monodisperse emulsion It can be observed that spherical oil droplets of a dispersed phase were generated in the continuous aqueous mint extract (Figure 2(a1-a3)). As previously described by Shah et al. [40], in contrast to bulk emulsification techniques, an emulsion is carefully manufactured one drop at a time in a microfluidic device, and therefore a monodisperse emulsion is the end product of this procedure. A microfluidic device's ability to create controlled-sized emulsion [41,42] droplets depends on a number of factors, including the flow rates, the fluid viscosity, the emulsifier, and the geometry of the microfluidic channels [10,34]. All of the mentioned factors have to be taken into consideration when optimizing the emulsification process [39].

Results and Discussion
For oil-in-aqueous mint extract emulsions, the smallest droplets were generated with the emulsifier PEG 6000 (Figure 2(a2,b2)), with an average Feret dimeter in the range from 52.36 ± 16.50 µm to 149.99 ± 39.15 µm, followed by the emulsions with PEG 1500 (Figure 2(a1,b1)) and with PEG 20,000 ( Figure 2(a3,b3)). For PEG 1500 emulsions, the average droplet Feret dimeter was in the range from 112.26 ± 16.24 µm to 163.77 ± 16.93 µm, while for the PEG 20,000 emulsions the average droplet Feret dimeter was in the range from 109.81 ± 1.74 µm to 169.99 ± 1.06 µm. When comparing the measured data with the results presented by Grgić et al. [39] for oil-in-water emulsions produced with the same emulsifiers and using the same process conditions, the droplet size followed approximately the same trend. For PEG 1500 and PEG 20,000 oil-in-mint extracts, the emulsion droplets were larger than the oil-in-water emulsion droplets for all experiments. For PEG 6000 oil-in-mint extracts, the emulsion droplets were larger than the oil-in-water emulsion droplets for experiments with 30% oil phase.

NIR Spectra of Oil-in-Water and Oil-in-Aqueous Mint Extract Emulsions: Preprocessing and PCA Analysis
The NIR spectra of the oil-in-water and oil-in-aqueous mint extract emulsions prepared with different emulsifiers (PEG 1500, PEG 6000, and PEG 20,000) were recorded continuously in the wavelength range from 904 to 1699 nm. The average spectra of individual samples, grouped according to the continuous phase, are given in Figure 3(a1,a2), while the SNV preprocessed spectra of the oil-in-water emulsions and the MSC preprocessed spectra of oil-in-mint aqueous extract emulsions are given in Figure 3(b1,b2). Despite variations in spectral absorbance, the majority of the spectra obtained from the emulsion samples followed a similar pattern. Additive effects (spectral shifts) characteristic of the different droplet sizes present in the samples can be seen. A similar observation was made in a study by Bampi et al. [28], where a discrepancy in the spectral baseline was observed that could be attributed to different light scattering due to the different water droplet sizes in the emulsions. The largest differences in the spectral peaks of both sample types are seen for the wavelength range from 1300 to 1699 nm, which is specific for the superposition of the O-H bonds. Moreover, the differences in this part of the spectrum can be easily correlated with the water present in the samples [36].
According to Borges et al. [29], NIR spectroscopy is a method that allows the efficient determination of the average diameter and water content of oil-in-water emulsions and offers great potential for the online qualitative analysis of biodiesel during storage. Furthermore, according to Bi et al. [43] and Grisanti et al. [44], the implication that numerous interferences frequently cause spectra to be altered during the signal acquisition process is a practical issue for the implementation of NIR technology. The sample thickness, measurement geometry, or physical characteristics of the samples can influence the light path length, and therefore preprocessing is an essential step in NIR spectral modeling [45]. As described by Feng et al. [46], spectral preprocessing is used to eliminate systemic noise and highlight the changes between the samples. In this work, the efficiency of the first-order Savitzky-Golay derivative (SG1), standard normal variate (SNV), multiplicative scatter corrections (MSC), first-order Savitzky-Golay derivative followed by standard normal variate (SG1+SNV), and first-order Savitzky-Golay derivative followed by multiplicative scatter corrections (SG1+MSC) was analyzed. The results showed that SNV and MSC were the most efficient methods for the analysis of the NIR spectra of oil-in-water and oil-in-aqueous mint extract emulsions prepared in a continuously operated microfluidic device. The standard normal variate (SNV) and multiplicative signal correction (MSC) methods was analyzed (Figure 3(b1,b2)). By deducting the complete spectrum's mean value, the standard normal variate (SNV) eliminates a constant offset component and scales down all spectra by dividing the result by the full spectrum's standard deviation [47]. On the other hand, by employing the linear least-squares approach to construct a linear model between a reference spectrum and other spectra in the dataset, the multiplicative scatter correction reduces spectral deviations. The dataset's average spectrum, known as the reference spectrum, is frequently selected [48]. However, without the application of chemometric methods that can extract significant data from the spectra, few conclusions can be drawn from either the raw or preprocessed spectral data. For this reason, a PCA analysis was applied to the raw spectral data and preprocessed data as one of the most widely used chemometric methods to detect the differences between samples [10]. The results for the oil-in-water emulsions are shown in Figure 4, and the results for the oil-in-aqueous mint extract emulsions are shown in Figure 5. Based on the PCA results, SNV was selected as the optimal preprocessing method for individual oil-in-water emulsion NIR spectra and MSC was selected as the optimal preprocessing method for individual oil-in-aqueous mint extract emulsion NIR spectra. The selected preprocessing methods resulted in the best discrimination of the sample and the highest explanation of the data in the first two principal components [49]. PCA score plots for average NIR preprocessed spectra (Figures 4a and 5a) show the specific grouping of the samples in three groups according to the emulsifier used (PEG 1500, PEG 6000, and PEG 20,000). As expected, the grouping was more evident for the oil-in-water emulsions. Mint extracts, as the continuous phase for the second type of emulsion, scarcely influenced the sample grouping. The chart of the principal components (x-axis) and the percentage of explained variance (y-axis) shows its inflection point at the third PC, which is an indication of the most important PCs to investigate the observed system. As presented for both types of emulsions, the first three principal components (PCs) explained most of the sample variability. For the oil-in-water emulsions, the first three PCs explained 75.92% of the data variability, while for the oil-in-aqueous mint extract emulsions the first three PCs explained 87.06% of the data variability. The compressed variance difference in PC1-PC3 for the oil-in-water emulsions and oil-in-aqueous mint extract emulsions could be explained by the color difference of the continuous phases used and the larger effect of color in PC1-PC3 of the oil-in-aqueous mint extract emulsions. The detection of significant variables (variables with large variances) and correlations between variables [50] was made by the analysis of the loading spectra (Figures 4b and 5b). Even though both positive and negative contributions are displayed in the loading plots (Figures 4b and 5b), the spectral shape of the pure PC1 loading vector displays the majority of the distinctive absorption peaks observed in Figures 2a and 3a. Moreover, the intensity maximum of the pure PC1 spectrum is shifted toward the actual spectrum by the positive and negative contributions from PC2 and PC3, as previously presented by Zhang et al. [51]. For the oil-in-water emulsions, the maximum loading peaks were noticed at 941 nm (C−H bond, 3rd overtone) and 1631 nm (C−H, 1st overtone), while for the oil-in-aqueous mint extract emulsions the maximum loading peaks were noticed at 960 nm (C−H bond, 3rd overtone), 1437 nm (O−H, 1st overtone), and 1631 nm (C−H, 1st overtone). Furthermore, a PCA analysis was applied for the analysis of the emulsion samples according to the emulsifier used. As presented in Figures 4c,e,g and 5c,e,g, the specific grouping of the samples based on the amount of the emulsifier used can be noticed. It can be seen that sample grouping was more pronounced for the oil-in-water emulsions. Furthermore, the emulsions with the highest amount of emulsifier (6%) were specifically grouped, while there was some overlap of the samples with 2% and 4% emulsifier. That was especially evident for the oil-in-aqueous mint extract emulsions (Figure 5c,e,g). Moreover, the first three factors explained 75.31% (PEG 1500), 89.11% (PEG 6000), and 89.11% (PEG 20,000) of the variability between the oil-in-water emulsion samples and 83.85% (PEG 1500), 89.24% (PEG 6000), and 96.23% (PEG 20,000) of the variability between the oil-in-aqueous mint extract emulsion samples. The obtained results indicate that those samples were also separated based on the mixing rate, oil phase, and particle size. The loading plot shows that for oil-in-water emulsions the maximum loading peaks were noticed as follows: (i) for PEG1500 at 941 nm (C−H bond, 3rd overtone) and 1631 nm (C−H, 1st overtone) ( Figure 4d (Figure 5h). The high percentages of explained variance in both cases are indicative that NIR coupled with chemometrics can be successfully used for the rapid and nondestructive discrimination of emulsion samples. These results are consistent with studies by Borges et al. [29], Bampi et al. [28], and Dinache et al. [52], who also successfully used NIR spectroscopy coupled with chemometrics to analyze emulsions.
109.81 ± 1.74 µm to 169.99 ± 1.06 µm. When comparing the measured data with the results presented by Grgić et al. [39] for oil-in-water emulsions produced with the same emulsifiers and using the same process conditions, the droplet size followed approximately the same trend. For PEG 1500 and PEG 20,000 oil-in-mint extracts, the emulsion droplets were larger than the oil-in-water emulsion droplets for all experiments. For PEG 6000 oil-inmint extracts, the emulsion droplets were larger than the oil-in-water emulsion droplets for experiments with 30% oil phase.

NIR Spectra of Oil-in-Water and Oil-in-Aqueous Mint Extract Emulsions: Preprocessing and PCA Analysis
The NIR spectra of the oil-in-water and oil-in-aqueous mint extract emulsions prepared with different emulsifiers (PEG 1500, PEG 6000, and PEG 20,000) were recorded continuously in the wavelength range from 904 to 1699 nm. The average spectra of individual samples, grouped according to the continuous phase, are given in Figure 3(a1,a2), while the SNV preprocessed spectra of the oil-in-water emulsions and the MSC preprocessed spectra of oil-in-mint aqueous extract emulsions are given in Figure 3(b1,b2). Despite variations in spectral absorbance, the majority of the spectra obtained from the emulsion samples followed a similar pattern. Additive effects (spectral shifts) characteristic of the different droplet sizes present in the samples can be seen. A similar observation was made in a study by Bampi et al. [28], where a discrepancy in the spectral baseline was observed that could be attributed to different light scattering due to the different water droplet sizes in the emulsions. The largest differences in the spectral peaks of both sample types are seen for the wavelength range from 1300 to 1699 nm, which is specific for the superposition of the O-H bonds. Moreover, the differences in this part of the spectrum can be easily correlated with the water present in the samples [36].

PLS Modeling of the Average Feret Diameters of Emulsions
The aim of this work was to analyze the applicability of NIR spectroscopy to distinguishing the emulsion droplet sizes, expressed as the average Feret diameter at the microfluidics device outflow. For that purpose, partial least-squares (PLS) modeling was applied ( Table 2). Table 2. Parameters of partial least-squares (PLS) models for the prediction of the average droplet size, expressed as the average Feret diameter of oil-in-water emulsions and oil-in-aqueous mint extract (green shading) emulsions prepared in a microfluidic device at different flow rates using different emulsifiers based on different NIR spectra pretreatments. Preprocessing methods: raw spectra (No), first-order Savitzky-Golay derivative (SG1), standard normal variate (SNV), multiplicative scatter corrections (MSC), first-order Savitzky-Golay derivative followed by standard normal variate (SG1+SNV), and first-order Savitzky-Golay derivative followed by multiplicative scatter corrections (SG1+MSC). Model applicability: the coefficients of determination for calibration (R 2 cal ) and cross-validation (R 2 cval ), the root-mean-square error for calibration (RMSEC) and cross-validation (RMSECV), the average value of the difference between the predicted and observed values (bias), the ratio of predicted deviation (RPD), and the range error ratio (RER). Seven factors were included in each model. The performances of the different preprocessing methods were tested, including the first-order Savitzky-Golay derivative, standard normal variate, multiplicative scatter corrections, first-order Savitzky-Golay derivative in combination with standard normal variate, and first-order Savitzky-Golay derivative in combination with multiplicative scatter corrections. The applicability of the developed PLS models was evaluated using R 2 cal , R 2 cval , RMSEC, RMSECV, bias, RPD, and RER. As for the PCA analysis, the SNV preprocessing method was shown to be the most efficient for the PLS modeling using the NIR spectra of the oil-in-water emulsions, whereas the MSC was the most efficient for the PLS modeling using the NIR spectra of the oil-in-aqueous mint extract emulsions. From the results summarized in Table 2 and Figure 6, it can be seen that the PLS model for oil-in-water emulsions using SNV preprocessed NIR spectra showed strong correlations (R 2 > 0.7) [53] for the coefficient of calibration (R 2 cal ), cross-validation (R 2 cval ), and prediction (R 2 pred ) with the addition of PEG 20000 (R 2 cal of 0.9692, R 2 cval of 0.8740, and R 2 pred of 0.8692). For PEG 15000, the PLS model correlation was moderate (R 2 cal of 0.6863, R 2 cval of 0.5617, and R 2 pred of 0.4618). For PEG 6000, a strong correlation was achieved for the coefficient of calibration (R 2 cal of 0.9601), while for cross-validation and prediction the correlations were moderate (R 2 cval of 0.6204 and R 2 pred of 0.6254). It can also be noticed that there were variations in RMSEC, RMSECV, and RMSEP for a number of PLS model factors.
The results indicate that PLS models for oil-in-aqueous mint extract emulsions showed lower accuracy in comparison to those for oil-in-water emulsions. The highest R 2 pred of 0.7234 was obtained for emulsions prepared with PEG 1500, followed by PEG 20000, with an R 2 pred of 0.7062, and PEG 6000, with an R 2 pred of 0.5587. These poor values can be attributed to the indirect measurement of the average Feret diameter [54]. The quality of the developed PLS models was also evaluated based on residual predictive deviation (RPD) index and the range-to-error ratio (RER). Ideal and robust models need to possess higher R 2 pred coefficients and RPD indexes [55]. Based on the obtained RPD values, the only model for oil-in-water emulsion with PEG 20000 with RPD = 7.4581 was the one that can be used for process control (RPD > 6.5) [56]. The other developed PLS models, except the model for oil-in-aqueous mint extract emulsions with PEG 6000, can be used for screening (1.5 < RPD < 2.5) [57]. The bias, which is the discrepancy between the means of the true values and the estimated values, additionally known as the error of means, was strongly affected by the measurement error and the number of predictor variables [58]. Comparing the developed PLS models, it could be noticed that lower bias values were obtained for the oil-in-mint aqueous extract emulsions.
The performances of the different preprocessing methods were tested, including the first-order Savitzky-Golay derivative, standard normal variate, multiplicative scatter corrections, first-order Savitzky-Golay derivative in combination with standard normal variate, and first-order Savitzky-Golay derivative in combination with multiplicative scatter corrections. The applicability of the developed PLS models was evaluated using R 2 cal, R 2 cval, RMSEC, RMSECV, bias, RPD, and RER. As for the PCA analysis, the SNV preprocessing method was shown to be the most efficient for the PLS modeling using the NIR spectra of the oil-in-water emulsions, whereas the MSC was the most efficient for the PLS modeling using the NIR spectra of the oil-in-aqueous mint extract emulsions.
From the results summarized in Table 2 and Figure 6, it can be seen that the PLS model for oil-in-water emulsions using SNV preprocessed NIR spectra showed strong correlations (R 2 > 0.7) [53] for the coefficient of calibration (R 2 cal), cross-validation (R 2 cval), and prediction (R 2 pred) with the addition of PEG 20000 (R 2 cal of 0.9692, R 2 cval of 0.8740, and R 2 pred of 0.8692). For PEG 15000, the PLS model correlation was moderate (R 2 cal of 0.6863, R 2 cval of 0.5617, and R 2 pred of 0.4618). For PEG 6000, a strong correlation was achieved for the coefficient of calibration (R 2 cal of 0.9601), while for cross-validation and prediction the correlations were moderate (R 2 cval of 0.6204 and R 2 pred of 0.6254). It can also be noticed that there were variations in RMSEC, RMSECV, and RMSEP for a number of PLS model factors. The results indicate that PLS models for oil-in-aqueous mint extract emulsions showed lower accuracy in comparison to those for oil-in-water emulsions. The highest R 2 pred of 0.7234 was obtained for emulsions prepared with PEG 1500, followed by PEG 20000, with an R 2 pred of 0.7062, and PEG 6000, with an R 2 pred of 0.5587. These poor values can be attributed to the indirect measurement of the average Feret diameter [54]. The quality of the developed PLS models was also evaluated based on residual predictive deviation (RPD) index and the range-to-error ratio (RER). Ideal and robust models need to possess higher R 2 pred coefficients and RPD indexes [55]. Based on the obtained RPD values, the only model for oil-in-water emulsion with PEG 20000 with RPD = 7.4581 was the one that can be used for process control (RPD > 6.5) [56]. The other developed PLS models, except the model for oil-in-aqueous mint extract emulsions with PEG 6000, can be used for screening (1.5 < RPD < 2.5) [57]. The bias, which is the discrepancy between the means of the true values and the estimated values, additionally known as the error of means, was strongly affected by the measurement error and the number of predictor variables [58]. Comparing the developed PLS models, it could be noticed that lower bias values were obtained for the oil-in-mint aqueous extract emulsions.  The efficient use of NIR spectroscopy coupled with PLS modeling was presented by Mishra et al. [59] for the at-line and in-line monitoring of droplet size in mayonnaise. The authors developed PLS models that achieved prediction errors for droplets in the range of 0.38 to 0.68 µm. Moreover, Amsaraj et al. [60] combined Fourier-transform infrared spectroscopy with a partial least-squares discriminant analysis of milk adulteration and achieved a 100% accurate classification. Bampi et al. [28] proposed PLS models with 9.53% mean error for the external validation of the average droplet size of water-biodiesel emulsions, while Jurinjak Tušek et al. [10] applied PLS modeling to predict the average Feret diameter of oil-in-water emulsions with two different emulsifiers (Tween 20 and PEG 2000).

ANN Modeling of the Average Feret Diameters of Emulsions
A multilayer feed-forward neural network or multilayer perceptron (MLP) was fitted using the training dataset of the average droplet Feret diameter of each individual produced emulsion. The model inputs were the first five PCs, which contributed 99% of the data variability selected after the preprocessing that ensured efficient sample grouping. For the oil-in-water emulsions, five PCs after SNV preprocessing were used, and for the oil-inaqueous mint extract emulsions, five PCs after MSC preprocessing were used, as in the case of PLS modeling. The performances of selected ANNs are given in Table 3 and in Figure 6. Based on the obtained R 2 and RMSE values for training, testing, and validation, it can be noticed that ANN modeling ensured better agreement between the experimental data and the model-predicted data than PLS modeling. This can be simply explained by the nature of the model; PLS models include linear regressions, while ANN models include highly nonlinear expressions. Table 3. Characteristics of ANN networks selected for the prediction of the average droplet sizes of oil-in-water and oil-in-aqueous mint extract (green shading) emulsions prepared in a microfluidic device at different flow rates using different emulsifiers based on different NIR spectra pretreatments.  The optimal ANN architecture was selected based on the number of neurons in the hidden layer (less neurons in the hidden layer means a simpler and more stable network). For the oil-in-water emulsions, the highest R 2 validation was obtained for the emulsion with PEG 20000 (Figure 7(a3)). For the prediction of the average droplet Feret diameters in the oil-in-water emulsion with PEG 20000, MLP 5-5-1 was selected. The selected ANN was characterized by five neurons in the input layer, five neurons in the hidden layer, and one neuron in the output layer. A hidden activation function and output activation function was the exponential function. MLP 5-5-1 achieved an R 2 training of 0.9917, an RMSE training of 0.0002, an R 2 test of 0.9294, an RMSE training of 0.00184, an R 2 validation of 0.8533, and an RMSE validation of 0.0027. The ANN models used for the description of the average droplet Feret diameter of oil-in-aqueous mint extract emulsions showed significant improvement regarding R 2 validation in comparison to the developed PLS models. For oil-in-aqueous mint extract emulsions, the highest R 2 validation was obtained for emulsions with PEG 1500 (Figure 7(b1)) using MLP 5-8-1 with five neurons in the input layer, eight neurons in the hidden layer, and one neuron in the output layer. The selected ANN included a logistic function as a hidden activation function and an identity function as an output activation function. This ANN achieved an R 2 training of 0.9996, an RMSE training of 0.0004, an R 2 test of 0.9994, an RMSE training of 0.0006, an R 2 validation of 0.9995, and an RMSE validation of 0.0005.
regarding R 2 validation in comparison to the developed PLS models. For oil-in-aqueous mint extract emulsions, the highest R 2 validation was obtained for emulsions with PEG 1500 (Figure  7(b1)) using MLP 5-8-1 with five neurons in the input layer, eight neurons in the hidden layer, and one neuron in the output layer. The selected ANN included a logistic function as a hidden activation function and an identity function as an output activation function.   The effectiveness of ANN modeling vs PLS modeling was estimated using R 2 and RMSE, and based on those criteria, the developed ANN models showed higher effectiveness for the prediction of the average Feret diameter of both oil-in-water and oil in mint aqueous emulsions. However, the proposed models were limited in their applicability to the range of the trained variables and therefore do not have a wider application. As previously described, the superior performance of the ANN model is attributed to its nonlinear mapping capability, which is a feature lacking in the PLS models [61,62], and it is in agreement with previously presented results where ANN modeling was also applied for the prediction of emulsion droplets using NIR spectra [10,28]. Furthermore, ANN modeling was shown to be more efficient than PLS modeling for the rapid detection of the microbial spoilage of beef fillets based on Fourier-transform infrared spectra [63], for the prediction of soil organic carbon using UV-VIS-NIR spectra [64], for the quantitative analysis of quartz in the presence of mineral interferences using FTIR spectra [62], for nondestructive grape texture prediction using NIR spectra [49], and for the prediction of bioactive component contents in olive leaf extracts using NIR spectra [36].

Conclusions
This is the first application of NIR spectra and PLS and ANN modeling for the analysis of oil-in-aqueous mint extract emulsions prepared using a microfluidic device. Based on the presented results, it can be concluded that, for all emulsifiers used, oil-in-aqueous mint extract emulsion droplets were larger than those generated in oil-in-water emulsions. The results also showed that NIR spectroscopy coupled with chemometrics can be used for the distinctive qualitative and quantitative grouping of the samples according to the emulsifier used for its production. The obtained results for the ANN models a showed higher ability to predict the average droplet sizes than the PLS models, especially in case of PEG 1500 oil-in-aqueous mint extract emulsions, where R 2 values of 0.9996, 0.9994, and 0.9995 were obtained for training, testing, and validation, respectively. This was attributed to the ANN's nonlinear mapping capability, which is a feature lacking in the examined PLS models.