Application of Fourier Transform Infrared Spectroscopy and Multivariate Analysis Methods for the Non-Destructive Evaluation of Phenolics Compounds in Moringa Powder

: This study performed non-destructive measurements of phenolic compounds in moringa powder using Fourier Transform Infrared (FT-IR) spectroscopy within a spectral range of 3500–700 cm − 1 . Three major phenolic compounds, namely, kaempferol, benzoic acid, and rutin, were measured in ﬁve different varieties of moringa powder, which was approved with respect to the high-performance liquid chromatography (HPLC) method. The prediction performance of three different regression methods, i.e., partial least squares regression (PLSR), principal component regression (PCR), and net analyte signal (NAS)-based methodology, called hybrid linear analysis (HLA/GO), were compared to achieve the best prediction model. The obtained results for the PLS regression method resulted in better performance for the prediction analysis of phenolic compounds in moringa powder. The PLSR model attained a correlation coefﬁcient ( R 2 p ) value of 0.997 and root mean square error of prediction (RMSEP) of 0.035 mg/g, respectively, which is comparatively higher than the other two regression models. Based on the results, it can be concluded that FT-IR spectroscopy in conjugation with a suitable regression analysis method could be an effective analytical tool for the non-destructive prediction of phenolic compounds in moringa


Introduction
Moringa, a plant belonging to a family of Moringaceae, is a commonly cultivated crop in India, Africa, Indonesia, the Philippines, and various parts of the world. The plant consists of 13 species from tropical and subtropical regions, out of which Moringa Oleifera (MO) is the most cultivated one [1]. It has been consumed as a local food and in the form of medicines and also has several industrial applications [2,3]. Furthermore, due to its higher consumption, people usually refer to it as a miracle tree, as it is highly effective for curing several diseases, such as cancer, diabetes, insomnia, etc. [4]. Various studies in the past have demonstrated the advantageous effects of moringa oleifera in humans [5]. Moringa oleifera is a great source of bioactive chemical compounds, and thus, it provides several important benefits, as mentioned in previous literature [6,7]. The moringa leaves that are widely consumed are rich with vitamins, carotenoids, polyphenols, phenolic acids, flavonoids, alkaloids, isothiocyanates, tannins, and saponins; therefore, they have various pharmacological properties [8].
Phenolic compounds in plants are the secondary metabolites considered a fusion of shikimate, pentose phosphate, and phenylpropanoid pathways. The general chemical structure of phenolic compounds is an aromatic benzene ring with one or more hydroxyl groups [9]. Depending on the number of phenol moieties present in the molecule, they can be classified as simple phenols or polyphenols. Phenolic compounds have several important characteristics properties that are crucial for plants. For example, they work as a defense system for plants by protecting them from harmful UV radiations, insects, and fungi [10,11]. Furthermore, in the past, some research showed the potential of phenolic compounds to work as an antitumor, antimicrobial, anti-adhesive, and anti-inflammatory agent [12].
Currently, the extraction of phenolic compounds in moringa powder is carried out by destructive chemical methods, including mass spectrometry (MS) [13], high-performance liquid chromatography (HPLC) [14], and gas chromatography [15]. The aforementioned wet chemistry methods are fast, efficient, precise, and can detect substances under examination in trace levels, i.e., ppm or ppb levels, but they have certain drawbacks: they are often time-consuming, destructive, require complicated experimental processes, and produce several chemical wastes, limiting real-time measurement. Thus, there is an urgent requirement to develop a rapid detection technique that can cope with the limitations of the methods mentioned above for phenolic compounds measurement in moringa powder samples in a non-destructive manner.
Spectroscopic techniques are promising and important tools for the examination of structures of chemically related systems by measuring the interactions between light and matter. Different spectroscopic techniques, such as infrared spectroscopy, fluorescence spectroscopy, etc., which operate under different spectral ranges and deal with the extraction of chemical information, are sensitive to determining structural and physicochemical properties. These techniques examine the energy of radiation absorbed or emitted by the molecules. For example, infrared spectroscopy allows the measurement of symmetric and asymmetric vibrational transitions between atoms generated by the absorption of IR radiations. Furthermore, fluorescence spectroscopy measured the electronic transition of atoms and the relaxation of electrons to the ground state by the emission of color in the form of radiation that comes under the ultraviolet (UV) range.
Raman and near-infrared (NIR) spectroscopy are popular vibrational spectroscopic techniques that showed strong potential while dealing with qualitative and quantitative measurement in different kinds of organic and inorganic compounds. In the past, many researchers took advantage of these techniques in combination with specific multivariate analysis methods for various kinds of studies, i.e., they used it for the quality determination of Grignard reagent [16], prediction of anthocyanin in soybean [17], and so on. Though the following spectroscopic techniques offer various analytical applications, they still suffer from certain drawbacks while dealing with fluorescence and overtone combinations bands generated during spectral acquisitions. Thus, FT-IR spectroscopy is one of the key solutions to cope with the following limitations by measuring the fundamental vibrations of molecules. Specifically, they can be useful for extracting information regarding the structural and physicochemical properties of molecules by exploiting the matter and light interactions Fourier Transform Infrared (FT-IR) spectroscopy is a part of vibrational spectroscopy that provides the measurement of fundamental vibrations of molecules in a non-destructive manner and is further used in the analysis of gases, liquids, and solids. Several studies were performed which utilized an application of FT-IR spectroscopy for the quality evaluation of oils mixed with benzene [18] or for the determination of adulteration in wines [19]. Furthermore, some studies were also performed in the past using moringa powder samples by evaluating quality parameters such as minerals, protein, and moisture contents [20] and sorption potential of Moringa oleifera seeds by elucidating possible functional groups responsible for up taking Ag + [21]. Although FT-IR spectroscopy has numerous applications, as described above, the application of multivariate analysis methods is essential to extract meaningful information from the acquired spectral data. Thus, the present study mainly focused on investigating the potential application of FT-IR spectroscopy for the prediction of the total phenolic content of three significant phenolics compounds in moringa powder, namely kaempferol, benzoic acid, and rutin, by using three different regression analysis methods, namely the PLSR, PCR, and HLA/GO methods, to achieve the best prediction model for the assessment of phenolics compounds in moringa powder samples in a non-destructive manner.

Sample Preparation
In the present study, five different varieties of moringa powder were used, purchased from an online shop named Coupang from South Korea. The moringa samples have distinct geographical heritage: the USA, Korea, India, and Africa. The complete details regarding variety, heritage, and manufacturing company are provided in Table 1 below.  3.30, and 3.80 (mg/g), were prepared by mixing different varieties into each other. The samples were shifted to separate snap-cap vials, and the high-speed vortex mixer high-speed Vortex Gene2 (Scientific Industries, Inc., Bohemia, NY, USA) was used for 40 s to make a uniform mixture.

FT-IR Spectral Measurements
Nicolet 6700 (Thermo Scientific Co., Waltham, MA, USA) FT-IR spectrometer was used for the spectral measurement of moringa powder samples. The system was constructed in attenuated total reflectance (ATR) sampling mode and assigned with a detector made of deuterated triglycine sulfate (DTGS), with potassium bromide (KBr) as a beam splitter, which is all managed together by software made by OMINIC. For each particular sample, the spectra were acquired under the spectral range from 3500 to 800 cm −1 . The number of scans was set to 32 per sample with 4 cm −1 intervals in order to reduce the background noise generated by the instrument. The spectral acquisition was performed based on the procedure explained by [19]. During the spectral acquisition, 10 replicates were selected for each of the 14 different concentrations. Thus, a total of 140 samples were measured using an FT-IR spectrometer.

Extraction and Analysis of Phenolic Compounds Using HPLC
Extraction and analysis of phenolic compounds were carried according to the protocol described by [17]. To the 0.1 g of each fine moringa powder sample, 3 mL of aqueous MeOH solution (80%) was added and vortexed for 1 min, which was then sonicated for 1 h at 37 • C. After that, the mixture was centrifuged at 4 • C for 15 min at 10,000 rpm. The clear supernatants were passed through a 0.45 µm PTFE syringe filter (Millipore, Bedford, MA, USA) into amber glass vials (Thermo Fisher Scientific, Waltham, MA, USA). The HPLC machine, mobile phase, gradient program, identification, and quantification of the phenolic compounds were similar to the protocol described in [22].

Data Preprocessing and Multivariate Analysis
The origination of spectral noise, scattering effects, and difference in particle size by the spectrometer during spectral acquisition directly correlate with the model's prediction performance. Therefore, it is necessary to apply spectral pretreatment methods to the acquired raw spectral data to enhance the quality of data. For this research, the raw data were pretreated using several preprocessing methods, including multiplicative scatter correction (MSC), standard normal variate (SNV), and Savitzky-Golay (SG) filtering. MSC and SNV are the most commonly used preprocessing methods to remove background, slope variation, and scattering effects from the spectral data [23]. Furthermore, Savitzky-Golay filters (first and second derivative) methods were used to eliminate the undesired effects, such as noise and baseline drift, from the obtained FT-IR spectroscopic data [24,25]. The pretreated spectra were analyzed using multivariate analysis methods, including partial least square regression (PLSR), principal component regression (PCR), and HLA/GO using MATLAB (Version 7, The Mathworks, Natick, MA, USA).

PLSR, PCR, and HLA/GO Model
The PLSR, PCR, and HLA/GO models were constructed to predict phenolic compounds in moringa powder samples. The PLSR model is one of the widely used methods for prediction analysis which derives the linear relationship between one of the dependent and independent variables by calculating the properties of the dependent variable. The general equations for the PLSR model are written in the following way: where the terms X and Y are defined as independent and dependent variables, T and U are score matrices, PT and QT are the loading matrices of X and Y, respectively, and E is the error matrix. PCR is defined as a fusion of principal component analysis (PCA) and multivariate linear regression (MLR) and is commonly applied in multivariate analysis to get rid of highly correlated predicting variables. In the first step, PCA is performed, which reduces the number of variables by using the dimension reduction approach. In the second step, the optimum number of principal components obtained through PCA is used in the MLR model to perform PCR [26].
Furthermore, NAS and hybrid linear analysis (HLA) algorithms were also used to predict phenolic compounds in moringa powder. A detailed explanation of the HLA algorithm is presented in the original article [27]. NAS calculates the portion of the signal which is directly linked with the concentration of analyte under interest [28]. For this study, the NAS vector related to each sample was calculated based on the method explained by Goicoechea and Olivieri [27] and Marsili [29]. Figure 1 shows the raw spectra of moringa powder samples. The original FT-IR spectra consist of several overlapping peaks and noise generated by the instrument during spectral acquisition and, therefore, cannot provide meaningful information, which is sensitive to the phenolic compounds present in the measured samples. Thus, spectral preprocessing played a valuable role in enhancing the spectral quality and, therefore, provides more informative peaks related to critical chemical compounds present in the scanned samples.

Spectral Interpretation
In this study, after applying standard normal variate (SNV) preprocessing, the high-quality spectral peaks were observed to eliminate unwanted noise and overlapping peaks.  The FTIR spectral range is divided into two important spectral regions: functional group and fingerprint region. The functional group regions usually run from 4000-1450 cm −1 , while the fingerprint group region ranges from 1450-500 cm −1 . Both these regions display important spectral properties that are crucial during the identification of unknown compounds. Since the functional group region is typically associated with stretching vibrations of functional groups present in compounds and, therefore, consists of relatively few peaks. On the other hand, the fingerprint region is considered most important because each different compound produces its own unique peak pattern and is associated with more peaks. Due to the lack of information after 3500 cm −1 and below 800 cm −1 wavelength range, the spectra were plotted between 3500-800 cm −1 , respectively. The peaks observed around 3500-2500 cm −1 and 1700-1600 cm −1 were related to the presence of O-H (hydroxyl) and C=O (carbonyl) stretching (i and ii) vibrational bands. All the 14 different concentrations, which were subdivided into three concentrations ranges, were clearly separated from each other within this range shown in Figure 2i,ii and were obtained within a similar range for all the three phenolic compounds identified through HPLC analysis; their corresponding chemical structures are presented in Figure 3.
Due to the presence of a phenolic ring (C 6 H 5 O-H) in kaempferol, some change in spectral lines' intensity was observed around 3427 and 3317 cm −1 , which is generated due to the presence of phenolic O-H stretching vibrations. Additionally, the spectral signature located around 2954 and 2850 cm −1 were both caused by C-H stretching. Both the vibrations observed above are associated with kaempferol [30], one of the phenolic compounds identified in our samples. Furthermore, the following statement was wholly supported by Sharma's previous reports [31], who identified similar spectral regions done for kaempferol in Pedalium murex using FT-IR spectroscopy. Additionally, Sedef [32] observed identical spectral signatures for kaempferol, which were described in this study in chitosan samples using FT-IR spectroscopy. Furthermore, the spectral peaks acquired for benzoic acid and rutin are located at 1505 cm −1 (C=C aromatic stretching), 1600 cm −1 (benzene ring skeleton), and 1505 cm −1 (C=C stretching), respectively [33,34]. All the corresponding vibrations were marked with an arrow in the extended spectral regions from iii, iv, and v, as shown in Figure 2 and further explained in Table 2.

HPLC Reference Analysis
The reference values obtained through HPLC analysis for phenolic compounds in moringa powder are explained in Table 3. There are three major phenolic compounds identified in five varieties of moringa powder samples. Rutin was the most abundant compared to the other two types of phenolic compounds detected and presented in a higher range of concentration. After rutin, other phenolic compounds, i.e., kaempferol and benzoic acid, were presented in the appropriate concentration range in the moringa samples. Thus, the prediction model for individual phenolics was carried out using these three compounds.

PLSR, PCR, and HLA/GO Model for Phenolics Compounds Prediction
As the raw spectral data are not suitable for direct analysis, different preprocessing steps were implemented prior to the analysis for the prediction of three different phenolics compounds in moringa powder. Therefore, the evaluation of model performance is done by constructing calibration and prediction datasets. Since the number of samples experimentally measured through FT-IR spectrometer was only 140 for 14 different concentrations, which are comparatively less and could result in under fitting problems during the construction of the multivariate analysis model. Thus, 1400 mixed samples were generated artificially using Dirichlet distribution to solve this issue. The detailed description of the algorithm and methodology of Dirichlet distribution is explained in the following article [35]. Figure 4a,b shows the general concept of Dirichlet distribution after utilizing this algorithm in our study. After 1400 artificially mixed samples were prepared, a multivariate analysis model named PLSR was developed. A total of 840 samples were used for calibration set samples, while the remaining 560 samples were used for the prediction dataset, presented in Table 4. Statistical parameters, such as the coefficient of determination (R 2 ) and root mean square error (RMSE), were used as a tool for model performance calculation. During the construction of the PLSR model, different preprocessing steps were employed. Compared to other preprocessing methods, SNV performed better and afforded a higher correlation value (R 2 ) of 0.996 with a minimum error (RMSEC) value of 0.038 mg/g for the calibration dataset. In contrast, the R 2 and RMSEP values for the prediction sets were 0.997 and 0.035 mg/g data. Figure 5a,b shows the actual and predicted values for the moringa powder samples, indicating a good agreement between actual and predicted concentrations. In order to compare the prediction performance of PLSR model, two different regression methods, named PCR and NAS-based HLA/GO models, were further developed for phenolic compounds prediction. During both model construction, the same number of samples were selected, i.e., 1400, in which 840 samples were chosen for calibration set and 560 samples were selected for the prediction dataset. Figure 6a,b shows the relationship between actual and predicted concentration for the phenolic compounds in moringa powder for the calibration and prediction sets. In the FT-IR region, the PCR model predicted correlation coefficients (R 2 p ) values of 0.963 with an RMSEP value of 0.134 mg/g respectively, using maximum normalization preprocessing. Similarly, the HLA/GO model also predicted a correlation coefficient (R 2 p ) value of 0.971 with an RMSEP value of 0.120 mg/g using mean normalization preprocessing. The comparative performance of PLSR, PCR, and HLA/GO models were summarized in Table 5. Table 5 clearly demonstrates that the correlation coefficient (R 2 p ) and RMSEP values acquired through PCR and HLA/GO models performed comparatively lower than the developed PLSR model. Thus, it can be suggested that the PLSR model has a strong potential to perform the prediction analysis of moringa powder samples.

Beta Coefficients for the Developed PLSR Model
The beta coefficient shown in Figure 7 was plotted to support the developed PLSR model. The beta-coefficient plot in Figure 7 exhibits the significant spectral differences between various groups of samples. In the multivariate analysis method, this plot plays an important role in the localization of wavenumbers, which have a direct relationship with the various chemical features of compounds. The spectral region obtained from 2540-3427 cm −1 and 1613-1685 cm −1 is due to the presence of O-H and C=O stretching vibrations of phenolic compounds. Furthermore, additional peaks were observed around 944, 2954, 1600, 1422, and 1505 cm −1 , which were in a similar spectral region shown in (Figure 2a) and under extended spectral regions (I, ii, iii, iv, and v). These peaks highlighted the sensitive regions of three different phenolic compounds identified, and therefore, the beta-coefficient obtained from the PLSR method supports the prediction of phenolic compounds in moringa powder samples. The results obtained through the PLSR model, therefore, suggested that FT-IR spectroscopy, when combined with the PLS regression and SNV preprocessing method, could be a rapid and alternative analytical tool for the nondestructive measurement of phenolic compounds in moringa powder samples. Johnson [36] measured the antioxidant and phenolic contents in powdered plant matrices using FT-IR spectroscopy and acquired lower prediction accuracy of 0.96 using PLSR regression. Since the plants consists of several kinds of phenolic compounds, this study cannot provide sufficient information regarding the major phenolic compounds in the powder matrices measured. Furthermore, Zhang [37] utilized an application of the near-infrared reflectance spectroscopic technique for the evaluation of chemical components, including crude protein, crude fiber, and crude fat in moringa leaves. All these components are macro components and near infrared spectroscopic data are often affected by overtone and combination bands, which reduce its effectiveness in a real time scale. Additionally, Okechukwu [38] measured phytochemical constituents using FT-IR spectroscopy in moringa oleifera leaves by combining Gas Chromatography-Mass Spectrometry. Although this study utilized a more advanced reference analysis method using FT-IR spectroscopy and identified phenolic compounds, this study did not give a deeper explanation regarding the kinds of phenolic compounds identified and their total concentrations present in moringa powder. Apart from this, Makita [39] also evaluated the flavonoid content in moringa oleifera using UHPLC-qTOF-MS fingerprinting. Even though this research is highly precise for measuring the flavonoid content, they are often time consuming and destructive. The limitations of these papers have been solved in our study by providing more detailed information regarding the types and amount of phenolic compounds present in moringa powder samples with a higher correlation coefficient (R 2 ) value of 0.99 for prediction and the lowest root mean square error values of 0.035 mg/g for prediction. Hence, the obtained results confirmed that when combined with suitable multivariate regression analysis methods, FT-IR spectroscopy could replace destructive chemical approaches such as HPLC and Mass spectrometry to perform the prediction analysis of the phenolic compounds in moringa powder samples.

Validation of the Presence of Phenolic Compounds in the Mixed Powder Samples Using HPLC
In order to assure the presence of phenolic compounds in the mixed moringa powder samples with 14 different concentrations, which were measured through FT-IR spectroscopy, HPLC was performed a second time to validate our study. During the extraction and analysis of phenolic compounds, the entire process was similar to the protocol explained in Section 2.3. The concentration values of phenolic compounds are explained below in Table 6.
Based on the accomplished second HPLC results, rutin content was highest in all the mixed variety concentrations, followed by benzoic acid and kaempferol, which show a similar trend with the first HPLC analysis. The second HPLC analysis, therefore, supports the presence of similar phenolic compounds, which were identified before though this study.

Conclusions
In this study, FT-IR spectroscopy was investigated for the non-destructive evaluation of three significant phenolic compounds, namely kaempferol, benzoic acid, and rutin, in moringa powder samples, measured in five different varieties of moringa powder, which was further validated with the HPLC method. The prediction analysis was performed using three different regression methods, i.e., the PLSR, PCR, and NAS-based HLA/GO method. FT-IR spectroscopy in combination with the PLSR model resulted in a higher correlation coefficient (R 2 p ) value of 0.997 and a lower error (RMSEP) value of 0.035 mg/g using the SNV preprocessing method and resulted in a better performance than the PCR and HLA/GO model, which acquired correlation coefficient (R 2 p ) values of 0.963 and 0.971 and error (RMSEP) values of 0.134 mg/g and 0.120 mg/g, respectively. To confirm the presence of phenolic compounds in the mixed moringa powder samples, HPLC treatment was performed a second time, which showed a similar trend, i.e., rutin is present in the highest concentration range among the other two phenolic compounds, similar to the results acquired with the first HPLC analysis results. Therefore, the results concluded that FT-IR spectroscopy combined with suitable regression analysis methods could be an alternative analytical tool for the rapid measurement of phenolic compounds in moringa powder and could replace destructive chemical methods. This research will be continued with different plant powder samples to check the validity of the constructed model to identify phytochemicals in real-time applications.