1. Introduction
Edible oils are food substances obtained from plants and animal sources. They are usually liquids at room temperature and consist mainly of triglycerides, which are esters formed from the condensation reaction between glycerol and saturated, monounsaturated, and polyunsaturated fatty acids. Tropical edible oils such as palm oil and coconut oil are solids at room temperature because they contain high amounts of short-chain triglycerides and saturated fatty acids [
1,
2]. The amount and type of fatty acids in edible oils are influenced by the specific variety of the edible oil [
3,
4,
5]. For example, safflower oil has more polyunsaturated fatty acids than palm oil, coconut oil, soybean oil, peanut oil, canola oil, or flaxseed oil.
Edible oils are also the major components in the feedstock used to produce renewable fuels in a variety of industries, including transportation and agriculture. Renewable fuels [
6,
7] produced from edible oils have emerged as an alternative to oil, natural gas, coal, and other fossil fuels. Examples of edible oils that are used as feedstock for producing renewable fuels include soybean oil, peanut oil, canola oil, rapeseed oil, palm oil, cotton seed oil, coconut oil, safflower oil, and flaxseed oil [
8,
9,
10,
11,
12]. Currently, 95% of renewable fuels are produced from edible oils that are available on a large scale from agriculture [
13]. Approximately 7% of the plant-based edible oils harvested annually are used to produce biodiesel [
14]. Most research on renewable fuels is focused on producing biodiesel from plant-based edible oils [
15,
16,
17].
Burgeoning interest in renewable fuels can be attributed to the rapid depletion of fossil fuels caused by the increasing global energy demand and the environmental advantages of renewable fuels, specifically reduced emissions of greenhouse gases. Other advantages of renewables, for example, biodiesel, are their higher combustion efficiency and lower sulfur and aromatic content [
18,
19]. Biodiesel is also safer as its flashpoint is 423° K compared to 350° K for diesel [
20]. Biodiesel also has a higher cetane number [
21] than diesel. Cetane number is a measure of the readiness of fuel to auto-ignite when injected into an engine.
Edible oils used as feedstock have been analyzed for a variety of chemical and physical properties, including volatile matter, moisture, fixed carbon, ash, and inorganic and organic elemental content. The four properties of feedstock that are crucial for the conversion of edible oils to renewable fuels are viscosity, density, iodine value, and total acid number (TAN). The feedstock used to produce renewable fuels should be neither too dense nor too viscous, as thick material requires a larger amount of work to move it through the pipeline. Furthermore, the viscosity and density of edible oils can influence the performance of renewable fuel in fuel injection systems, with higher viscosity and density of the fuel leading to poorer atomization and inefficient combustion, resulting in a build-up of deposits on the combustion chamber wall. The iodine value defines the amount of olefin in the feedstock, and, therefore, the amount of hydrogen consumed during hydro-treating. Finally, the TAN for the feedstock is also an important parameter, as even a small increase in TAN for the feedstock above a critical threshold value can lead to corrosion of the catalyst in the refining process.
In this study, the viscosity, density, and iodine value of the feedstocks did not create problems in the refining of renewable fuels. To obviate the effects of higher TAN values, it was necessary to blend several feedstocks prior to refining. As part of a broader effort to standardize the feedstock used in the refining of renewable fuels, the present work focuses on the development of a secondary reference method to determine TAN based on coupling mid-infrared (IR) spectroscopy with partial least squares (PLS) regression to produce a method that is faster, less expensive, easier to use, and can be performed on site compared to the current method for TAN based on an acid-base (potentiometric) titration [
22,
23]; this represents a paradigm shift in problem solving. Previously, PLS regression has been used in conjunction with the near-infrared region for quantitative analysis. The prediction of protein content in wheat replacing the time-consuming and hazardous Kjeldahl method [
24] and the determination of the octane number of gasoline [
25] are two examples of a similar paradigm shift in solving important analysis problems.
2. Materials and Methods
Fourier-transform infrared (FTIR) absorbance spectra (4000 cm
−1 to 400 cm
−1) of 45 samples of feedstock used to produce renewable fuels were collected at 4 cm
−1 resolution at 64 scans each with Happ Genzel apodization using an iS50 Thermo-Nicolet FTIR spectrometer equipped with a diamond attenuated total reflection (ATR) accessory and a DTGS detector. Feedstock used to produce renewables were purchased on the commodities market, and limited information was provided about the edible oil type, composition or the processing history. Each feedstock sample was placed on the diamond ATR crystal via a disposable pipette, and the mid-IR spectrum was measured. A representative IR spectrum of a feedstock sample is shown in
Figure 1. The most intense absorption bands in the FTIR spectrum are observed at 2925 cm
−1 (asymmetric -C-H stretching of -CH2-), 2854 cm
−1 (symmetric -C-H stretching of -CH2-), 1746 cm
−1 (-C = O stretching of ester) and 1163 cm
−1 (-C-O stretching and -CH2- bending). The spectral region between 2200 and 2000 cm
−1, which corresponds to the absorbance by the diamond ATR crystal, was excluded from the analysis. Our previous studies on edible oils have shown that absorbance in this region is due solely to absorption by the diamond ATR crystal used to collect the mid-IR spectra [
26].
Digitally blended spectral data were generated as part of this study to augment the training set of 45 feedstock samples. Digital blending was performed by combining unprocessed FTIR spectra of real samples to obtain spectra that are representative of samples with a proscribed TAN value (see
Figure 2). To obtain a digital blended spectrum representing the IR spectrum of a sample with a TAN value of 8.75, the IR spectrum of a sample with a TAN value of 8.2 is averaged with the IR spectrum of a sample with a TAN value of 9.3. Gaussian distributed noise is added to the IR spectrum of each digital blend to homogenize the data. For each spectrum, noise is only added to the regions that contain IR bands. For a training set of digitally blended IR spectra, the largest absorbance value is identified at each wavelength and one thousandth of this value is multiplied by Gaussian distributed random noise that has a mean of zero and standard deviation of one. If the largest absorbance value is less than or equal to zero, noise is not added to the digitally blended spectrum at that wavelength.
Figure 3 compares a digitally blended IR spectrum (TAN is 8.75) to a measured IR spectrum (TAN is 8.71) for the region 1800–1600 cm
−1. PLS calibrations for the TAN were developed using only this spectral region as it contains the carbonyl stretch of the carboxylic acid group of fatty acids, which is the source of acidity in the feedstock. PLS was selected because it is considered the gold standard for linear multivariate calibrations.
PLS calibrations [
27] for TAN using experimental and/or digitally blended data from the FTIR spectra of feedstocks were developed using UNSCRAMBLER 11 (Camo Analytics). For each calibration, the spectra were preprocessed using orthogonal signal correction (OSC) [
28] followed by mean centering to improve both the quality and performance of the model. The number of latent variables for each PLS model was determined using cross validation [
29]. Several figures of merit [
30] were computed for each partial least squares (PLS) regression model including root mean square error of calibration (RMSEC), standard error of calibration (SEC), bias, root mean square error of cross validation (RMSECV), and standard error of cross validation (SECV).
Mid-IR spectra are often preprocessed to remove systematic noise such as baseline variation and multiplicative scatter effects using first and second derivatives or multiplicative scatter correction [
31]. However, these methods may also remove information from the spectra about the response variable. Better results for the PLS calibration of the mid-IR spectra of the feedstock were obtained when OSC, which removes features from the data unrelated to TAN, was employed. When these features are removed before the spectrum is analyzed by PLS, the performance of the calibration model is less impacted by changes in the chemical composition of the background sample matrix, which was another reason for preprocessing the spectral data with OSC prior to PLS.
3. Results and Discussion
Figure 4 summarizes the results of a PLS calibration model developed from the mid-IR spectra of the 45 feedstock samples using a single latent variable. Figures of merit for this PLS calibration (RMSEC, SEC, RMSECV, SECV, R
2, and bias) are summarized in
Table 1. Both the fitted and cross-validated estimates of TAN exhibited low bias. For cross validation (i.e., jackknifing), the data set was divided into 45 training set prediction set pairs. Each training set consisted of 44 samples, and the prediction set consisted of only 1 sample. Each sample was in the prediction set only once. PLS calibration models developed from the 44 samples in the training set were used to predict the TAN for the sample in the corresponding prediction set. The cross validation set results are summarized in
Table 1 for the entire sample cohort and
Table 2 for each sample. The correlation with TAN is good, and the differences between fitted and cross-validated predictions for R
2, root mean square error, and standard error do not indicate overfitting by PLS.
To strengthen the calibration, 103 digitally blended IR spectra were generated using the IR spectra of the 45 feedstock samples. In some cases, the feedstock samples used to generate blended IR spectra were selected to fill in regions of the calibration (see
Figure 4) where there were only a few samples (e.g., TAN values between two and four, five and eight, and fourteen and eighteen.) In other cases, the spectra of the feedstock samples used to generate digitally blended spectra were selected to reproduce samples that lie in regions of the calibration that are well represented (see
Figure 4, e.g., a TAN between zero and two). By using this set of digitally blended spectra for calibration, a PLS regression model for TAN can be developed that spans a wide range of TAN values and is well represented in all regions of the calibration.
As a first step towards developing a digital training set, the PLS calibration developed from the 45 feedstock samples (see
Figure 4 and
Table 1 and
Table 2) was used to predict the TAN values of the digitally blended data.
Table 3 summarizes the results of the PLS calibration for predicting the TAN values of the digitally blended spectral data. Of the 103 digitally blended spectra, the difference between the corresponding TAN value as predicted by the PLS calibration and the value expected for the digitally blended data (i.e., the deviation) exceeded a user-determined critical threshold value, which is ±1 or greater, for 29 digitally blended IR spectra. (Differences of ±1 unit or greater are significant.) An examination of the samples used to generate these 29 digitally blended spectra (see
Table 4) revealed that spectra generated from samples whose sample identification (SID) numbers were 45, 53, 57, 63, and 79 (see
Table 2) are problematic in terms of their TAN predictions. The provenance of these five samples is unique compared to the other forty samples in the cohort, as these five samples consisted of edible oils blended with used cooking oil that was contaminated with spices and/or alcohol, depending upon the part of the world from where they were purchased. Furthermore, digital adducts of these five samples do not appear to follow a linear additive model based on their poor PLS fits. As our approach for digital blending assumes that all data follow a linear additive model [
32], the 29 digital adducts of these five samples were deemed unsuitable for inclusion in the calibration set.
Figure 5 summarizes the results of a one-component PLS calibration model developed from the remaining seventy-four digitally blended spectra. The other twenty-nine digitally blended spectra discussed in the previous paragraph were not included in this model because of their poor TAN fit. Figures of merit for this PLS calibration are summarized in
Table 5. Using the digitally blended data, the slope and R
2 of the calibration line (for both fitted and cross-validated) are effectively one, the root mean square error (for both fitted and cross-validated) has been reduced by 50% (see
Table 1 versus
Table 5), and there is also a reduction of one order of magnitude for the bias associated with cross validation (see
Table 1 versus
Table 5).
The PLS calibration developed from the 74 digitally blended spectra was used to predict the TAN values of the 45 feedstock samples.
Figure 6 shows a plot of the predicted versus actual values.
Table 6 summarizes the results of the PLS calibration for predicting the 45 TAN values of the original spectral data. The R2 for predicted TAN values of the 45 feedstock samples using the PLS model developed from digital data was 0.9625 (see
Figure 6), which is larger than the R2 (0.957) for the fitted values computed from the PLS model for TAN that was developed using these same 45 feedstock samples as a training set (see
Table 1). Clearly, the PLS calibration developed from digitally blended spectral data can provide reasonable predictions of TAN for actual feedstock samples.
Figure 7 shows the results of the PLS calibration for the data cohort of 74 digitally blended spectra and 44 experimental spectra. Sample 53 (an experimental mid-IR spectrum) was deleted from the original data cohort of 74 digitally blended spectra and 45 experimental spectra because it was flagged as an outlier by PLS. The 118 spectra cover almost the entire range of the calibration.
Table 7 summarizes the figures of merit for this calibration. The root mean square error of calibration for the TAN is 0.7324 compared to a root mean square error of calibration of 1.13 for the PLS model developed from the experimental data (see
Table 1). Clearly, there is benefit in combining digital data with experimental data for determining the TAN from mid-IR spectra which demonstrates the advantages of using digital data to enhance PLS multivariate calibrations. Although there was no independent test set to validate this model, an independent test set was not necessary as the feedstock used to produce the renewables for the pilot plant also served as standards for the PLS calibration of the TAN. In practice, different feedstock samples were combined to produce renewable fuels, and the sample calibration set for the PLS model includes samples from the same lots of the raw materials used to produce these fuels. By combining digitally blended spectral data with experimental spectral data, the PLS calibration model obtained was superior to the calibration model obtained using only experimental data (compare
Table 7 to
Table 1).