Gasoline is one of the most widely used light petroleum products and a complex hydrocarbon mixture usually containing four to thirteen carbon atoms [1
]. However, the rapid development of the transportation industry has led to a reduction in non-renewable oil resources and the increasingly serious atmospheric pollution problem [2
]. To address this issue, it is necessary to find an environmentally, friendly, and economically viable alternative resource. Methanol gasoline as an excellent alternative fuel for engines, which has advantages such as high octane number, and good resistance to violent and clean burning [3
]. Due to these excellent qualities and appropriate economic suitability, methanol gasoline can play an important role in addressing the issue of energy shortage and environmental pollution. Thus, it has been widely used in transportation, including motor vehicles and internal combustion engines [4
]. According to the reports of the relevant literature, the adding ratio of methanol in methanol gasoline varies from 0%–80% [5
] since methanol gasoline can be prepared by directly mixing gasoline and methanol. In addition, this provides an opportunity for some unscrupulous traders to sell high-methanol gasoline at a low methanol content of gasoline to earn illegal profits. Therefore, it is necessary to develop an accurate and rapid method to qualitatively and quantitatively analyze the methanol content in methanol gasoline.
There are numerous detection methods for determining methanol content in gasoline and the commonly used is gas chromatography (GC) or gas chromatography-mass spectrometer (GC-MS) [6
]. Although these detection methods have high sensitivity and accuracy, they are time-consuming and complex, require toxic and hazardous reagents, and cannot meet the needs of online monitoring. The development and application of infrared (IR) spectroscopy technology provides a novel opportunity for rapid detection of methanol content in methanol gasoline [6
]. As a rapid and non-destruction detection technology, IR spectroscopy technology has been applied in various subject fields, such as chemistry, agriculture, food quality and the environment. Moreover, there are several studies using near infrared spectra to analyze gasoline products and obtain good qualitative and quantitative detection results [6
]. However, most of these studies use near infrared (NIR) spectroscopy to perform classification research [10
], and just a few works focus on the content of methanol in the gasoline [12
]. Thus, more explorations should be researched, and the feasibility of IR spectroscopy should be attempted to qualitatively and quantitatively measure the methanol gasoline.
Due to IR spectroscopy having the special information of molecular profiles, attributes correspond certain ones to functional groups of molecule. Thus, the feasibility of IR spectroscopy for rapidly detecting methanol in methanol gasoline was investigated. In this study, firstly, the dataset was explored using unsupervised principal component analysis (PCA), then qualitative and quantitative supervised models, including partial least squares discriminant analysis (PLS-DA), partial least square regression (PLSR) and least squared support vector machine (LS-SVM), were carried out to detect the methanol percentage in methanol gasoline. Concerned with the problem of too many variables in the stage of building the regression models, two classic variables selection algorithms (i.e., uninformative variables elimination (UVE), and competitive adaptive reweighted sampling (CARS)) were also applied to select the optimal variables. Finally, the performance of all regression models were systemically compared to identify the best prediction model for methanol percentage in methanol gasoline.
To assess the feasibility of IR spectroscopy for detecting the methanol percentage in methanol gasoline, attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy coupled with several multivariate calibration models, and variables selection methods for qualitative and quantitative analyzing methanol were used to investigate. Based on above introduction and planning, it can be seen that the main objective is to qualitatively and quantitatively detect methanol using the IR spectra. To be specific, there are several sub-subjects: (1) collecting and analyzing the spectral data of gasoline and methanol gasoline; (2) qualitatively classifying the gasoline and methanol gasoline; (3) selecting the optimal variables for regression model; (4) establishing the quantitatively detection model for methanol based on full variables and the optimal variables; (5) identifying the best detection model by comparing all model’s performance.
2. Materials and Methods
2.1. Sample Preparation
In this study, 95# Gasoline was chosen as the research object. Different brands of gasoline were purchased from Wenzhou gasoline stations (Wenzhou City, Zhejiang Province, China). Analytical-grade methanol reagent (Product No. M116122, purity > 99.9%) was purchased from Aladdin Reagent and used without any further treatments. To obtain the methanol gasoline, 95# gasoline samples were mixed with methanol according to a serial of volume ratios. A total of 16 gasoline samples varied six kinds of brands were collected. Volume ratios of methanol to gasoline were matched with the range of 0% to 30%.
2.2. Collection of ATR-FTIR Spectra
Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra was collected from the range of 4000 cm−1 to 600 cm−1 with the resolution of 4 cm−1 using the VERTEX 70 spectrometer (Bruker Optics Inc., Ettlingen, Germany), coupled with an attenuated total reflection (ATR) accessory (Pike Technologies, Germany). Then, the spectral signals were digitalized with 2 cm−1 intervals in the Fourier transform. The repeat scan number was set to 16 times, and the displayed curve was the averaged spectrum, which was obtained using OMINIC software (Version 6.5, Bruker, Inc.). Notably, each sample was updated through the cuvette (10 mm) and six times were measured repeatedly. Therefore, a total of six spectra were collected for further modeling analysis.
2.3. Multivariate Data Analysis
2.3.1. Principal Component Analysis
Before establishing the multivariate calibration model, the data dimension reduction method, and the principal component analysis (PCA), it was suggested to explore the dataset structure. The main idea of PCA is that a set of variables that may be related to each other is transformed into a set of linear uncorrelated variables called the principal component by way of orthogonal transformation. This transforms high-dimensional data into low-dimensional data, which facilitates analysis and visualization of data [13
]. It has been widely used in environment [14
] and food [16
] analytical fields, in terms of similarity clustering of samples, and the dimensional reduction of spectral data.
2.3.2. Classification for Adulteration Category
When the classification was considered, partial least squares discriminant analysis (PLS-DA) was employed to create the classification model. Similar to partial least squares regression (PLSR), the main principle of PLS-DA was also to extract several latent variables (LVs), which have the maximum covariance with the dependent variables from original data. With the help of optimal LVs, the PLS-DA classification model was established to predict the response of each sample. Finally, the category identification of each sample was completed based on the threshold determined by Bayesian theory. As a classic classification algorithm, much literature can be referred [19
2.3.3. Regression for Adulteration Content
Two regression models, including partial least squares regression (PLSR) and least square support vector machine (LS-SVM), were employed to establish the quantitative model and compare their predictive capacity. PLSR, as a classic linear regression model, projects the independent variable onto a set of orthogonal factors, which were called latent variables (LVs). Then, the quantitative relationship between the dependent and the independent variables was created using the several number of LVs [21
On the contrary, the nonlinear regression model LS-SVM was also considered to establish a comparison model. The principle of SVM is to map the original dataset from a low dimensional space into the high-dimensional space through non-linear functions and construct a hyperplane. So that the linear indivisible problem between datasets in the conventional space is transformed into a constrained quadratic programming problem, and the global optimal solution of the problem is obtained using Lagrange multiplier methods. Notably, the non-linear radial basis function (RBF) kernel was used most in this study [23
2.3.4. Variables Selection for Significant Information
To simplify the calculation process and improve the performance of regression model, two commonly used variable selection methods, including CARS [24
] and UVE [25
], were considered to select a few optimal variables. The principle of CARS is based on the “survival of the fittest” of Darwin’s theory of evolution. Each spectral point is regarded as individual and to remove the weighting-less individuals. These spectral variables with small absolute coefficients in the PLSR model were eliminated, and the remaining variables were used to construct a model with cross validation. In each run, the root mean squared error of cross validation (RMSECV) of each model was recorded to compare the performance. Furthermore, the model with the lowest RMSECV was chosen as the optimal subset of spectral variables [26
UVE is a classic variable selection method and which is developed based on PLSR method. In the calculation process of UVE, an artificial random variable matrix was appended to the spectral matrix, and their maximum stability value was calculated. Therefore, UVE can select these variables whose stability values are larger than the stability threshold [27
]. More detailed information about CARS and UVE can be found in the previous literature [28
2.3.5. Evaluation of the Model’s Performance
When finishing the multivariate calibration models, several evaluation indices were considered to assess the performance of the calibration model. As for the regression model, the root mean square error of calibration (RMSEC) and prediction (RMSEP), coefficients of determination of calibration (Rc2
) and prediction (Rp2
), residual predictive deviation (RPD), and the absolute difference between RMSEC and RMSEP (ABS) were considered to evaluate the performance of the regression model. Specifically, an excellent regression model usually has the high value of Rc2
, RPD, and the smaller value of RMSEC, RMSEP and ABS [29
For the PLS-DA classification model, the accuracy was used to evaluate the performance of PLS-DA classifier. Additionally, a sample distribution map will be used to more intuitively display the results of the classification. Notably, all calculation in this study were performed in the MATLAB 2015b environment (The Math Works, Natick, USA).
This study using ATR-FTIR technology investigated the feasibility to quantitatively and qualitatively detect the methanol in methanol gasoline. The calculated result demonstrated the accuracy and efficiency of IR spectra in quantitative and qualitative analysis of the methanol gasoline. As for the qualitative analysis, PLS-DA reached 100% and 96.88% accuracy for calibration and prediction set, respectively. When the quantitative analysis was considered, the PLSR and LS-SVM model, combined with UVE and CARS variables selection methods, were applied to establish the prediction model, and UVE-PLSR obtained the best prediction result with the RPD and ABS value of 6.420 and 0.847.
As a primary exploration using IR spectra to qualitatively and quantitatively the methanol in gasoline, the computed results show that it is feasible to nondestructive and rapid detect the methanol content. This research indicates that IR spectra is a promising analytical method in applying in the gasoline industry and more effort should be made to speed up industrial application.