^{1}

^{2}

^{*}

^{2}

^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A novel method which is a combination of wavelet packet transform (WPT), uninformative variable elimination by partial least squares (UVE-PLS) and simulated annealing (SA) to extract best variance information among different varieties of lubricants is presented. A total of 180 samples (60 for each variety) were characterized on the basis of visible and short-wave infrared spectroscopy (VIS-SWNIR), and 90 samples (30 for each variety) were randomly selected for the calibration set, whereas, the remaining 90 samples (30 for each variety) were used for the validation set. The spectral data was split into different frequency bands by WPT, and different frequency bands were obtained. SA was employed to look for the best variance band (BVB) among different varieties of lubricants. In order to improve prediction precision further, BVB was processed by UVE-PLS and the optimal cutoff threshold of UVE was found by SA. Finally, five variables were mined, and were set as inputs for a least square-support vector machine (LS-SVM) to build the recognition model. An optimal model with a correlation coefficient (

Automobile lubricating oil is a kind of efficient anti-friction agent, mainly used to reduce the friction between the surfaces of moving parts. At present, there are many different varieties of lubricating oil, and these varieties greatly affect the market price and quality. Recently, consumption of lubricating oil has increased, as a result, instances of fraud have increased as well. To make enormous profits, some illegal factories mix varieties of lubricating oil of different quality, and such behaviors infringe on the rights and interests of consumers and legal factories. Therefore, there is a need to develop an accurate and rapid method to discriminate between varieties and qualities of lubricants, which may be also utilized for the detection of adulteration.

Recently, some researchers have devoted much attention to the study of lubricants. Zhao

Wavelet packet transform (WTP), an extension of wavelet transform (WT) [

Uninformative variable elimination by PLS (UVE-PLS) is a method for variable selection [

The simulated annealing (SA) algorithm, a simulation of the annealing process used for metals, was put forward by Kerkpatrick in 1983 [

In this work, WPT with SA was used to search for BVB, then irrelevant variables in BVB were eliminated by UVE-PLS, and SA was employed to search for the cutoff threshold of UVE-PLS. Finally, the variables were mined as input sets for a LS-SVM to build a lubricant recognition model.

A total of 180 lubricant samples were used as the whole data set. The calibration set of 90 samples was selected randomly for the optimal parameters. The remaining 90 samples were selected as validation set to evaluate the performance of discrimination model. The 180 samples were purchased in local market including the following three varieties: Changcheng (Cc), Huaxiayyangguang (Hxyg), Caltex (Ca). All samples were stored in the lab at a constant temperature of 25 ± 1 °C to equalize the temperature.

For each sample, three reflectance spectra were scanned by a handheld FieldSpec Pro FR (325–1,075 nm)/A110070 (Analytical Spectral Devices Inc., Boulder, CO, USA). The light source consists of a Lowell pro-lam interior light source assemble/128930 with a Lowell pro-lam 14.5 V Bulb/128690 tungsten halogen bulb that could be used both in the visible and near infrared regions. The field-of-view of the spectral radiometer is 10°. The spectroradiometer was placed at a height of approximately 250 mm and at a 45° angle away from the center of the sample. The light source was placed at a height of approximately 150 mm and 45° angle away from the sample. The spectrum of each sample was the average of 30 successive scans with 1.5 nm intervals. Three spectra were collected for each sample and the average spectrum of these two measurements was used in the later analysis. All spectral data were stored in a computer and processed using the RS3 software for Windows (Analytical Spectral Devices) designed with a Graphical User Interface.

WPT is a derivative of WT. In the fast wavelet transform (WT) [

In linear least squares models, the predictions

In UVE-PLS method, a PLS regression coefficient matrix _{1}_{n}_{i}_{i}

In simulated annealing, a problem starts with an initial solution, and this solution can be easily changed. But as the temperature _{i}

The main processes of the SA algorithm for optimal parameters are explained as follows:

Step 1: Initialize parameters: set initial temperature _{I}_{s}_{1}^{best}

Step 2: Another solution _{2}^{best}

If _{2}) < ^{best}^{best} ← F_{2}) and ^{best}_{2}

If _{2}) > ^{best}

If ^{best}_{2}

Else change = false

Step 3: Check whether the stopping criteria are satisfied. If satisfied, end the SA algorithm; otherwise, change annealing temperature:

Typical spectra of three varieties of lubricants are shown in

Before SA was employed to search for BVB, the searching range of SA should be defined; the range includes lower and upper bound constraints of decomposition level; the max node number in each level. Lower and upper bound constraints of WPT decomposition should be determined first, upper bound, namely, for a signal with a length of ^{l}

After different bands were obtained by WPT decomposition, SA was employed to seek this optimal band in this study. Before using of SA, some parameters of SA algorithm should be preset:

Selection of initial temperature _{I}_{s}_{I}_{s}

State generate function and state accepting function: Student’s

Annealing schedule: Annealing schedule is an exponential annealing schedule which updates the current temperature based on the initial temperature and the current annealing parameter

Algorithm termination criteria: In general, there are two stopping rules: one is that the number of temperature transitions satisfies the temperature termination rules, and the other one is that the neighbor solution was not improved after a certain period [

The performance of SA was evaluated through a fitness function, also known as objective function. The function value was the criterion for guiding SA to the global optimum. The prediction ability of the calibration model was evaluated with parameters of correlation coefficient (^{0} is the mean of
^{p}

After the parameters of WTP and SA were determined, SA was employed to search for the BVB. The optimal BVB and the best function value are shown in

Support vector machine (SVM) is a state-of-the-art statistical learning method proposed by Vapnik [^{−1}, 2) and the parameter ^{2}) within the region of (10^{−2}, 10^{4}) were set.

Finally, BVB of 18 variables were extracted, and they constituted a new set of variables which could be used as inputs for LS-SVM (BVB-LS-SVM) to build the recognition model. In the application of BVB-LS-SVM, each variety of lubricant in the calibration set was assigned a dummy variable as a reference value (set Cc lubricant as 1, set Hxyg lubricant as 2, set Ca lubricant as 3). A total of 90 samples in the validation set were predicted by the LS-SVM model. Good performance was achieved, and the prediction results for

In order to improve prediction accuracy further, uninformative variable elimination by PLS (UVE-PLS) was employed to remove uninformative variables of BVB. In the UVE-PLS method, how to estimate the cutoff is a very important issue. In the previous research, they used artificial random variables, added to the data set to calculate the cutoff [

The retained five variables constituted a new set of variables for LS-SVM (BVB-RV-LS-SVM). The performance of BVB-RV-LS-SVM was evaluated by the same 90 unknown samples mentioned above, and

In this section, the method proposed in this work was compared with the conventional methods:

Method I: LS-SVM model with certain LVs as the inputs [

Method II: Partial least squares regression (PLSR) [

Method III: LS-SVM with optimal wavelengths. SA was employed to search for the optimal wavelength in the full spectrum; this method is equivalent to a genetic algorithm (GA) searching for the optimal wavelength from the full spectrum.

Method IV: LS-SVM with certain wavelengths. Certain wavelengths were selected from the full spectrum by UVE-PLS, and optimal cutoff threshold of UVE-PLS was searched by SA.

Method V: The method proposed in this work (BVB-RV-LS-SVM).

Compared with all the models above, the model presented in this work achieved satisfactory prediction results (seen in

The variety discrimination of lubricants was successfully performed by Vis-SWNIR spectroscopy with a hybrid method combination of WPT, UVE-PLS, SA and LS-SVM. Using the WPT, UVE_PLS and SA, the raw spectra data set was greatly compressed, and only five variables were mined. Then, these variables were used as input set for LS-SVM to build a recognition model, and good performance was achieved. The overall results indicated that the method combining WPT, UVE-PLS and SA was a powerful way of compressing the spectral data set and selection of diagnostic information.

This study was supported by 863 National High-Tech Research and Development Plan (2011AA100705) and Zhejiang Provincial Science & Technology Innovation Team Project (2009R50001).

Full WPT binary tree.

Vis-SWNIR spectra of three varieties of lubricant.

Result of optimal node by SA. (

Characteristics of coefficients of node (6,3).

The absolute deviation value of prediction results of validation set (sample index 1–30 for Cc lubricant, 31–60 for Hxyg lubricant, and 61–90 for Ca lubricant). (

Result of optimal cutoff by SA. (

Stability distribution of each variable, and the two red dotted lines indicate the lower and upper cutoff.

The absolute deviation value of prediction results of BVB-RV-LS-SVM.

The discrimination results of calibration and validation sets by different calibration models.

PCs/LVs/Sw/Sv ^{a} |
6 | 6 | 389 | 151 | 5 |

Calibration Set | |||||

^{b} |
0.9845 | 0.9539 | 0.9864 | 0.9844 | 0.9943 |

RMSEC^{c} |
0.1433 | 0.2540 | 0.1351 | 0.1472 | 0.0878 |

Validation Set | |||||

^{b} |
0.9511 | 0.9256 | 0.9844 | 0.9950 | 0.9950 |

RMSEP ^{c} |
0.2718 | 0.3194 | 0.1472 | 0.0829 | 0.0827 |

PCs/LVs/Sw/Sv: Principle components/latent variables/Selected wavelengths/Selected variables.

RMSEC/RMSEP: root mean square error of calibration or prediction.