Rapid Quantitation of Coal Proximate Analysis by Using Laser-Induced Breakdown Spectroscopy

: Proximate analysis of coal is of great signiﬁcance to ensure the safe and economic operation of coal-ﬁred and biomass-ﬁred power generation units. Laser-induced breakdown spectroscopy (LIBS) assisted by chemometric methods could realize the prediction of coal proximate analysis rapidly, which makes up for the shortcomings of the traditional method. In this paper, three quantitative models were proposed to predict the proximate analysis of coal, including principal component regression (PCR), artiﬁcial neural networks (ANNs), and principal component analysis coupled with ANN (PCA-ANN). Three model evaluation indicators, such as the coefﬁcient of determination (R 2 ), root-mean-square error of cross-validation (RMSECV), and mean square error (MSE), were applied to measure the accuracy and stability of the models. The most accurate and stable prediction of coal proximate analysis was achieved by PCR, of which the average R 2 , RMSECV, and MSE values were 0.9944, 0.39%, and 0.21, respectively. Although the R 2 values of ANN and PCA-ANN were greater than 0.9, the higher RMSECV and MSE values indicated that ANN and PCA-ANN were inferior to PCR. Compared with the other two models, PCR could not only achieve accurate prediction, but also shorten the modeling time.


Introduction
In the next decades, coal is expected to remain an important energy source and comprise a large proportion of energy consumption worldwide; for instance, more than half of the electric power is supplied by coal-fired power plants in China [1]. Coal-fired burners face an urgent problem in that the flame combustion stability needs to be settled, which is not conducive to ignition, combustion efficiency, extinction, and pollutant control. If the physical and chemical properties of coal could be determined in time, it could lead to extensive advances in combustion control and optimization. Proximate analysis is a key parameter that could preliminarily distinguish the types of coal and assess the coal quality, contributing well to the knowledge of the changes in fuel characteristics in boilers [2]. Traditional chemical measurement is time-consuming and labor-intensive, as characterizing coal requires involved sample preparation and cannot obtain multi-component information simultaneously, which is unfavorable for real-time boiler combustion optimization.
Many rapid sample analysis techniques exist, such as prompt gamma neutron activation analysis (PGNAA) [3], X-ray fluorescence (XRF) [4], and inductively coupled plasma-optical emission spectroscopy (ICP-OES) [5]. However, the neutron source of PGNAA has potential radiation hazards; XRF cannot analyze C, H, and other low-atomicnumber elements; and ICP-OES has several weaknesses; for example, the costly analysis is derived from a large amount of argon consumption and the occurrence of several matrix effects is difficult to avoid or mitigate. As many limitations exist in practical applications, searching for an efficient and convenient method to assess coal quality is necessary. Laserinduced breakdown spectroscopy (LIBS) is an atomic emission analysis approach that is

Experimental Setup
The coal samples were assessed using a mobile LIBS system with a coaxial structure, as shown in Figure 1. A Q-switched Nd: YAG laser was used for a laser source of 1064 nm in wavelength, 1-20 Hz in repetition rate, and 7 ns in pulse width. The parameters of the linearly polarized pulses were optimized to 1 Hz in repetition rate and 10 mJ/pulse in energy (less than 1%). The LIBS signal, collected by two plano-convex lenses with 50 mm focal length, was accessed by a 50 µm single-channel spectrometer (AvaSpec-ULS4096-EVO, Avantes). The single-channel spectrometer could record the plasma emission in the wavelength range of 200-1100 nm with a resolution of 0.05-0.24 nm. Under the circumstance of a high signal-to-noise ratio (SNR), the delay time and integration time were set to 1.2 and 30 µs, respectively. The coal sample was placed on a 2D stage, which could be shifted continuously in a serpentine (S-shaped) path relative to the laser, to increase the reproducibility of the measurement. High-purity air (21% O 2 and 79% N 2 ) at a flow rate of 1 L/min was applied to purge the plasma region, which increased the intensity of the emission lines and decreased the limit of detection [23].
Some differences exist in the intensities of the spectral emission lines between ablation points. Every coal tablet was divided into 16 parts (4 × 4), and each part was placed at 16 ablation points (4 × 4) with a spacing of 500 µm. The spectral intensities of the 256 ablation points were averaged to represent the spectrum of each coal sample. Some differences exist in the intensities of the spectral emission lines between ablation points. Every coal tablet was divided into 16 parts (4 × 4), and each part was placed at 16 ablation points (4 × 4) with a spacing of 500 μm. The spectral intensities of the 256 ablation points were averaged to represent the spectrum of each coal sample.

Spectral Pretreatment
Taking the average spectrum of 256 spectra could enhance the repeatability of the LIBS signal and improve the accuracy of quantitative analysis. For example, the calculated relative standard deviation (RSD) of Fe 461.88 nm emission (one of 256 spectra selected in section 3.1) was 0.09. However, the baseline of the average spectrum was prone to drift. Removing the baseline has a significant effect on increasing the SNR. The baseline was removed by outlier elimination and the first-order derivative [24], which could enhance the implicit peaks and separate the overlapped peaks [25].
where is the intensity of the corrected spectrum, is the intensity of the raw spectrum, and is the intensity of the estimated baseline.

PCR
PCA combines the independent variables into a few principal components that could fully reflect the overall information through linear transformation to avoid the collinearity problem between variables. PCA could not only distinguish and remove the redundant spectral emission lines, but also extract the most related information [26]. Then, the principal components are used for the multiple linear regression (MLR) of dependent variables, namely, PCR.
Herein, the independent variables are the intensities of the selected spectra, referring to the characteristic emission lines in section 3.1, and the dependent variable is the proximate analysis. The cumulative explained variance of the selected principal components was set to 90% or 95% [27]. In accordance with the component matrix generated by PCA, the principal components were expressed by the original variables for linear regression. Finally, MLR was used to map the selected principal components to the coal proximate analysis. Compared with MLR, PCR reduces the number of input variables and improves the stability and fitness of the regression. The key to the accuracy and stability of the PCR model is the number of principal components.

Spectral Pretreatment
Taking the average spectrum of 256 spectra could enhance the repeatability of the LIBS signal and improve the accuracy of quantitative analysis. For example, the calculated relative standard deviation (RSD) of Fe 461.88 nm emission (one of 256 spectra selected in Section 3.1) was 0.09. However, the baseline of the average spectrum was prone to drift. Removing the baseline has a significant effect on increasing the SNR. The baseline was removed by outlier elimination and the first-order derivative [24], which could enhance the implicit peaks and separate the overlapped peaks [25].
where I is the intensity of the corrected spectrum, I raw is the intensity of the raw spectrum, and I baseline is the intensity of the estimated baseline.

PCR
PCA combines the independent variables into a few principal components that could fully reflect the overall information through linear transformation to avoid the collinearity problem between variables. PCA could not only distinguish and remove the redundant spectral emission lines, but also extract the most related information [26]. Then, the principal components are used for the multiple linear regression (MLR) of dependent variables, namely, PCR.
Herein, the independent variables are the intensities of the selected spectra, referring to the characteristic emission lines in Section 3.1, and the dependent variable is the proximate analysis. The cumulative explained variance of the selected principal components was set to 90% or 95% [27]. In accordance with the component matrix generated by PCA, the principal components were expressed by the original variables for linear regression. Finally, MLR was used to map the selected principal components to the coal proximate analysis. Compared with MLR, PCR reduces the number of input variables and improves the stability and fitness of the regression. The key to the accuracy and stability of the PCR model is the number of principal components.

ANNs
Originating from biology, ANNs combine mathematics with physics to abstract the human brain's neural network from the perspective of information processing. ANNs are usually composed of an input layer, a hidden layer, and an output layer, with basic units called neurons. The input data are inputted into the activation function in the hidden layer after weighted summation to calculate the output value. Meanwhile, the value of the loss function is calculated to update the weight. ANNs have strong input-output nonlinear mapping, self-adaptability, and learning ability. However, ANNs have a low study rate and local minimum solution, leading to the inaccuracy of predictive performance. ANNs cannot easily study the contribution of input variables to the output [28]. Though traditional ANNs could predict quantitative analysis [29], a large number of input variables could increase the difficulty of the training model and reduce the predictive precision [30].
Trained with a backpropagation algorithm, the ANN model consists of sigmoid hidden layer neurons and linear output neurons. The more neurons in the hidden layer, the higher the prediction accuracy. If the network is too complex, it may not converge. Therefore, the trade-offs between network complexity and training efficiency are worthy of attention. The number of hidden layer neurons is roughly determined by the following equation [31]: where N (h) is the number of hidden layer neurons, N (i) is the number of input neurons, and N (o) is the number of output neurons.

PCA-ANN
The selected principal components with large variances were first used to replace the original variables to eliminate information redundancy in the raw data. An ANN assisted by PCA was used to reduce the input variables and improve the efficiency of training models. Compared with the ANN model, the PCA-ANN model simplifies the network structure and computational process.
The principal component values calculated by the component matrix and the coal proximate analysis were treated as independent and dependent variables for training the ANN model, respectively. When the principal component scores of a coal sample are inputted into the trained ANN model, a predicted proximate analysis result is outputted. The feature extraction of the LIBS spectrum performed by PCA-ANN can be found in previous research [32].

Model Evaluation Indicators
Leave-one-out cross-validation (LOO-CV) was adopted to verify the generalization effect of these three models. Three indicators were used to evaluate the accuracy and stability of the models. R 2 represents the fitting degree of a model (a value equal to one is a perfect fit). RMSECV describes the deviation between the predicted and real values (a value equal to zero is a perfect fit). MSE enlarges the value with a large deviation and compares the stability of different models (a value equal to zero is a perfect fit).
where y i , y i , andŷ i are the real, mean, and predicted values from the training set, respectively; and n is the number of samples in the training set.

Selection of Characteristic Emission Lines
Five indices of proximate analysis, which consisted of moisture, ash, volatiles, fixed carbon, and heating value, were utilized to assess the coal quality in this context. The organic emission lines (e.g., C, H, and O) associated with these five indices were suitable to integrate into noise; thus, setting up a database alone was difficult [15]. The molecular spectra of CN and C 2 were chosen to complement the reduction in atomic C emissions, because the collisions of O and N with C in the air enhanced the formation of molecular species, such as CN, C 2 , and CO [33]. Several mineral elements correlated with volatiles and ash were also taken into consideration to cover all elements in coal as much as possible. Therefore, the database was composed of 63 emissions, including C, H, O, N, Li, Na, Mg, Al, Si, Ca, Ti, Fe, C 2 , and CN [34], as shown in Table 2. Owing to the addition of 90 wt. % KBr, the intensity of the K emission line (766.49 nm) was the highest in all wavelengths, as shown in Figure 2. Thus, the K emission lines were eliminated to prevent the high concentration from affecting the prediction accuracy. The removal of the baseline not only retained the original appearance of spectral peaks, but also improved the SNR, which contributed to the selection of characteristic emission lines. The basic considerations for selecting the emission lines were as follows [35]: • High intensity to obtain a high SNR; • High probability of excitation to ensure the repeatability of the experiment; • Interference-free spectrum to exclude the influence of spectral line overlap. only retained the original appearance of spectral peaks, but also improved the SNR, which contributed to the selection of characteristic emission lines. The basic considerations for selecting the emission lines were as follows [35]: • High intensity to obtain a high SNR; • High probability of excitation to ensure the repeatability of the experiment; • Interference-free spectrum to exclude the influence of spectral line overlap.

PCR
Several uncorrelated principal components were obtained by dimensional reduction of PCA. According to Figure 3a, four principal components could account for 92.68% of the cumulative explained variance. Even if the cumulative explained variance was over 90%, the model evaluation indicators, such as R 2 and RMSECV, as shown in Figure 3b, were still essential. When the number of principal components was selected to be 14, the RMSECV value was the smallest, at only 0.53%, and the cumulative explained variance

PCR
Several uncorrelated principal components were obtained by dimensional reduction of PCA. According to Figure 3a, four principal components could account for 92.68% of the cumulative explained variance. Even if the cumulative explained variance was over 90%, the model evaluation indicators, such as R 2 and RMSECV, as shown in Figure 3b, were still essential. When the number of principal components was selected to be 14, the RMSECV value was the smallest, at only 0.53%, and the cumulative explained variance met the convention of 90% cumulative explained variance [27]. Then, 14 principal components of each coal sample were determined and mapped to the coal proximate analysis by MLR. components 1 and 2 were orthogonal, some emission lines were negatively correlated with principal component 2, such as O, Na, Mg, Ca, and CN. The scoring trends of the C, Si, Li, Fe, and C 2 emission lines were generally similar, whereas those of the H, O, Na, Mg, Ca, and CN emission lines were the opposite. Although each principal component had no clear physical meaning, the loadings of the first two principal components were devoted to determining which potential emission lines were vital to select the emission lines [36].  The correlation between the input emission lines and the first two principal components was shown in Figure 4. Principal component 1 was positively correlated with the intensities of the C, O, Li, Na, Mg, Ca, Fe, C2, and CN emission lines. As principal components 1 and 2 were orthogonal, some emission lines were negatively correlated with principal component 2, such as O, Na, Mg, Ca, and CN. The scoring trends of the C, Si, Li, Fe, and C2 emission lines were generally similar, whereas those of the H, O, Na, Mg, Ca, and CN emission lines were the opposite. Although each principal component had no clear physical meaning, the loadings of the first two principal components were devoted to determining which potential emission lines were vital to select the emission lines [36].   A clustering phenomenon could be observed in Figure 5, and the first two principal component scores of each coal sample corresponded to the types of coal, especially in bituminous coal. The scores of principal components calculated by PCA were widely used in sample classification, implying that similar samples could obtain close scores of principal components [37]. Thus, PCA could be applied to roughly classify a large number of coal samples into different types in accordance with the scores. Furthermore, the predictive model could be trained by different groups to predict the coal proximate analysis accurately. A clustering phenomenon could be observed in Figure 5, and the first two principal component scores of each coal sample corresponded to the types of coal, especially in bituminous coal. The scores of principal components calculated by PCA were widely used in sample classification, implying that similar samples could obtain close scores of principal components [37]. Thus, PCA could be applied to roughly classify a large number of coal samples into different types in accordance with the scores. Furthermore, the predictive model could be trained by different groups to predict the coal proximate analysis accurately.

ANNs
ANNs are devoted to the mapping of multi-inputs and multi-outputs; thus, they are suitable for solving nonlinear problems with complex internal mechanisms. However, an

ANNs
ANNs are devoted to the mapping of multi-inputs and multi-outputs; thus, they are suitable for solving nonlinear problems with complex internal mechanisms. However, an overfitted situation, where the ANN model performs well in training but poorly in prediction, may occur. The most direct method to avoid overfitting is to add the number of spectra for each coal sample [38]. Thus, the 256 spectra of each coal sample were used to cover more changes and enlarge the training set.
The intensities of the 63 emission lines selected in Table 2 were inputted as independent variables, and the proximate analysis was inputted as the dependent variable to train the model. In this work, 12 coal samples were selected as the training set, two samples were regarded as the validation set, and the remaining two were used as the prediction set. The proportions of the training set, validation set, and prediction set were approximately 70%, 15%, and 15%, respectively. The most suitable number of hidden layer neurons was 11, fixed by the minimum RMSECV and the maximum R 2 , as shown in Figure 6. When the number of hidden layer neurons was 13, the ANN model was overfitted, which indicated that ANNs may need to conduct data regularization for enhanced stability [38].

PCA-ANN
The PCA-ANN model contained the nonlinear regression of ANN. Thus, the problem of overfitting in section 3.3 also existed when the number of hidden layer neurons was nine, as shown in Figure 7. If a sufficient number of principal components was used, the accuracy of classification could be improved [39], indicating the importance of selecting principal component input for ANN. According to Figure 3a, almost 100% of the variance could be explained by the first 15 principal components, which were inputted from large to small by eigenvalue (EV-PCA-ANN) to build an ANN model. The maximum R 2 and minimum RMSECV optimized the number of hidden layer neurons to five, at which point the performance of the PCA-ANN model was the best.

PCA-ANN
The PCA-ANN model contained the nonlinear regression of ANN. Thus, the problem of overfitting in Section 3.3 also existed when the number of hidden layer neurons was nine, as shown in Figure 7. If a sufficient number of principal components was used, the accuracy of classification could be improved [39], indicating the importance of selecting principal component input for ANN. According to Figure 3a, almost 100% of the variance could be explained by the first 15 principal components, which were inputted from large to small by eigenvalue (EV-PCA-ANN) to build an ANN model. The maximum R 2 and minimum RMSECV optimized the number of hidden layer neurons to five, at which point the performance of the PCA-ANN model was the best.

Comparison and Analysis
Due to the complex and diverse composition of coal, three chemometric methods were proposed to predict the coal proximate analysis after spectral pretreatment (See Table 3). PCR is regarded as a linear algorithm, and ANN is a nonlinear algorithm. PCA-ANN first performed linear dimensionality reduction (PCA) and then mapped the principal component scores to coal proximate analysis through nonlinear transformation (ANN). The optimal parameters of each algorithm were determined by cross-validation. The comparison of predicted and real values of coal proximate analysis by three different chemometric methods was shown in Figure 8.

Comparison and Analysis
Due to the complex and diverse composition of coal, three chemometric methods were proposed to predict the coal proximate analysis after spectral pretreatment (See Table 3). PCR is regarded as a linear algorithm, and ANN is a nonlinear algorithm. PCA-ANN first performed linear dimensionality reduction (PCA) and then mapped the principal component scores to coal proximate analysis through nonlinear transformation (ANN). The optimal parameters of each algorithm were determined by cross-validation. The comparison of predicted and real values of coal proximate analysis by three different chemometric methods was shown in Figure 8. A slight difference was found among the three predictions of moisture. Among them, the PCR model performed best, and its R 2 , RMSECV, and MSE values were 0.9904, 0.2458%, and 0.0565, respectively. For ash, the prediction of PCR was the most accurate and stable. The R 2 values of ash between the predicted and real values, as calculated by  A slight difference was found among the three predictions of moisture. Among them, the PCR model performed best, and its R 2 , RMSECV, and MSE values were 0.9904, 0.2458%, and 0.0565, respectively. For ash, the prediction of PCR was the most accurate and stable. The R 2 values of ash between the predicted and real values, as calculated by PCR, ANN, and PCA-ANN, were 0.9986, 0.9770, and 0.9179, respectively; the corresponding RMSECV values obtained by LOO-CV were 0.2455%, 1.0060%, and 1.8955%, respectively. The MSE values of PCR and ANN were 0.0564 and 0.9496, while the MSE value of PCA-ANN was 11.6299, indicating that the prediction model was not as stable as the other two. In terms of fixed carbon, the best prediction model was PCR. The R 2 and RMSECV values of the three models were close. The MSE value of PCR was far lower than that of the other two, indicating that PCR was more robust. The PCR model had the best performance for the quantification of volatiles. Though the R 2 values of all models were greater than 0.99, the robustness of the model still needed to be considered. The RMSECV values of PCR, ANN, and PCA-ANN were 0.6857%, 0.8973%, and 1.6452%, respectively. Compared with PCR and ANN, the MSE value of PCA-ANN reached 24.5579, which showed that PCA-ANN was not suitable for the prediction of volatiles. When predicting the heating value, the highest R 2 , and the lowest RMSECV and MSE demonstrated that the PCR was still the prime model for prediction.  The modeling time was considered as an important index of the modeling efficiency. When the same algorithm used different parameters, its modeling time was similar, which was consistent with Zhang's work [15]. The modeling time of different algorithm models varied greatly; for instance, the modeling times of PCR, ANN, and PCA-ANN to predict moisture were 10, 89, and 28 s, respectively. The modeling time of PCR was generally the shortest, PCA-ANN ranked second, and the modeling efficiency of ANN was the lowest. The prediction results of the cross-validation and modeling time were shown in Table 3.
As a result, the PCR should be considered first when coal proximate analysis was predicted under similar experimental conditions. After dimensional reduction and linear regression, PCR had good accuracy and stability, and its training time was the shortest. The intensities of the selected emission lines had a great linear correlation with the coal proximate analysis. First, the principal components were extracted directly from the 63 emission lines, non-reabsorbed and non-overlapping, which provided the possibility of linear correlation. With the addition of KBr binder, the generation and disappearance of shock waves were at the same level during the ablation process of coal [40], which suppressed the matrix effect of coal. Meanwhile, the laser-induced ablation pyrolysis of volatiles may have been strongly inhibited to provide a good environment for forming plasmas with stable emissions. Thus, a strong corresponding relationship existed between the component concentration and spectral intensity in coal.
Due to the large number of variables to be inputted, the training of the ANN was time consuming, while its prediction accuracy and stability were slightly lower than those of PCR. Some of the selected emission lines may not have mattered, which wasted computing resources and resulted in inaccurate results. Although the correlation between the component concentration and elemental concentration in coal was nonlinear, the selected emission lines weakened some nonlinear factors, to an extent. Another reasonable explanation was that the temporal and spatial uniformity of the plasma, provided by the KBr, improved the linear relation, causing the ANN to not be as applicable as PCR.
In contrast, PCA-ANN was not suitable for the prediction of coal proximate analysis. The reason why PCA-ANN lacked accuracy and stability was that the principal components generated by PCA and the input data for ANN were unrelated, consistent with the results of Drezga et al. [41]. The principal components obtained by PCA were usually sorted by EV-PCA-ANN for training ANN, similar to the present work. The results revealed that the principal components, obtained by PCA and then selected by GA, for training ANN (PCA-GA-ANN) had a better performance than the EV-PCA-ANN in literature [32].

Conclusions
In this work, the spectra of 16 coal samples with added KBr binder were investigated by a LIBS experimental setup. Three types of chemometric methods were applied to predict the coal proximate analysis: PCR, ANN, and PCA-ANN, representing linear, nonlinear, and a combination of linear and nonlinear methods, respectively. Three indicators that expressed the accuracy and stability of the model were employed to fix the optimal parameters and evaluate the performance of the model. With the highest R 2 and the lowest RMSECV and MSE values, the PCR method was deemed the most effective approach to quantitative proximate analysis. Although the R 2 values of ANN and PCA-ANN were both above 0.9, their RMSECV and MSE values were much higher than those of PCR. The training time of ANN was the longest, and a gap was found between the prediction and training ability. PCA reduced the number of input variables of ANN, whereas the nonlinear fitting of the unrelated variables reduced the robustness of PCA-ANN.
On the basis of the advantages of multi-element rapid measurement, simple sample preparation, and remote measurement, LIBS could be able to provide in-situ and online measurement and analysis for coal-fired boilers in the future. The three chemometric methods could be used to predict coal proximate analysis to provide an expedient method for obtaining the proximate analysis of coal burning in power plant boilers.