Identification of Coal Geographical Origin Using Near Infrared Sensor Based on Broad Learning

Geographical origin, an important indicator of the chemical composition and quality grading, is one essential factor that should be taken into account in evaluating coal quality. However, traditional coal origin identification methods based on chemistry experiments are not only time consuming and labour intensive, but also costly. Near-Infrared (NIR) spectroscopy is an effective and efficient way to measure the chemical compositions of samples and has demonstrated excellent performance in various fields of quantitative and qualitative research. In this study, we employ NIR spectroscopy to identify coal origin. Considering the fact that the NIR spectra of coal samples always contain a large amount of redundant information and the number of samples is small, the broad learning algorithm is utilized here as the modelling system to classify the coal geographical origin. In addition, the particle swarm optimization algorithm is introduced to improve the structure of the Broad Learning (BL) model. We compare the improved model with the other five multivariate classification methods on a dataset with 243 coal samples collected from five countries. The experimental results indicate that the improved BL model can achieve the highest overall accuracy of 97.05%. The results obtained in this study suggest that the NIR technique combined with machine learning methods has significant potential for further development of coal geographical origin identification systems.


Introduction
Coal, as one of the most abundant primary fossil fuels, plays a critical role in meeting global energy needs and will remain important to humankind for many years to come [1].It is highly desired to evaluate the coal quality for the rational use of coal.However, various factors can determine the quality of coal.Among them, one important aspect is its geographical origin [2,3].The geographical origin can represent comprehensive factors including the climate, hydrology and minerals.The traditional coal geographical origin identification methods via chemistry experiments are complex and time consuming [4].Meanwhile, some other rapid detection methods, such as gamma-ray-based methods [5] and microwave heating-based methods [6], focus on only a single property of coals (e.g., fixed carbon and moisture).Considering the limitations of previous methods, rapid, non-destructive, cost-effective analytical techniques for the coal geographical origin identification would be highly desirable [7,8].In recent years, vibrational spectroscopy techniques along with the machine learning algorithms have been proven to be powerful tools in the analysis of fuel samples [9].For instance, Wang et al. used Near-Infrared (NIR) spectroscopy with improved PLS regression for the rapid analysis of six coal properties [10].Yang et al. identified coal and carbonaceous shale based on visible and NIR spectroscopy [11].
NIR spectroscopy is an effective and efficient technique to measure the chemical compositions of samples and has been widely used in the agricultural, petrochemical and pharmaceutical industries in the past few decades [12][13][14][15].NIR has also been used to identify geographical origins.For example, Lin et al. used NIR spectroscopy and SPA-LDA simultaneously to classify the geographical origin and quality of tea [16].Tony et al. classified the geographical origin of honey samples by NIR spectroscopy [17].Gang-Feng Li et al. identified the adulterations and geographical origins of Chinese herbs by NIR spectroscopy and chemometrics [18].Many traditional learning methods including Support Vector Machine (SVM) [19], Back Propagation Neural Network (BPNN) [20], Random Forest (RF) [21], etc., are combined with the NIR spectra to discriminate the geographical origin and quality of food and herbs [22].However, compared with the NIR spectra of the above organics, there is more redundant information and noise in the NIR spectra of coal samples.Moreover, the data set is small, and inhomogeneity may exist.Considering the problems mentioned above, the traditional classification methods cannot meet the requirements of accurate and effective identification [23][24][25][26].
In this study, we employ a novel classification algorithm, Broad Learning (BL), which was recently proposed by Chen et al. [24].BL is well known for its superior performance in solving several classification problems.Unlike the deep learning-based algorithms [27], there is no layer-to-layer coupling in the network structure.It has the advantage of a simple structure and low computational cost.The BL can be used in complex classification problems with less parameters to compute.Moreover, the incremental learning systems of BL can efficiently remodel the model in a broad, expansive way, and the retraining process is unnecessary if the network is deemed to be expanded.In order to further improve the performance of the BL model, we employ Particle Swarm Optimization (PSO) to optimize the structure of the BL model.Given its advantages of being powerful and easy to implement, the PSO algorithm has become universally applicable to various optimization problems [28].Finally, we compare the proposed method with state-of-the-art methods in coal geographical origin identification.The results demonstrate that the proposed PSO-BL-based strategy achieves the best performance with an accuracy of 97.05%, which strengthens the contribution of the proposed method and further emphasizes the importance of the optimal parameters.
In this study, we propose a novel method to identify coal geographical origin.The main contributions of this paper are three-fold: 1. Inspired by the usage of NIR in agriculture, petrochemical and pharmaceutical industries, we employ NIR to identify coal geographical origin, which is unprecedented.This method is fast and non-destructive.2. Considering the noisy NIR spectral and limited samples, we employ the BL as the modelling algorithm for its advantages of a simple structure, robustness to noise and excellent performance in previous studies.Compared with the traditional methods in the literature, this study improves the classification precision.3. The performance of BL is largely dependent on the network structure.In order to obtain the optimal parameters for the BL model, the PSO algorithm is utilized.The proposed method is able to classify the coal geographical origin accurately.
The rest of this paper is organized as follows.The experimental preparations, including the sources of data, data processing and outlier rejection, are presented in Section 1. Section 2 explains the theory of the BL model and presents the detail of our proposed PSO-BL model.Some comparison experiments are carried out to compare the performance of the PSO-BL model with that of the other five traditional discrimination methods, and the results are shown in Section 3. Section 4 concludes the whole paper.

Experimental Data
We collected 243 coal samples from five countries, including 47 samples from Australia, 36 samples from Russia, 36 samples from Canada, 83 samples from Indonesia and 41 samples from China.Before spectra recording, all samples were prepared according to the "Method for Preparation of Coal Sample" (GB474-2008) [29] by using a KERP hammer crusher, an SF-05 automatic sample splitter, and a 0.2 mm-standard sieve in the National Laboratory of the Import and Export Quarantine Inspection Bureau.Prior to the measurement of each sample's spectrum, we measured the background spectrum, the response of the spectrometer when there was no sample in place.Then, we utilized the background spectrum to eliminate signals due to the spectrometer and its environment.Therefore, the final spectrum was due solely to the sample.Then, the prepared coal samples were scanned by an Antaris II FT-NIR Spectrometer (Thermo Electron Co., Waltham, MA, USA) in the range of 10,001.00-3999.64cm −1 , and in total, we obtained 1557 wavelength points.Each sample spectrum was scanned 64 times, and the spectrum of each sample was the average of 64 scans performed at a resolution of 4 cm −1 .The near-infrared spectrum of the coal samples is shown in Figure 1.As can be seen from Figure 1, the NIR spectra consists of broad, weak and extensively-overlapping bands, which may impede further analysis.In order to identify coal geographical origins, it is of great necessity to employ powerful machine learning algorithms that are robust to noise.

Outlier Rejection
In practice, experimental data often contain outliers, which are different from the majority.During the spectral data collection, outliers might be due to the environmental influence and incorrect operating.However, commonly-used machine learning methods are sensitive to such outliers, and the results may be adversely affected by them [30].Therefore, it is of great importance to detect and reject these outliers.In this paper, the outliers are detected according to the Euclidean distance (Ed) between samples with the centres.First, we calculate the average spectrum of samples from each country, denoted as the central vector, where #s is the number of samples in each country.Then, we calculate the average of the Euclidean distances between the spectra of each coal sample and the central vector: To detect the outliers, the threshold is set as 3 × dis.The sample is regarded as an outlier when the Euclidean distance between the coal sample and X is larger than the threshold.The results of the outlier rejection are shown in Figure 2. It can be found that the samples whose indexes are 12 in Australia, 2 and 3 in Russia, 12 and 21 in Canada and 73 in Indonesia are eliminated from the data set.We also employed the Hotelling T-squared [31] to remove the outliers, and similar results were achieved.

BL Algorithm
BL is based on the Random Vector Functional-link Neural Network (RVFLNN) previously proposed in [32].Instead of gradient-descent-based learning algorithms [33], RVFLNN provides the generalization capability as a function approximation by calculating the pseudoinverse to find the desired connection weights.However, RVLNN does not work well in the modern large data era.Chen et al. proposed a novel strategy, namely broad learning, which is able to cope with the new incoming data [24].As shown in Figure 3, there are four parts of a BL network, including input, output, feature nodes and enhancement nodes.In the BL model, the input data are first mapped into a series of random feature nodes, similar to the feature extraction process in traditional machine learning.After that, the mapped random feature nodes are transformed into enhancement nodes by a nonlinear transformation.Further, we connect the output label with the combination of mapped feature nodes and enhancement nodes, and the connection weight W is the learning parameter, which can be calculated by ridge regression [34].
Given a dataset (X, Y) with N samples where X are the input data and Y is the output label, the algorithm implementation process is as follows: We first transform the input data X into the mapped random features Z i , i = 1, . . ., n, where n is the number of mapped features.The i th mapped feature node is: where W e i and b e i are the random weights and bias and φ i (•) is the mapping function.Then, we denote . ., Z n ) as the outputs of n feature nodes.In order to obtain sparse and compact features of Z * , we fine-tune the initialized weights W e i by using a sparse autoencoder.After obtaining the mapped feature nodes, Z * is input into the enhancement nodes H j , j = 1, . . ., m where m is the number of enhancement nodes.The j th enhancement nodes can be calculated as: where W h j and β h j are the random weights and bias and δ j (•) is the active function chosen as tanh(•) here.We define the outputs of enhancement nodes as Hence, the algorithm of BL can be described as: where [Z * |H * ] is the total feature nodes and W is the desired connection weight matrix between the feature nodes and the output label.By taking A = [Z * |H * ], Formula (5) can be represented as Y = AW.The BL aims to calculate the W through ridge regression approximation [34] as the following minimization problem: where λ is a positive constant to avoid over-fitting.We calculate the partial derivation of Formula ( 6) with respect to W and set the derivation to be zero.Then, we can get the W as: Then, we can find the desired connection weights.

Proposed PSO-BL Model
The BL model is significantly affected by the relevant parameters of its structure.Hence, in order to improve the performance of the BL model, we employ the PSO algorithm to search for the optimal parameters.PSO is a classic global optimization strategy that is based on the flocking behaviour and social co-operation of birds [35].In this experiment, we denote N 1 as the number of windows for feature mapping, N 2 as the number of nodes in each window and N 3 as the number of enhancement nodes.During the BL model construction, we find that the classification performance is largely dependent on the selection of these three parameters, N 1 , N 2 and N 3 .The PSO algorithm is applied here to obtain the optimal values of these three parameters.The fitness function of the optimization process is defined as follows, where bl k [n 1 , n 2 , n 3 ] is the test accuracy of the k th BL model when the parameters equal n 1 , n 2 , n 3 .
In order to reduce the random error, we calculate the BL model 10 times and return the mean of the accuracy as the fitness value.Figure 4 demonstrates the optimization process of the proposed BL with PSO.During the PSO optimization process, we initialize a swarm of 50 particles.In this study, the dimension of parameters' particles d is set as three, which represents the number of parameters to optimize.We denote p d best,h (t) as the best solution that particle h has obtained until generation t, and p d gbest (t) is the best solution of all particles.Similarly, the velocity V d h (t) and X d h (t) denote that the velocity and position of the particle h have obtained generation t.In our study, it should be noted that the optimal parameters are the number of mapping features and enhancement nodes, which must be a positive integer.Similarly, the optimal velocity is also an integer.In each generation, we update velocity by the following formula, where r 1 and r 2 are random numbers between [0,1] and c 1 , c 2 are two positive acceleration constants chosen as 1.4962, which is commonly selected in the PSO algorithm.The roundis the function to round the number to the nearest integer.w is the dynamic inertia weight, which is expressed as: where w max = 0.9, w min = 0.25, t is the current generation and M is the max generations, which equals 40.The position is updated as follows, During the optimal process, we heuristically assume that the velocity V d h when d = 1, . . ., 3 is in the range of [−10, 10], [−10, 10] and [1000, −1000], respectively, the corresponding position p d h is in the range of [1,50], [1,50] and [10, 10, 9), (11); update the inertia weight according to Formula (10); end return p d gbest (t) and the corresponding fitness value;

Model Construction
This section presents the model construction and the design of the experiment.After rejecting six outliers, we had 237 samples remaining.During the PSO-BL mode, we first divided the 237 samples into the training data set, the validation data set and the test data set.To improve the stability of the BL model and use the data more efficiently, 10-fold cross-validation was employed here.The total data set was split into 10 parts.Eight parts was applied in the training process of the BL model, and one part was utilized as the validation data to return the fitness value of the given parameters.The remaining part was used as the test data set to evaluate the performance of the PSO-BL model.The program was run 10 times, and each sample was given the opportunity to be used in the hold-out test set.The modelling process of the PSO-BL model is shown in Figure 5.To improve the predictive ability of the BL system, in this study, we employed the PSO algorithm to optimize the construction of the BL network.The fitness value of the initial population is shown in Figure 6, where the swarm size is 50.It can be seen from Figure 6 that the fitness value of the initial population was mostly in the range of [0.84 0.96].The performance was largely dependent on the value of the parameters.As mentioned above, 10-fold cross-validation was employed here.Hence, we repeated the experiment 10 times, and the global optimal fitness values of these 10 iterations are shown in Figure 7. Table 1 demonstrates the process of cross-validation and shows the performance on 10 different test sets.We also calculate the accuracy corresponding to each region (i.e., five countries).As shown in Table 2, the PSO-BL model totally misclassified seven samples and achieved the state-of-the-art performance of 97.05%.We also evaluated the performance of the proposed PSO-BL algorithm without outlier rejection, and the corresponding accuracy was 91.88%, significantly lower than that of the scheme with outlier removal.Here, these 10 subfigures correspond to 10 different validation sets in the cross-validation.Furthermore, we compared the proposed PSO-BL method with six traditional machine learning methods to identify the geographical origin of coal samples, including SVM, K-Nearest Neighbour (K-NN) [36], Radial Basis Function Neural Network (RBFNN) [37], the BPNN algorithm, RF and the BL model.Principal components analysis was employed to reduce the dimension and hence to enhance the computational efficiency of the SVM model.Moreover, the PSO algorithm was also used to obtain the optimal two parameters of the SVM model, namely c and gamma of the RBF kernel SVM.

Experiment Results
The classification results of all the above-mentioned machine learning methods are shown in Table 2. BL, SVM, RBFNN and BPNN performed better than the other traditional classification methods, and the proposed PSO-BL achieved the highest accuracy of 97.05%.The comparison results demonstrate the effectiveness of the proposed PSO-BL algorithm, strengthen its contribution and further consolidate the necessity of the optimal parameters.
In addition, as can be seen from Table 2, the models had better performance on the samples of Indonesia and China.We suspect the potential reason was in two-fold.First, the samples set was imbalanced, resulting in a better classification with the bigger sample set.Moreover, the coal samples from Australia, Russia and Canada were mostly lignite, and the coal samples from China and Indonesia were mostly gas coal.Therefore, it can be inferred that the NIR spectroscopy had a better classification performance of coal geographical origin for the gas coal.

Conclusions
In this study, a novel method based on NIR spectra was proposed for fast and non-destructive identification of coal geographical origin.We employed the BL algorithm as the modelling method due to its advantages of a simple structure, low computational cost, as well as its excellent performance.In order to further improve the predictive ability of the model, the PSO algorithm was utilized to optimize the structure of the BL model.In addition, we compared the proposed PSO-BL model with the other six different multivariate classification methods, including SVM, K-NN, RBFNN, BPNN, RF, as well as BL.The experimental results indicated that the proposed PSO-BL model was superior to previous approaches, achieving the best classification accuracy of 97.05% with 10-fold cross-validation.In general, the proposed PSO-BL based method can identify the geographical origin of coal effectively and efficiently.The established protocol is of great importance to evaluate coal quality.

Figure 1 .
Figure 1.Spectra of the coal samples from the five countries.

Figure 2 .
Figure 2. The results of outlier rejection.(a) Outlier rejection of samples from Australia; (b) outlier rejection of samples from Russia; (c) outlier rejection of samples from Canada; (d) outlier rejection of samples from Indonesia; (e) outlier rejection of samples from China.

Update velocity and position of each particle according to 9 YFigure 4 .
Figure 4. Process of the parameter optimization of Broad Learning (BL) with PSO.

Figure 5 .
Figure 5. Process of the PSO-BL model construction.

Figure 6 .
Figure 6.The fitness value of the initial swarm.

Figure 7 .
Figure 7.The global optimal fitness value over the validation dataset in the process of PSO optimization.Here, these 10 subfigures correspond to 10 different validation sets in the cross-validation.
000].The major steps of the proposed PSO-BL algorithm are summarized in Algorithm 1.

Table 1 .
Results of the PSO-BL model on the test set.

Table 2 .
Classification results of different methods.