Rapid Analysis of Composition of Coal Gangue Based on Deep Learning and Thermal Infrared Spectroscopy

: Coal gangue is the main solid waste in coal mining areas, and its annual emissions account for about 10% of coal production. The composition information of coal gangue is the basis of reasonable utilization of coal gangue, and according to the composition information of coal gangue, one can choose the appropriate application scene. The reasonable utilization of coal gangue can not only effectively alleviate the environmental problems in mining areas but also produce signiﬁcant economic and social beneﬁts. Chemical analysis techniques are the principal ones used in traditional coal gangue analysis; however, they are slow and expensive. Many researchers have used machine learning techniques to analyze the spectral data of coal gangue, primarily random forests (RFs), extreme learning machines (ELMs), and two-hidden-layer extreme learning machines (TELMs). However, these techniques are heavily reliant on the preprocessing of the spectral data. This research suggests a quick analysis approach for coal gangue based on thermal infrared spectroscopy and deep learning in light of the drawbacks of the aforementioned methodologies. The proposed deep learning model is named SR-TELM, which extracts spectral features using a convolutional neural network (CNN) consisting of a spatial attention mechanism and residual connections and implements content prediction with TELM as a regressor, which can effectively overcome the dependence on preprocessing. The usefulness and speed of SR-TELM in coal gangue analysis were demonstrated by comparing several models in order to verify the proposed coal gangue analysis model. The experimental ﬁndings show that, for the prediction tasks of moisture, ash, volatile matter, and ﬁxed carbon content, respectively, the SR-TELM model attained an R 2 of 0.947, 0.972, 0.967, and 0.981 and an RMSE of 0.274, 4.040, 1.567, and 2.557 with a test time of just 0.03 s. It offers a method for the analysis of coal gangue that is low cost, highly effective, and highly reliable.


Introduction
Coal is a significant source of energy and industrial raw materials. Among the fossil energy values, it has the most varieties and the longest history of development and use [1]. However, a large amount of coal gangue is produced in the process of coal production and washing. According to incomplete statistics, at present, the amount of gangue piled in China has exceeded 6 billion tons and is increasing at a rate of 500 to 800 million tons every year [2,3]. Long-term accumulation of gangue not only occupies a large amount of land but also causes serious environmental problems [4][5][6]. In addition, gangue piles are prone to spontaneous combustion or even explosion, which poses a direct threat to the safety of personnel and property in mining areas [7,8]. The harmful gases generated such as CO and SO 2 not only cause air pollution but also may further result in casualties [9][10][11]. Figure 1 shows the site photo of gangue piles in a coal mine area in Liaoning Province (China), where (a) is the gangue pile in spontaneous combustion, and (b) is the one without safety of personnel and property in mining areas [7,8]. The harmful gases generated such as CO and SO2 not only cause air pollution but also may further result in casualties [9][10][11]. Figure 1 shows the site photo of gangue piles in a coal mine area in Liaoning Province (China), where (a) is the gangue pile in spontaneous combustion, and (b) is the one without spontaneous combustion. The accumulation of such a large amount of gangue is both a serious challenge and a great opportunity for China. The resource utilization of gangue is key to green mine construction. Rapid determination of the composition of gangue is a necessary prerequisite for the resource utilization of gangue [12][13][14]. At present, the commonly used method in coal mines is to take typical samples from a gangue pile for testing and analyzing the composition of gangue and then evaluate the ways to harness the whole gangue pile [15]. However, such a method takes a long time and costs a lot despite its high accuracy. In recent years, infrared spectroscopy has been widely used in the fields of classification and composition inversion due to its high efficiency and non-destructive and non-polluting properties [16][17][18]. Numerous research studies in the coal industry have demonstrated that spectroscopy in conjunction with machine learning algorithms may be used to accurately identify the components of coal. Yan et al. [19] used kernel-based ELM to evaluate laser-induced breakdown spectroscopy (LIBS), and they were able to determine the carbon and sulfur contents with an RMSE of 0.3762% and an R 2 of 0.994. Liu et al. [20] successful prediction methods for moisture, volatile matter, and calorific value included random forest (RF), extreme learning machine (ELM), and ELM optimized by particle swarm. Song et al. [21] proposed an analytical method using a combination of LIBS and a synergic regression algorithm to achieve the prediction of calorific value and sulfur and volatile fraction content with errors of 0.299, 0.077, and 0.590. However, the preprocessing of the spectra is a crucial component of the aforementioned methods, and if the wrong preprocessing technique is employed, it will not only have a negative impact on the improvement of the model performance but also result in unreliable prediction accuracy of the subsequently built models.
Food, healthcare, and biology are three industries that have heavily utilized deep learning [22][23][24]. Deep learning enables the creation of end-to-end analytical models and does not rely on preprocessing in the field of spectroscopy [25,26]. For the prediction of moisture, ash, volatile matter, fixed carbon, low-level calorific value, and sulfur content in coal, Le et al. [27] integrated convolutional neural network (CNN) with ELM optimized using an artificial bee colony method. In order to forecast grain protein, moisture, and type using NIR spectroscopy, Assadzadeh et al. [28] built a deep learning prediction model. Experimental results demonstrate that the model's prediction error was much lower than that of linear techniques such as partial least squares (PLS). A quick nondestructive detection model for synthetic brilliant blue pigments in cream was proposed by Shin et al. [29] using near-infrared spectroscopy and deep learning, and they achieved R 2 values of 0.9638, RMSEP values of 0.0157, and RPD values of 4.4022. At present, the commonly used method in coal mines is to take typical samples from a gangue pile for testing and analyzing the composition of gangue and then evaluate the ways to harness the whole gangue pile [15]. However, such a method takes a long time and costs a lot despite its high accuracy. In recent years, infrared spectroscopy has been widely used in the fields of classification and composition inversion due to its high efficiency and non-destructive and non-polluting properties [16][17][18]. Numerous research studies in the coal industry have demonstrated that spectroscopy in conjunction with machine learning algorithms may be used to accurately identify the components of coal. Yan et al. [19] used kernel-based ELM to evaluate laser-induced breakdown spectroscopy (LIBS), and they were able to determine the carbon and sulfur contents with an RMSE of 0.3762% and an R 2 of 0.994. Liu et al. [20] successful prediction methods for moisture, volatile matter, and calorific value included random forest (RF), extreme learning machine (ELM), and ELM optimized by particle swarm. Song et al. [21] proposed an analytical method using a combination of LIBS and a synergic regression algorithm to achieve the prediction of calorific value and sulfur and volatile fraction content with errors of 0.299, 0.077, and 0.590. However, the preprocessing of the spectra is a crucial component of the aforementioned methods, and if the wrong preprocessing technique is employed, it will not only have a negative impact on the improvement of the model performance but also result in unreliable prediction accuracy of the subsequently built models.
Food, healthcare, and biology are three industries that have heavily utilized deep learning [22][23][24]. Deep learning enables the creation of end-to-end analytical models and does not rely on preprocessing in the field of spectroscopy [25,26]. For the prediction of moisture, ash, volatile matter, fixed carbon, low-level calorific value, and sulfur content in coal, Le et al. [27] integrated convolutional neural network (CNN) with ELM optimized using an artificial bee colony method. In order to forecast grain protein, moisture, and type using NIR spectroscopy, Assadzadeh et al. [28] built a deep learning prediction model. Experimental results demonstrate that the model's prediction error was much lower than that of linear techniques such as partial least squares (PLS). A quick nondestructive detection model for synthetic brilliant blue pigments in cream was proposed by Shin et al. [29] using near-infrared spectroscopy and deep learning, and they achieved R 2 values of 0.9638, RMSEP values of 0.0157, and RPD values of 4.4022.
In this paper, a novel model combining CNN and two-hidden-layer extreme learning machine (TELM) is proposed. Using CNN as a feature extractor can effectively overcome the high dimensionality, strong correlation, and inclusion noise interference of spectral data; using TELM as a regressor can quickly and accurately predict the content of moisture, ash, volatile matter, and fixed carbon, which provides a basis for the rational utilization of coal gangue.

Data Collection
In this paper, China Tiefa coal mining area (in Liaoning Province) was selected as the study area, and the five-point sampling method was used to sample the gangue pile in use. A total of 50 gangue samples were collected, and the sample size of each was about 10 cm × 10 cm × 5 cm block shaped. The sample photos are shown in Figure 2. In this paper, a novel model combining CNN and two-hidden-layer extreme learning machine (TELM) is proposed. Using CNN as a feature extractor can effectively overcome the high dimensionality, strong correlation, and inclusion noise interference of spectral data; using TELM as a regressor can quickly and accurately predict the content of moisture, ash, volatile matter, and fixed carbon, which provides a basis for the rational utilization of coal gangue.

Data Collection
In this paper, China Tiefa coal mining area (in Liaoning Province) was selected as the study area, and the five-point sampling method was used to sample the gangue pile in use. A total of 50 gangue samples were collected, and the sample size of each was about 10 cm × 10 cm × 5 cm block shaped. The sample photos are shown in Figure 2. In this experiment, Turbo FT portable infrared spectral radiometer manufactured by Designs & Prototypes (D & P) Company of the United States was used to test the thermal infrared spectrum of the ground objects. The spectral detection ranged from 3 to 15 μm, and the spectral resolution was 4 cm −1 . Thermal infrared spectrum experimental equipment and test site are shown in Figure 3. In this experiment, the multi-channel temperature tester were used to allow to measure the surface temperature of the sample more accurately. The instrument has 16 temperature measurement channels, enabling it to test and record 16 different types of temperature information at the same time, which can meet the requirements of outdoor experiments. In addition, other experimental instruments included in this experiment are hygrohygrometer, thermocouple thermometer, tape measures, etc.   In this experiment, Turbo FT portable infrared spectral radiometer manufactured by Designs & Prototypes (D & P) Company of the United States was used to test the thermal infrared spectrum of the ground objects. The spectral detection ranged from 3 to 15 µm, and the spectral resolution was 4 cm −1 . Thermal infrared spectrum experimental equipment and test site are shown in Figure 3. In this experiment, the multi-channel temperature tester were used to allow to measure the surface temperature of the sample more accurately. The instrument has 16 temperature measurement channels, enabling it to test and record 16 different types of temperature information at the same time, which can meet the requirements of outdoor experiments. In addition, other experimental instruments included in this experiment are hygrohygrometer, thermocouple thermometer, tape measures, etc. In this paper, a novel model combining CNN and two-hidden-layer extreme learning machine (TELM) is proposed. Using CNN as a feature extractor can effectively overcome the high dimensionality, strong correlation, and inclusion noise interference of spectral data; using TELM as a regressor can quickly and accurately predict the content of moisture, ash, volatile matter, and fixed carbon, which provides a basis for the rational utilization of coal gangue.

Data Collection
In this paper, China Tiefa coal mining area (in Liaoning Province) was selected as the study area, and the five-point sampling method was used to sample the gangue pile in use. A total of 50 gangue samples were collected, and the sample size of each was about 10 cm × 10 cm × 5 cm block shaped. The sample photos are shown in Figure 2. In this experiment, Turbo FT portable infrared spectral radiometer manufactured by Designs & Prototypes (D & P) Company of the United States was used to test the thermal infrared spectrum of the ground objects. The spectral detection ranged from 3 to 15 μm, and the spectral resolution was 4 cm −1 . Thermal infrared spectrum experimental equipment and test site are shown in Figure 3. In this experiment, the multi-channel temperature tester were used to allow to measure the surface temperature of the sample more accurately. The instrument has 16 temperature measurement channels, enabling it to test and record 16 different types of temperature information at the same time, which can meet the requirements of outdoor experiments. In addition, other experimental instruments included in this experiment are hygrohygrometer, thermocouple thermometer, tape measures, etc.

Data Description
In the whole thermal infrared spectrum range, 8-14 microns with the lowest signalto-noise ratio were selected as the spectrum range studied, and 256 bands were used. On the other hand, fifty samples were collected for measuring the contents of moisture, ash, volatiles, and fixed carbon. Table 1 describes the properties of coal gangue samples used in this paper.

Spectral Dimension Conversion
Spectral data collected by using a spectrometer are usually represented in one-dimensional form. One-dimensional spectral information mainly provides peak features in different bands, but its analysis space is limited if one-dimensional spectral data are directly used. After transforming the spectrum from one-dimensional state to two-dimensional, CNN can be used to obtain key features in the spectral data while suppressing irrelevant features and noise effects. In this experiment, after acquiring 256-dimensional spectral data, each sample was arranged into a 16 × 16 matrix in the shape of 'S' to obtain two-dimensional spectral data. The process of spectral dimension transformation is shown in Figure 4, and different colors in the two-dimensional spectrum represent different reflectivity.
In the whole thermal infrared spectrum range, 8-14 microns with the lowest signalto-noise ratio were selected as the spectrum range studied, and 256 bands were used. On the other hand, fifty samples were collected for measuring the contents of moisture, ash, volatiles, and fixed carbon. Table 1 describes the properties of coal gangue samples used in this paper.

Spectral Dimension Conversion
Spectral data collected by using a spectrometer are usually represented in one-dimensional form. One-dimensional spectral information mainly provides peak features in different bands, but its analysis space is limited if one-dimensional spectral data are directly used. After transforming the spectrum from one-dimensional state to two-dimensional, CNN can be used to obtain key features in the spectral data while suppressing irrelevant features and noise effects. In this experiment, after acquiring 256-dimensional spectral data, each sample was arranged into a 16 × 16 matrix in the shape of 'S' to obtain two-dimensional spectral data. The process of spectral dimension transformation is shown in Figure 4, and different colors in the two-dimensional spectrum represent different reflectivity.

Spatial Attention Mechanism [30]
In a picture, not all areas are of equal importance, and only the task-related areas need to be concerned. The spatial attention mechanism enables the network to find areas of higher importance. Similar to the image domains, the importance of different bands in the thermal infrared spectrum varies, and the attention mechanism can be used to give more weight to important bands while suppressing the noise or irrelevant band information in the spectrum. The spatial attention mechanism is shown in Figure 5, and its module consists of maximum pooling, average pooling, convolutional layer, and Sigmoid In a picture, not all areas are of equal importance, and only the task-related areas need to be concerned. The spatial attention mechanism enables the network to find areas of higher importance. Similar to the image domains, the importance of different bands in the thermal infrared spectrum varies, and the attention mechanism can be used to give more weight to important bands while suppressing the noise or irrelevant band information in the spectrum. The spatial attention mechanism is shown in Figure 5, and its module consists of maximum pooling, average pooling, convolutional layer, and Sigmoid activation function. The input features are processed by maximum pooling and average pooling to give prominence to important features, which is conducive to improving the efficiency of the network in finding important features and making the network better know which features need to be retained and emphasized and which need to be suppressed. activation function. The input features are processed by maximum pooling and average pooling to give prominence to important features, which is conducive to improving the efficiency of the network in finding important features and making the network better know which features need to be retained and emphasized and which need to be suppressed.

Residual Connection
In order to better understand intricate aspects, the network's depth is also rising. Convolutional layer addition alone, meanwhile, does not always help to minimize training error. When the network reaches a particular depth, the shallow network's capacity for learning may decline as the depth of the network rises, restricting the deep network's capacity for learning. He et al. [31,32] developed a network with residual connections that would allow the input to be immediately transferred to the following layer using an equivalent mapping function by adding an equivalent mapping branch to the original network. In order to prevent the loss of shallow features as the model's depth grows and to enable the deep network to acquire rich multi-scale features, the two sets of features are finally added one element at a time to accomplish multi-scale feature fusion. The residual connection's input and output relationship is stated using Equation (1).
where ( ) F x stands for the mapping connection between residual units, ω for the dimension-matching linear mapping, x for the input, y for the output, and ( ) f Data for the activation function. [33][34][35] Assume that the input of ELM is X , and its output is T . Let , then the activation function is ( ) g  . In this case, the output of the hidden layer is:

TELM
The weight between the hidden layer and the output layer is: Where H + is the Moore-Penrose generalized inverse matrix of H , and T T is the transposition of T .
TELM algorithm adds a second hidden layer on the basis of ELM algorithm, and the output of the second hidden layer is: In this case, the expected output is 1*

Residual Connection
In order to better understand intricate aspects, the network's depth is also rising. Convolutional layer addition alone, meanwhile, does not always help to minimize training error. When the network reaches a particular depth, the shallow network's capacity for learning may decline as the depth of the network rises, restricting the deep network's capacity for learning. He et al. [31,32] developed a network with residual connections that would allow the input to be immediately transferred to the following layer using an equivalent mapping function by adding an equivalent mapping branch to the original network. In order to prevent the loss of shallow features as the model's depth grows and to enable the deep network to acquire rich multi-scale features, the two sets of features are finally added one element at a time to accomplish multi-scale feature fusion. The residual connection's input and output relationship is stated using Equation (1).
where F(x) stands for the mapping connection between residual units, ω for the dimensionmatching linear mapping, x for the input, y for the output, and f (Data) for the activation function.

TELM [33-35]
Assume that the input of ELM is X, and its output is T. Let W F = [B, W], X F = [1, X] T , then the activation function is g(·). In this case, the output of the hidden layer is: The weight between the hidden layer and the output layer is: where H + is the Moore-Penrose generalized inverse matrix of H, and T T is the transposition of T. TELM algorithm adds a second hidden layer on the basis of ELM algorithm, and the output of the second hidden layer is: In this case, the expected output is H 1 * = Tβ + , and H 1 * = H 1 .
According to the weight W 1 and threshold B 1 of the second hidden layer, Equation (5) can be used to update the predicted output H 2 thereof: The weight between the second hidden layer and the output layer is: Finally, the TELM output is:

Proposed Methods
In this study, complex and in-depth spectral characteristics are extracted using a combination of the spatial attention mechanism and residual connection. As the depth of the model rises, the multi-scale fusion of spectral information is successfully accomplished while preventing the loss of shallow features and ensuring that the depth network may gain rich multi-scale features.
The fully connected layer in CNN is typically found at the topmost layer of the network model, where the input spectral data can be convolved and pooled to produce deep spectral features. These features are then sent to the fully connected layer, where they are used to realize the prediction function. However, a big network model will result in greater scale parameters for the completely linked layer. The fully connected layer parameters in the CNN training process, such as the convolutional and pooling layer parameters, must get the gradient through backpropagation and apply the gradient descent approach to achieve minimal loss. When the parameter scale of the full connection layer is too large, the training speed will be reduced, and the consumption of computing resources will be heavier. In this study, the fully connected layer is swapped out for the TELM algorithm as the CNN's regressor. The TELM algorithm's parameters do not require the use of gradient descent or backpropagation, which reduces training time and computational resources.
The SR-TELM model's structure is seen in Figure 6. It should be noted that in the SR-TELM model, a normalization layer and a ReLU layer come after each convolutional layer. Figure 7 depicts the SR-TELM model's spatial attention module's organizational structure. A series of 1 × 16 × 16 two-dimensional spectra, where 1 denotes the feature depth and 16 denotes the size of the spectrum, make up the input of the SR-TELM model.    The model is divided into a backbone path and a branch path, including a total of 6 convolutional layers. The specific parameters are shown in Tables 2 and 3, where BN stands for batch normalization. In the backbone path, the shallow network is used to extract some low-level features, and the deep network can learn advanced spectral features. The inputted spectral features are processed layer by layer by convolutional processing, and the extracted spectral features gradually increase in level. During this period, the embedded spatial attention mechanism module can better focus the network learning on key areas while avoiding the loss of important information. In the branch path, there is a convolutional layer whose main function is to transfer the original spectral features directly to the end of the network and adjust the feature size and depth to sum the output features of the backbone path together in an element-by-element way. By adding the features of backbone path as well as branch path, the multi-scale fusion of spectral features is realized while avoiding the loss of important features with the increase in network depth, enabling the regressor to combine richer spectral features to achieve accurate prediction. After the Flatten layer, the feature map is expanded into one dimension element by element, and The model is divided into a backbone path and a branch path, including a total of 6 convolutional layers. The specific parameters are shown in Tables 2 and 3, where BN stands for batch normalization. In the backbone path, the shallow network is used to extract some low-level features, and the deep network can learn advanced spectral features. The inputted spectral features are processed layer by layer by convolutional processing, and the extracted spectral features gradually increase in level. During this period, the embedded spatial attention mechanism module can better focus the network learning on key areas while avoiding the loss of important information. In the branch path, there is a convolutional layer whose main function is to transfer the original spectral features directly to the end of the network and adjust the feature size and depth to sum the output features of the backbone path together in an element-by-element way. By adding the features of backbone path as well as branch path, the multi-scale fusion of spectral features is realized while avoiding the loss of important features with the increase in network depth, enabling the regressor to combine richer spectral features to achieve accurate prediction. After the Flatten layer, the feature map is expanded into one dimension element by element, and the one-dimensional spectrum is then input into TELM. In addition, the excellent regression performance of TELM is used to realize the analysis of coal gangue characteristics. Table 2. CNN backbone path structure parameters.

Experimental Result
The separation of the training and test sets has a substantial influence on the experimental findings. In this work, a total of 50 samples were utilized as the data set, and the data set was then randomly divided into 12 samples as the test set in a 3:1 ratio. In order to obtain true and steady model performance, ten experiments were performed on a randomly divided test set for each model in this article. As a consequence, the final RMSE and R 2 were calculated by averaging the results of ten tests. Figure 8 shows the RMSEs for each SR-TELM test for moisture, ash, volatiles, and fixed carbon. It can be seen that the RMSE of moisture and volatile matter is smaller relative to ash and fixed carbon, which is caused by the smaller average values of moisture and volatile matter contents. In the identification results of ash and fixed carbon, there are multiple test results with large errors, which makes the final average RMSE of ash and fixed carbon larger.

Discussion
The experiments in this study were executed on a Windows 11 system, with the Python language and Keras used to build the network model and Matlab used to draw. To validate the performance of the SR-TELM model proposed in this study, we selected a number of models for comparison, including the TELM model optimized by principal component analysis (PCA), specifically ELM, TELM, random forest (RF), and PCA_TELM. Table 4 displays the specific experimental results.
The comparison model's ELM, TELM, and RF are frequently employed in regression problems. The benefits of computational simplicity and speed are shared by ELM and TELM, which are discussed in Section 2.4.3. The RF model consists of a collection of random regression trees, each of which has been trained on a data set, and the output of the random forest regressor is predicted by combining the predictions of every regression tree using Monte Carlo methods [36].
An enhanced model built on the foundation of TELM and PCA is the PCA_TELM in the comparison model. The original input spectrum had 1024 dimensions, making TELM unsuitable for processing large amounts of data. As a result, we apply PCA to decrease the input spectrum's dimensionality from 256 to 7, where the feature variance's cumulative contribution is 99.97%. The great performance of the model may be fully tapped by feeding TELM with lower-dimension data. Using the determination of fixed carbon as an example, an RMSE and R 2 of 5.181 and 0.911 were achieved, respectively, when the raw spectral data were entered into TELM. The RMSE was significantly reduced to 4.527 when

Discussion
The experiments in this study were executed on a Windows 11 system, with the Python language and Keras used to build the network model and Matlab used to draw. To validate the performance of the SR-TELM model proposed in this study, we selected a number of models for comparison, including the TELM model optimized by principal component analysis (PCA), specifically ELM, TELM, random forest (RF), and PCA_TELM. Table 4 displays the specific experimental results. The comparison model's ELM, TELM, and RF are frequently employed in regression problems. The benefits of computational simplicity and speed are shared by ELM and TELM, which are discussed in Section 2.4.3. The RF model consists of a collection of random regression trees, each of which has been trained on a data set, and the output of the random forest regressor is predicted by combining the predictions of every regression tree using Monte Carlo methods [36].
An enhanced model built on the foundation of TELM and PCA is the PCA_TELM in the comparison model. The original input spectrum had 1024 dimensions, making TELM unsuitable for processing large amounts of data. As a result, we apply PCA to decrease the input spectrum's dimensionality from 256 to 7, where the feature variance's cumulative contribution is 99.97%. The great performance of the model may be fully tapped by feeding TELM with lower-dimension data. Using the determination of fixed carbon as an example, an RMSE and R 2 of 5.181 and 0.911 were achieved, respectively, when the raw spectral data were entered into TELM. The RMSE was significantly reduced to 4.527 when the spectral data from PCA dimensionality reduction were entered into TELM, and R 2 was 0.927 at that time.
The following list contains the comparison test's precise parameters: ELM, TELM, and PCA_TELM all had hidden layer nodes set to 35. PCA_TELM set aside seven principal components for content prediction, and the characteristic variance's cumulative contribution rate was 99.97%. The decision tree count in RF was set to 10. In CNN, the iterations were set to 200, and the batch size was set to 8. Comparing SR-TELM, PCA_TELM, and TELM demonstrates how well CNN can extract features. On the other hand, SR-TELM is therefore shown to be superior to the usual regression approaches in spectrum processing when compared to ELM, TELM, and RF. Different colors reflect various coal gangue features in Figure 9, which depicts the CNN training process. As can be observed, the prediction error was quite high at the start of training, but as the number of iterations increased, the model's performance improved. Furthermore, the learning process demonstrates the capability of the CNN network, which is made up of a spatial attention mechanism and residual connections, to learn the characteristics of spectral data. Table 4 lists the precise predictions made by the various models, showing that the SR-TELM model obtains the lowest RMSE and maximum R 2 in the coal gangue analysis tasks, clearly demonstrating the SR-TELM's superior performance. when compared to ELM, TELM, and RF. Different colors reflect various coal gangue features in Figure 9, which depicts the CNN training process. As can be observed, the prediction error was quite high at the start of training, but as the number of iterations increased, the model's performance improved. Furthermore, the learning process demonstrates the capability of the CNN network, which is made up of a spatial attention mechanism and residual connections, to learn the characteristics of spectral data. Table 4 lists the precise predictions made by the various models, showing that the SR-TELM model obtains the lowest RMSE and maximum R 2 in the coal gangue analysis tasks, clearly demonstrating the SR-TELM's superior performance. The time costs of several models on the duties of coal gangue analysis are shown in Table 5 and are applied to 50 coal samples, respectively. It was discovered that while the SR-TELM's test time was merely 0.03 s, its training time required 32.447 s. The model often does not need to be trained regularly in actual applications, exhibiting the test time of greater reference significance, and SR-TELM may reach high prediction accuracy with little effort. Contrarily, chemical analysis procedures need more time and money, with some of them costing over USD 200,000 when the cost of laboratory equipment for chemical analysis is taken into account.   The time costs of several models on the duties of coal gangue analysis are shown in Table 5 and are applied to 50 coal samples, respectively. It was discovered that while the SR-TELM's test time was merely 0.03 s, its training time required 32.447 s. The model often does not need to be trained regularly in actual applications, exhibiting the test time of greater reference significance, and SR-TELM may reach high prediction accuracy with little effort. Contrarily, chemical analysis procedures need more time and money, with some of them costing over USD 200,000 when the cost of laboratory equipment for chemical analysis is taken into account.

Conclusions
In this work, we propose a method based on thermal infrared spectroscopy and deep learning for the determination of moisture, ash, volatile matter, and fixed carbon content in coal gangue. First, we convert the dimensionality of the collected gangue spectra from one-dimensional to two-dimensional and use the converted spectra as the input of the analysis model. Second, we propose a deep learning model named SR-TELM, which consists of CNN and TELM. Third, we introduce a spatial attention mechanism and residual connection in the CNN, which can focus the model on important spectral features and suppress irrelevant noise interference. The experimental results demonstrate that SR-TELM can accurately and rapidly analyze the content of the components in gangue and achieved an R 2 of 0.947, 0.972, 0.967, and 0.981 and an RMSE of 0.274, 4.040, 1.567, and 2.557 in the determination of moisture, ash, volatile matter, and fixed carbon, respectively, with a testing time of only 0.03 s. At the end of the paper, the rapidity and effectiveness of SR-TELM in gangue analysis are demonstrated by a large number of comparative experiments.

Model Extension
At present, the main methods of coal gangue recognition are density recognition, hardness recognition, ray recognition, and image recognition.
The density identification method uses a liquid medium for identification stratification and has good processing ability, but the medium recovery is difficult, and the production cost increases. Photoelectric technology is used to identify the coal gangue without a drying treatment. After obtaining the quality and thickness of the coal gangue, the coal gangue density should be accurately calculated to improve the identification accuracy. The hardness identification method has low energy loss and use cost, high requirements on the size of crushing force, and few theoretical research applications. The identification speed and efficiency of the ray identification method are fast, the identification results are affected by the particle size of the measured object, and the radiation intensity of the ray is large, so special protective measures should be taken. Automation of the image recognition process has high intelligence and recognition efficiency and simple equipment requirements but is easily influenced by the working environment. The rapid coal analysis method proposed in this paper combines spectroscopy and deep learning. To extract spectral features, a CNN network made up of a spatial attention mechanism and a residual connection is used. In order to identify additional rocks, TELM is first applied to the regression of the water, ash, volatile, and fixed carbon contents of various rocks in the chosen spectral interval.

Application
As an inevitable product in the process of coal mining and processing, coal gangue can be used as building materials, special fuels, and chemical raw materials for secondary utilization. However, untreated coal gangue will cause a series of serious hazards to the natural ecological environment, resulting in air pollution, geological disasters, radioactive pollution, and so on. SR-TELM based on a thermal imaging spectrum can quickly and accurately predict the content of moisture, ash, volatile matter, and fixed carbon in coal, which can provide technical and method support for the effective extraction and identification of preparation information in a typical mining ecosystem such as a coal gangue mountain and provide a scientific basis for the rapid and accurate evaluation of the preparation and restoration effect of a coal gangue mountain. In addition, this paper is a basic research work, the results of this manuscript can be applied to the mine robot gangue accurate ferret and quantitative utilization and other aspects of work.

Prospect
At present, most researchers select and determine the identification characteristics and standards of coal gangue by means of experiments, but there is no in-depth discussion on it. For example, the identification threshold of the coal gangue image recognition method lacks a unified standard. For the analysis and understanding of the identification characteristics of coal gangue in the actual working environment, it is still necessary to carry out research in the aspects of theory, algorithm, and technology.
(1) Research on the definition of coal gangue identification characteristics.
There is no recognized and obvious distinction and definition in the identification characteristics of coal and gangue; that is, there is no unified standard. At present, the definition of the identification characteristics of coal gangue is mostly based on the idea of test statistics. The numerical distribution range of one or more characteristics of coal gangue is obtained, and the difference between the two is found as the identification standard. There are problems that the analysis and understanding of the characteristics of coal gangue are not comprehensive. In order to improve the accuracy of coal gangue identification, it is necessary to further study the differences between the two characteristics.
(2) Research on the identification methods of coal gangue that meet the requirements of coal mine green development.
Most coal resources in our country are concentrated in Midwest areas where the water resource is lacking. Identifying coal gangue using liquid media and then realizing its separation has brought about the increasing pressure on water resource supply and the problem of water resource waste. With the implementation of the clean and efficient utilization plan of coal, it is necessary to research the coal gangue identification method that meets the requirements of the green coal mine development. The application of the coal gangue identification method in underground mines can promote the development of dry separation technology of coal gangue, avoid the accumulation of ground gangue, and help to build a clean and low-carbon development system of coal.
(3) Research on a fast and efficient image recognition method for coal gangue.
The image recognition method of coal gangue is the research hotspot at present. In practical applications, environmental changes, gangue types, equipment performance, and other factors will affect the image recognition results of gangue, so the generalization and usability of an image recognition algorithm must be considered. Aiming at the problem that the image data of coal gangue are lacking and the characteristics of coal gangue cannot be comprehensively and accurately analyzed and understood, it is an important way to build a large-scale image database of coal gangue. On this basis, the expression of coal gangue image features in deep CNN is deeply understood through coal gangue feature analysis and visualization analysis, and the deep CNN suitable for coal gangue recognition is studied, which is conducive to the realization of fast and efficient image recognition.
(4) Research on an efficient identification method of new coal gangue.
By studying photoelectric, ray recognition, image recognition, deep learning, and other technologies, researchers have applied them in the field of coal gangue recognition, but there are still different problems in the practical application. It is necessary to study new efficient identification methods of coal gangue, integrate and innovate on the basis of existing methods, and apply them to practice for testing so as to improve the identification efficiency of coal gangue and the intelligent level of coal gangue separation [37].