Next Article in Journal
Sum of Squares Decompositions and Rank Bounds for Biquadratic Forms
Previous Article in Journal
Extremum Seeking Optimization for Ripple Minimization in Multi-Module Power Factor Correction Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Coal Flow-Centric Predictive Model for Mining–Transportation Coordination Based on an LSTM–Transformer

1
School of Mechanical Engineering, Xi’an University of Science and Technology, Xi’an 710054, China
2
Shaanxi Key Laboratory of Mine Electromechanical Equipment Intelligent Monitoring, Xi’an 710054, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(4), 634; https://doi.org/10.3390/math14040634
Submission received: 28 November 2025 / Revised: 28 January 2026 / Accepted: 9 February 2026 / Published: 11 February 2026
(This article belongs to the Topic Industrial Big Data and Artificial Intelligence)

Abstract

This paper addresses the issue of coordination failures in fully mechanized mining equipment under complex operating conditions, which can lead to operational abnormalities and safety hazards. We systematically analyze the dynamic coordination relationships within the equipment system across three dimensions: temporal, spatial, and geometric. Centered on the coal flow, we establish a comprehensive “mining–transportation” coordination mathematical model covering the entire production process from the coal flow cut off by the shearer to the coal flow transported out by the conveyor. Building upon this foundation, a deep learning prediction method integrating long short-term memory (LSTM) and transformer architectures is proposed to construct an intelligent prediction model for the shearer traction speed. This model effectively captures temporal features and long-term dependencies within equipment operation data, enabling the prediction of critical operational parameters for fully mechanized mining systems. It significantly enhances the early identification and warning capabilities for equipment coordination failure states. The experimental results based on the operational data of fully mechanized mining systems show that the LSTM–Transformer model performs excellently in the prediction of traction speed. The mean square error (MSE) of prediction reached 0.041, the mean absolute error (MAE) was 0.122, and the coefficient of determination (R2) was 0.996, fully demonstrating the advantages of the model in terms of prediction accuracy and stability. This article provides a theoretical basis and technical support for the judgment of the operating status of coal mine working faces and the early warning of accident risks, which is of great significance for promoting the intelligent construction of coal mines.

1. Introduction

Coal, as China’s core energy source and key industrial raw material, plays a vital role in ensuring energy security and supporting sustained economic development [1]. Under the dual carbon strategy, coal’s share of primary energy consumption is gradually declining, yet its dominant position remains difficult to alter in the near term [2]. Nevertheless, coal mining inherently carries significant safety risks, including roof collapses, gas outbursts, and rock bursts. Should the mining system lose coordination, consequences range from minor disruptions in mining operations and equipment damage to severe accidents like working face collapses and gas explosions, resulting in significant casualties and substantial economic losses. To enhance coal mining efficiency and ensure operational safety, scholars have been exploring intelligent mechanized mining technologies since the last century [3,4]. The intelligent operation of a fully mechanized mining face is fundamentally dependent on the coordinated interaction among three core equipment units: the shearer, the hydraulic supports, and the conveyor. Among these, the synergy between the shearer and the conveyor exerts a direct influence on coal flow dynamics. The shearer’s speed variation depends not only on its own operating conditions, but also on constraints such as the conveyor’s load status and fluctuations in the coal flow rate. Accurately predicting the operating status of key equipment in fully mechanized mining plays a crucial role in maintaining the stability of the production capacity in fully mechanized mining faces and achieving the collaborative optimization of working faces. This will provide an important theoretical basis and technical support for the intelligent upgrade of the coal industry.
This paper focuses on characteristics such as high timing and strong coupling in the “mining–transportation” coordination process of the fully mechanized mining system. It proposes a hybrid LSTM–Transformer prediction model tailored to the dynamic coupling properties of such systems. Taking the coal flow as the focal point, the model integrates the collaborative relationships among various pieces of equipment within the fully mechanized mining system to predict the traction speed of the shearer. This establishes an effective operational status early warning mechanism for the working face. The specific research approach is illustrated in Figure 1 below.
The principal innovations of this article are as follows:
  • This paper breaks through the limitations of traditional single-device and static analysis. A dynamic coupling model of coal shearer and conveyor is systematically constructed from three dimensions: time, space, and several characteristics to reflect the collaborative constraint relationships among underground pieces of equipment;
  • This paper takes the coal flow as the link and deeply integrates equipment status prediction with the production process flow. The predictive model not only reflects the operational rules of the equipment itself, but also embodies the overall collaborative logic of the production system, enhancing the interpretability of the model;
  • This paper combines the local temporal feature extraction capability of LSTM with the global dependency capture mechanism of Transformers. This effectively addresses the issues of long-term dependencies and short-term fluctuations present in the operational data of fully mechanized mining equipment, improving the predictive model’s adaptability to complex working conditions.

2. Related Work

Some scholars have conducted systematic research on the state perception and prediction of fully mechanized mining systems from different perspectives. Some scholars have constructed an equipment health assessment model based on multi-sensor data fusion technology, achieving real-time perception and anomaly detection of the operating status of key equipment such as coal mining machines and hydraulic supports. And some scholars have adopted multiple deep learning methods to conduct time series prediction on key parameters such as traction speed, cutting current, and support pressure, and have carried out in-depth explorations on the collaborative control of equipment.
Polverino et al. [5] systematically analyzed the application of machinery condition monitoring in fault diagnosis within the Industry 4.0 framework, along with predictive methodologies and machine learning algorithms. Chen et al. [6] introduced a rough radial basis function neural network to predict the load torque of scraper conveyors, providing a basis for optimizing system energy consumption. Lu et al. [7] analyzed the variation patterns of coal–rock interfaces using extreme learning machines and neighborhood rough sets to predict the height adjustment parameters of shearer drums. He et al. [8] employed rough sets and gated recurrent units for dynamic load prediction in scraper conveyors, effectively overcoming gradient vanishing and explosion issues in stacked GRU networks. Zhang et al. [9] explored the mapping relationship between motor load and current, proposing a BP neural network combined with wavelet transform for the current prediction in the conveyor. Concurrently, some scholars optimized collaborative control strategies for fully mechanized mining systems. Li et al. [10] integrated parameters including longwall equipment position, pillar pressure, and travel distance. By combining an attention mechanism with a Transformer architecture, they established a hydraulic support pump station fluid supply prediction model, enabling intelligent fluid supply control for supports. Zhang et al. [11] constructed a cutting state recognition system by establishing a coupled relationship model between shearer current and speed, achieving condition-aware predictive control and intelligent decision-making. Zheng et al. [12] developed a predictive control model for the shearer speed and drum rotation speed by modeling the shearer’s kinematics. Zhao et al. [13] constructed a cooperative control model for the conveyor based on rough set theory, achieving coordinated speed regulation in the fully mechanized mining system through speed prediction.
In the field of equipment status prediction for large and complex devices such as TBM and power systems, long short-term memory networks and their hybrid models have demonstrated significant advantages. In the early stage of the research, scholars explored the application of LSTM in single-dimensional time series prediction and achieved remarkable results. Hu et al. [14] achieved bearing load analysis and dynamic stress prediction based on LSTM, providing an effective means for bearing fault diagnosis and life prediction. Fu et al. [15] proposed an LSTM-driven spatiotemporal prediction method, achieving real-time and accurate prediction of TBM operating parameters; Mo et al. [16] combined LSTM with inverse probability optimization to construct a hybrid model for TBM cutter wear prediction and operating condition classification.
To address the multi-source heterogeneous data of complex equipment, scholars have gradually turned their attention to the research of hybrid models. By integrating LSTM with architectures such as CNN and Transformer, a transformation from single modeling to multi-feature fusion and multi-scale analysis has been achieved. Li et al. [17] employed a CNN–LSTM architecture to model the TBM cutterhead rotational speed and tunneling rate, providing decision support for parameter setting under complex geological conditions; Man et al. [18] introduced gray relational analysis to screen key indicators and constructed a CNN–LSTM model for joint prediction of TBM excavation parameters and rock burst levels; Sun et al. [19] integrated CNN, LSTM, and attention mechanisms to propose a time step parameterized CNN–LSTM–AM model, significantly enhancing the accuracy of wind turbine power forecasting. With the wide application of the Transformer architecture in time series modeling, scholars have combined it with different modules, and hybrid models have emerged continuously. Shi et al. [20] enhanced the model’s capability to identify complex geological risks like rock bursts and water inrushes by combining CNNs with Transformers. Diaa Salman et al. [21] constructed a hybrid model combining wavelet transform, LSTM, and Transformers to predict and classify power grid faults, enabling dynamic adaptation to variable operating conditions. Zhang et al. [22] realized the real-time and accurate prediction of the tunneling rate of shield tunneling machines through the integration of bidirectional LSTM and improved Transformers.
Although the LSTM–Transformer hybrid model has demonstrated significant performance advantages in handling complex multivariable time series data, at present, success has been achieved in multiple fields such as wind power prediction and industrial process monitoring, but its application in the prediction of the operating status and parameters of fully mechanized mining systems remains unexplored. Most current research on the prediction of fully mechanized mining systems focuses on the independent analysis of individual pieces of equipment such as the shearer, the hydraulic support, and the scraper conveyor, or only models local operation links. Systematic research on multi-link coupling has not yet been formed. Existing methods often overlook the dynamic collaborative relationships among various devices on the working face. In particular, there is a lack of integration of the links such as the traction of the coal shearer, the follow-up of the support, and the coordination of the transportation system into a unified organic whole, which makes it difficult for the prediction model to fully reflect the real operational logic of the fully mechanized mining system.
In this paper, the LSTM–Transformer hybrid model is introduced into the collaborative prediction of the fully mechanized mining system. Centering on the coal flow, it integrates the operational status of the fully mechanized mining system to predict the collaborative performance of the entire system. This application fills a gap in utilizing the hybrid model for fully mechanized mining prediction, advancing the field from single-device prediction to multi-device collaborative forecasting.

3. Comprehensive Fully Mechanized Mining System Supporting Coordination and Model Construction

3.1. Supporting Equipment for Fully Mechanized Mining Systems

The efficient and stable operation of the fully mechanized mining face relies on the coordinated operation of the three core pieces of equipment: the shearer, the hydraulic support, and the conveyor. As the central piece of mining equipment, the shearer provides walking power through the traction section, and the coal dropping is completed by the drum in the cutting section. Hydraulic supports are crucial for roof support. They achieve support and movement through a hydraulic system composed of components such as columns and jacks, ensuring the safety of the roof on the working face and the working space. The conveyor undertakes the task of transporting coal. The scraper chain circulates in the middle trough, transporting the coal cut off by the shearer out. The chain speed needs to be precisely matched with the traction speed of the shearer to ensure the continuous and smooth flow of coal.
The fully mechanized mining system does not operate independently but is closely intertwined in structure and function, forming a highly coordinated organic whole. The efficiency of the coordinated operation of fully mechanized mining equipment directly determines the production capacity and safety level of the working face. In terms of spatial layout, the shearer operates along the conveyor track. The hydraulic support is straddled on the conveyor, with the base connected to the conveyor. The top beam directly supports the top plate, forming a supporting and moving whole with the interaction of “equipment surrounding rock” [23]. In terms of the production process flow, the fully mechanized mining equipment has formed an operation cycle of “mining–transportation–support”, ensuring the dynamic matching of coal flow operation and equipment status. Due to changes in geological conditions such as the uneven thickness of coal seams, intercalated gangue, and faults, as well as periodic roof pressure, the fully mechanized mining system exhibits significant dynamic time-varying characteristics in both temporal and spatial dimensions. As the working face advances, the system’s collaborative state continuously evolves along the advancement direction [24,25], forming a collaborative feedback mechanism for fully mechanized mining collaboration. Geological conditions directly affect the cutting of the shearer, driving the dynamic adjustment of the traction speed. The operating state of the coal mining machine changes accordingly, which in turn alters the follow-up operation rhythm of the hydraulic support and the advancement efficiency of the working face. These influences eventually spread to the conveyor, specifically manifesting in the coal flow load and operational stability [26].
The fully mechanized mining system operates in complex and variable underground environments. Factors such as the uncertainty of coal seam conditions, dynamic changes in roof pressure, and the wear and aging of equipment continuously disturb the coordination state among devices. Any operational anomaly in a single piece of equipment can rapidly propagate throughout the entire system. Therefore, predicting and analyzing the coordination status of fully mechanized mining equipment enables the early identification of potential misalignment trends and provides advanced information for proactive control and intelligent scheduling, offering technical support for the safe and efficient operation of the working face [27].

3.2. Development of a Mathematical Model for “Mining–Transportation” Coordination

The coordinated operation of the shearer and conveyor system directly determines the generation and transportation of the coal flow. During actual production, improper settings of the coal cutter’s cutting speed may cause concentrated coal flow influx, leading to conveyor overload or blockage. Therefore, this paper focuses on modeling and analyzing the dynamic equilibrium of the coal flow [28]. The collaborative relationship between “mining–transportation” lies in the dynamic balance of the coal dropping rate and the coal transportation rate in the spatiotemporal dimension. Any imbalance will cause fluctuations in the operating parameters of the equipment. This paper establishes a traction dynamics model for the shearer and a load power model for the conveyor, theoretically deriving the coupling relationships among key equipment parameters during the coordination process. This lays the mathematical foundation for subsequent data-driven equipment condition prediction models.
Based on the production process characteristics of fully mechanized mining faces, the actual operating power of conveyors can be decomposed into no-load power and load power [29], with the mathematical expression given below:
P = P 0 + P 1 Q
In the equation, P represents the real-time total power, P 0 denotes the no-load power, and P 1 ( Q ) indicates the power generated by the real-time coal load Q . Under horizontal transportation conditions, the load power is mainly used to overcome the horizontal friction force and the lifting force for coal transportation. This paper focuses solely on analyzing equipment operating in horizontal environments.
P 1 Q = F × v g
F = Q g × g × μ
In the equation, v g represents the conveyor transport speed, g denotes gravitational acceleration, and μ is the coefficient of friction.
Combine the above formulas:
P = P 0 + ( Q g × g × μ ) × v g
Q c = v c × H × B × γ
In the equation, Q c represents the coal discharge capacity of the shearer, B denotes the cutting depth of the shearer (m), and γ is the bulk density of coal (t/m3).
To ensure the continuity of production, the coal cut off by the shearer must be promptly transported away by the conveyor. The transportation capacity of the conveyor should be greater than or equal to the coal dropping capacity of the shearer. The following equation holds:
P P 0 v g × g × μ = v c × H × B × γ
P = ( v c × H × B × γ × v g × g × μ ) + P 0
During the actual underground mining process, the operating state of the shearer fluctuates significantly due to the dynamic changes in geological conditions. The above theoretical formula expresses the intrinsic correlation between the power of the conveyor and the parameters of the working face layout, as well as the speed of the shearer and the conveyor. In actual operations, field measurements indicate that factors such as changes in the operational status of the fully mechanized mining equipment itself and external environmental disturbances also significantly impact the conveyor’s power consumption. Therefore, the actual operating power of the conveyor cannot be determined merely by the layout of the working face and parameters such as the speed of the shearer and the conveyor. These parameters can indeed effectively indicate the dynamic variation trend of conveyor power, providing an important basis for the collaborative prediction of fully mechanized mining systems.
The operating state of the coal mining machine traction system is jointly determined by the internal dynamics and the external load conditions [30]. For the traction motor of the shearer, the shaft torque output is directly proportional to the motor current.
T = K t × I
T represents the motor torque, K t denotes the motor torque constant (Nm/A), determined by the motor’s design specifications, and I indicates the current of the shearer traction motor.
F c = ( K t × i × η R ) × I
F c is the traction force generated by the shearer motor, i is the total reduction ratio of the transmission system, η is the mechanical efficiency of the transmission system, and R is the pitch circle radius of the drive wheel. Under ideal conditions, when the shearer operates at a constant speed, the traction force F c generated by its motor equals the total resistance required to overcome during operation. The motor current directly reflects the resistance encountered, indicating the equipment’s load status. In actual operation, due to changes in geological conditions, the shearer often operates at variable speeds, and the following dynamic formula applies:
m d v d t = F c F l o o d = ( K t × i × η R ) × I F l o o d
where m is the equivalent mass and F l o o d is the total load resistance. This equation indicates that the external load F l o o d determines the trend of traction speed variation, providing a theoretical basis for predicting speed dynamics based on current signals.
Considering the system’s thermodynamic processes, the change in motor temperature T a follows the equation below:
C t h d T a d t = I 2 R w T a T m R t h
Here, R w represents the winding resistance, C t h and R t h denote the thermal capacitance and thermal resistance, respectively, and T m is the upper temperature limit. When the temperature exceeds the safety threshold, the control system actively feeds back to the traction speed. From the perspectives of heat transfer and thermodynamics, the external load demand of the coal transportation flow in a fully mechanized mining system and the traction speed of the shearer jointly determine the current output of the traction motor. The current of the traction motor generates the electromagnetic torque required for driving, and at the same time, Joule heat is produced due to the resistance of the windings, causing the motor temperature to rise. The increase in motor temperature will in turn change the resistance characteristics of the windings, restricting the current output of the motor. Ultimately, the actual traction speed is affected by the automatic speed regulation strategy of the electronic control system, forming dynamic closed-loop feedback between speed, current, and temperature.
The parameters such as conveyor power, shearer traction speed, current, and temperature together constitute a multi-dimensional and strongly coupled time series dataset reflecting the operating status of the fully mechanized mining system. These parameter characteristics fully reflect the dynamic coupling and temporal coordination features of fully mechanized mining equipment during the transportation process. Through in-depth analysis of the state parameters of fully mechanized mining equipment, the nonlinear dynamic mapping relationship and lag effect among speed, current, temperature, and power can be effectively captured. Based on this, a data-driven state perception model for the fully mechanized mining system is established, ultimately achieving accurate prediction of the traction speed of the coal shearer.

4. LSTM–Transformer Model Construction

4.1. LSTM

LSTM is an improved version of RNN, which was proposed in 1997, a neural network with a recurrent structure. It effectively resolves the vanishing gradient problem in traditional RNNs and can better learn long-term dependencies [31]. The LSTM neural network is a chain network formed by connecting several identical units in chronological order. Short-term memory information is transmitted between every two adjacent units through hidden states. Long-term memory is transmitted temporally from front to back through a single information chain spanning the entire LSTM network. Each basic LSTM unit contains three gated units: the input gate, the output gate, the forget gate, and a cell state [32].
The forget gate determines which information should be forgotten from the cell state, enabling appropriate updates during information flow. It receives h t 1 and x t as input parameters, processing them through a sigmoid layer to obtain the corresponding forget gate parameters f t . The calculation formula is as follows:
f t = σ ( W f · [ h t 1 , x t ] + b f )
Here, W f is the weight matrix of the forget gate. [ h t 1 , x t ] is the concatenation of the hidden state from the previous time step and the input vector of the current time step. b f is the bias term, and σ is the sigmoid function.
The input gate determines which new information will be stored in the cell state. The input gate includes the cell’s own state and the candidate memory cell state, where C t ~ represents the candidate memory cell, implemented via the tan h layer. Additionally, an input gate parameter i t determines the updated information through a sigmoid layer. Finally, multiplying i t by C t ~ yields the updated information; multiplying the resulting forget gate f t by the old cell state C t 1 performs information forgetting. Combining these two operations produces the updated state C t .
i t = σ ( W i · [ h i 1 , x i ] + b i )
C t ~ = t a n h ( W C · [ h t 1 , x t ] + b c )
C t = f t × C t 1 + i t × C t ~
In the above equation, W i and W C represent the weight matrices for the input gate and candidate memory cell states, respectively, while i t denotes the degree of control information influx.
The output gate determines which information from the cell state will be output to the hidden state at the next time step. The output information is primarily determined by the cell state C t and must undergo filtering through the output gate. First, the cell state C t is normalized to the range [−1, 1] via the tan h layer. Then, the output gate parameters O t are obtained from a sigmoid layer. Finally, the normalized cell state is multiplied by O t to yield the final filtered result. The calculation formula is as follows:
O t = σ ( W t · [ h t 1 , x t ] + b O )
h t = O t × t a n h ( C t )
In the formula, h t represents the input value of the updated cell state that is output to the next cell.

4.2. Transformer

The Transformer algorithm was originally designed to address the issue of long-range dependencies in natural language processing tasks. It overcomes through the performance bottlenecks of recurrent neural networks and convolutional neural networks [33]. The Transformer first retains the sequence positional information by introducing position encoding, and then combines it with the attention mechanism and the feedforward network to complete the encoding and decoding processes. Ultimately, the prediction results are generated based on the weight parameters and position vectors. The Transformer consists of two core parts: the encoder and the decoder, which complete the mapping from the input sequence to the output sequence. The encoder performs feature extraction on the input sequence and builds a model based on the context. The decoder gradually generates the target sequence based on the output of the encoder and the generated historical information for prediction [34].
The input data are mapped into a continuous vector by the embedding layer, and sequential features are injected into the input through position encoding. Position encoding is generated using sine and cosine functions of different frequencies, and the formula is as follows.
P E ( p o s , 2 i + 1 ) = s i n ( p o s 100,000 2 i / d m o d e l )
P E ( p o s , 2 i ) = s i n ( p o s 100,000 2 i / d m o d e l )
Here, p o s denotes the sequence position, i represents the dimension index, and d m o d e l is the embedding vector dimension. Through position encoding, the model can not only distinguish different time step information, but also capture global and local features at multiple frequencies.
The encoder is composed of several stacked layers, each of which includes a multi-head self-attention mechanism and a feedforward neural network. It is used to model the dependencies within sequences and enhance the ability of nonlinear modeling. In the self-attention mechanism, each input vector is mapped to three types of vectors: query, key, and value through linear transformation. The fundamental calculation formula is shown below.
A t t e n t i o n ( Q , K , V ) = S o f t m a x Q K T d k V
Among these, Q , K , and V represent the query, key, and value matrices, respectively, while d k is the scaling factor used to prevent gradient vanishing. The multi-head attention mechanism computes multiple independent attention heads in parallel. Each attention head learns sequence dependencies in different subspaces through multiple linear mappings. The basic calculation formula is shown below.
M u l t i h e a d ( Q , K , V ) = C o n c a t ( h e a d 1 , , h e a d n ) W i Q
h e a d i = A t t e n t i o n ( Q W i Q , K W i K , V W i V )
h denotes the number of attention heads, with W i Q , W i K , and W i V representing the weight matrices for the i attention head, respectively. After the attention layer, each encoder layer will be connected to a feedforward neural network. Its form consists of two fully connected layers: the first layer is elevated in dimension, and the second layer is reduced back to the original dimension. Meanwhile, each sub-layer is equipped with a residual connection layer and layer normalization, which not only avoids vanishing gradients, but also enhances the training stability.
The structure of the decoder is similar to that of the encoder, but each layer of it contains three different sub-layers, namely the self-attention layer, the cross-attention layer, and the feedforward neural network layer [35]. The self-attention layer models the target sequence, introduces a mask, shields the information after the current position, and predicts the next output only based on the generated sequence. The internal structure of the feedforward neural network encoder is the same. Cross-attention is similar to self-attention, but the difference lies in that its query comes from the decoder, and the key and value come from the encoder output. The fundamental calculation formula is shown below.
C r o s s A t t e n t i o n ( Q , K , V ) = S o f t m a x Q K T d k V
Through this mechanism, the decoder dynamically integrates source sequence information with the generated target information during the prediction process.

4.3. LSTM–Transformer

In the prediction of the coordinated operation of fully mechanized mining system equipment, there are multilevel coupling relationships among equipment such as the shearer, the hydraulic support, and the conveyor. The traction speed of the shearer is closely related to factors such as the current and temperature of the traction motor. The power of the conveyor and the temperature of the motor are directly affected by the changes in coal flow and load.
These parameters simultaneously encompass both short-term high-frequency fluctuations and long-term trend changes during their evolution. The aforementioned complexity makes it challenging for traditional single-structure models to adequately characterize the intricate spatiotemporal coupling relationships within fully mechanized mining systems. Therefore, this paper proposes an LSTM–Transformer hybrid architecture for predicting the coordinated operational state parameters of multiple devices within fully mechanized mining faces.
This paper uses the gating mechanism of LSTM to model the dynamic changes of the actual working conditions in the underground environment. The forget gate is used to actively forget abnormal operating states. When the coal shearer transitions from high-load cutting to no-load operation, this mechanism weakens the previous high-load working memory, enabling the model state to adapt to the new working conditions. The input gate assesses the characteristics of the new working conditions. For instance, if the traction speed of the coal shearer abnormally increases, this mechanism will encode it as a potential change in the system status. The memory unit can accumulate long-term changes such as the performance degradation of fully mechanized mining equipment by modeling the inherent operating rules of the fully mechanized mining system. Ultimately, the output gate generates a hidden state output that reflects the assessment results of the fully mechanized mining system’s operating conditions based on the current comprehensive internal state. Although the LSTM layer can effectively capture the local timing dynamics of the equipment itself, its sequential processing mechanism has certain limitations when modeling complex collaborative relationships across processes and devices. This paper combines LSTM with Transformer to construct a local–global fusion prediction model.
In the local modeling stage, LSTM is used to replace the fixed-position encoding mechanism in the Transformer. The hidden layer state is obtained through LSTM [36,37], and the local dynamics and short-term dependencies in the input sequence are adaptively modeled. It can capture the subtle changes among the traction speed of the coal mining machine and the temperature of the motor and the current, enhance the model’s perception ability of temporal continuity features, enable the model to achieve dynamic learning of position coding in the input stage, and effectively avoid the deficiencies caused by static coding. In the global modeling stage, the output sequence of the LSTM is input into the Transformer encoder, and the multi-head self-attention mechanism is utilized to capture the long-term dependencies among the operational features of the devices over a larger range. The Transformer structure can focus on the correlations at different time steps in a sequence, achieving the integration and reconstruction of global features. The feedforward neural network layer further enhances the nonlinear expression ability of the model, enabling it to adapt to complex multivariable relationships.
The core algorithm process can be summarized in the following steps:
Step 1. Perform preprocessing operations such as outlier handling, missing value imputation, and normalization on the original dataset. Use the sliding window method to construct the model input. Construct the input sequence with a time window T , with each sample having a shape of ( T , N ) , where N is the number of features, and the output is the traction speed of the coal mining machine at a future time point;
Step 2. The constructed input sequence is then fed into the LSTM layer. The LSTM, through its internal gating mechanism, extracts the local dynamic patterns and short-term dependencies within the sequence, generating a hidden state sequence that contains temporal information. This sequence not only captures the short-term evolution patterns of the device parameters, but also serves as a dynamic position encoding. It replaces the traditional static position encoding in the Transformer, enabling the model to more flexibly perceive the sequential information of the input;
Step 3. Perform layer normalization on the hidden state sequence output by the LSTM layer. This helps stabilize the feature distribution and accelerates the convergence speed of the model. It provides uniformly scaled and well-distributed input for subsequent modules, thereby enhancing the stability of model training;
Step 4. The normalized sequence is input into the Transformer encoder. The encoder, through its core multi-head self-attention mechanism, computes the correlation weights between all time step features in the sequence in parallel, thereby effectively capturing the global dependencies and collaborative patterns across long time periods. Subsequently, its feedforward neural network sublayers perform nonlinear transformations on the features, further enhancing the model’s representational ability and adaptability to complexity;
Step 5. The high-order feature sequence output by the Transformer encoder is globally averaged along the time dimension to aggregate into a comprehensive feature vector. This vector is then input into a fully connected layer, which maps it to the final predicted value of the coal mining machine’s traction speed.
Through the local–global fusion modeling method, this model can not only retain the advantages of LSTM in temporal continuous modeling, but also leverage the capabilities of Transformer in global feature capture and efficient operation, significantly enhancing the predictive ability for the collaborative operation status of fully mechanized mining equipment. The specific structure is shown in Figure 2.

4.4. Hyperparameter Configuration and Automated Optimization Strategies

The predictive performance of deep learning models is highly dependent on the setting of hyperparameters, such as the number of network layers, the dimension of hidden layers, the learning rate, and Dropout, etc. To overcome the subjectivity and limitations of manual parameter selection and ensure that the model achieves optimal generalization on the operational data of fully mechanized mining equipment. This paper designs an automated hyperparameter search framework and determines the optimal model configuration through systematic experiments. Based on the research experience in related fields, the mechanism of action and setting principles of each hyperparameter are analyzed. The learning rate needs to be adjusted according to the complexity of the model and the characteristics of the dataset. Batch size affects the variance of gradient estimation and memory efficiency. The Dropout rate is used to control the complexity of the model and prevent overfitting. On this basis, the reasonable value ranges of each hyperparameter were initially defined.
Considering the parameter scale and the characteristic that most of the hyperparameters of the model constructed in this paper are discrete values, the grid search method is selected for system optimization. Compared with random search or gradient based optimization, grid search can achieve complete traversal in a medium-sized discrete parameter space, ensuring the acquisition of the global optimal solution and avoiding the uncertainty brought by random sampling.
To suppress overfitting and save computing resources, an early stop strategy was introduced in the training. When the loss of the validation set does not decline for 10 consecutive epochs, the training is automatically terminated and the model weights with the best performance of the validation set are retained.
Each combination of hyperparameters is independently trained multiple times to reduce the influence of randomness. During the training process, the training loss, validation loss, and test performance indicators are recorded. Ultimately, the model configuration with the highest coefficient of determination (R2) and the lowest mean square error (MSE) in the validation set is selected as the final optimal model. The specific search range of hyperparameters is shown in Table 1 as follows.

5. Case Study Validation Analysis

5.1. Data Preparation and Preprocessing

This paper conducts an empirical study based on the actual operational data of a coal mine working face. The length of the working face is 351.8 m. The thickness of the coal seam is 5.27 to 5.89 m. The average thickness of the coal seam is 5.63 m. The inclination angle is 0 to 3°, with a general inclination angle of 1°. The designed minimum mining height is 3.2 m, the maximum mining height is 5.9 m, and the average mining height is 5.6 m. The working face is under half-shift maintenance with two half-shifts in production. In practice, maintenance tasks are typically scheduled between 10:00 and 14:00 daily, though timing may vary based on specific underground operations, generally maintaining a duration of approximately 4 h. The mining machine selected is the SL1000 double drum shearer produced by Eickhoff GmbH (Bochum, Germany). The hydraulic support chosen is the ZY18000/29.5/63D hydraulic support produced by Linzhou Heavy Machinery Group Co., Ltd. (Linzhou, China). The working face conveyors consist of the DSJ160/3×500 belt conveyor produced by Northwest Coal Mining Machinery Co., Ltd. (Shizuishan, China) and the SGZ1250/4500 scraper conveyor produced by China Coal Zhangjiakou Coal Mining Machinery Co., Ltd. (Zhangjiakou, China).
Building upon the previously established mathematical model of the “mining–transportation” coordination mechanism, this paper uses the shearer’s traction speed, traction motor temperature and current, as well as the conveyor motor temperature and power as input variables. It describes the short-term fluctuations and long-term trend changes during the operation of the equipment. The predicted value of the traction speed of the shearer is taken as the output to reflect the dynamic coordination level of the mining and transportation system, providing data support for subsequent working condition adjustment and collaborative control. The specific parameters are shown in Table 2. This study collected 109,301 data points from the working face between 15 October 2024 and 30 December 2024, forming the original dataset. These data features are updated every minute to ensure real-time monitoring of the data center’s operational status, as shown in Table 3.
The harsh conditions in coal mine shafts often result in sensor data being accompanied by strong noise, abnormal fluctuations, and non-stationary characteristics. Direct input into the model would severely compromise prediction accuracy. For this purpose, this paper processes the original data. The specific steps are as follows.
1. Outlier handling and interpolation of missing values: Considering the complex underground environment and unstable signals, the data collected by sensors are usually missing. To ensure the integrity of the data, it is necessary to fill in the missing data. The methods of filling in are usually divided into the deletion method, fixed value replacement, and the interpolation method. Among them, the deletion method is simple and easy to implement, but it will cause changes in the data structure and a reduction in the number of samples. The substitution method is less difficult to use, but it will affect the standard deviation of the data, leading to variations in the amount of information. This paper adopts the linear interpolation method, and the formula is as follows.
x t = x t 1 + ( x t + 1 x t 1 ) × [ t ( t 1 ) ] ( t + 1 ) ( t 1 )
In the formula, x t is the estimated value of the time t to be interpolated; x t + 1 and x t 1 are the values of the first valid data point before and after the missing segment.
For the extreme outliers that occur in the data, this paper adopts a quantile-based truncation strategy. Taking the traction speed of the coal shearer as an example, the 99th percentile of this indicator as the upper limit of the threshold is calculated, and the data exceeding the threshold are forcibly truncated. The formula is as follows.
x cleaned = min x raw , Q 0.99
In the formula, x cleaned represents the original data; Q 0.99 is the 99th percentile threshold of this dataset; x raw is the processed data value;
2. Data noise reduction: The operational data of the fully mechanized mining system have strong temporal and dynamic characteristics. When sensors collect key parameters such as the traction speed and motor current of the coal mining machine, they are not only affected by the noise caused by electromagnetic interference and mechanical vibration underground, but also may experience short-term fluctuations due to sudden changes in the underground rock strata and periodic incoming pressure. Directly using the original data for modeling will cause the model to overly focus on noise rather than the real trend, affecting its stability and prediction accuracy. In this paper, Kalman filtering is adopted to preprocess the data, enhancing the smoothness of the data and the stability of the model. The basic form of Kalman filtering is as follows.
x k = A x k 1 + B u k + w k
z k = H x k + v k
In the formula, x k represents the true state vector of the system at time k ; z k represents the corresponding observed quantity. A , B , H are, respectively, the state transition matrix, the control matrix, and the observation matrix. w k and v k are, respectively, process noise and observation noise, and it is usually assumed that they follow a Gaussian distribution with a mean zero. The processed data exhibits smoother, noise-free characteristics, with fluctuation patterns consistent with the actual operational behavior of the equipment. This facilitates the efficient extraction of local temporal features by the LSTM module while enhancing the robustness and prediction accuracy of the Transformer module during global modeling;
3. Data normalization and sequence reconstruction: After Kalman filtering processing, time series data that can more accurately reflect the real operating parameters of each piece of equipment in the fully mechanized mining system are obtained. However, these data have different physical meanings and vary greatly in dimensions and numerical ranges. To eliminate the influence of dimensional differences among various parameter features and accelerate the convergence of neural networks, in this paper, Min-Max normalization is adopted to map all features to the interval [0, 1]. The calculation formula is as follows:
x = x x m i n x m a x x m i n
In the formula, x represents the original value of a certain feature; x m i n is the minimum value of this feature on the training dataset; x m a x represents the maximum value of this feature on the training dataset; x is the normalized value.
The comprehensive mining operation in underground coal mines is a highly structured and repetitive process. Its essence is to take the mining cycle as the basic unit, and each complete cycle includes process links such as cutting, frame moving, support, and transportation. The accurate prediction of the traction speed of the coal shearer is the prerequisite for achieving continuous and balanced coal flow and the collaborative optimization of equipment. The traction speed of the coal shearer is not an independent variable but a dynamic response of the entire system’s state. To model coal mining with periodic changes, this paper constructs an input sequence with the theoretical duration of a single coal mining cycle as the time window. It not only retains the short-term coupling characteristics between processes within the cycle, such as the change in cutting resistance after frame moving, but can also model the long-term state evolution laws across cycles, such as changes in geological conditions and equipment aging. The sliding window method is used to divide the preprocessed time series data. The input of each sample is a multi-variable sequence of length T , which includes N features such as the traction speed of the coal mining machine and the power of the conveyor. The output is the traction speed value of the coal mining machine at the future time point.
The input of this model is the historical multivariate sequence of key features, including the traction speed of the coal mining machine, the currents of the left and right traction motors of the coal mining machine, the temperatures of the left and right traction motors of the coal mining machine, the stator temperature of the conveyor motor, and the power of the conveyor motor. The length of the time window for the input sequence is set to 80 time steps, corresponding to 80 min of actual production data of the comprehensive mining equipment operation. The prediction target of the model is the traction speed of the coal mining machine at the next moment, using a single-step prediction strategy. In terms of data division, the complete dataset is divided into a training set and a validation set in an 8:2 ratio. All divisions strictly follow the time sequence to avoid information leakage.
Figure 3 shows the complete process, from the collection of raw data from the fully mechanized mining face to the final prediction output. Firstly, the raw data are preprocessed, including missing value interpolation, outlier handling, Kalman filtering for noise reduction, and normalization. The processed data are then constructed as an input sequence with a time window T = 80, and divided into a training set of 80% and a validation set of 20% in chronological order. The LSTM–Transformer model is used for training, and the optimal hyperparameters are determined through grid search. The final model consists of an LSTM layer, a normalization layer, a Transformer encoder, a global pooling layer, and a fully connected layer, which outputs the predicted value of the traction speed at the next time step, and the final result is obtained through inverse normalization.

5.2. Algorithm Validation

Based on the hyperparameter optimization described in Section 4.4, this paper conducts systematic training and evaluation in the search space. The experimental results show that when the LSTM hidden layer dimension is 128 and the number of layers is 2, combined with a single-layer Transformer structure and 4 attention heads, the model achieves the best performance on the validation set.
Based on the above optimal hyperparameter configuration, the time series features of the input data are extracted through the LSTM layer, and the output is input into the Transformer encoder after linear mapping and layer normalization. The Transformer module internally adopts the GELU activation function, while the fully connected layer of the output layer uses the ReLU activation function. It can further enhance the nonlinear expression ability of the model.
To comprehensively and quantitatively evaluate the predictive performance of the model, this paper selects the following three indicators to evaluate the performance of the model: the mean square error (MSE), mean absolute error (MAE), and R2 (fitness) [38,39].
M A E = ( i = 1 N y i y ^ i ) N
M S E = 1 n i = 1 n ( y i y ^ i ) 2
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ i ) 2
To verify the predictive performance of the LSTM–Transformer model for the traction speed of the coal shearer in the fully mechanized mining system, this paper compares it with several mainstream models, including two single models, GRU and MLR, a hybrid model of CNN–LSTM, and a separate model of LSTM and Transformer. The assessment results are shown in Table 4. The LSTM–Transformer hybrid model performs the best in all evaluation metrics. The LSTM–Transformer model has approximately 995,000 parameters and a model weight file of 3.8 MB. Although the structure of LSTM–Transformer is relatively complex, its single-round training time is close to that of CNN–LSTM, which is also a hybrid model. At the same time, compared with other single models, it converges faster. The LSTM–Transformer takes 1573 s to complete training, achieving a good balance between prediction accuracy and training efficiency.
The loss variation curves of the training set and the validation set are shown in Figure 4a,b. As the number of iterations increases, the loss value shows a stable downward trend and eventually converges. It can be seen from Figure 4 that the LSTM–Transformer hybrid model proposed in this paper significantly outperforms the single LSTM and Transformer models in both convergence speed and loss value. The training loss is the smallest, and the verification loss curve is smooth, with the smallest fluctuation amplitude. The model training process was stable and no overfitting occurred. This indicates that the model has good generalization ability.
To visually compare the performance of the models, Figure 5 shows the comparison of the prediction effects of the models. By analyzing the degree of fit between the predicted values and the true values, and combining the three evaluation indicators of the training set and the test set, the performance of the model is analyzed. Figure 5 shows that the LSTM–Transformer hybrid model performs the best in both prediction accuracy and goodness of fit.
In summary, the LSTM–Transformer model constructed in this paper fully leverages the strengths of LSTM in modeling temporal dependencies while utilizing the Transformer to capture long-term global features. This enables the precise characterization of the dynamic coordination between the mining and transportation systems. Accurate predictions of the shearer’s speed enable further estimation of coal flow rates, allowing for the dynamic adjustment of conveyor chain speeds to achieve early-stage control of mining–transportation coordination. Simultaneously, comparing predicted shearer speeds with actual measurements facilitates timely alerts upon detecting anomalies, thereby mitigating the amplification and propagation of irregularities within the fully mechanized mining system.

6. Conclusions

(1) This paper analyzes the coordination among the pieces of equipment of the fully mechanized mining system from three dimensions: time, space, and geometry. Centered on the coal flow, a dynamic mathematical model linking the entire process from coal extraction to transportation is established. It lays a theoretical foundation for system analysis;
(2) This paper proposes a coal mining machine traction speed prediction model based on the LSTM–Transformer hybrid architecture. This model effectively integrates the temporal feature extraction capability of LSTM with the global dependency modeling advantage of Transformer, achieving the high-precision prediction of equipment operation status under complex working conditions;
(3) Through comparative analysis of various neural networks, the proposed model demonstrates significantly superior performance metrics compared to GRU, MLR, and standalone LSTM or Transformer models. Maintaining high prediction accuracy while achieving faster training convergence, this model offers a more scientific and effective approach for forecasting critical operational parameters in fully mechanized mining systems.
The manifestation of coordination imbalances in fully mechanized mining systems has a lag, and potential safety hazards are often exposed only after faults occur. This paper provides key algorithmic support for achieving system-level pre-warning and active regulation through the precise prediction of the traction speed of the coal shearer. The current constructed model is limited by the differences in geological structures, equipment configurations, and process parameters among different mines. Its generalization ability across working faces and engineering adaptability still need to be further verified. Subsequent research will collect measured data from multiple working faces and working conditions and carry out more representative training and evaluation to enhance the model’s adaptability in complex mining environments. In addition, the current work mainly focuses on the prediction of equipment operation status and has not yet formed a closed loop with the real-time control logic of the production system. Future research will focus on prediction and control, further optimizing the model structure and expanding to aspects such as equipment failure early warning and health status assessment, thereby promoting the transformation of comprehensive mining safety management from passive response to active prevention. In the future, explainability analysis methods such as SHAP will be introduced to explain the model’s prediction results, enhancing the transparency and credibility of model decisions, and providing continuous and reliable technical support for the intelligent construction of coal mines and the improvement of intrinsic safety levels.

Author Contributions

Conceptualization, Y.W. and G.L.; investigation, Y.W.; data curation, J.Z. and R.Z.; writing—original draft, G.L.; writing—review and editing, Y.W., L.H. and X.C.; supervision, L.H. and X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant. No.: 52074210, 52204174, 52274158).

Data Availability Statement

The datasets presented in this article are not readily available because they contain confidential geological and coal information controlled by the relevant energy companies. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yang, F.; Xu, H. Analysis on the development path of ecological environment protection and resources comprehensive utilization in coal industry during the l4th Five-Year Plan period. China Coal 2021, 47, 73–82. [Google Scholar]
  2. Chen, F.; Yu, H.; Bian, Z.; Yin, D. How to handle the crisis of coal industry in China under the vision of carbon neutrality. J. China Coal Soc. 2021, 46, 1808–1820. [Google Scholar]
  3. Wang, G.; Liu, F.; Meng, X.; Fan, J.; Wu, Q.; Ren, H.; Pang, Y.; Xu, Y.; Zhao, G.; Zhang, D. Research and practice on intelligent coal mine construction (primary stage). Coal Sci. Technol. 2019, 47, 1–36. [Google Scholar]
  4. Wang, G.; Ren, H.; Zhao, G.; Zhang, D.; Wen, Z.; Meng, L.; Gong, S. Research and practice of intelligent coal mine technology systems in China. Int. J. Coal Sci. Technol. 2022, 9, 24. [Google Scholar] [CrossRef]
  5. Polverino, L.; Abbate, R.; Manco, P.; Perfetto, D.; Caputo, F.; Macchiaroli, R.; Caterino, M. Machine learning for prognostics and health management of industrial mechanical systems and equipment: A systematic literature review. Int. J. Eng. Bus. Manag. 2023, 15, 18479790231186848. [Google Scholar] [CrossRef]
  6. Chen, D.; Zheng, Z.; Huang, T.; Zhang, G. Coordinated optimal control of the speed of shearer and scraper conveyor based on their energy consumption models. MEITAN XUEBAO 2022, 47, 2483–2498. [Google Scholar]
  7. Lu, Z.; Guo, W.; Zhang, C.; Zhao, S.; Wang, Y.; Zhang, W.; Yang, M.; Li, S. A Novel Intelligent Decision-Making Method of Shearer Drum Height Regulating Based on Neighborhood Rough Reduction and Selective Ensemble Learning. IEEE Access 2020, 9, 46545–46559. [Google Scholar] [CrossRef]
  8. He, H.; Lu, Z.; Zhang, C.; Wang, Y.; Guo, W.; Zhao, S. A data-driven method for dynamic load forecasting of scraper conveyer based on rough set and multilayered self-normalizing gated recurrent network. Energy Rep. 2021, 7, 1352–1362. [Google Scholar] [CrossRef]
  9. Zhang, D.; Qin, J.; Wu, W.; Zhu, Y.; Guo, W. Research on scraper conveyor load prediction method based on wavelet transform and BP neural network. Sci. Rep. 2025, 15, 15367. [Google Scholar] [CrossRef] [PubMed]
  10. Li, R.; Wei, W.; Lai, Y.; Wang, D.; Liu, B.; Wang, T. Long-distance intelligent liquid supply for coal mining faces based on liquid demand prediction. J. Phys. Conf. Ser. 2023, 2561, 012018. [Google Scholar] [CrossRef]
  11. Zhang, G.; Shen, G.; Tang, Y.; Li, X. State recognition based adaptive cutting method for roadheader. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2025, 239, 2030–2048. [Google Scholar] [CrossRef]
  12. Zheng, Z.; Chen, D.; Huang, T.; Zhang, G. Coordinated speed control strategy for minimizing energy consumption of a shearer in fully mechanized mining. Energies 2021, 14, 1224. [Google Scholar] [CrossRef]
  13. Zhao, S.; Zhao, J.; Lu, Z.; He, H.; Zhang, C.; Miao, Y.; Xing, Z. Data-driven cooperative control model of shearer-scraper conveyor based on rough set theory. Front. Energy Res. 2022, 10, 811648. [Google Scholar] [CrossRef]
  14. Hu, H.; Luo, H.; Deng, X. Health monitoring of automotive suspensions: A lstm network approach. Shock Vib. 2021, 2021, 6626024. [Google Scholar] [CrossRef]
  15. Fu, X.; Zhang, L. Spatio-temporal feature fusion for real-time prediction of TBM operating parameters: A deep learning approach. Autom. Constr. 2021, 132, 103937. [Google Scholar] [CrossRef]
  16. Mo, D.; Bai, L.; Huang, W.; Wu, N.; Lu, L. TBM disc cutter wear prediction using stratal slicing and IPSO-LSTM in mixed weathered granite stratum. Tunn. Undergr. Space Technol. 2024, 148, 105745. [Google Scholar] [CrossRef]
  17. Li, L.; Liu, Z.; Zhou, H.; Zhang, J.; Shen, W.; Shao, J. Prediction of TBM cutterhead speed and penetration rate for high-efficiency excavation of hard rock tunnel using CNN-LSTM model with construction big data. Arab. J. Geosci. 2022, 15, 280. [Google Scholar] [CrossRef]
  18. Man, K.; Wu, L.; Liu, X.; Song, Z.; Li, K. Prediction of TBM tunneling parameters and rockburst grade based on CNN-LSTM model. Coal Sci. Technol. 2023, 52, 21–37. [Google Scholar]
  19. Sun, Y.; Zhou, Q.; Sun, L.; Sun, L.; Kang, J.; Li, H. CNN–LSTM–AM: A power prediction model for offshore wind turbines. Ocean Eng. 2024, 301, 117598. [Google Scholar] [CrossRef]
  20. Shi, J.; Wang, S.; Qu, P.; Shao, J. Time series prediction model using LSTM-Transformer neural network for mine water inflow. Sci. Rep. 2024, 14, 18284. [Google Scholar] [CrossRef]
  21. Salman, D.; Direkoglu, C.; Altanneh, N.; Ahmed, A. Hybrid Wavelet-LSTM-Transformer Model for Fault Forecasting in Power Grids. SSRG Int. J. Electr. Electron. Eng. 2024, 11, 314–326. [Google Scholar] [CrossRef]
  22. Zhang, M.; Ji, A.; Zhou, C.; Ding, Y.; Wang, L. Real-time prediction of TBM penetration rates using a transformer-based ensemble deep learning model. Autom. Constr. 2024, 168, 105793. [Google Scholar] [CrossRef]
  23. Wang, H.; Wang, F.; Li, H.; Zhang, X.; He, W.; Chen, T.; Zhang, Y.; Shi, Y.; Du, W. Virtual-real mapping for the scraper conveyor S-shaped bending in a fully mechanized mining face production system based on digital twin. Int. J. Comput. Integr. Manuf. 2025, 38, 1823–1843. [Google Scholar] [CrossRef]
  24. Li, H.-J.; Fu, X.; Qin, Y.-F.; Jia, S.-F. Application of deep learning classification model for regional evaluation of roof pressure support evolution effects over time in coal mining face. Heliyon 2024, 10, e31824. [Google Scholar] [CrossRef]
  25. Jia, S.; Fu, X.; Wang, R.; Wang, H.; Wang, P. Dynamic evaluation of support quality of hydraulic support in space-time region. J. Mine Autom. 2022, 48, 26–33. [Google Scholar]
  26. Han, H.; Wang, G.; Xu, Y.; Zhang, J.; Lei, S.; Li, Y. Adaptive intelligent coupling control of hydraulic support and working face system for 6− 10 m super high mining in thick coal seams. Coal Sci. Technol. 2024, 52, 276–288. [Google Scholar]
  27. He, L.; Pan, R.; Wang, Y.; Gao, J.; Xu, T.; Zhang, N.; Wu, Y.; Zhang, X. A case study of accident analysis and prevention for coal mining transportation system based on FTA-BN-PHA in the context of smart mining process. Mathematics 2024, 12, 1109. [Google Scholar] [CrossRef]
  28. Ma, H.; Mao, Q.; Xue, X.; Wang, C.; Wang, P.; Nie, Z.; Duan, Y.; SiMa, J.; Chai, J.; Chen, Y. On the academic ideology of “Transport is traffic”. J. China Coal Soc. 2025, 50, 3658–3667. [Google Scholar]
  29. Chen, S.; Wang, S.; Ge, S.; Wang, Z.; Ma, G. Study on the spatiotemporal distribution of coal flow in the scraper conveyor of fully mechanized mining face. J. Mine Autom. 2024, 50, 98–107. [Google Scholar]
  30. Li, F.; Wang, Z.; Si, L.; Wei, D.; Tan, C.; An, X. Traction resistance analysis and cutting state recognition of shearer based on numerical simulation. Measurement 2024, 237, 115261. [Google Scholar] [CrossRef]
  31. Zhao, Y.; Guo, Y.; Wang, X. Hybrid LSTM–Transformer Architecture with Multi-Scale Feature Fusion for High-Accuracy Gold Futures Price Forecasting. Mathematics 2025, 13, 1551. [Google Scholar] [CrossRef]
  32. Wang, W.; Liu, Z.; Dai, F.; Quan, H. Dynamic Equivalence of Active Distribution Network: Multiscale and Multimodal Fusion Deep Learning Method with Automatic Parameter Tuning. Mathematics 2025, 13, 3213. [Google Scholar] [CrossRef]
  33. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  34. Shen, Z.; Han, W.; Hu, Y.; Zhu, Y.; Han, J. An Interpretable Hybrid Deep Learning Model for Molten Iron Temperature Prediction at the Iron-Steel Interface Based on Bi-LSTM and Transformer. Mathematics 2025, 13, 975. [Google Scholar] [CrossRef]
  35. Zhang, K.; Yin, C.; Yao, W.; Feng, G.; Liu, C.; Cheng, C.; Zhang, L. A Working Conditions Warning Method for Sucker Rod Wells Based on Temporal Sequence Prediction. Mathematics 2024, 12, 2253. [Google Scholar] [CrossRef]
  36. Feng, Z.; Zhang, J.; Jiang, H.; Yao, X.; Qian, Y.; Zhang, H. Energy consumption prediction strategy for electric vehicle based on LSTM-transformer framework. Energy 2024, 302, 131780. [Google Scholar] [CrossRef]
  37. Cao, K.; Zhang, T.; Huang, J. Advanced hybrid LSTM-transformer architecture for real-time multi-task prediction in engineering systems. Sci. Rep. 2024, 14, 4890. [Google Scholar] [CrossRef] [PubMed]
  38. Al-Ali, E.M.; Hajji, Y.; Said, Y.; Hleili, M.; Alanzi, A.M.; Laatar, A.H.; Atri, M. Solar energy production forecasting based on a hybrid CNN-LSTM-transformer model. Mathematics 2023, 11, 676. [Google Scholar] [CrossRef]
  39. Wu, Y.; Sang, W.; Cao, X.; He, L. Research on the Parameter Prediction Model for Fully Mechanized Mining Equipment Selection Based on RF-WOA-XGBoost. Appl. Sci. 2025, 15, 732. [Google Scholar] [CrossRef]
Figure 1. Research technology route.
Figure 1. Research technology route.
Mathematics 14 00634 g001
Figure 2. LSTM–Transformer structure.
Figure 2. LSTM–Transformer structure.
Mathematics 14 00634 g002
Figure 3. The workflow of predicting the traction speed of the coal mining machine based on the LSTM–Transformer model.
Figure 3. The workflow of predicting the traction speed of the coal mining machine based on the LSTM–Transformer model.
Mathematics 14 00634 g003
Figure 4. Comparison of loss change curves of LSTM–Transformer and other neural networks.
Figure 4. Comparison of loss change curves of LSTM–Transformer and other neural networks.
Mathematics 14 00634 g004
Figure 5. Comparison of the predicted values of each algorithm with the true values.
Figure 5. Comparison of the predicted values of each algorithm with the true values.
Mathematics 14 00634 g005aMathematics 14 00634 g005b
Table 1. Hyperparameter setting range.
Table 1. Hyperparameter setting range.
HyperparametersHyperparameter Values
LSTM hidden layer dimension64, 128
LSTM hidden layer depth4
Transformer hidden layer dimension64, 128
Transformer layer depth1,2
Batch size32, 64
Number of attention heads2, 4
Learning rate0.0005, 0.001
Dropout rate0.2, 0.3
Table 2. Operating parameters and symbols of a fully mechanized mining system.
Table 2. Operating parameters and symbols of a fully mechanized mining system.
Parameter NameSymbolUnit
Shearer traction speed v c m/min
Shearer left/right traction motor temperature T c l ,   T c r °C
Shearer left/right traction motor current I c l , I c r A
Conveyor motor stator temperature T a ,   T b ,   T c °C
Conveyor motor powerPkw
Table 3. Example of operating parameter data for fully mechanized mining system.
Table 3. Example of operating parameter data for fully mechanized mining system.
Event Time v c T c l T c r I c l I c r T a T b T c P
2024/10/16 4:323.7648174317819187252
2024/10/16 4:338.6748174317788991249
2024/10/16 4:341.9948174317789383249
2024/10/16 4:358.3750334317789383249
2024/10/16 4:368.750334317799291259
……………………………………………………
2024/10/26 0:127.5341193819788688198
2024/10/26 0:137.141193819748685168
2024/10/26 0:146.0545174317678687188
2024/10/26 0:156.7845174317678687188
2024/10/26 0:167.0946174317788887211
2024/10/26 0:178.5946174317768483222
……………………………………………………
2024/12/30 22:408.5847174317788886162
2024/12/30 22:418.747174317788886162
2024/12/30 22:428.6547174317749091194
2024/12/30 22:439.247314317749091194
2024/12/30 22:448.7448174333749091194
2024/12/30 22:458.3948174517748591210
Table 4. Comparison of prediction results between different algorithms and LSTM–Transformer.
Table 4. Comparison of prediction results between different algorithms and LSTM–Transformer.
ModelMSEMAER2
LSTM–Transformer0.0414950.1219470.996029
LSTM0.0767990.2314810.992651
Transformer0.1242960.2778410.987980
GRU0.0720530.221950.993105
MLR0.0551070.1541520.994727
CNN–LSTM0.0537380.1581070.994858
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Li, G.; He, L.; Zhao, J.; Zhang, R.; Cao, X. A Hybrid Coal Flow-Centric Predictive Model for Mining–Transportation Coordination Based on an LSTM–Transformer. Mathematics 2026, 14, 634. https://doi.org/10.3390/math14040634

AMA Style

Wu Y, Li G, He L, Zhao J, Zhang R, Cao X. A Hybrid Coal Flow-Centric Predictive Model for Mining–Transportation Coordination Based on an LSTM–Transformer. Mathematics. 2026; 14(4):634. https://doi.org/10.3390/math14040634

Chicago/Turabian Style

Wu, Yue, Guoping Li, Longlong He, Jiangbin Zhao, Ruiyuan Zhang, and Xiangang Cao. 2026. "A Hybrid Coal Flow-Centric Predictive Model for Mining–Transportation Coordination Based on an LSTM–Transformer" Mathematics 14, no. 4: 634. https://doi.org/10.3390/math14040634

APA Style

Wu, Y., Li, G., He, L., Zhao, J., Zhang, R., & Cao, X. (2026). A Hybrid Coal Flow-Centric Predictive Model for Mining–Transportation Coordination Based on an LSTM–Transformer. Mathematics, 14(4), 634. https://doi.org/10.3390/math14040634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop