1. Introduction
Global Navigation Satellite Systems (GNSS) play a pivotal role in modern science and communications, finding extensive application in various high-precision positioning, navigation, and timing (PNT) tasks. Following the global commissioning of the BeiDou-3 Navigation Satellite System (BDS-3), the positioning accuracy and spatiotemporal reference stability it provides have become an indispensable component of the global positioning landscape. Satellite Clock Bias (SCB) is a critical factor affecting navigation system positioning accuracy and time synchronization [
1]. SCB represents the timing error caused by the inherent instabilities of satellite atomic clocks; its impact on accuracy becomes particularly pronounced over long-duration measurements. Consequently, high-precision prediction of BDS-3 satellite clock bias is essential for improving both positioning accuracy and the overall reliability of the system. Traditional methods for predicting satellite clock bias have primarily been based on physical models (such as the quadratic polynomial model, QP) [
2,
3] and statistical approaches (such as the gray system model GM(1,1) [
4,
5], Kalman filtering [
6,
7], and the autoregressive integrated moving average model ARIMA [
8,
9]). However, these methods exhibit significant limitations when dealing with the nonlinear characteristics of BDS-3 satellite clocks, multi-source noise interference, and long-term dependency modeling. For example, physical models rely on prior assumptions about clock physical properties and struggle to adapt to dynamic changes in complex space environments; statistical methods may perform well for short-term predictions but are inefficient for very long sequences and cannot effectively capture nonlinear association patterns [
10]. In recent years, with the rise in machine learning—particularly deep learning—an increasing number of studies have adopted data-driven approaches to predict satellite clock bias. The Long Short-Term Memory neural network (LSTM) has demonstrated powerful nonlinear modeling capabilities for time series prediction [
11], and LSTM-based models have been widely applied to time series data processing because they are well-suited to capturing long-term dependencies. LSTM can effectively handle the nonlinear and time-varying characteristics present in satellite clock bias data and has therefore achieved promising results in this domain. Nonetheless, when processing very long time series, LSTM still faces issues such as vanishing gradients and information loss, which can lead to suboptimal performance in certain complex scenarios [
12].
To address these issues, recent research has gradually begun to introduce hybrid models that combine multiple deep-learning frameworks or integrate other methods to improve prediction accuracy and long-term stability. For example, Li et al. [
13] proposed a hybrid model based on LSTM and a self-attention mechanism for GNSS clock bias prediction, achieving favorable predictive performance. By incorporating the self-attention mechanism, the model can better capture global information and improve clock-bias prediction accuracy. Zhao et al. [
14] proposed a hybrid model based on a multivariate convolutional neural network (CNN) and a long short-term memory network (LSTM). The CNN is responsible for extracting local spatial features of multi-satellite clock biases (such as inter-satellite correlations), while the LSTM captures long-term temporal dependencies, thus balancing spatial feature extraction and temporal modeling. Experiments show that the CNN–LSTM model outperforms traditional methods in short-, medium-, and long-term prediction. Huang et al. [
15] proposed a supervised-learning-based LSTM algorithm for predicting navigation satellite clock bias. A supervised learning mechanism was introduced to guide network training with labeled data (e.g., historical true clock-bias values), enhancing the model’s ability to capture nonlinear features and thereby improving prediction accuracy. Tan et al. [
16] proposed a short-term satellite clock-bias prediction method based on complementary ensemble empirical mode decomposition (CEEMD) and a quadratic polynomial model. This method uses CEEMD to decompose the satellite clock-bias time series and extract components at different frequency bands, then fits and predicts each component using a quadratic polynomial model. Experimental results indicate that this method achieves high accuracy and stability in short-term satellite clock-bias prediction.
Despite the promising results achieved in previous studies, current models still struggle with suboptimal prediction accuracy and long-term stability. Specifically, under the complex operational conditions of the BDS-3 satellite system, both traditional physical models and standalone LSTM networks exhibit limited efficacy in clock-bias forecasting. To address these challenges, this paper proposes a novel hybrid architecture, the Mamba-LSTM model, which integrates Mamba [
17,
18] with an LSTM network for high-precision BDS-3 satellite clock bias prediction. This approach aims to overcome the limitations of existing methods by fusing adaptive sequence modeling with deep-learning-based temporal feature extraction. By employing dynamic selection mechanisms, Mamba can adaptively accommodate varying data characteristics. Furthermore, when synergized with the LSTM network, the proposed hybrid model effectively captures long-term dependencies while maintaining strong adaptability, ultimately delivering superior prediction accuracy and robustness for complex time-series data. To clearly define the prediction task addressed in this paper, we propose a hybrid Mamba-LSTM model for high-precision forecasting of BDS-3 SCB. The model takes as input the preprocessed historical SCB time series, including first-order differencing, gross error detection and correction using the MAD method, and Min-Max normalization. A sliding window strategy is employed to perform epoch-by-epoch prediction, with the prediction horizons mainly set to two scenarios: 12 h and 24 h. The output of the model consists of the predicted satellite clock bias values for each future epoch. All experiments are conducted based on high-precision IGS final clock products, and the performance is evaluated using the root mean square error (RMSE) with respect to the true clock bias values.
Specifically, the main contributions of this paper are as follows:
- (1)
A novel Mamba-LSTM hybrid model is proposed, which combines the adaptive modeling capability of Mamba with the nonlinear feature extraction ability of LSTM, fully exploring the latent features within the data to improve SCB prediction accuracy.
- (2)
Experiments conducted on the BDS-3 satellite clock bias dataset demonstrate the superior performance of the proposed model in clock-bias prediction.
- (3)
Extensive experiments show that the Mamba-LSTM model has strong potential to enhance both the accuracy and stability of BDS-3 satellite clock-bias prediction, providing a new perspective for future research in satellite clock-bias forecasting.
The structure of this paper is arranged as follows:
Section 2 introduces the theoretical foundations of the Mamba-LSTM model and its implementation;
Section 3 provides a detailed description of the acquisition and preprocessing of the BDS-3 satellite clock bias data;
Section 4 presents the prediction results based on the Mamba-LSTM model and their comparison with conventional methods; and finally,
Section 5 summarizes the research findings and discusses directions for future work.
2. Principles of the Model
2.1. Fundamental Principles of the Mamba Model
Mamba is a novel sequence modeling architecture, as shown in
Figure 1, with its core being the structured state space model (SSM). Unlike Transformer models that rely on self-attention mechanisms with quadratic complexity, Mamba implements an SSM to achieve linear time complexity, making it more efficient for processing long sequences. The basis of Mamba is to map a one-dimensional continuous input signal
through a hidden state
to an output
. This process is described by the following linear ordinary differential equation (ODE):
Here, , , and are learnable parameter matrices.
To deploy this continuous-time system on modern computing hardware, it must be discretized. Mamba adopts the zero-order hold rule and introduces a learnable time-scale parameter ∆, converting the continuous parameters
and
into discrete parameters
and
. The discretized state-space system can be expressed as follows:
The core innovation of Mamba lies in introducing a selection mechanism. In traditional SSMs, the parameter matrices are fixed and unchanging. However, in Mamba, key parameters (such as , , and ∆) are input-dependent. This means the model can dynamically adjust its parameters based on the current input , enabling it to selectively focus on important information in the sequence and filter out irrelevant interference. This selectivity allows Mamba to more effectively compress and process sequence data, demonstrating outstanding performance on various long-sequence modeling tasks.
2.2. Fundamental Principles of the LSTM Model
Hochreiter et al. [
19] first proposed the LSTM model, which has unique advantages in time series data modeling. LSTM uses a cell to store the long-term state of time series data and consists of three gates: the input gate, forget gate, and output gate. Information is selectively passed at each gate.
Figure 2 shows the structure of the LSTM network. The input gate determines how much of the model input will be saved to the cell state, and is implemented through Equation (3).
The current time and the previous time state serve as the input gate, then the calculation result is multiplied by the weight matrix, and the update information is determined through the activation function.
The forget gate determines how much of the current model input will be forgotten, and then saves the remaining part to the current cell. The related mathematical expressions are
The forget gate obtains input information from the current time’s input and the previous time’s hidden state , and outputs a probability value between 0 and 1. When the probability value is 1, it means retaining all information; when the probability value is 0, it means discarding all information.
The output gate determines what content to output from the current cell state. The related mathematical expressions are
First, the sigmoid layer determines which part of the cell state needs to be output. Next, the cell state is fed into the “tanh” layer, which outputs a probability value between −1 and 1. Finally, this probability value is multiplied by the output result of the sigmoid layer.
In the above equations, is the weight coefficient matrix, is the bias vector, and and tanh are the sigmoid and tangent activation functions, respectively. Additionally, , , , and represent the input gate, forget gate, cell state, and output gate, respectively, and denotes element-wise matrix multiplication.
2.3. Construction of the Mamba-LSTM Model
This paper proposes a hybrid Mamba-LSTM model for the high-precision prediction of BDS-3 SCB. As an adaptive sequence modeling algorithm, Mamba effectively extracts key features across various data dimensions by dynamically adjusting its strategy based on inherent data characteristics. Meanwhile, the LSTM network excels at capturing long-term temporal dependencies. In this integrated framework, Mamba first processes the time-series data to adaptively filter and extract crucial SCB information. Subsequently, these refined features are fed into the LSTM network to model the deep dynamic characteristics and long-term dependencies of the SCB sequence. Consequently, when handling complex SCB signals, the Mamba-LSTM model demonstrates enhanced adaptability and feature extraction capabilities, ultimately yielding significantly more accurate and robust prediction results. The Mamba-LSTM model proposed in this paper aims to synergize the efficient adaptive sequence processing capabilities of Mamba with the powerful nonlinear, long-term dependency modeling of the LSTM network, achieving high-precision prediction of BDS-3 SCB. SCB time series are inherently characterized by complex nonlinearity, time-varying behaviors, and multi-source noise. While an independent LSTM network can effectively capture long-term dependencies, it often struggles when processing ultra-long sequences. Conversely, although Mamba is highly efficient, its nature as a structured state space model (SSM) generally necessitates complementary architectures to fully capture intricate dynamic features. To address these respective limitations, this study integrates Mamba with LSTM, thereby significantly enhancing the hybrid model’s global context awareness and overall predictive performance.
The Mamba-LSTM model we propose adopts an architecture where feature extraction and temporal modeling are connected in series, as shown in
Figure 3. First, the preprocessed SCB time series
is input into the Mamba layer. Mamba utilizes its efficient linear complexity and input-dependent selection mechanism to scan long sequence data.
where
denotes a learnable parameter. At this stage, Mamba acts as a powerful adaptive feature extractor, which can selectively retain key information patterns based on the dynamic characteristics of the clock bias data, while filtering out redundant or noisy information, thereby generating a sequence representation rich in key temporal features. Next, the feature sequence extracted by the Mamba layer is used as input and passed to the subsequent LSTM network
As described in
Section 2.2, LSTM is very adept at capturing and modeling nonlinear dynamics and long-term dependencies in data through its unique gating mechanisms (input gate, forget gate, output gate). Finally, the hidden state of the LSTM network passes through a fully connected layer (Linear layer) to output the final SCB prediction value
Thus, the complete Mamba-LSTM hybrid model is compactly expressed as follows:
In this way, the Mamba-LSTM model fully leverages the advantages of both architectures: Mamba is responsible for efficiently and adaptively purifying and compressing long sequence features, while LSTM performs deeper nonlinear and long-term dependency modeling on this basis. The specific parameters of the Mamba-LSTM model are shown in
Table 1. This design aims to overcome the limitations of single models, enabling the combined model to predict complex SCB sequences more accurately and robustly.
3. Data Processing and Evaluation Methods
3.1. Data Preprocessing
The input data for this study comprises the IGS final precise clock products at a 30 s interval. These products are fundamentally estimates derived from the self-consistent adjustment model of the global GNSS network, subject to various influencing factors such as orbit modeling errors, ionospheric and tropospheric delays, and receiver noise. Although the ionosphere-free linear combination utilized by the IGS significantly mitigates first-order ionospheric and plasmaspheric effects, residual higher-order terms can still introduce centimeter-level biases. The preprocessing of the input data primarily involves the following three steps:
First-order differencing: SCB time series inherently exhibit non-stationarity. To enhance sequence smoothness and facilitate the extraction of complex nonlinear features, we apply first-order differencing to the original SCB data. This operation improves data stationarity, which consequently reduces model complexity and enhances overall prediction accuracy [
20].
The sequence used for modeling after differencing is .
Gross error detection and repair: Severe gross errors can affect the accuracy of clock bias prediction. The Median Absolute Deviation (MAD) method [
21] is used to detect and remove gross errors. The MAD is calculated as follows:
where
is the median, the threshold is set to 3. If
, the data point is marked as a gross error. For the removed gross errors, cubic spline interpolation can be used to fill them in.
Data normalization: Data normalization can be applied to avoid the impact of different dimensions of feature quantities and target values on prediction performance, accelerate gradient descent during network training, and improve the convenience of model processing. This paper employs Min-Max normalization
Map to the interval [0, 1].
3.2. Network Model Training and Prediction
To ensure the reliability and generalization capability of the model, all datasets in this study are univariate time series consisting solely of SCB values; therefore, no class imbalance issue exists. Accordingly, a time-series-specific data splitting strategy is adopted, as follows:
Training set: the complete data from the first day (2880 epochs) is used for model parameter optimization;
Validation set: the last 20% of the training data are selected in chronological order for hyperparameter tuning;
Test set: the completely independent data from the following day (2880 epochs) are used for final performance evaluation.
Model Structure Design and Parameter Settings: The LSTM model designed in this paper consists of an input layer, hidden layers, and an output layer. The number of neurons in the input layer equals the number of input data points. The hidden layers consist of 2 LSTM layers, each connected to a dropout layer containing 32 hidden nodes. The dropout layer, during the training process, has a dropout rate of 0.2 to prevent overfitting.
Figure 4 shows the specific LSTM model framework design.
Table 2 describes the specific parameter settings of the LSTM model. This paper adopts a sliding window approach for sample generation and prediction. The window size is 60, and the slide step is 1, as shown in
Figure 5.
3.3. Data Post-Processing
After completing the LSTM model prediction, the predicted values are obtained and then subjected to denormalization and inverse first-order differencing to obtain the final predicted SCB sequence. The experimental results and analysis will be detailed in the next section.
3.4. Evaluation Methodology
This paper utilizes the post-processed precise clock offset products provided by IGS as our data source to ensure the quality and reliability of the experimental data. This data source is widely used globally due to its high precision, and it also guarantees the credibility of our model’s performance. The experimental design in this paper is as follows: using the data from the previous day (20 July 2025) for training, and employing the trained model to predict the SCB data for the next day (21 July 2025). The data time interval is 30 s, covering a total of 5760 epochs.
To deeply analyze and evaluate the model’s predictive performance, we compared the actual SCB data provided by IGS with the model’s predicted values. In this evaluation, we use the root mean square error (RMSE) as the metric to assess prediction accuracy.
where
represents the clock bias of the
-th epoch predicted by the model,
represents the actual clock bias of the
-th epoch provided by IGS, and
represents the total number of predicted epochs.
5. Conclusions and Future Direction
Focusing on the BDS-3 satellite clock bias (SCB) time series, this study proposes a novel Mamba-LSTM hybrid model. By integrating the distinct strengths of both the Mamba architecture and the LSTM network, our approach enables high-precision SCB prediction. Extensive comparative experiments and analyses demonstrate that the proposed Mamba-LSTM method significantly enhances both the accuracy and stability of clock bias forecasting. In summary, our method exhibits the following key advantages: Compared with single models (such as Mamba and LSTM methods), the Mamba-LSTM combined model shows significant improvements in the stability and accuracy of predicting satellite clock bias.
The prediction errors of traditional neural network methods (such as BP and CNN methods) increase rapidly with the extension of prediction time, while the Mamba-LSTM method has a significant advantage in controlling the accumulation of prediction errors over prediction time, making it more suitable for medium- and long-term predictions.
The Mamba model and LSTM network are effective tools for processing time series data. For typical time series data processing problems, we combined the advantages of both, achieved effective application in satellite clock bias prediction, and obtained good results. This work makes a beneficial attempt at in-depth research on satellite clock bias prediction problems and provides new ideas for further research in this field.
The method proposed in this paper still has some aspects that can be further studied and improved.
The proposed method can be further studied for fusion with other methods to further improve its prediction performance.
The model’s computational complexity is slightly higher than that of a single LSTM. It is necessary to further study optimization methods for hyperparameter selection and training to further enhance the computational efficiency of this method.
Space weather phenomena (such as solar activity cycles and magnetic storms) can significantly alter the thermal environment of the ionosphere and satellites. During intense magnetic storms, a decline in Precision Orbit Determination (POD) accuracy and unaccounted-for higher-order ionospheric delays often manifest as high-frequency noise or sudden anomalies in IGS apparent clock deviation estimates. Its robustness under extreme space weather disturbances (such as strong magnetic storms) has not yet been fully verified; future work could further systematically investigate the impact of strong magnetic storms on the model.