Next Article in Journal
Enhancing Robotic Antenna Measurements with Composite-Plane Range Extension and Localized Sparse Sampling
Previous Article in Journal
Extraction of Electron and Hole Drift Velocities in Thin 4H-SiC PIN Detectors Using High-Frequency Readout Electronics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Solar Array Temperature Multivariate Trend Forecasting Method Based on the CA-PatchTST Model

1
Key Laboratory of Electronic Information Countermeasure and Simulation Technology of Ministry of Education, Xidian University, Xi’an 710071, China
2
School of Aerospace Science and Technology, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(23), 7199; https://doi.org/10.3390/s25237199
Submission received: 26 September 2025 / Revised: 5 November 2025 / Accepted: 22 November 2025 / Published: 25 November 2025
(This article belongs to the Section Electronic Sensors)

Abstract

System reliability, which is essential for the normal operation of satellites in orbit, is decisively governed by the performance of solar array, making accurate temperature forecasting of solar array imperative. Reliable solar array temperature forecasting is essential for predictive maintenance and autonomous power-system management. Forecasting relies on temperature telemetry data, which provide comprehensive thermal information. This task remains challenging due to the high-dimensional, long-horizon temperature sequences with inherent cross-variable coupling, whose dynamics exhibit nonlinear and non-stationary behaviors owing to orbital transitions and varying operational modes. In this context, multi-step forecasting is essential, as it better characterizes long-term dynamics of temperature and provides forward-looking trends that are beyond the capability of single-step forecasting. To tackle these issues, we propose a solar array temperature multivariate trend forecasting method based on Cross-Attention Patch Time Series Transformer (CA-PatchTST). Specifically, we decompose temperature variables into trend and residual components using a moving average filter to suppress noise and highlight the dominant component. In addition, the PatchTST model extracts local features and long-term dependencies of the trend and residual components separately through the patching encoders and channel-independent mechanisms. The cross-attention mechanism is designed to capture the correlation between temperature variables of different devices in solar array. Extensive experiments on the real solar array temperature dataset demonstrate that the CA-PatchTST surpasses mainstream baselines in root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), with ablation studies further confirming the complementary roles of sequence decomposition and cross-attention.

1. Introduction

In recent years, communication satellite constellation technology has developed rapidly. Leveraging its capabilities for large-scale deployment and collaborative operation, such constellations provide crucial support for achieving global coverage and low-latency communication services, offering significant strategic and commercial value. However, satellites operate in harsh space environments over extended periods, where they are continuously exposed to challenges such as space radiation, debris impacts, and extreme temperature fluctuations [1,2]. Furthermore, the growing complexity of their systems significantly elevates the risk of on-orbit failures [3]. Statistical analyses reveal that power system failures account for the largest share of satellite malfunctions, with anomalies related to solar array comprising about 42% of all power system failures [4]. Excessive heating or frequent alternating thermal cycles can accelerate material aging and performance degradation of solar array, which in turn affects the stability of the power supply and threatens overall mission reliability of the satellite [5]. As the primary source of onboard power, the operational condition of solar array is thus a decisive factor in ensuring the normal operation and successful execution of satellite functions.
Given this critical role, accurate trend forecasting of multivariate telemetry parameters characterizing the thermal state of solar array has become an indispensable technology for enabling the dynamic health management of satellite constellations, enhancing autonomous operational capabilities, and extending the effective service life of spacecraft [6]. High-precision forecasting enables early detection of incipient performance degradation, supports predictive maintenance, and allows for proactive operational adjustments that can prevent cascading failures [7].
However, achieving such predictive accuracy is far from trivial. Satellite telemetry data are characterized by high dimensionality, long temporal sequences, and complex coupling relationships among parameters [8]. Moreover, the temperature variation patterns of solar array are influenced by a combination of factors such as fluctuating space environments, the periodic orbital motion of satellites, and changes in the operating modes of onboard equipment. This interplay introduces strong nonlinearity and non-stationarity into the temperature data, making straightforward modeling approaches inadequate for capturing the underlying patterns [9,10]. Current methods commonly assume reliable priors or stationarity and often degrade under orbital-phase transitions. Moreover, data-driven approaches struggle with long-horizon robustness and cross-variable coupling amid noise, missingness, and phase-dependent shifts.
In addition, the temperature series are periodic and intricately coupled across devices, thus single-step forecasting is too myopic for health management and preventive maintenance [11]. By contrast, multi-step forecasting is expected to reveal long-term trends and phase-dependent behaviors, yet existing approaches often degrade over extended horizons due to error accumulation and exposure bias in recursive decoding, and distributional shifts across orbital day–night phases, and under-modeled cross-variable coupling among structural groups. Methods built on stationarity assumptions or pointwise attention further struggle with non-stationary, multi-scale dynamics. These limitations manifest as lag across structurally coupled sensors and cumulative drift over long horizons, undermining the reliability required for on-orbit health management.
To achieve high-precision multi-step forecasts, this paper proposes a method that learns intrinsic patterns of long-term solar array temperature data and captures cross-variable dynamic correlations. In response to the above issues, we propose a method of solar array temperature multivariate trend forecasting based on CA-PatchTST, which mitigates long-horizon non-stationarity, models dynamic cross-variable coupling, and captures long-range temporal dependencies. The main contributions of this paper are outlined as follows:
  • A patch-based, channel-independent PatchTST encoder further enhances local pattern extraction, lowers computational cost, and preserves long-range temporal patterns;
  • Cross-attention across devices captures inter-variable correlations and enables complementary information exchange among multivariate telemetry channels;
  • Extensive experiments on real-world GOCE satellite temperature telemetry data validate CA-PatchTST across multiple forecast horizons. The results show consistent and significant improvements over state-of-the-art baselines, with superior performance in RMSE, MAE, and MAPE, underscoring the model’s accuracy, robustness, and practical utility.
The remainder of this paper is organized as follows. Section 2 provides a systematic review and analysis of existing time-series forecasting methods, highlighting their strengths and limitations in handling satellite temperature data, and laying the theoretical foundation for the proposed model. Section 3 briefly introduces solar array temperature telemetry data, the proposed CA-PatchTST for the solar array temperature multivariate trend forecasting. Section 4 reports dataset introduction, parameter settings and experiment analysis. Section 5 provides a summary of this paper, presents the main conclusion, and offers insights for future research.

2. Related Work

2.1. Traditional Time-Series Trend Forecasting Methods

Time-series trend forecasting is critical for infrastructure management, enabling proactive maintenance and operational optimization. In satellite systems, specifically, predicting the condition of subsystems like solar array is vital for preventing performance degradation [12], yet the high dimensionality, strong inter-variable coupling, and long-range dependencies in telemetry data make the selection of appropriate methods uniquely challenging [13]. Existing trend forecasting techniques for time-series data are typically grouped into three categories: physical modeling approaches, statistical models, and data-driven models.
Physical modeling methods construct mathematical representations grounded in domain-specific knowledge and are best suited to systems whose governing laws are clearly defined and tractable. These methods are suitable for systems with well-defined physical relationships and relatively straightforward modeling processes. Haupt et al. [14] proposed a probabilistic forecasting framework that integrates physics-based modeling with machine learning techniques. In the context of renewable energy generation predicting, the authors effectively enhanced the robustness and generalization capability of predictions by combining physical models with algorithms such as random forests. Mackey and Kulikov [15] proposed a spacecraft telemetry forecasting method that leverages physics-based simulation as the primary predictor and enhances its accuracy by modeling residuals with autoregressive techniques and applying data-driven transformations, enabling more reliable real-time forecasting under uncertain system behavior. However, the highly complex, nonlinear, and strongly coupled thermal dynamics of satellite solar array, influenced by orbital and environmental interactions, make it extremely difficult to derive accurate physical-based models, limiting their applicability.
Statistical model-based time-series trend forecasting analyzes the probabilistic and statistical characteristics of systems, offering relatively high interpretability. The Autoregressive Moving Average (ARMA) model is a classic and well-established approach for time-series analysis [16], modeling stationary data through autoregressive and moving average components. Zhang et al. [17] incorporated Bayesian statistical principles into the robust fitting process of the ARMA model, thereby enhancing its stability and accuracy in satellite clock bias prediction. In addition, Long et al. [18] adopted the Prophet model, which decomposes time-series into trend, seasonality, and holiday effects, for medium-term and long-term electricity load forecasting. This approach improves modeling flexibility while maintaining good interpretability. However, these methods rely on linearity and stationarity assumptions, which prevent them from effectively capturing the pronounced nonlinear, non-stationary, and time-varying dependencies of solar array temperatures under variable operational conditions, resulting in reduced predictive accuracy.

2.2. Time-Series Trend Forecasting Methods Based on Deep Learning

In recent years, driven by advancements in deep learning, data-driven trend forecasting methods have demonstrated significant advantages. The Gated Recurrent Unit (GRU) employs update and reset gates to control information flow, enabling efficient modeling of sequential data and the capture of long-range temporal dependencies. Cai et al. [19] employed a Residual Convolutional Neural Network–Simple Recurrent Unit (Res-CNN-SRU) model for industrial Internet intrusion detection in a gas pipeline dataset. By combining the strong local feature extraction capability of CNN with the fast recurrent modeling of the SRU, the approach achieved high detection accuracy, low false-alarm rates, and significantly reduced training time compared to other RNN-based methods. Le Guen and Thome proposed SegRNN [20], which integrates segmentation into recurrent neural networks to capture long-term dependencies in time-series. Wang et al. [21] integrated adaptive shrinkage processing with a Temporal Convolutional Network (AS-TCN) for rolling-bearing RUL prediction, using multichannel vibration signals and outperforming strong baselines on standard industrial benchmarks. Zeng et al. proposed Dlinear [22], a decomposition-based linear framework for time-series forecasting, which separates trend and seasonal components to achieve efficient and effective long-term forecasting.

2.3. Transformer

The Transformer architecture has emerged as a powerful framework for time-series modeling, primarily due to its self-attention mechanism, which captures dependencies across arbitrary positions in a sequence. Unlike traditional recurrent models that struggle with long-range interactions and suffer from exploding gradients, Transformer directly models global dependencies through parallelizable attention operations [23]. As a result, Transformer is especially effective for forecasting tasks that involve high-dimensional variables and extended temporal dependencies, including applications in industrial monitoring. Moreover, the flexibility of its attention mechanism enables dynamic weighting of relevant temporal features, providing strong adaptability to non-stationary and multi-scale patterns commonly present in complex real-world time-series [24].
Building on this foundation, a range of Transformer variants have been proposed to further enhance performance in specific domains. For instance, Cuéllar et al. [25] applied an explainable anomaly detection method to spacecraft telemetry data, where the model leverages transformer-based attention modules combined with feature attribution to uncover both global and local temporal dependencies. Tested on multiple mission datasets, the approach improved anomaly detection performance, revealing better interpretability and earlier fault detection compared to baseline methods. Similarly, Yang et al. [26] applied Informer to motor-bearing vibration forecasting, where the ProbSparse self-attention mechanism enables efficient modeling of long sequences with reduced computational cost. Evaluated on multiple bearing datasets, Informer achieved state-of-the-art results and demonstrated strong potential for predictive maintenance tasks.
Attention mechanisms are increasingly employed in deep learning to selectively emphasize relevant input features and capture long-range dependencies and complex interactions. They have been widely utilized across domains such as natural language processing, computer vision, and time-series forecasting. The Squeeze-and-Excitation (SE) block provides lightweight channel-wise recalibration within each variable, strengthening salient signals and suppressing noise. Qin et al. [27] integrated an SE-based channel-attention module into a CNN–GRU hybrid for short-term distribution-network load forecasting, achieving higher accuracy and more robust peak–valley tracking on real data than baseline models. The cross-attention mechanism further extends the capability of attention by enabling information exchange across different variables, which is particularly advantageous for modeling inter-variable dependencies in multivariate time-series. Zhang et al. [28] proposed Crossformer, a Transformer-based model with cross-dimension attention that effectively captures both intra-variable temporal dependencies and inter-variable correlations, achieving superior performance on multivariate forecasting tasks.
However, the aforementioned methods still face challenges in long-sequence, high-dimensional trend forecasting tasks for critical satellite components such as solar array. Statistical models are limited by linear assumptions and struggle to characterize complex abrupt changes and non-stationary features. Recurrent neural network-based forecasting models suffer from low computational efficiency and issues such as gradient explosion and vanishing gradients when applied to long sequences [29]. Although the Transformer can parallelize the modeling of global dependencies, its pointwise attention calculation neglects local features within continuous segments and incurs quadratic computational complexity [30]. Furthermore, most existing architectures focus on univariate modeling approaches and lack the capability to capture dynamic coupling relationships among multivariate features.

3. Methodology

Figure 1 presents the proposed multivariate trend forecasting method for solar array temperatures based on the CA-PatchTST model. It begins by characterizing solar array temperature telemetry data, which exhibit clear trends, orbital periodicity, and random fluctuations driven by factors such as orbital cycles, solar radiation, and equipment operational states. To address these mixed patterns, moving average decomposition is applied to separate the original sequences into trend and residual components, effectively isolating low-frequency trends from high-frequency fluctuations and thereby enhancing modeling stability. The PatchTST model is employed to extract local features and long-term dependencies from both components through patch-based encoding and channel-independent mechanisms. A cross-attention mechanism is introduced, where the target device sequence serves as the Query and sequences from other devices act as Key and Value, thereby capturing dynamic coupling relationships among temperature variables across different devices. Finally, the predictions from the trend and residual branches are fused via weighted summation to produce multi-step temperature forecasts. By effectively integrating data characterization, sequence decomposition, local feature extraction, and cross-variable interaction, the proposed method significantly enhances the accuracy and robustness of long-term temperature forecasting.

3.1. Solar Array Temperature Telemetry Data

Solar array temperature telemetry data is a key state parameter reflecting the operational status of satellite systems. Collected in real time by high-precision onboard sensors and transmitted via ground data links, the temperature series of solar array exhibit pronounced periodic fluctuations due to the repeated transition between sunlight and Earth shadow during each 90 to 100 min orbit at an altitude of 500 to 1000 km [31]. This paper focuses on the solar array system of the GOCE satellite as the research object, which incorporates 16 temperature parameters at different locations: solar wing body temperature (2 parameters: THT10000/THT10001), BMSP structure temperature (8 parameters: THT10002-THT10019), and interface bracket temperature (6 parameters: THT10008-THT10024).
As illustrated in Figure 2, the temperature curves of different GOCE solar array components clearly demonstrate this orbital periodicity. Beyond orbital effects, the data are further influenced by solar activity cycles, seasonal illumination changes, and satellite attitude maneuvers, leading to strong non-stationarity, multi-scale variations, and nonlinear behaviors. To quantify the spatiotemporal correlations observed among measurements from different sensors—which reflect both the intrinsic thermal conductivity of the array structure and the overall in-orbit thermal environment—we employ the Pearson correlation coefficient (PCC). This metric measures the linear dependence between pairs of temperature parameters [32]. For two temperature time-series  X  and  Y , each containing  n  samples, the Pearson coefficient  r  is defined as:
r = i = 1 n ( X i X ¯ ) ( Y i Y ¯ ) i = 1 n ( X i X ¯ ) 2 i = 1 n ( Y i Y ¯ ) 2 ,
where  X ¯  and  Y ¯  are the mean values of  X  and  Y , respectively. The value of  r  ranges from −1 to 1, with values near 1 indicating strong positive linear correlation, values near −1 indicating strong negative linear correlation, and values near 0 indicating little to no linear relationship. This method is chosen for its interpretability and widespread use in quantifying linear associations in multivariate telemetry data, thereby providing a rigorous foundation for assessing thermal coupling relationships and supporting trend forecasting among interdependent sensor readings.
We first organize the multivariate solar array temperature sequences by device. Let  X t N  denote the  N -channel telemetry at time  t , whose channels are partitioned into three disjoint device groups.
I = I BMSP I Bracket I Body ,         I a I b = a b ,
where  I  is the full index set of temperature channels;  I BMSP I Body I Bracket  are the subsets for the BMSP, Bracket, and Body components, respectively; and  I a  denotes any one of these with  a { BMSP , Bracket , Body } . The relations indicate that these three subsets form a partition of  I , their union equals the whole set and they are pairwise disjoint, so each channel belongs to exactly one component.
For each target device  v { BMSP , Bracket , Body } , we build two multivariate subsequences: a main component formed by channels of the target device and a cross component formed by channels of the remaining devices.
X t ( v , main ) = X t [ I v ] ,       X t ( v , cross ) = X t [ I I v ] ,
where  I v  is the index subset for device,  X t ( v , main ) = X t [ I v ]  selects the target-device channels, whereas  X t ( v , cross ) = X t [ I I v ]  selects the complementary channels.
To effectively capture these complex dynamics, the multivariate temperature sequence  X t , is decomposed into a long-term trend component  τ , which embodies orbital cycles and slowly varying environmental effects, and a residual component  ε , which accounts for short-term fluctuations, noise, and irregular variations:
X t = τ t + ε t ,
This decomposition disentangles mixed patterns, mitigates the impact of non-stationarity, and enables more accurate modeling by allowing predictive models to simultaneously learn global evolution and local perturbations of solar array temperature dynamics.

3.2. CA-PatchTST

This section presents the proposed CA-PatchTST model, a novel framework designed for multivariate temperature forecasting of solar array. The model integrates a PatchTST backbone for temporal feature extraction and a cross-attention mechanism that facilitates information interaction among variables. The overall architecture comprises four key components: temperature series decomposition, PatchTST-based encoding, cross-attention-driven feature fusion, and multi-step forecast generation.

3.2.1. Temperature Series Decomposition

The temperature parameters of the solar array are simultaneously affected by multiple factors such as orbital period, solar radiation, and equipment operating states, exhibiting a composite characteristic profile of distinct trend, periodicity, and random fluctuations. Therefore, decomposition of the temperature sequence serves as an effective means to disentangle these mixed patterns. Through decomposition operations, slow-changing overall trends can be effectively separated from short-term fluctuations. This helps to reduce the interference of noise and non-stationarity on model training and improve prediction stability; in parallel, it enables the model to capture long-term evolution patterns and local detail changes in temperature sequences, thereby more accurately reflecting the actual temperature behavior of solar array in complex spatial environments.
The telemetry time-series is decomposed into trend and residual components through moving average filtering [33], effectively isolating low-frequency trends from high-frequency fluctuations. Given a  t  length multivariate temperature time-series  X t = X t 1 , , X t N  with  X t d = x 1 d , x 2 d , , x t d , where  d  denotes the  d - th  variable in all the  N  temperature parameters. The trend component  τ t d  is calculated by applying a moving average operation with a fixed-size kernel  k , which extracts the low-frequency smoothed part of the sequence and captures its long-term trend.
τ t d = 1 k j = ( k 1 ) / 2 ( k 1 ) / 2 x i + j d ,         i = 1 , 2 , , t ,
where  j  is the summation index traversing the symmetric window around,  x i + j d  refers to the value of the  i - th  sensor at time offset  j  within the window.
The residual component  ε t d  is obtained by subtracting the trend from the original sequence, thereby highlighting short-term fluctuations and disturbances not captured by the trend.
ε t d = X t d τ t d ,
The fixed-size kernel  k  in the moving average filter was selected based on the sampling rate and the physical periodicity of the GOCE satellite. Since the temperature telemetry is sampled every 10 min and the satellite completes one orbit in approximately 90 min, one orbital cycle corresponds to about nine sampling intervals. Therefore,  k = 9  was adopted so that the filter window matches one complete thermal–orbital cycle. This choice allows the moving average operation to capture the slowly varying orbital trend while suppressing high-frequency fluctuations caused by short-term environmental disturbances. A window of this scale provides a balanced decomposition, ensuring that the trend component reflects the long-term orbital behavior, whereas the residual component retains finer intra-orbital variations.

3.2.2. PatchTST

In the task of solar array temperature forecasting, traditional Transformer-based time-series models can capture long-term dependencies, but they are difficult to effectively extract temperature change patterns in local continuous segments, especially sensitive to noise in solar array temperature telemetry data, which undermines predictive robustness. In addition, this type of model has the problem of high computational complexity [34], especially for long-term forecasting problems. Their scalability deteriorates when applied to long-duration, multi-year satellite temperature series.
To address these issues, we tailor the PatchTST model, as shown in Figure 3, which introduces a patch segmentation strategy inspired by image processing, dividing the temperature sequence of the solar array into fixed-length local patches and using Transformer encoders for feature extraction within each patch. This method not only reduces the computational complexity of the attention mechanism and enhances the ability to capture local temperature fluctuation patterns but also maintains the ability to express global trends through feature combinations between patches. This design is particularly suitable for the periodic and local abrupt changes in temperature data of solar array, effectively improving the predictive performance and robustness of the model for long sequences.
Compared to the quadratic computational complexity  O ( L 2 d )  inherent in the standard Transformer model when processing long sequences, PatchTST effectively reduces the computational burden through its distinctive patching strategy. Given an input sequence length  L , a patch length  P , and a patch stride  S , the number of patches  M  is given by  M = L P S + 1 . Since the self-attention mechanism operates on this sequence of patches, its computational complexity is reduced from  O ( L 2 d )  to  O ( M 2 d ) .
The model independently encodes and predicts the trend component and the residual component. For a multivariate time-series input  X , where  B  is the batch size,  N  is the number of variables, and  L  is the sequence length. PatchTST first divides each variable’s time-series into overlapping patches, using a sliding window. Each patch has a length of  P  and a stride of  S , resulting in  M  segments after segmentation.
X = unfold X B × N × P × M , M = L P S + 1 ,
Each patch is linearly projected into a dimensional latent space, followed by the addition of a learnable positional encoding, d indicates the embedding size.
Z = X W p + E pos ,     Z B × N × M × d ,
where  W p  denotes the linear projection weight matrix,  E pos  represents the learnable positional encoding, and  Z  is the resulting latent representation. The dimension  d  indicates the embedding size, which defines the hidden representation dimension of each patch after projection.
PatchTST adopts a channel-independent modeling approach. Unlike traditional multivariate attention mechanisms, this method independently models each channel during the encoding stage. Each variable has its own separate patch sequence  X c  and generates its corresponding latent representation  Z c , which helps to avoid interference across variables.
Z c = X c W p + E pos ,       Z c B × M × d ,
PatchTST utilizes a standard Transformer encoder to process the sequence of patches. Each encoding layer consists of multi-head self-attention [35] and a feed-forward network. For the layer, the computation is as follows:
Z ( l ) = LayerNorm Z ( l 1 ) + MultiHeadAttn Z ( l 1 ) ,
Z ( l ) = LayerNorm Z ( l ) + FFN Z ( l ) ,

3.2.3. Cross-Attention Mechanism

Although the PatchTST model can effectively extract time-series features of each individual variable, its modeling ability is limited when dealing with multi-source variables and auxiliary sequences. To fully capture the coupling relationship within multivariate temperature sequences, we introduce a cross-attention mechanism on the basis of the device grouping described in Section 3.2.1, to enhance feature complementarity across sequences.
For a target device group  V ( t ) , we define its multivariate time-series as the main component, denoted as  Z main B × | V ( t ) | × M × d , which serves as the primary sequence for prediction. The multivariate time-series from all other device groups,  V ( c ) , are concatenated to form the cross component, denoted as  Z cross B × | V ( c ) | × M × d , which provides contextual information.  V ( t )  and  V ( c )  denote the number of temperature channels in the target group and the combined cross groups, respectively. This design explicitly models the target device’s response to the thermal state of the entire solar array system.
The PatchTST encoder first processes the main and cross components independently, yielding their respective patch-based feature representations  H main    and  H cross . In the cross-attention layer, the Query ( Q ) is derived exclusively from the main component’s representation  H main   , compelling the model to focus on and refine the prediction of the target device. Conversely, the Key ( K ) and Value ( V ) are projected from the cross component’s representation  H cross , which encapsulates the thermal dynamics of the auxiliary devices.
The cross-attention layer consists of multi-head cross-attention and a feed-forward network, as shown in Figure 4. The main sequence undergoes a linear transformation to generate the Query, while the cross sequences undergo linear transformations to generate the Key and Value.  W q W k  and  W v  are learnable projection matrices.
Q = H main   W q ,   K = H cross W k ,   V = H cross W v
The cross-attention mechanism computes the relevance between Query and Key, and uses the resulting attention distribution to adaptively aggregate features from Value.
CrossAttention ( Q , K , V ) = softmax Q K T d k V ,
By computing attention weights between Query (target device) and Key (auxiliary devices), the model can adaptively capture coupling relationships among temperature variables across different devices. This mechanism enables the model to integrate cross-device contextual information, thereby achieving a more comprehensive perception of the system state and improving the accuracy and robustness of overall temperature trend forecasting and local dynamics of the solar array.
The final output of the cross-attention layer  Z out  is fused with the original input  Q  through a residual connection and layer normalization, followed by a feed-forward neural network:
Z 1 = LayerNorm Q + MultiHeadAttn Q , K , V ,
Z 2 = LayerNorm Z 1 + FFN Z 1 ,
Z out = CrossAttentionLayer   Z M a i n , Z Cross ,

3.2.4. Output Fusion

The merge module converts branch features into horizon-wise forecasts and fuses the trend and residual components. The hidden states from the trend and residual branches, denoted as  Z trend , Z res B × N × M × d , are first flattened along the last two dimensions per variable to obtain  H trend , H res B × N × ( P d ) . Each variable  c  is then independently projected to the forecast horizon  H  via a dedicated linear layer.
Y ^ c trend = H c trend W c trend + b c trend , Y ^ c res = H c res W c res + b c res ,
The final multi-step forecast is obtained through an element-wise summation of the trend and residual components. This additive fusion combines the long-term evolutionary patterns captured by the trend component with the short-term fluctuations modeled by the residual component, yielding a comprehensive forecast.
Y ^ = Y ^ trend + Y ^ res B × N × H ,
We used the mean squared error (MSE) as the loss function to train our forecasting model due to its stable gradient behavior and suitability for regression on continuous telemetry. The MSE quantifies the average squared discrepancy between predictions and ground truth across all variables, time steps, and batch samples [36], providing a uniform optimization objective for both the trend and residual forecasting branches. The loss is computed as:
L MSE = 1 B N H b = 1 B n = 1 N t = 1 H y b , n , t y ^ b , n , t 2 ,
where  y b , n , t  denotes the ground-truth value, and  y ^ b , n , t  the predicted value. This formulation ensures balanced gradient propagation, promotes smooth training dynamics, and facilitates the simultaneous optimization of both components in our architecture.
Back-propagation benefits from the additive form of the final forecast. With  y ^ = τ ^ + ε ^ , the loss gradient decomposes cleanly so that each branch receives gradient signals consistent with the overall forecast.
L MSE y ^ = L MSE τ ^ = L MSE ε ^ ,
This symmetry yields balanced updates for the trend and residual heads and drives consistent learning through subsequent layers. Model parameters  θ  are optimized with the Adam optimizer.
θ θ η θ L MSE ,

3.2.5. Model Architecture

The overall architecture of the proposed CA-PatchTST model is summarized in Table 1, which outlines the hierarchical structure, key operations, and the corresponding input/output shapes of each component. The model is designed as a dual-branch pipeline that processes multivariate temperature sequences through four major stages: decomposition, patch-based encoding, cross-attention fusion, and multi-step forecast output.
The multivariate input is decomposed by a moving-average filter into trend and residual components, each processed by an independent PatchTST branch that patchifies channels, applies linear projection with positional encodings, and uses channel-independent Transformer encoders. A cross-attention module fuses information across structural device groups (target as Query, auxiliaries as Key/Value), with output refined by residual connections, LayerNorm, and FFN. Prediction heads map both branches to multi-step forecasts, combined element-wise to yield final temperatures; trained end-to-end with MSE, the additive fusion ensures balanced optimization, while the design remains scalable and interpretable. Table 1 provides detailed component descriptions and tensor shapes for reproducibility.

3.2.6. Implementation Procedure of CA-PatchTST

To provide a clear and actionable overview of the proposed methodology, Algorithm 1 presents the complete operational workflow of the CA-PatchTST model in pseudocode. It outlines the step-by-step computational process, from the initial organization of input sequences and their decomposition into trend and residual components, through the core stages of patch-based encoding, cross-attention fusion, and channel-independent transformation, to the final generation of multi-step forecasts. This procedural blueprint is designed to facilitate a straightforward and accurate implementation of the model.
Algorithm 1. CA-PatchTST for Solar Array Temperature Trend Forecasting
1:  Input: Multivariate temperature sequence:  X B × N × L ,
        Device partition:  { G BMSP , G Bracket , G Body } ,
        patch length  P , stride  S , forecast horizon  H  
2: Organize sequences by device groups:  X min = X [ : , G target , : ] , X cross = X [ : , G G target , : ]
3: for component  c { main ,   cross }  do:
4:       Apply moving average filtering:  X trend ( c ) = MA ( X ( c ) )
5:       Compute residual component:  X res ( c ) = X ( c ) X trend ( c )
6: end for
7: for component  c { trend ,   res }  do:
8:       for branch  b { main ,   cross }  do:
9:               Patching:  P c ( b ) = Unfold ( X c ( b ) , kernel = P , stride = S ) B × N b × M × P
10:                Linear Projection & Position Encoding:  Z c ( b ) = P c ( b ) W p + E pos B × N b × M × D
11:                Channel-independent TST Encoding:  Z c ( b ) = TSTEncoder ( Z c ( b ) )
12:       end for
13: end for
14: for component  c { trend ,   res }  do:
15:       Extract representations:  Z main = Z c ( main ) , Z cross = Z c ( cross )
16:       Multi-head Cross-Attention:
                       Q = Z main W Q , K = Z cross W K , V = Z cross W V Attention = Softmax Q K d k V
17:       Residual Connection & Layer Normalization:  Z = LayerNorm ( Attention + Z main )
18:       Feed-Forward Network:  Z c out = LayerNorm ( FFN ( Z ) + Z )
19: end for
20: for component  c { trend ,   res }  do:
21:       Flatten and project to forecast horizon:  Y ^ c = Flatten ( Z c out ) W o B × N target × H
22: end for
23: Additive Fusion:  Y ^ = Y ^ trend + Y ^ res  
24: Compute MSE loss:  L = 1 B N target H Y Y ^ 2
25: Update parameters:  θ θ η θ L
26: Output: Multi-step forecasts  Y ^ B × N × H  

4. Experiments

4.1. GOCE Satellite Temperature Dataset

The GOCE satellite is a scientific mission satellite of the European Space Agency (ESA), operating in a low Earth orbit at approximately 263 km with an orbital period of 90 min [37]. This dataset covers multivariate temperature telemetry data of GOCE satellite solar array from March 2009 to June 2012, using ESA’s recommended 10 min resampled data, which has undergone prior calibration and statistical processes by ESA to address missing values and irregular sampling rates.
The dataset has significant orbital period characteristics: every 90 min of orbital period, the satellite experiences about 60 min of sunlight exposure and 30 min of Earth’s shadow period, causing periodic temperature fluctuations [38], as Figure 5 shows. The +Z plane mainly faces the Sun with the largest temperature variation amplitude, whereas the—Z plane faces away from the Sun with relatively stable temperature. 16 temperature sensors at different spatial positions provide multidimensional spatial temperature distribution and thermal gradient information of solar array. This multivariate temperature dataset has strong periodic patterns, multi-level structural information and characteristics of an extreme thermal environment, providing a clear pattern basis for local feature extraction.
The Pearson correlation coefficient is employed to quantify the linear correlations among temperature parameters across the solar array’s structural components, revealing a clear hierarchy of thermal interdependencies. Analysis demonstrates exceptionally strong intra-group cohesion, particularly within the Interface Bracket group where sensors such as THT10008, THT10022 and THT10024 exhibit near-unity correlation coefficients of 1.00, indicating virtually identical thermal profiles due to spatial proximity and shared thermal mass. The matrix further reveals distinct inter-group coupling patterns, with bracket sensor THT10002 showing a strong correlation of 0.73 with specific BMSP sensors. The consistent weak positive correlations between Interface Bracket and Wing Body sensors, exemplified by THT10000 at 0.25, demonstrate additional thermal connectivity. This correlation matrix provides a crucial physical prior that not only demonstrates the array’s operation as a complex interconnected thermal system but also establishes a quantitative benchmark for validating whether our CA-PatchTST model’s learned attention patterns accurately reflect these measurable physical relationships [39].

4.2. Experiment Settings

To validate the effectiveness of the proposed method for solar-array temperature forecasting, we used the GOCE satellite temperature dataset with a 7:2:1 split for training, validation, and testing. The lookback window was set to 144 time steps, which equals 24 h at 10 min sampling. This duration spans a complete daily cycle, enabling the model to learn both intra-orbit illumination/eclipse transitions and diurnal thermal variation. Forecast horizons were set to 144, 432, and 720 steps, corresponding to 24, 72, and 120 h. These three horizons provide a balanced evaluation of short-, medium-, and long-range prediction scenarios that are relevant to operational planning. The model configuration is summarized in Table 2, where a learning rate of 0.0001 was used to ensure stable gradient updates and a dropout rate of 0.05 is adopted to mitigate overfitting.
All experiments were conducted on a workstation with an NVIDIA RTX 4060 GPU from NVIDIA Corporation, (Santa Clara, CA, USA) an Intel Core i7-14650HX CPU from Intel Corporation, (Santa Clara, CA, USA) and 32 GB of RAM, using Python 3.8, PyTorch 1.12, and CUDA 11.6. The hardware and software stack was kept fixed across all runs to maintain consistent comparisons and support reproducibility.

4.3. Evaluation Metrics

To evaluate the forecasting accuracy of the model, RMSE, MAE, and MAPE were used as performance metrics. Here,  y i  denotes the ground truth,  y ^ i  denotes the predicted value,  n  is the number of samples, and  y ¯  represents the mean of the ground truth values. These evaluation metrics characterize the deviation between the predicted and actual values from different perspectives.
RMSE = 1 n i = 1 n y i y ^ i 2 ,
MAE = 1 n i = 1 n | y i y ^ i | ,
MAPE = 1 n t = 1 n y t y ^ t y t × 100 % ,

4.4. Comparison with Other Methods

4.4.1. Attention Visualization

To verify whether the cross-attention mechanism within the trained CA-PatchTST model learns physically meaningful relationships, we compared its learned attention patterns against the static statistical correlations inherent in the data. The objective was to examine whether that model’s dynamic, learned dependencies align with the inherent structural couplings suggested by the Pearson correlation coefficient matrix.
The analysis was performed using the final, pre-trained CA-PatchTST model in inference mode, with its weights frozen. The model was configured with a lookback window of 144 time steps (24 h), a forecast horizon of 432 time steps (72 h), and a patch length of 16. To ensure the evaluation reflects the model’s generalization capabilities, the attention weights were computed using 64 representative batches drawn from the test set. This approach provided a snapshot of the model’s learned dependencies when processing previously unseen data.
To analyze inter-group dynamics, we extracted raw attention weights from the cross-attention layers by iteratively using each structural group (Interface Bracket, BMSP Structure, Wing Body) as the Query and the other two as Key and Value, yielding three attention maps. We averaged the attention weights across samples, heads, and temporal patches, resulting in a matrix of mean attention scores for each Query–Key pair. This matrix was column-wise z-score normalized; positive values (red) indicate higher-than-average attention, while negative values (blue) indicate lower focus, reflecting relative predictive importance across parameters.
The resulting attention heatmaps exhibit strong and consistent agreement with the structural relationships captured by the PCC matrix as Figure 6, providing numerical validation that the model’s learned attention aligns with measurable physical couplings. When the BMSP Structure serves as the Query (Figure 7a), the PCC matrix shows strong correlations with the Wing Body, such as values reaching 0.73 between certain sensors, and moderate yet notable correlations with the Interface Bracket, exemplified by coefficients around 0.25. This pattern is clearly reflected in the attention heatmap through consistently high positive z-scores from the BMSP parameters toward both of the other groups. Similarly, with the Wing Body as the Query (Figure 7b), the PCC matrix confirms strong ties to the BMSP Structure, with correlations up to 0.73, but weak associations with the Interface Bracket, some near zero or slightly negative like −0.01. The cross-attention mechanism accurately mirrors this distinction with predominantly positive attention toward the BMSP Structure and significantly weaker or negative z-scores toward the Bracket group. When the Interface Bracket is used as the Query (Figure 7c) the PCC indicates moderate correlation with the BMSP Structure, for instance 0.25, and the weakest correlation with the Wing Body, around 0.10.
The consistent alignment between the learned attention patterns of the model and the static Pearson correlations across all three structural groups demonstrates that the cross-attention mechanism captures physically interpretable relationships from the time-series data. This result confirms the capacity of the model to represent the underlying structural dynamics of the system in a manner consistent with empirical correlation patterns.

4.4.2. Comparison with Other Forecasting Methods

This paper systematically evaluates the performance of four advanced models, including the proposed CA-PatchTST, the classic linear model Dlinear (2022) [22], the RNN-based SegRNN (2023) [20], and the Transformer-based Informer (2021) [26], and two recently proposed Transformer variants, TimesNet (2023) [40] and iTransformer (2024) [41], across near-term, medium-term, and long-term forecast horizons of 24, 72, and 120 h for multivariate solar array temperature forecasting. As clearly demonstrated in Figure 8c, the proposed CA-PatchTST model consistently outperforms all baseline methods, achieving notably low MAPE values of 9.26% at 24 h, 13.68% at 72 h, and 24.67% at 120 h, significantly surpassing alternative models across all horizons.
This performance advantage stems from the integrated design of the model: the cross-attention mechanism effectively captures inter-variable correlations among structural groups, while patch-based encoding and trend–residual decomposition jointly enhance local feature extraction and noise suppression. The model maintains robust accuracy over extended horizons, mitigating error accumulation and temporal drift, with the performance gap particularly widening against sequence-sensitive baselines such as SegRNN, as evidenced by the comparison of 24.67% versus 47.22% MAPE at 120 h. Moreover, compared with recent Transformer variants such as TimesNet, iTransformer, and Informer, CA-PatchTST consistently achieves lower error and higher stability across all horizons, demonstrating superior capability in modeling non-stationary and long-range dependencies in multivariate satellite telemetry data. These results align with earlier ablation studies, confirming the necessity of both cross-attention and decomposition modules in sustaining prediction stability. Figure 8 visually illustrates the superior temporal generalization capability and structural effectiveness of CA-PatchTST, underscoring its suitability for reliable long-horizon satellite temperature forecasting and predictive maintenance applications.
To visually evaluate the superior performance of the proposed CA-PatchTST model in long-sequence forecasting tasks, we designed a rigorous comparative experiment. All models were evaluated under identical settings, using an input sequence of 144 time steps (24 h) to forecast the subsequent 432 time steps (72 h) in a single forward pass. Figure 9 shows the forecasting results of these models for the key sensor THT10002 on the BMSP structure during the first week of January 2012, which comprises two subplots: the upper panel compares the predicted and ground truth curves, while the lower panel displays the corresponding error distributions.
From the upper subplot, the satellite telemetry temperature series displays clear periodic fluctuations with sharp peaks and troughs, reflecting its complex, nonlinear and dynamic characteristics. Among all compared models, CA-PatchTST demonstrates the highest consistency with the true values. Its predicted curve aligns almost perfectly with the ground truth, successfully capturing both long-term periodic trends and transient variations. In contrast, models such as DLinear, SegRNN, Informer, TimesNet and iTransformer exhibit visible deviations, especially during rapid temperature transitions, indicating weaker adaptability to high-frequency and nonlinear dynamics inherent in satellite telemetry data.
The error distributions in the lower subplot further confirm the superiority of CA-PatchTST. Its error curve exhibits the smallest fluctuation range, remaining consistently close to zero, which reflects high stability and reliability throughout the forecast horizon. The error curves of the other three models, however, demonstrate considerably larger and irregular fluctuations, far exceeding those of CA-PatchTST. Notably, these baseline models produce abnormal error spikes around critical peaks and troughs, which would be unacceptable in high-precision applications such as spacecraft thermal management.
This comparative experiment provides strong evidence that the proposed CA-PatchTST model possesses significant advantages over mainstream models in handling satellite telemetry data with complex dynamics. Its high forecasting accuracy, stability, and capability to capture critical variations make it a highly suitable and promising approach for thermal analysis and forecasting in spacecraft applications.
To quantitatively assess the computational efficiency and deployment feasibility of the proposed model for resource-constrained onboard satellite systems, we conducted a comparative analysis of key efficiency metrics against several state-of-the-art baselines. The evaluation was performed under a standardized experimental setup to ensure a fair comparison: a batch size of 32, an input sequence length of 144, and a prediction horizon of 432 were applied uniformly across all models. All experiments were executed on the same hardware platform with a fixed random seed to eliminate performance variability. We report three critical metrics for each model: the number of parameters, the average inference time per batch, and the peak GPU memory consumption during inference.
As demonstrated in Table 3, the models exhibit distinct efficiency profiles corresponding to their architectural families. Among Transformer-based models, our proposed CA-PatchTST demonstrates superior efficiency, achieving the most compact architecture (4.9 M parameters), the fastest inference speed (10.5 ms), and the lowest memory consumption (2.4 GB) within its architectural class. This efficiency is attributed to the patch-based segmentation strategy and channel-independent modeling, which effectively reduces computational redundancy while preserving representational capacity. Linear models, exemplified by DLinear, achieve even higher computational efficiency due to their structural simplicity; while DLinear infers slightly faster, CA-PatchTST maintains a markedly better forecasting accuracy, justifying its marginal computational overhead. In contrast, RNN-based models like SegRNN exhibit the lowest efficiency across all metrics, which aligns with the known challenges of recurrent architectures in processing long sequences. The results indicate that CA-PatchTST strikes a favorable trade-off, delivering significantly superior forecasting accuracy over simpler linear models with only a marginal computational overhead, while simultaneously overcoming the efficiency limitations of RNN-based approaches. This makes it a practical and effective solution for deployment in resource-constrained satellite systems where both prediction performance and operational efficiency are critical.

4.5. Ablation Experiment

4.5.1. Component Ablation Experiment

To rigorously quantify the individual contributions of the core components within the proposed CA-PatchTST framework, a comprehensive set of ablation studies was carried out. As a critical methodology in deep learning research, ablation studies systematically remove specific elements from the full model to isolate and measure their impact on performance. This approach verifies whether each module functions as intended and clarifies their complementary roles.
As detailed in Table 4, we specifically ablated the cross-attention mechanism and the sequence decomposition module to assess their individual contributions to forecasting accuracy. The results show that the complete CA-PatchTST model achieves the best performance across all forecast horizons in terms of RMSE, MAE, and MAPE, confirming its superior accuracy and robustness. Through systematic evaluation of four model variants, we further dissect the individual and combined effects of the cross-attention and decomposition components.
The CA-PatchTST model, incorporating both cross-attention and sequence decomposition modules, consistently achieves the lowest errors across all forecast horizons, particularly exemplified at the 144-step (24 h) horizon with values of RMSE of 1.538, MAE of 0.885, and MAPE of 9.26%, underscoring the synergistic effect of integrated components. Removing the cross-attention module leads to noticeable performance degradation in short-term predictions, with RMSE rising to 1.710 and MAPE increasing to 16.35% at 144 steps, highlighting its essential role in capturing inter-variable dependencies and refining cross-device interactions. Conversely, disabling the decomposition module significantly impairs medium- and long-term forecasting performance, as indicated by the increase in RMSE to 2.199 and MAPE to 19.54% at 432 steps (72 h), confirming the importance of moving-average decomposition in isolating trend–residual patterns and stabilizing long-horizon forecasts. The baseline model without both components yields the poorest performance across all horizons, especially over longer sequences where MAPE reaches 28.13% at 720 steps, affirming that neither mechanism alone suffices to model the complex, non-stationary dynamics inherent in solar array temperature data.
The consistent superiority of the full model across all horizons confirms the complementary roles of the two components: cross-attention effectively models short-term, cross-sensor thermal couplings, while the decomposition module captures underlying trend-periodicity structures essential for medium- and long-term forecasting.

4.5.2. Backbone-Attention Ablation Experiment

To comprehensively evaluate the impact of different encoder architectures and attention mechanisms on model performance, we conducted a systematic ablation study comparing four representative encoder backbones (PatchTST, TCN, SRU, and iTransformer), each paired with either cross-attention or squeeze-and-excitation (SE) attention mechanisms. All models were evaluated under a forecast horizon of 432 steps (72 h) using the GOCE satellite temperature dataset, with consistent input settings including a lookback window of 144 time steps, patch length of 16, and stride of 8 where applicable. Performance was measured using RMSE, MAE, and MAPE to ensure a comprehensive assessment of forecasting accuracy.
As clearly demonstrated in Table 5, the combination of the PatchTST encoder with cross-attention significantly outperforms all other encoder–attention combinations across every metric, achieving an RMSE of 2.079, MAE of 1.231, and MAPE of 13.68%. This superior performance can be attributed to the synergistic effects of PatchTST’s patch-based segmentation and channel-independent encoding, which effectively capture both local temporal patterns and long-range dependencies, coupled with the cross-attention mechanism’s capacity to model dynamic inter-variable correlations across different structural groups of the solar array. In contrast, the same PatchTST encoder augmented with SE attention, which performs only channel-wise recalibration without explicit cross-variable interaction, produces substantially worse results, with an RMSE of 2.884 and MAPE of 22.76%, underscoring the critical importance of modeling inter-sensor dependencies in multivariate forecasting.
Other encoder architectures, including TCN, SRU, and iTransformer, consistently underperform the PatchTST-based models regardless of the attention mechanism used. For instance, the TCN encoder paired with cross-attention attains an RMSE of 3.441 and MAPE of 27.65%, while the SRU-based model reaches an RMSE of 3.385 and MAPE of 18.69%. The iTransformer model with cross-attention performs slightly better among non-PatchTST encoders, yet still falls short with an RMSE of 3.265 and MAPE of 20.56%. These results suggest that while cross-attention generally enhances each backbone’s ability to leverage cross-variable information, the architectural advantages of PatchTST, such as its patching strategy, reduced computational complexity, and improved local feature extraction, are essential for achieving state-of-the-art performance in long-horizon, multi-sensor temperature forecasting.
The consistent superiority of the CA-augmented PatchTST model affirms its efficacy in capturing both complex temporal dependencies and physically meaningful interactions among solar array temperature variables, as further corroborated by the attention alignment analysis in Section 4.4. This ablation study not only confirms the rationale underlying the proposed CA-PatchTST framework but also highlights the limitations of conventional SE-style attention and other encoder architectures in handling high-dimensional, non-stationary satellite telemetry data.

5. Conclusions

This paper proposes CA-PatchTST, a multivariate forecasting framework for solar array temperature, and validates its superior performance on the GOCE satellite telemetry dataset, where empirical results demonstrate consistent improvements in MAE, RMSE, and MAPE across multiple horizons, confirming the model’s accuracy and robustness for long-horizon on-orbit forecasting. The methodological contribution lies in the coherent integration of complementary modules into a unified pipeline: the moving-average decomposition, matched to the satellite’s orbital cycle, suppresses high-frequency fluctuations and stabilizes long-horizon learning; the patch-based, channel-independent encoder enhances local feature extraction and global temporal representation while improving computational efficiency; and the cross-attention mechanism captures inter-device correlations by fusing auxiliary thermal streams, with learned attention patterns aligning with physical couplings revealed by the Pearson correlation matrix. Together, these components form a compact and interpretable framework whose primary application value lies in providing critical data support for autonomous satellite operations and informed power system decision-making.
In future work, we will advance our prototype toward operational deployment by focusing on model lightweighting and online adaptation. For lightweighting, we will implement pruning, quantization, and knowledge distillation to optimize the efficiency–accuracy trade-off. For online adaptation, we will develop continual learning with experience replay and anomaly-aware updates to maintain performance under environmental shifts. These co-designed improvements will enhance robustness and enable long-term autonomous satellite power management.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W. and X.S.; software, Y.W.; validation, Y.W.; formal analysis, Y.W.; investigation, Y.W., and X.S.; data curation, Y.W. and Z.Z.; resources, Y.W., X.S., Z.Z. and F.Z.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W. and X.S.; visualization, Y.W.; supervision, X.S.; funding acquisition, X.S. and F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant number 62401429, 62401445, and 62425113, in part by the Postdoctoral Fellowship Program of CPSF under grant number GZC20241332, GZC20232048 and GZC20251207, in part by the China Postdoctoral Science Foundation under grant number 2024M761178 and 2025M771739, and in part by the Fundamental Research Funds for the Central Universities under grant number ZYTS25144.

Data Availability Statement

This study uses ESA Earth Observation data from the GOCE mission. Access and use of ESA EO data are subject to ESA’s Terms and Conditions for the Utilisation of ESA’s Earth Observation Data. Data may be accessed upon registration (free datasets) or via specific requests for restrained datasets. (“Data provided by the European Space Agency (ESA).” © ESA (2009–2012)).

Acknowledgments

Data provided by the European Space Agency (ESA). We thank ESA for access to the GOCE Earth Observation dataset. A copy of this publication will be provided to ESA at eopi@esa.int.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CA-PatchTSTCross-Attention Patch Time-Series Transformer
PCCPearson correlation coefficient
CNNConvolutional Neural Network
FFNFeed-Forward Network
GOCEGravity field and steady-state Ocean Circulation Explorer
SESqueeze-and-Excitation
SRUSimple Recurrent Unit
TCNTemporal Convolutional Network

References

  1. Nwankwo, V.U.; Jibiri, N.N.; Kio, M.T. The impact of space radiation environment on satellites operation in near-Earth space. In Satellites Missions and Technologies for Geosciences; IntechOpen: London, UK, 2020. [Google Scholar]
  2. Liu, T.; Sun, Q.; Meng, J.; Pan, Z.; Tang, Y. Degradation modeling of satellite thermal control coatings in a low earth orbit environment. Sol. Energy 2016, 139, 467–474. [Google Scholar] [CrossRef]
  3. Tafazoli, M. A study of on-orbit spacecraft failures. Acta Astronaut. 2009, 64, 195–205. [Google Scholar] [CrossRef]
  4. Landis, G.A.; Bailey, S.G.; Tischler, R. Causes of power-related satellite failures. In Proceedings of the 2006 IEEE 4th World Conference on Photovoltaic Energy Conference, Waikoloa, HI, USA, 7–12 May 2006; pp. 1943–1945. [Google Scholar]
  5. Miller, S.K.; Banks, B. Degradation of spacecraft materials in the space environment. MRS Bull. 2010, 35, 20–24. [Google Scholar] [CrossRef]
  6. Huang, W.; Andrada, R.; Borja, D. A framework of big data driven remaining useful lifetime prediction of on-orbit satellite. In Proceedings of the 2021 Annual Reliability and Maintainability Symposium (RAMS), Orlando, FL, USA, 24–27 May 2021; pp. 1–7. [Google Scholar]
  7. Ochuba, N.; Usman, F.; Okafor, E.; Akinrinola, O.; Amoo, O. Predictive analytics in the maintenance and reliability of satellite telecommunications infrastructure: A conceptual review of strategies and technological advancements. Eng. Sci. Technol. J. 2024, 5, 704–715. [Google Scholar] [CrossRef]
  8. He, J.; Cheng, Z.; Guo, B. Anomaly detection in satellite telemetry data using a sparse feature-based method. Sensors 2022, 22, 6358. [Google Scholar] [CrossRef]
  9. Liu, H.; Kong, C.; Shen, Y.; Lin, B.; Wang, X.; Zhang, Q. Short-Period Characteristics Analysis of On-Orbit Solar Arrays. Aerospace 2025, 12, 706. [Google Scholar] [CrossRef]
  10. Xu, Z.; Cheng, Z.; Guo, B. A hybrid data-driven framework for satellite telemetry data anomaly detection. Acta Astronaut. 2023, 205, 281–294. [Google Scholar] [CrossRef]
  11. Fang, J.; Guo, X.; Liu, Y.; Chang, X.; Fujita, H.; Wu, J. An attention-based deep learning model for multi-horizon time series forecasting by considering periodic characteristic. Comput. Ind. Eng. 2023, 185, 109667. [Google Scholar] [CrossRef]
  12. Peng, Y.; Jia, S.; Xie, L.; Shang, J. Accurate Satellite Operation Predictions Using Attention-BiLSTM Model with Telemetry Correlation. Aerospace 2024, 11, 398. [Google Scholar] [CrossRef]
  13. Li, Y.; Lu, X.; Xiong, H.; Tang, J.; Su, J.; Jin, B.; Dou, D. Towards long-term time-series forecasting: Feature, pattern, and distribution. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 1611–1624. [Google Scholar]
  14. Haupt, S.E.; McCandless, T.C.; Dettling, S.; Alessandrini, S.; Lee, J.A.; Linden, S.; Petzke, W.; Brummet, T.; Nguyen, N.; Kosović, B. Combining artificial intelligence with physics-based methods for probabilistic renewable energy forecasting. Energies 2020, 13, 1979. [Google Scholar] [CrossRef]
  15. Mackey, R.; Kulikov, I. Forecasting Spacecraft Telemetry Using Modified Physical Predictions. In Proceedings of the Annual Conference of the PHM Society, Portland, OR, USA, 10–14 October 2010. [Google Scholar]
  16. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  17. Zhang, G.; Han, S.; Ye, J.; Hao, R.; Zhang, J.; Li, X.; Jia, K. A method for precisely predicting satellite clock bias based on robust fitting of ARMA models. GPS Solut. 2022, 26, 3. [Google Scholar] [CrossRef]
  18. Yang, J.; Zhang, X.; Chen, W.; Rong, F. Prophet–CEEMDAN–ARBiLSTM-Based Model for Short-Term Load Forecasting. Future Internet 2024, 16, 192. [Google Scholar] [CrossRef]
  19. Cai, Z.; Si, Y.; Zhang, J.; Zhu, L.; Li, P.; Feng, Y. Industrial Internet intrusion detection based on Res-CNN-SRU. Electronics 2023, 12, 3267. [Google Scholar] [CrossRef]
  20. Lin, S.; Lin, W.; Wu, W.; Zhao, F.; Mo, R.; Zhang, H. Segrnn: Segment recurrent neural network for long-term time series forecasting. arXiv 2023, arXiv:2308.11200. [Google Scholar]
  21. Wang, H.; Yang, J.; Shi, L.; Wang, R. Remaining useful life prediction based on adaptive SHRINKAGE processing and temporal convolutional network. Sensors 2022, 22, 9088. [Google Scholar] [CrossRef]
  22. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 11121–11128. [Google Scholar]
  23. Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
  24. Oliveira, J.M.; Ramos, P. Evaluating the effectiveness of time series transformers for demand forecasting in retail. Mathematics 2024, 12, 2728. [Google Scholar] [CrossRef]
  25. Cuéllar, S.; Santos, M.; Alonso, F.; Fabregas, E.; Farias, G. Explainable anomaly detection in spacecraft telemetry. Eng. Appl. Artif. Intell. 2024, 133, 108083. [Google Scholar] [CrossRef]
  26. Yang, Z.; Liu, L.; Li, N.; Tian, J. Time series forecasting of motor bearing vibration based on informer. Sensors 2022, 22, 5858. [Google Scholar] [CrossRef]
  27. Qin, B.; Gao, X.; Ding, T.; Li, F.; Liu, D.; Zhang, Z.; Huang, R. A hybrid deep learning model for short-term load forecasting of distribution networks integrating the channel attention mechanism. IET Gener. Transm. Distrib. 2024, 18, 1770–1784. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  29. Keles, F.D.; Wijewardena, P.M.; Hegde, C. On the computational complexity of self-attention. In Proceedings of the International Conference on Algorithmic Learning Theory, Singapore, 20–23 February 2023; pp. 597–619. [Google Scholar]
  30. Zaheer, M.; Guruganesh, G.; Dubey, K.A.; Ainslie, J.; Alberti, C.; Ontanon, S.; Pham, P.; Ravula, A.; Wang, Q.; Yang, L. Big bird: Transformers for longer sequences. Adv. Neural Inf. Process. Syst. 2020, 33, 17283–17297. [Google Scholar]
  31. Xiong, X.; Zhang, M.; Zhao, H.; Jin, D.; Jia, M. Solar-Array Attenuation Analysis Method for Solar Synchronous Orbit Satellites. In Proceedings of the 2023 14th International Conference on Reliability, Maintainability and Safety (ICRMS), Urumuqi, China, 26–29 August 2023; pp. 232–237. [Google Scholar]
  32. Ruszczak, B.; Kotowski, K.; Evans, D.; Nalepa, J. The OPS-SAT benchmark for detecting anomalies in satellite telemetry. Sci. Data 2025, 12, 710. [Google Scholar] [CrossRef]
  33. Kreuzer, T.; Zdravkovic, J.; Papapetrou, P. Unpacking the trend: Decomposition as a catalyst to enhance time series forecasting models. Data Min. Knowl. Discov. 2025, 39, 54. [Google Scholar] [CrossRef]
  34. Fournier, Q.; Caron, G.M.; Aloise, D. A practical survey on faster and lighter transformers. ACM Comput. Surv. 2023, 55, 1–40. [Google Scholar] [CrossRef]
  35. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  36. Jadon, A.; Patil, A.; Jadon, S. A comprehensive survey of regression-based loss functions for time series forecasting. In Proceedings of the International Conference on Data Management, Analytics & Innovation, Vellore, India, 19–21 January 2024; pp. 117–147. [Google Scholar]
  37. Rummel, R.; Gruber, T.; Flury, J.; Schlicht, A. ESA’s gravity field and steady-state ocean circulation explorer GOCE. ZFV-Z. Geodäsie Geoinf. Landmanag. 2009, 24, 339–386. [Google Scholar]
  38. RM, S. Computation of eclipse time for low-earth orbiting small satellites. Int. J. Aviat. Aeronaut. Aerosp. 2019, 6, 15. [Google Scholar] [CrossRef]
  39. Li, J.; Yan, S.; Cai, R. Thermal analysis of composite solar array subjected to space heat flux. Aerosp. Sci. Technol. 2013, 27, 84–94. [Google Scholar] [CrossRef]
  40. Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
  41. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
Figure 1. The Overall Framework of the Proposed CA-PatchTST.
Figure 1. The Overall Framework of the Proposed CA-PatchTST.
Sensors 25 07199 g001
Figure 2. GOCE Satellite Solar Array Temperatures Trend over Time.
Figure 2. GOCE Satellite Solar Array Temperatures Trend over Time.
Sensors 25 07199 g002
Figure 3. The Flowchart of the PatchTST Model.
Figure 3. The Flowchart of the PatchTST Model.
Sensors 25 07199 g003
Figure 4. The Flowchart of Cross-attention Layer.
Figure 4. The Flowchart of Cross-attention Layer.
Sensors 25 07199 g004
Figure 5. Time-Series of Temperatures for the GOCE Satellite Solar Array.
Figure 5. Time-Series of Temperatures for the GOCE Satellite Solar Array.
Sensors 25 07199 g005
Figure 6. Pearson Correlation Matrix of Solar Array Temperatures.
Figure 6. Pearson Correlation Matrix of Solar Array Temperatures.
Sensors 25 07199 g006
Figure 7. Cross-Attention Analysis by Structural Grouping.
Figure 7. Cross-Attention Analysis by Structural Grouping.
Sensors 25 07199 g007
Figure 8. Performance Comparison of Different Models.
Figure 8. Performance Comparison of Different Models.
Sensors 25 07199 g008aSensors 25 07199 g008b
Figure 9. Comparative Forecasting Performance of THT10002 Parameter across Different Models.
Figure 9. Comparative Forecasting Performance of THT10002 Parameter across Different Models.
Sensors 25 07199 g009
Table 1. Model Structure *.
Table 1. Model Structure *.
BlockLayerOperationInput ShapeOutput Shape
DecompositionInputMain sequence,
Cross sequence
[B, N, seq_len] [B,  N tat , seq_len]
[B,  N aux , seq_len]
Moving-average decompositionSplit each sequence into trend and residual
component via MA filter
[B,  N tat , seq_len]
[B,  N aux , seq_len]
[B,  N tat , seq_len] × 2
[B,  N aux , seq_len] × 2
(Trend/Res.)
PatchTST (Single-Branch)PatchingReplicationPad1d[B,  N sub , seq_len][B,  N sub , M, P]
Unfold and Permute
Patch projection + Pos encodingLinear projection[B,  N sub , M, P][B,  N sub , M, D]
Dropout
Add learnable
positional encodings
TST Encoder × N
(Channel-independent for per variable).
Multi-Head Self-Attention[B,  N sub , M, D][B,  N sub , M, D]
Dropout
Residual shortcut
LayerNorm
FFN: Linear (D → 2D) → GELU → Dropout → Linear (2D → D)
Residual shortcut
LayerNorm
Cross-Attention (Single-Branch)Pre-CAQ from main,
K/V from cross
[B,  N tat , M, D]
[B,  N aux , M, D]
Q: [B,  N tat , M, D]
K/V: [B,  N aux , M, D]
Cross-Attention block × MMulti-Head Cross-Attention: Reshape to muti-heads → Attn softmax → Dropout → Attn·V → Concat heads[B,  N tat , M, D]
[B,  N aux , M, D]
[B,  N tat , M, D]
Dropout
Residual shortcut
LayerNorm
FFN: Linear (D → 4D) → GELU → Dropout → Linear (4D → D)
Residual shortcut
LayerNorm
Output and FusionPrediction head (Per branch)Permute[B,  N tat , M, D][B,  N tat , pred_len]
Flatten
Linear projection
Dropout
Fusion (Trend + Residual)Element-wise sum of
branch outputs to obtain
normalized prediction
[B,  N tat , pred_len][B,  N tat , pred_len]
* The meanings of the characters in the table are as follows. B: batch size; N: number of input channels in general; seq_len: input sequence length;  N tat : number of target-device channels;  N aux : number of auxiliary-device channels;  N sub : channel count within a single branch, ( N sub N tat , N aux ); P: patch length; M: number of patches; D: embedding dimension; pred_len: forecasting horizon.
Table 2. Model Configuration.
Table 2. Model Configuration.
Model ParametersValue
batch_size32
epoch30
learning_rate0.0001
dropout0.05
seq_len144
patch_len16
patch_stride8
Encoder_layer_num2
Linear_projection_size64
Att_head_num4
CA_layer_num2
FFN_hidden_size128
Table 3. Computational Efficiency Comparison of Different Models.
Table 3. Computational Efficiency Comparison of Different Models.
ModelsParams (M)Inference Time (ms)Peak Memory (GB)
CA-PatchTST4.910.52.4
DLinear3.27.81.6
TimesNet7.616.84.3
iTransformer7.414.03.9
Informer5.813.33.3
SegRNN8.718.24.8
Table 4. Component Ablation Results Comparison *.
Table 4. Component Ablation Results Comparison *.
AblationForecasting LengthMetric
CADecompositionRMSEMAEMAPE
144 (24 h)1.5380.8859.26%
432 (72 h)2.0791.23113.68%
720 (120 h)2.8771.66724.67%
×144 (24 h)1.7101.04016.35%
432 (72 h)2.4551.46722.72%
720 (120 h)3.0031.76427.61%
×144 (24 h)1.5450.89212.52%
432 (72 h)2.1991.32119.54%
720 (120 h)2.9691.72928.05%
××144 (24 h)1.6861.01314.39%
432 (72 h)2.2201.35016.68%
720 (120 h)2.9511.76028.13%
* The meanings of the characters in the table are as follows. : The component was used. ×: The component was removed.
Table 5. Backbone-Attention Ablation Results.
Table 5. Backbone-Attention Ablation Results.
Encoder StructureAttention MechanismMetric
RMSEMAEMAPE
PatchTSTCA2.0791.23113.68%
SE2.8841.96122.76%
TCNCA3.4411.81927.65%
SE3.9992.01529.34%
SRUCA3.3851.80318.69%
SE4.0662.43137.04%
iTransformerCA3.2651.59620.56%
SE3.9901.90034.27%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Shi, X.; Zhang, Z.; Zhou, F. A Solar Array Temperature Multivariate Trend Forecasting Method Based on the CA-PatchTST Model. Sensors 2025, 25, 7199. https://doi.org/10.3390/s25237199

AMA Style

Wang Y, Shi X, Zhang Z, Zhou F. A Solar Array Temperature Multivariate Trend Forecasting Method Based on the CA-PatchTST Model. Sensors. 2025; 25(23):7199. https://doi.org/10.3390/s25237199

Chicago/Turabian Style

Wang, Yunhai, Xiaoran Shi, Zhenxi Zhang, and Feng Zhou. 2025. "A Solar Array Temperature Multivariate Trend Forecasting Method Based on the CA-PatchTST Model" Sensors 25, no. 23: 7199. https://doi.org/10.3390/s25237199

APA Style

Wang, Y., Shi, X., Zhang, Z., & Zhou, F. (2025). A Solar Array Temperature Multivariate Trend Forecasting Method Based on the CA-PatchTST Model. Sensors, 25(23), 7199. https://doi.org/10.3390/s25237199

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop