A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators

Long, Jiahui; Jia, Xiang; Li, Bingyi; Zhu, Lin; Wang, Miao

doi:10.3390/bdcc9070163

Open AccessArticle

A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators

by

Jiahui Long

,

Xiang Jia

^*,

Bingyi Li

,

Lin Zhu

and

Miao Wang

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(7), 163; https://doi.org/10.3390/bdcc9070163

Submission received: 16 May 2025 / Revised: 14 June 2025 / Accepted: 17 June 2025 / Published: 20 June 2025

Download

Browse Figures

Versions Notes

Abstract

The accurate forecasting of key performance indicators (KPIs) is essential for enhancing the reliability and operational efficiency of engineering systems under increasingly complex security challenges. However, existing approaches often neglect the heterogeneous nature of multivariate time series data, particularly the consistency of measurements and the influence of external factors, which limits their effectiveness in real-world scenarios. In this work, a Consistency-aware Hybrid Static-Dynamic Multivariate forecasting Network (CHSDM-Net) is proposed, which first applies a consistency-aware, optimization-driven segmentation to ensure high internal consistency within each segment across multiple variables. Secondly, a hybrid forecasting model integrating a Static Representation Module and a Dynamic Temporal Disentanglement and Attention Module for static and dynamic data fusion is proposed. For the dynamic data, the trend and periodic components are disentangled and fed into Trend-wise Attention and Periodic-aware Attention blocks, respectively. Extensive experiments on both synthetic and real-world radar detection datasets demonstrated that CHSDM-Net achieved significant improvements compared with existing methods. Comprehensive ablation and sensitivity analyses further validated the effectiveness and robustness of each component. The proposed method offers a practical and generalizable solution for intelligent KPI forecasting and decision support in industrial engineering applications.

Keywords:

industrial KPI forecasting; consistency-aware dynamic segmentation; hybrid static–dynamic modeling; feature disentanglement; global–local correlation modeling

1. Introduction

The accurate forecasting of key performance indicators (KPIs) is critical for informed decision-making and proactive management. KPI data are typically collected as time series, capturing the temporal evolution of multiple interrelated variables [1]. Such sequential data are widely utilized across domains including finance [2,3], energy [4,5], medicine [6,7], environmental monitoring [8,9], and industrial processes [10,11], where they reflect the evolving states of crucial variables. However, these time series often exhibit pronounced nonlinearity and nonstationarity, which pose significant challenges for accurate modeling and prediction. This is particularly evident in the industrial domain, where the complexity and highly dynamic nature of KPI time series demand robust and effective forecasting approaches.

To address these challenges, data-driven models have become the predominant and effective approach to time series forecasting in complex and dynamic scenarios. These methods are generally categorized into univariate and multivariate forecasting. Univariate approaches exclusively rely on the historical values of a single time series to predict its future trajectory, emphasizing the intrinsic temporal patterns [12,13]. In contrast, multivariate time series forecasting (MTSF) approaches incorporate additional external or related variables, capturing cross-variable dependencies and the influence of multiple factors on the target indicator [14,15]. This distinction is particularly relevant in industrial applications, where KPIs are often shaped by the complex interactions of numerous interacting variables rather than by their own historical trends alone.

However, the inherent complexity of industrial processes leads to evolving relationships among variables and changes in their statistical properties over time, necessitating the segmentation of time series into internally consistent intervals to accurately capture shifting statistical regimes [16]. As a point of fact, such data are rarely generated under homogeneous conditions because factors like equipment wear and environmental fluctuations often cause shifts in the underlying data distribution. This nonstationarity not only complicates forecasting and performance evaluation but also obscures the intrinsic dynamics of the data. Most existing segmentation methods, including model-based approaches [17,18] and data-driven [19,20,21] techniques, primarily focus on maximizing inter-segment differences such as shifts in mean, variance, or likelihood within individual variables [22]. These methods typically assess segmentation quality based on the statistical properties of the same variable before and after a change point, without considering the similarity or consistency among different variables within the same segment. Consequently, when applied to multiple related variables in industrial settings, such approaches can lead to misaligned segments that fail to capture the true operational regimes shared across variables. Consistent segmentation not only improves the interpretability of multivariate models but also enhances forecasting robustness by preserving the true relationships among variables within each segment.

Beyond effective segmentation, the accurate forecasting of industrial indicators requires a comprehensive understanding of both static and dynamic factors influencing system behavior [23]. Static variables, for instance, equipment specifications or ambient environmental settings, remain constant or change only slowly over time, whereas dynamic variables, including sensor readings and process outputs, fluctuate in real time. While dynamic factors are typically emphasized in MTSF, static factors can also significantly affect system behavior and forecasting accuracy. However, most existing methods tend to overlook static factors, limiting the ability to fully capture the mechanisms driving system performance, particularly in complex industrial environments where both types of variables interact.

Nevertheless, integrating static and dynamic data poses significant challenges due to their heterogeneous structures and temporal characteristics. Static variables are time-invariant, while dynamic variables are sequential, making direct fusion non-trivial. Moreover, complex interactions between static and dynamic features are easily neglected. In this study, features from static and dynamic data are extracted separately and subsequently concatenated, with a nonlinear predictor employed to capture their joint effects. This strategy enables a more effective integration of heterogeneous data sources and improves forecasting accuracy in complex industrial environments.

The demand for accurate prediction and evaluation has intensified the need to analyze relationships among multivariate data. Forecasting models have progressed from statistical methodologies to machine learning approaches, which are adept at capturing complex and nonlinear interactions in forecasting tasks. More recently, deep learning has further advanced multivariate time series modeling by providing enhanced flexibility and representational capacity [24]. Among these techniques, the transformer model [25] has achieved notable success in MTSF due to its ability to automatically learn dependencies among variables, making it a strong candidate for sequential forecasting tasks. Recent advances in transformer-based forecasting methods mainly include disentangled models [26,27,28,29], which separately model temporal components, and unified models [30,31,32,33], which capture global dependencies through attention mechanisms. Both have shown strong performance in multivariate time series forecasting.

A representative industrial application is the evaluation of radar detection performance indicators [34], which is crucial in both military and civilian scenarios. This evaluation is typically carried out through check-flight tests, where a target is set up on an aircraft and repeatedly flown several times. In these trials, the radar system is uniformly sampled to measure detection range, azimuth, and elevation angles, as illustrated in Figure 1. The range difference (dR), defined as the deviation between the measured and true ranges, directly reflects radar detection accuracy and forms a time series with pulses emitted at fixed intervals. As the aircraft traverses different flight zones or environmental conditions, the resulting data often exhibit nonstationarity, necessitating segmentation to ensure internal consistency. Moreover, dR is affected by both dynamic variables, such as radar cross-section, and static factors, like temperature, underscoring the need for modeling approaches that incorporate both variable types.

To advance hybrid multivariate forecasting for industrial data like radar detection data, this work focuses on improving correlation modeling with deep learning by addressing three primary challenges. First, most existing segmentation methods overlook the need for consistency across multiple variables, often resulting in misaligned segments that fail to reflect the true operational regimes. This reduces the interpretability and reliability of multivariate forecasting. Second, most deep learning methods primarily emphasize dynamic variables, often neglecting static factors including environmental conditions that are crucial in industrial applications, and face integrating challenges due to the integrating static and dynamic data of static and dynamic data. Third, while disentangled approaches are effective at isolating trends and periodic components, they struggle with complex relationships and long-range dependencies. Unified methods, however, lack the capacity to learn distinct temporal features.

To address these research gaps, this study proposes the Consistency-aware Hybrid Static–Dynamic Multivariate Forecasting Network (CHSDM-Net) for industrial time series forecasting, with radar detection serving as a representative application. The approach begins with consistency-aware dynamic segmentation across multiple variables, formulating the segmentation as an optimization problem. Subsequently, a hybrid network combining a Static Representation Module and Dynamic Temporal Disentanglement and Attention Module is employed as the forecasting model to capture the relationships between influencing factors and target indicators. The main contributions of this study are as follows:

Consistency-Aware Dynamic Segmentation: A novel optimization-based segmentation method is proposed to adaptively partition time series data while explicitly maintaining consistency across multiple related variables, minimizing redundancy while preserving essential information.
Hybrid Static–Dynamic Representation Network: The forecasting module integrates a dual-stream architecture to effectively extract complex features from both static and dynamic variables, and it fuses their representations through feature concatenation and nonlinear modeling.
Hierarchical Attention Module: The model fuses disentangled and unified approaches to capture both independent temporal patterns and cross-factor dependencies, thereby improving feature representation in complex multivariate scenarios.

The remainder of this paper is organized as follows: Section 2 reviews the literature related to multivariate time series forecasting. Section 3 introduces the problem formulation and relevant preliminaries. Section 4 details the proposed methodology, including the overall framework and its components. Section 5 presents experimental results validating the model and evaluating algorithmic efficiency. Section 6 discusses the limitations of the proposed method and outlines directions for future work. Finally, Section 7 concludes the paper.

2. Literature Review

Multivariate time series forecasting aims to predict future values based on historical data and relevant influencing factors, where the core objective is to uncover temporal dependencies, trends, and periodic patterns to improve predictive accuracy.

Traditional statistical forecasting methods represent the earliest approaches in time series prediction, employing mathematical models to capture the underlying statistical properties of sequences. Representative models include the Auto-Regressive (AR), Moving Average (MA), Auto-Regressive Moving Average (ARMA), and Auto-Regressive Integrated Moving Average (ARIMA) models [35,36]. Among these, the Seasonal Auto-Regressive Integrated Moving Average with eXogenous regressors (SARIMAX) model further extends ARIMA by incorporating both seasonal effects and external variables [37,38], thereby enhancing its capability to model complex real-world time series. Owing to their simplicity and interpretability, these models have established a solid theoretical foundation and remain widely used in practice.

With the advancements in data-driven techniques, machine learning-based forecasting methods have emerged, offering greater flexibility and adaptability to complex time series data. Unlike traditional statistical models, machine learning approaches do not require explicit assumptions about data distribution; instead, they learn patterns and extract features directly from historical data. Common methods include Support Vector Regression (SVR) [39], Decision Tree (DT) [40], and Random Forest (RF) [41], which are particularly effective for handling large-scale datasets and capturing intricate nonlinear relationships.

In recent years, the rapid development of deep learning has significantly advanced time series forecasting, particularly for tasks involving long-term dependencies, nonlinear patterns, and large-scale datasets. Deep learning approaches based on neural networks—such as Convolutional Neural Networks (CNNs) [42], Recurrent Neural Networks (RNNs) [43], and Transformer networks—have become widely adopted for modeling and predicting various types of time series data. CNNs excel at extracting local features and capturing high-level semantic information, making them suitable for data with local patterns or short-term dependencies [44]. RNNs are designed to model sequential data and can effectively capture long-term temporal dependencies and complex structures [45].

More recently, the transformer model has emerged as a breakthrough by leveraging attention mechanisms to model dependencies across all positions in a sequence. Its ability to handle long sequences, multi-scale information, and inherent parallelism makes it a highly promising choice for time series forecasting. In MTSF tasks, the trend and periodic components often exhibit different characteristics across variables. Specifically, trend components typically convey more local dependencies, where the values at one point in time are influenced by preceding values, whereas periodic components display more global dependencies [46]. Accordingly, transformer-based multivariate forecasting methods can be categorized into disentangled and unified models.

Disentangled methods decompose the input data into distinct features, which are then individually extracted and used for predictions. Some approaches perform disentanglement before feature extraction, while others integrate both processes. For example, Woo et al. [26] used contrastive learning to disentangle seasonal and trend representations, improving robustness to distribution shifts. Ye et al. [46] combined time and frequency domain analyses with adaptive weighting to balance trend and seasonal features. Yu et al. [27] applied learnable decomposition to handle inter-series dependencies and intra-series variations, effectively capturing trend and seasonal features independently. In contrast, Antoformer [28] integrated decomposition blocks within the transformer architecture to separate trends and seasonal components progressively, enabling refined predictions through built-in decomposition. FEDformer [29] advanced this approach by adding frequency-domain operations with Fourier and Wavelet transforms to capture both global properties and fine structures in time series data.

Other studies fall under unified approaches, mainly using attention mechanisms for forecasting. Liu et al. [30] introduced a pyramidal attention mechanism to capture multi-resolution temporal dependencies efficiently while lowering computational costs. Shabani et al. [31] iteratively refined time-series forecasts across multiple scales, enhancing scale awareness in the model. Challu et al. [32] utilized hierarchical interpolation and multirate data sampling to improve long-horizon forecasting performance. Chen et al. [33] employed adaptive multi-scale modeling with dual attention mechanisms to capture local details and global correlations, enhancing prediction accuracy by integrating temporal resolution and distance.

In summary, disentangled approaches enhance interpretability by modeling component-specific patterns but depend on accurate decomposition and may insufficiently capture inter-component interactions. On the other hand, unified models excel at learning global dependencies yet often overlook the distinct characteristics of trends and seasonality. Addressing these limitations, this paper introduces a novel hierarchical attention module that effectively integrates the advantages of both disentangled and unified models.

3. Problem Statement

This section provides a thorough description of the problem along with data details. The specific problem is as follows:

The forecasting problem is defined within a data space consisting of N datasets that combine static and dynamic factors. After preprocessing, outliers are removed, resulting in datasets of varying lengths. Each dataset is then aligned to a standardized length of each dataset to T. For the i-th dataset,

Y^{i} \in R^{1 \times T}

, where

i = 1, \dots, N

, and the aggregated structure is

Y \in R^{N \times 1 \times T}

.

Each dataset contains P static and Q dynamic influencing factors. Static data remains constant within each segment and is padded or repeated to match the length T. The complete feature matrix is represented as

X = (X_{s}, X_{d}) \in R^{N \times (P + Q) \times T}

. Before making predictions,

Y

is divided into K segments of unequal length, and the consistency of the data within each segment is calculated.

Given a look-back window M, the objective is to predict the target variable for the subsequent L time steps within each segment. The input to the forecasting model is represented as

X^{F} = (X_{s}^{F}, X_{d}^{F}) \in R^{(P + Q) \times M}

, and the output is the predicted values for the next L steps, denoted as

{\hat{Y}}^{F} \in R^{1 \times L}

. The forecasting process is governed by a mapping function

g (\cdot)

that models the relationship between the input data and target indicator, formally expressed as

{\hat{Y}}^{F} = g (X^{F})

.

4. Methodology

This section presents CHSDM-Net, a hybrid static–dynamic modeling framework for industrial time series forecasting with heterogeneous and multi-source data. The proposed framework applies consistency-aware segmentation to ensure aligned and meaningful partitioning across multiple variables, and it employs a dual-branch neural network to extract static and dynamic features independently. Dynamic variables are disentangled into trend and periodic components, and specialized attention mechanisms are introduced to capture both local and global dependencies. The fused representations are then used for accurate target prediction. The architecture and each component are detailed in the following subsections.

4.1. Consistency-Aware Dynamic Segmentation

To ensure that each segment is sampled from a consistent underlying distribution, a consistency-aware segmentation strategy is introduced for industrial time series. Specifically, the aim is to maximize consistency across N datasets, quantified using Grey Relational Analysis (GRA) [47,48]. To achieve this, the segmentation task is reformulated as an optimization problem, where each dataset

Y^{i}

is divided into K aligned segments, as illustrated in Figure 2. This alignment ensures that each segment contains corresponding subsequences from all datasets, enabling a meaningful assessment of consistency within each segment. By solving the optimization problem, we obtain globally consistent and interpretable segmentation results.

The segmentation objective is to maximize the overall consistency score S, calculated as the average GRA score across all dataset pairs. The objective function is as follows:

\begin{matrix} \max S = \frac{2}{N (N - 1)} \sum_{i = 1}^{N} \sum_{j = i + 1}^{N} G (Y^{i}, Y^{j}) \end{matrix}

(1)

\begin{matrix} G (Y^{i}, Y^{j}) = \frac{1}{K} \sum_{k = 1}^{K} G R A (Y_{k}^{i}, Y_{k}^{j}) \end{matrix}

(2)

\begin{matrix} G R A (Y_{k}^{i}, Y_{k}^{j}) = \frac{1}{T_{k}} \sum_{t = 1}^{T_{k}} \frac{Δ_{\min} + a Δ_{\max}}{Δ_{t} + a Δ_{\max}} \end{matrix}

(3)

\begin{matrix} Δ_{t} = | Y_{k}^{i} - Y_{k}^{j} | \end{matrix}

(4)

where

G (Y^{i}, Y^{j})

is the consistency between datasets

Y^{i}

and

Y^{j}

, and

Y_{k}^{i}, Y_{k}^{j} \in R^{1 \times T_{k}}

represent the k-th segment of the respective datasets;

T_{k}

is the length of the k-th segment; the GRA score of the same segment k is performed as

G R A (Y_{k}^{i}, Y_{k}^{j})

. In addition,

Δ_{t}

is the absolute difference between

Y_{k}^{i}

and

Y_{k}^{j}

at time point t;

Δ_{\min}

and

Δ_{\max}

are the minimum and maximum of all

Δ_{t}

values across time points within the segment;

α

is the distinguishing coefficient, typically set to

α = 0.5

, which controls the relative importance of the difference between

Δ_{\min}

and

Δ_{\max}

.

Figure 3 illustrates the concept of consistency. In panel (a), two consistent series exhibit similar trends and a concentrated distribution of absolute differences, resulting in a high GRA score (0.743). In contrast, panel (b) shows inconsistent series with divergent patterns and a dispersed distribution of absolute differences, yielding a lower GRA score (0.599). The left plots display the time series, while the right plots show the corresponding box plots of absolute differences.

Two key constraints are imposed to ensure a valid segmentation. Each segment must satisfy a length restriction where

T_{k}

lies within a specified range. Segments that are too short to capture noise instead of meaningful patterns result in unreliable analysis. Conversely, excessively long segments may overlook local variations, obscuring short-term trends and fluctuations.

\begin{matrix} T_{m i n} \leq T_{k} \leq T_{m a x}, \forall k = 1, 2, \dots, K \end{matrix}

(5)

where

T_{m i n}

and

T_{m a x}

denote the minimum and maximum allowable lengths of each segment, respectively.

Furthermore, the sum length of all segments must equal the total length of the dataset T, ensuring that the entire dataset is fully and evenly partitioned.

\begin{matrix} \sum_{k = 1}^{K} T_{k} = T \end{matrix}

(6)

A variety of optimization algorithms have been developed for segmentation in industrial time series analysis. Among them, genetic algorithms (GAs) are mature and widely adopted due to their effectiveness in exploring complex solution spaces [49]. In this study, a genetic algorithm is employed to search for the optimal segmentation points that maximize GRA-based consistency. The GA simulates natural selection and includes the key steps outlined in Algorithm 1. The GA proceeds as follows:

Algorithm 1 Genetic Algorithm for Dynamic Segmentation

1:: Initialize: Generate initial population $P_{0}$
2:: Evaluate: Calculate fitness for each individual in $P_{0}$
3:: Set: Best score $S_{best} = 0$
4:: for $g = 1$ to G do
5:: Select: Apply roulette-wheel selection to choose parents from $P_{t}$
6:: Crossover: Perform crossover and mutation to generate offspring $Q_{t}$
7:: Evaluate: Assess fitness for each individual in $Q_{t}$
8:: Update: Replace $P_{t}$ with $Q_{t}$
9:: Record: $S = \max (fitness (P_{t}))$
10:: if $S > S_{best}$ then
11:: $S_{best} = S$
12:: $B_{seg} = segment indices (P_{t})$
13:: end if
14:: end for
15:: Return: Best score $S_{best}$ and best segment indices $B_{seg}$

Initialization: Generate an initial population, where each individual encodes a candidate segmentation scheme.
Selection: Individuals are selected for reproduction based on their fitness, which is evaluated using GRA scores via a roulette-wheel strategy.
Crossover: Selected individuals are paired and crossed to exchange segmentation points, producing new offspring.
Mutation: With a defined probability, individuals undergo mutation, modifying segmentation points to explore new solutions.
Evaluation: The fitness of all individuals is assessed, and the population is iteratively refined over successive generations.

Following segmentation, the consistency between dataset pairs is evaluated. Datasets with a pairwise GRA score greater than or equal to 0.65 are deemed consistent and presumed to originate from similar underlying processes. All consistent datasets are then averaged at each time point as follows:

\begin{matrix} {\bar{Y}}_{cons} = \frac{1}{n} \sum_{i = 1}^{n} Y^{i}, \bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X^{i} \end{matrix}

(7)

where

n \leq N

is the number of consistent datasets,

{\bar{Y}}_{cons} \in R^{1 \times T}

is the average of consistent datasets,

\bar{X} = ({\bar{X}}_{s}, {\bar{X}}_{d}) \in R^{(P + Q) \times T}

represents the averaged contributory factors.

4.2. Hybrid Multivariate Forecasting Model

This section introduces the hybrid multivariate forecasting model depicted in Figure 4, which jointly extracts features from static and dynamic variables. The features are then subsequently integrated to forecast the target variable

{\bar{Y}}_{cons}

.

(1): Static Representation Module

In industrial KPI time series, each instance is often associated with a set of static attributes that remain unchanged throughout the observation period. These static factors, while invariant within a single sequence, can differ significantly across different sequences and have a profound impact on the baseline level and long-term behavior of the KPI. To address this, the Static Representation Module is introduced to explicitly encode instance-specific, time-invariant information. This enables the model to capture global contextual features and enhances generalization across diverse operational scenarios.

The extraction of static features utilizes local receptive fields and weight sharing to efficiently model intrinsic patterns and spatial dependencies among static variables. This approach distills informative representations from static inputs, thereby supporting more accurate downstream predictions. Given static data

{\bar{X}}_{s} \in R^{P \times T}

, where P denotes the number of static variables observed over T time steps, a rolling window approach [50] is adopted. At each step, a look-back window of M time points

X_{s}^{F} \in R^{P \times M}

is used as input to predict the subsequent L steps.

A convolution filter

W_{c} \in R^{k \times l}

is applied to extract local features, where

k \times l

represents the size of the kernel. The convolution operation computes the weighted sum within each local region, followed by a nonlinear activation function:

\begin{matrix} Z (i, j) = σ (\sum_{p = 0}^{k - 1} \sum_{q = 0}^{l - 1} W_{c} (p, q) \cdot X_{s}^{F} (i + p, j + q) + b_{c}) \end{matrix}

(8)

where

Z (i, j)

is the post-convolution activation,

b_{c}

is the bias term, and

σ (\cdot)

is the activation function.

Following the convolution, a max-pooling operation is applied to the activation values

Z (i, j)

to reduce the dimensionality of the feature map:

\begin{matrix} {\hat{X}}_{s}^{F} (i, j) = \max_{p, q} Z (i + p, j + q) \end{matrix}

(9)

This operation selects the maximum value within each local region, effectively downsampling the feature map while retaining the most salient information.

These features

{\bar{X}}_{s}

encapsulate time-invariant characteristics of the static variables for enhancing prediction accuracy. These static features are subsequently integrated with dynamic representations in the downstream forecast tasks.

(2): Dynamic Temporal Disentanglement and Attention Module

Beyond static influences, industrial KPI time series exhibit rich and complex temporal dynamics, including underlying trends, periodic fluctuations from operational cycles, and abrupt changes triggered by external events or anomalies. The direct modeling of the raw series may obscure these distinct temporal patterns, making it difficult to effectively capture both long-term evolutions and short-term fluctuations. Given that trend and periodicity often coexist and interact in such data, the Dynamic Temporal Disentanglement and Attention Module is designed to first decompose the observed time series into trend and periodic components, and then apply attention mechanisms to selectively emphasize informative temporal patterns. This explicit disentanglement aligns feature extraction with the intrinsic structure of industrial time series, reduces mutual interference between temporal patterns, and provides a solid foundation for targeted modeling, thereby enhancing both predictive accuracy and interpretability.

For dynamic data

{\bar{X}}_{d} \in R^{Q \times T}

, a look-back window

X_{d}^{F} \in R^{Q \times M}

is first extracted and normalized using Reversible Normalization (RevIN) [51], yielding

{\dot{X}}_{d}^{F} \in R^{Q \times M}

.

To disentangle temporal patterns, the normalized data is decomposed into trend (

X_{tr}

) and periodic

X_{pe}

components via Fast Fourier Transform (FFT). By transforming the time series into the frequency domain, low-frequency components corresponding to long-term trends and high-frequency components representing periodic patterns can be effectively separated, enabling the model to process these features independently and, thus, enhance predictive performance:

\begin{matrix} X (f) = F {{\dot{X}}_{d}^{F}} \end{matrix}

(10)

where

X (f)

denotes the frequency spectrum of the input time series

{\dot{X}}_{d}^{F}

, f denotes the frequency, and

F

denotes the Fast Fourier Transform.

To simultaneously extract both the low-frequency trend and high-frequency periodic components, low-pass and high-pass filters are applied, followed by an inverse Fourier Transform to return the data to the time domain:

\begin{matrix} X_{tr} = F^{- 1} {X (f) \cdot 1 (| f | < λ)} \end{matrix}

(11)

\begin{matrix} X_{pe} = F^{- 1} {X (f) \cdot 1 (| f | \geq λ)} \end{matrix}

(12)

where

λ

is the frequency threshold, and

1 (\cdot)

is the indicator function selecting the appropriate frequency components;

X_{tr}, X_{pe} \in R^{Q \times M}

present trend and periodic components, which are added up to reconstruct the original input signal

{\dot{X}}_{d}^{F}

.

After disentanglement, a hierarchical attention mechanism is employed to capture dependencies both within and across the separated trends and periodic components. This mechanism consists of two attention modules: a Trend-wise Attention block and a Periodic-aware Attention block, each designed to model distinct temporal characteristics [52].

The Trend-wise Attention block operates on the trend component

X_{tr}

, focusing on modeling long-term temporal dependencies within each dynamic variable. For each dynamic factor q, the trend features are embedded in a new feature space

X_{tr}^{q} \in R^{1 \times L \times d}

, where d is the embedding dimension. These embeddings are then linearly projected to obtain the query

Q_{tr}

, key

K_{tr}

, and value

V_{tr}

representations:

\begin{matrix} Q_{tr}^{q} = W^{Q} \cdot X_{tr}^{q}, K_{tr}^{q} = W^{K} \cdot X_{tr}^{q}, V_{tr}^{q} = W^{V} \cdot X_{tr}^{q} \end{matrix}

(13)

where matrices

W^{Q}, W^{K}, W^{V} \in R^{d \times d}

are learnable parameters representing the projection weights,

Q_{tr}^{q} \in R^{L \times d}

represents the position embedding, and

K_{tr}^{q}, V_{tr}^{q} \in R^{L \times d}

correspond to the value embeddings for the key and value components, respectively.

The Trend-wise Attention block computes the attention values using the following attention mechanism:

\begin{matrix} {Attn}_{tr}^{q} = S o f t m a x (\frac{Q_{tr}^{q} {(K_{tr}^{q})}^{⊤}}{\sqrt{d}}) V_{tr}^{q} \in R^{L \times d} \end{matrix}

(14)

where

S o f t m a x (\cdot)

is the normalization process of the weighted sum.

The resulting feature vectors are then concatenated across segments for all dynamic factors:

\begin{matrix} {Attn}_{tr} = C o n c a t ({Attn}_{tr}^{1}, \dots, {Attn}_{tr}^{Q}) \in R^{Q \times L \times d} \end{matrix}

(15)

The Periodic-aware Attention block focuses on the periodic component

X_{pe}

, capturing global periodic dependencies across different dynamic variables. Analogous to the trend branch, periodic features are projected into an embedding space and transformed into query

Q_{pe}

, key

K_{pe}

, and value

V_{pe}

representations:

\begin{matrix} Q_{pe} = W^{Q} \cdot X_{pe}, K_{pe} = W^{K} \cdot X_{pe}, V_{pe} = W^{V} \cdot X_{pe} \end{matrix}

(16)

where matrices

W^{Q}, W^{K}, W^{V} \in R^{d^{'} \times d^{'}}

represent the learnable projection weights for the periodic component, transforming the input periodic features into query, key, and value representations with

Q_{pe}, K_{pe}, V_{pe} \in R^{Q \times d^{'}}

and

d^{'} = L \times d

.

The periodic attention is computed as follows:

\begin{matrix} {Attn}_{pe} = S o f t m a x (\frac{Q_{pe} {(K_{pe})}^{⊤}}{\sqrt{d}}) V_{pe} \in R^{Q \times d^{'}} \end{matrix}

(17)

Subsequently, the output features from both Trend-wise and Periodicity-aware Attention branches are fused and subjected to inverse normalization:

\begin{matrix} {\hat{X}}_{d}^{F} = I n v e r s e N o r m ({Attn}_{tr} + {Attn}_{pe}) \end{matrix}

(18)

The fused dynamic feature representation

{\hat{X}}_{d}^{F} \in R^{Q \times L}

effectively captures both local and global temporal dependencies, allowing the model to account for local trends and global periodic fluctuations in the data. The inverse normalization step ensures that the fused features return to their original scale, thereby supporting accurate downstream predictions.

Finally, the extracted static

{\hat{X}}_{s}^{F}

and dynamic features

{\hat{X}}_{d}^{F}

are concatenated to form a unified feature vector.

\begin{matrix} {\hat{X}}^{F} = C o n c a t ({\hat{X}}_{s}^{F}, {\hat{X}}_{d}^{F}) \in R^{(P + Q) \times L} \end{matrix}

(19)

This unified representation comprehensively encodes both static and temporal characteristics, facilitating robust modeling of complex time series patterns.

(3): Predictor

Inspired by TiDE [53], the proposed method integrates static features, dynamic features, and look-back windows of the target variable to enhance forecasting performance. Prior work [54] has shown that linear mapping outperforms many complex networks across various scenarios. Accordingly, a linear projection is adopted to transform the extracted features into a lower-dimensional space:

\begin{matrix} \hat{Z} = W_{proj} \cdot {\hat{X}}^{F} + b_{proj} \in R^{1 \times L} \end{matrix}

(20)

where

W_{proj} \in R^{1 \times (P + Q)}

and

b_{proj} \in R^{1 \times L}

are the learned projection weights and biases, respectively. This step enables the model to map combined features into a space where static context and temporal dependencies are better utilized for forecasting.

In parallel, the true observations from the look-back window

Y_{lookback} \in R^{1 \times M}

are projected through a residual connection incorporating a linear-mapping layer, thereby preserving critical historical information of the target variable:

\begin{matrix} {\hat{Y}}_{residual} = R e s i d u a l (Y_{lookback}) \in R^{1 \times L} \end{matrix}

(21)

Finally, the projected features

\hat{Z}

and residual mapped look-back window

{\hat{Y}}_{residual}

are added to form the final prediction.

\begin{matrix} {\hat{Y}}^{F} = \hat{Z} + {\hat{Y}}_{residual} \in R^{1 \times L} \end{matrix}

(22)

The model is trained by minimizing the mean squared error (MSE) loss, which penalizes larger prediction errors:

\begin{matrix} MSE = \frac{1}{L} \sum_{i = 1}^{L} {(y_{i} - {\hat{y}}_{i})}^{2} \end{matrix}

(23)

where L denotes the number of predicted points,

y_{i}

denotes the true values, and

{\hat{y}}_{i}

denotes the model predictions.

The entire prediction network is optimized using the Adam optimizer to minimize the MSE loss.

4.3. Running Flow of Proposed Method

The overall workflow of the proposed CHSDM-Net for industrial time series forecasting is depicted in Figure 5. The methodology comprises three main stages: data acquisition and processing, model construction and training, and model evaluation and result analysis.

Data Acquisition and Processing: Taking radar data as an example, multivariate time series are collected, including six influencing factors along with dR data. The raw datasets are then preprocessed and aligned to ensure consistency across all variables, followed by a consistency-aware segmentation process aimed at maximizing the consistency score. This leads to the partitioning of data into training, validation, and test sets for subsequent modeling.
Hybrid Static–Dynamic Network Construction: The segmented data are processed by CHSDM-Net, which consists of parallel branches for static and dynamic feature extraction. Static factors are encoded through a representation module, while dynamic factors undergo instance normalization, temporal disentanglement, and hierarchical attention mechanisms to capture local and global dependencies. The resulting features are fused and passed to the prediction module for forecasting the target indicator.
Model Evaluation and Analysis: The performance of the proposed model is comprehensively assessed through segmentation analysis, comparative experiments, ablation studies, and sensitivity analysis, thereby validating its effectiveness and robustness.

5. Experiments and Results

In this section, two representative examples across a total of six datasets are presented to demonstrate the accuracy and efficiency of the proposed algorithm. The first case is a numerical case, which utilizes a single dataset to quantitatively assess the forecasting performance of the model. The second case is an application-oriented example involving five distinct datasets, designed to further illustrate the robustness of the proposed approach across various real-world scenarios, which can be extended to radar systems and other complex industrial problems.

5.1. Experimental Setup

5.1.1. Dataset Description

1.: Numerical case

A mathematical example involving three datasets is constructed to theoretically verify the proposed model. The static data are generated using four distinct factors, each remaining constant throughout the time series while varying across different instances:

$X_{s_{1}}^{i}$ : Randomly assigned a value between 1 and 5.
$X_{s_{2}}^{i}$ : Randomly selected from the range of 15 to 30.
$X_{s_{3}}^{i}$ : Randomly chosen from 1 to 4.
$X_{s_{4}}^{i}$ : Randomly chosen from 1 to 4.

The dynamic data is generated manually by combining the trends

X_{d_{1}, t r}^{i} \in R^{1 \times T}

, periodic components

X_{d_{1}, p e}^{i} \in R^{1 \times T}

, and random noise

X_{d_{1}, r n}^{i} \in R^{1 \times T}

. The equations used to construct

X_{d_{1}}^{i} \in R^{1 \times T}

are as follows:

\begin{matrix} X_{d_{1}}^{i} = X_{d_{1}, t r}^{i} + X_{d_{1}, p e}^{i} + X_{d_{1}, r n}^{i} \end{matrix}

(24)

The trend component evolves into three distinct phases.

\begin{matrix} X_{d_{1}, t r}^{i} = \{\begin{matrix} a_{1}^{i} + b_{1}^{i} \cdot (1 - t_{1}^{2}) + ϵ_{1}^{i}, & t_{1} \leq 4156 \\ a_{2}^{i} \cdot t_{2} + b_{2}^{i} \cdot \sin (t_{2}) + ϵ_{2}^{i}, & 4156 < t_{2} \leq 6385 \\ a_{3}^{i} \cdot \exp (t_{3} - 4) + ϵ_{3}^{i}, & t_{3} > 6385 \end{matrix} \end{matrix}

(25)

where

t_{1}

,

t_{2}

, and

t_{3}

represent different time intervals and the trend evolves smoothly across these phases. The parameters

a_{e}^{i}

(

e = 1, 2, 3

) and

b_{f}^{i}

(

f = 1, 2

) are constants of the i-th dataset that control the shape of the trend, and

ϵ_{1}^{i}, ϵ_{2}^{i}, ϵ_{3}^{i}

represent random noise terms of the i-th dataset drawn from standard normal distributions.

The periodic component

X_{d_{1}, p e}^{i}

and random noise component

X_{d_{1}, r n}^{i}

are defined as follows:

\begin{matrix} X_{d_{1}, p e}^{i} = a_{4}^{i} \cdot \sin (\frac{2 π \cdot t}{b_{3}^{i}}) \end{matrix}

(26)

\begin{matrix} X_{d_{1}, r n}^{i} = a_{5}^{i} \cdot z - b_{4}^{i} \end{matrix}

(27)

where

a_{4}^{i}

and

b_{3}^{i}

are constants that influence the time series,

a_{5}^{i}

scales the magnitude of the random noise, z represents the noise drawn from a standard normal distribution

N (0, 1)

, and

b_{4}^{i}

is a constant used to shift the noise.

Another time series

X_{d_{2}}^{i} \in R^{1 \times T}

is similarly constructed by combining trend

X_{d_{2}, t r}^{i} \in R^{1 \times T}

, periodicity

X_{d_{2}, p e}^{i} \in R^{1 \times T}

, and random noise

X_{d_{2}, r n}^{i} \in R^{1 \times T}

:

\begin{matrix} X_{d_{2}}^{i} = X_{d_{2}, t r}^{i} + X_{d_{2}, p e}^{i} + X_{d_{2}, r n}^{i} \end{matrix}

(28)

The three components are calculated as follows:

\begin{matrix} X_{d_{2}, t r}^{i} & = a_{6}^{i} \cdot (a_{7}^{i} \cdot \sqrt{| X_{d_{1}}^{i} |} + a_{8}^{i} \cdot \log (| X_{d_{1}}^{i} | + b_{5}^{i})) + \\ (1 - a_{6}^{i}) \cdot (a_{9}^{i} \cdot \sqrt{t} + a_{10}^{i} \cdot \log (t + b_{6}^{i}) + b_{7}^{i}) \end{matrix}

(29)

\begin{matrix} X_{d_{2}, p e}^{i} = a_{11}^{i} \cdot \cos (\frac{π \cdot t}{b_{8}^{i}}) \end{matrix}

(30)

\begin{matrix} X_{d_{2}, r n}^{i} = n_{1} + a_{12}^{i} \cdot n_{2} \end{matrix}

(31)

\begin{matrix} P (c, d) = \exp (- \frac{{| c - d |}^{2}}{2 r^{2}}) \end{matrix}

(32)

where t is the time index for the periodicity and trend components;

a_{e}^{i}

(for

e = 6, \dots, 12

),

b_{f}^{i}

(for

f = 5, \dots, 8

) are constants of the i-th dataset that control and influence the components;

n_{1}

is the independent Gaussian noise

N (0, 3)

with mean 0 and variance 3;

n_{2}

is the Gaussian Process noise

GP (0, P)

, where 0 is the mean and P is the covariance function;

P (c, d)

is the covariance function (kernel), and r is the length scale controlling the smoothness of the process.

To generate the first dataset

Y^{1} \in R^{1 \times T}

, the time series is partitioned into four consecutive segments, with each segment assigned a distinct set of parameters. Within each segment, the output is computed using the following formulation:

\begin{matrix} Y^{1} & = w_{0} \cdot X_{d_{1}}^{1} \cdot X_{s_{1}}^{1} + w_{1} \cdot (35 - X_{s_{2}}^{1} + {(X_{s_{3}}^{1})}^{2}) \\ + w_{2} \cdot X_{d_{2}}^{1} \cdot X_{s_{4}}^{1} - 20 + n_{3} \end{matrix}

(33)

where

w_{0}, w_{1}, w_{2}

are the weights used in each segment and

n_{3}

are Gaussian noise terms drawn from normal distributions. The piecewise assignment of weights enables the output series to exhibit varying dynamics across different temporal intervals, thereby increasing the complexity and realism of the synthetic data.

The second and third datasets are generated by linearly scaling the first one. Scaling is applied exclusively to the output

Y^{1}

. The second dataset

Y^{2} \in R^{1 \times T}

and the third dataset

Y^{3} \in R^{1 \times T}

are generated:

\begin{matrix} Y^{2} = a_{13} \cdot Y^{1} + b_{13} + n_{4} \end{matrix}

(34)

\begin{matrix} Y^{3} = a_{14} \cdot Y^{1} + b (t) + n_{5} \end{matrix}

(35)

where the constants

a_{13}

and

a_{14}

are scaling factors applied to

Y^{1}

, while

b_{13}

is a constant offset added to

Y^{2}

, and

b (t)

is a time-dependent offset function applied to

Y^{3}

, introducing a dynamic shift;

n_{4}, n_{5}

are Gaussian noise terms,

N (0, c_{1})

and

N (0, c_{2})

, respectively, representing random noise with mean 0 and variances

c_{1}

and

c_{2}

.

Due to the synthetic nature of the data, preprocessing requirements are minimal. All variables are generated within controlled ranges to ensure consistency and validity. Furthermore, as all three datasets share identical lengths

T = 9913

, there is no need for additional alignment or normalization procedures. This uniformity facilitates direct comparison and model evaluation across datasets.

2.: Illustrative case

This part presents an illustrative example based on radar measurement data, where the objective is to predict the range difference

Y

using both static and dynamic influencing factors. In practical flight test scenarios, radar systems are typically evaluated through multiple flight missions under varying environmental and operational conditions. These tests generate diverse datasets that are well-suited for validating the effectiveness and robustness of the proposed forecasting model.

For each dataset, the static factors are denoted as

X_{s}^{i} \in R^{P \times T}

, where

P = 4

. These static factors remain constant over time and can be either qualitative or quantitative. The dynamic factors are represented as

X_{d}^{i} \in R^{Q \times T}

, where

Q = 2

corresponds to radar cross-section (RCS) and signal-to-noise ratio (SNR), both varying over time. Specifically, for the i-th dataset,

$X_{s_{1}}^{i}, X_{s_{2}}^{i}, X_{s_{3}}^{i}, X_{s_{4}}^{i}$ represent filtering, temperature, wind level, and flight path data, respectively;
$X_{d_{1}}^{i}$ represents RCS over the time period T;
$X_{d_{2}}^{i}$ represents SNR over the same time period.

In this illustrative case, five datasets are constructed and labeled as Radar 1 through Radar 5, each corresponding to a specific flight mission or experimental scenario. These datasets comprehensively reflect various radar operating environments and provide a robust basis for evaluating the predictive performance of the proposed method.

To further elucidate the characteristics of the data, Table 1 summarizes the sample sizes collected from the five radars across four datasets, as well as the remaining sample sizes after preprocessing and alignment procedures. The preprocessing steps include data cleaning, outlier removal, and temporal alignment to ensure data quality and consistency across all datasets.

As shown in Table 1, the preprocessing procedures result in a reduction of sample sizes, primarily due to the removal of incomplete or anomalous records. The final aligned datasets ensure high data integrity and comparability, forming a reliable basis for subsequent model training and evaluation.

Figure 6 presents boxplots of the range differences (

d R

) for the four datasets collected from Radar 2. The distributions are generally centered around zero, indicating that most range measurements show only minor deviations from the expected values. Across all datasets, the median and interquartile ranges remain consistent, highlighting the stability of Radar 2 under varying conditions. Outliers are present at both ends of the distributions, reflecting occasional measurement deviations likely due to environmental or operational factors.

5.1.2. Evaluation Metrics

To quantitatively evaluate the forecasting performance of all methods, four widely used metrics are adopted: mean absolute error (MAE), root mean squared error (RMSE), the coefficient of determination (

R^{2}

), and Mean Absolute Scaled Error (MASE).

MAE measures the average magnitude of absolute errors between predicted values and the ground truth, providing an intuitive assessment of prediction accuracy. Its calculation is given by

\begin{matrix} MAE = \frac{1}{L} \sum_{i = 1}^{L} |y_{i} - {\hat{y}}_{i}| \end{matrix}

(36)

where L denotes the total number of samples,

y_{i}

is the true value, and

{\hat{y}}_{i}

is the predicted value.

RMSE evaluates the square root of the average squared differences between predictions and actual values, and it is more sensitive to large errors, which helps highlight significant deviations. The formula for RMSE is

\begin{matrix} RMSE = \sqrt{\frac{1}{L} \sum_{i = 1}^{L} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(37)

The coefficient of determination (

R^{2}

) reflects the proportion of variance in the ground truth values that can be explained by the model’s predictions, with a value closer to 1 indicating better predictive performance. It is calculated as follows:

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{L} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{L} {(y_{i} - \bar{y})}^{2}} \end{matrix}

(38)

where

\bar{y}

is the mean of the true values.

To further enhance the robustness and comparability of the evaluation, the MASE is also employed. It is a scale-independent metric that normalizes the prediction error relative to the in-sample mean absolute error of a naive one-step forecast, thus enabling fair comparison across different datasets and models. A lower MASE value indicates better predictive accuracy. The calculation is as follows:

\begin{matrix} MASE = \frac{\frac{1}{L} \sum_{i = 1}^{L} |y_{i} - {\hat{y}}_{i}|}{\frac{1}{L - 1} \sum_{i = 2}^{L} |y_{i} - y_{i - 1}|} \end{matrix}

(39)

where the denominator represents the mean absolute error of the one-step naive forecast. Together, these metrics provide a comprehensive and robust evaluation of model accuracy and reliability.

5.1.3. Baselines

To comprehensively evaluate the effectiveness of the proposed method, a comprehensive set of state-of-the-art time series forecasting models are selected as baselines. The following eight methods are included in the comparison:

Autoformer [28]: Autoformer introduces a decomposition architecture combined with an auto-correlation mechanism, which enables the model to effectively capture long-term dependencies and periodic patterns in time series data.
FEDformer [29]: By incorporating frequency-enhanced decomposition and seasonal-trend separation, FEDformer enhances both the accuracy and efficiency of long-term forecasting with transformer-based models.
Pyraformer [30]: Leveraging a pyramidal attention structure, Pyraformer is designed to efficiently model long-range dependencies while significantly reducing computational complexity.
N-HiTS [32]: The N-HiTS model employs neural hierarchical interpolation, effectively utilizing multi-resolution representations to improve forecasting performance on complex and diverse time series.
PatchTST [55]: PatchTST segments time series into patches to serve as input tokens and applies channel-independent transformers, resulting in efficient and accurate long-term multivariate forecasting.
DLinear [56]: As a lightweight linear model, DLinear decomposes time series into trend and seasonal components, offering both strong predictive performance and computational efficiency.
Pathformer [33]: Pathformer stands out by introducing a multi-scale transformer architecture with adaptive pathway selection, which allows the model to capture both local and global temporal dependencies.
TiDE [53]: TiDE adopts a dense encoder structure and incorporates advanced feature extraction and aggregation strategies, thereby achieving robust results in long-term time series forecasting tasks.
SARIMAX [37]: An extension of ARIMA that incorporates seasonal effects and exogenous variables, providing an interpretable linear baseline for time series with complex seasonality and external influences.

5.1.4. Implementation Details

To ensure reproducibility, the random seed is fixed at 42 throughout all experiments. The Adam optimizer is employed for model training. During training, the best model is selected based on performance on the validation set using cross-validation. The L1 loss function is adopted as the training objective. Grid search is utilized to select the optimal hyperparameters, with the search ranges for all hyperparameters summarized in Table 2. All experiments are implemented using the PyTorch framework (version 1.11.0).

5.2. Overall Performance

5.2.1. Segmentation Results

Dynamic segmentation is applied to the dataset to maximize internal consistency within each segment. The number of segments is determined automatically during optimization, rather than being predefined.

For the numeric dataset, constraints for minimum and maximum segment lengths are set with

T_{m i n} = 2000

and

T_{m i n} = 2800

. The original and optimized segmentation results are presented in Table 3. The optimized segmentation closely matches the original results, indicating the effectiveness of the approach. However, Segment 3 remains inconsistent, with a score below 0.65 even after optimization, highlighting some lingering variability.

Table 4 compares the GRA scores for the fourth segment before and after GA optimization. The minimal differences between the optimized and original segment boundaries highlight the flexibility of the dynamic segmentation process while still adhering to the set constraints. These results also reveal that the third dataset is less consistent with the others, as indicated by its lower GRA score. Figure 7 further illustrates this inconsistency, showing that

Y^{3}

fluctuates more than

Y^{1}

and

Y^{2}

in the fourth segment.

For radar datasets, the segmentation of data is constrained by setting the minimum segment length

T_{\min} = 300

and the maximum segment length

T_{\max}

= 1200. These constraints are chosen to ensure that each segment captures sufficient data for meaningful analysis while avoiding overly long segments that might obscure local variations in the data. As illustrated in Figure 8, the segmentation results and consistency scores for the five radar datasets are presented. The detailed results show that inconsistent segments exhibit noticeable discrepancies in the data from certain flights compared to others.

For those flights identified as consistent, the data are aggregated, and the mean value at each time point within the segment is calculated. This process yields an average trajectory that represents the typical behavior of the data in the consistent segments, providing a clearer understanding of the common patterns while minimizing the influence of outlier flights.

5.2.2. Main Results

Table 5 provides a comprehensive performance comparison between CHSDM-Net and several state-of-the-art time series forecasting models across multiple datasets. The evaluation metrics include MAE, RMSE,

R^{2}

, and MASE, where lower MAE, RMSE, and MASE values, as well as higher

R^{2}

, indicate better predictive performance.

The results demonstrate that CHSDM-Net consistently achieves the lowest MAE, RMSE, and MASE values on most datasets, indicating superior prediction accuracy. For instance, on the Numeric dataset, CHSDM-Net achieves an MAE of 0.0568, an RMSE of 0.1155, and an MASE of 0.0125, outperforming all baseline models. Similar trends are observed on challenging radar datasets, where CHSDM-Net maintains top performance, reflecting strong robustness and generalization.

In terms of

R^{2}

, CHSDM-Net frequently ranks first or second, further confirming its effectiveness in capturing underlying data patterns. While some competing models, such as N-HiTS and PatchTST, occasionally yield comparable results on specific metrics, CHSDM-Net generally provides the most balanced and reliable performance. Notably, the traditional linear model SARIMAX performs significantly worse than deep learning-based methods across all metrics, particularly on real-world datasets, underscoring the advantages of advanced models in handling complex temporal dynamics. Overall, these findings highlight the effectiveness of CHSDM-Net for time series forecasting, making it a robust choice for practical applications.

To further illustrate the effectiveness of CHSDM-Net, Figure 9 and Figure 10 visualize the absolute prediction errors on a subset of the test set for the Numeric and Radar1 datasets, respectively, comparing CHSDM-Net with several representative baseline models (FEDformer, Pyraformer, PatchTST, and DLinear).

As illustrated in Figure 9, CHSDM-Net consistently achieves the lowest absolute errors throughout the test sequence on the Numeric dataset. Competing models such as FEDformer, Pyraformer, and PatchTST display higher error curves, while DLinear shows significant fluctuations and frequent error spikes. These results highlight the strong stability and robustness of CHSDM-Net, particularly its ability to maintain low error levels even under challenging or volatile conditions.

A similar trend is observed in Figure 10 for the Radar1 dataset. Here, the advantage of CHSDM-Net is even more evident: while models like FEDformer, PatchTST, and DLinear exhibit frequent and pronounced error peaks, CHSDM-Net maintains consistently lower and more stable error curves. This indicates CHSDM-Net’s superior capability in modeling complex temporal dynamics inherent in radar data, resulting in more reliable and accurate forecasts.

In summary, the visualized error comparisons confirm that CHSDM-Net not only reduces average prediction errors but also delivers greater stability across diverse time series datasets. These findings reinforce the quantitative results and underscore the practical value of CHSDM-Net for real-world forecasting tasks.

5.3. Ablation Study

To thoroughly assess the contribution of each component in CHSDM-Net, we conducted a series of ablation experiments using seven distinct variants, each denoted by a concise label for clarity:

w/o Seg: The model is trained and evaluated on the dataset without performing the optimized segmentation process.
w/o Cons: Instead of using only consistent segmented datasets, all available data—including inconsistent samples—are merged for model training and evaluation.
w/o Sta: The Static Representation Module is removed to assess its contribution to overall performance.
w/o Dyn: The Dynamic Temporal Disentanglement and Attention Module is omitted, and only the static branch is retained.
w/o Tr: The Trend-wise Attention block within the dynamic module is removed, while other components remain unchanged.
w/o Pe: The Periodic-aware Attention block within the dynamic module is removed, while other components remain unchanged.

Figure 11 summarizes the ablation results across various datasets. The full CHSDM-Net consistently outperforms all ablated variants in every dataset and metric, highlighting the synergistic effect of its modules. Notably, omitting the segmentation strategy (w/o Seg) leads to a significant increase in MAE, RMSE, and MASE, underscoring the critical role of effective data partitioning in reducing distributional shifts. Similarly, removing the consistency filtering (w/o Cons) substantially degrades performance, emphasizing the importance of high-quality, consistent training data for robust model learning.

Both the static and dynamic branches are shown to be indispensable: eliminating either (w/o Sta, w/o Dyn) results in marked drops in accuracy, indicating that static and dynamic representations provide complementary information crucial for modeling complex temporal patterns. Furthermore, the trend-wise and periodic attention mechanisms within the dynamic branch are vital; their removal (w/o Tr, w/o Pe) leads to noticeable declines in predictive accuracy, demonstrating their effectiveness in capturing long-term and cyclical structures.

In summary, the ablation results validate the necessity and effectiveness of each component within CHSDM-Net. The superior performance of the complete model demonstrates that our architecture is well-suited to the challenges of diverse time series datasets, providing strong evidence for the soundness of our methodological choices. Extensive ablation studies further highlight that explicitly modeling static and dynamic components is crucial for robust and accurate predictions. In particular, the Dynamic Temporal Disentanglement and Attention Module consistently reduces prediction errors and enhances scale-invariant accuracy across all datasets, as shown by notable increases in MAE, RMSE, and MASE when this module or its subcomponents are removed. These findings not only confirm the theoretical rationale but also demonstrate the practical effectiveness and robustness of our modular approach for a wide range of real-world time series tasks.

5.4. Comparative Analysis of Segmentation and Decomposition Methods

To rigorously assess the advantages of the optimization-driven segmentation approach over existing statistical and clustering-based methods, as well as classical decomposition techniques, comprehensive experiments were conducted on the numeric dataset, which contains four ground-truth change points.

1.: Segmentation Methods Comparison

The Pruned Exact Linear Time (PELT) algorithm [17] is a widely used statistical segmentation method based on optimizing a cost function with a penalty for the number of change points. However, PELT is sensitive to the choice of penalty parameter, and in the present experiment, it failed to detect any change points, resulting in under-segmentation. DTW-Kmeans [19], a clustering-based approach that combines Dynamic Time Warping (DTW) distance with K-means clustering, tends to over-segment the series due to its sensitivity to initial cluster number settings and a lack of guidance from data consistency. In this experiment, DTW-Kmeans detects far more segments than the ground truth.

Both PELT and DTW-Kmeans yielded segmentation results that deviated significantly from the true number of segments in the numeric dataset. Specifically, PELT failed to detect any change points, resulting in only a single segment for the entire series, which led to severe under-segmentation. In contrast, DTW-Kmeans detected 11 segments, substantially exceeding the actual number of segments (four) and, thus, resulting in over-segmentation. These outcomes highlight the limitations of both methods in terms of consistency with the data structure and interpretability. In comparison, the proposed consistent-aware segmentation method accurately identified change points, yielding four segments that precisely matched the ground truth, as shown in Table 3.

2.: Decomposition Methods Comparison

For further validating the effectiveness of the proposed approach, the impact of different decomposition methods is assessed indirectly by comparing the forecasting accuracy of downstream models. Since the quality of decomposition cannot be directly measured, the decoupling module in the forecasting pipeline was replaced with alternative methods, and their effects were evaluated based on the final prediction performance. Specifically, Seasonal-Trend decomposition using Loess (STL) is a classical statistical technique that separates a time series into seasonal, trend, and residual components via locally weighted regression [57], while Moving Average is a widely used method that extracts the trend by smoothing the series with a fixed-size window [58]. Both STL and Moving Average are compared against the FFT-based decoupling adopted in CHSDM-Net on the numeric dataset. Table 6 presents the quantitative results in terms of MAE, RMSE,

R^{2}

, and MASE.

The results indicate that CHSDM-Net, utilizing FFT for decoupling, achieved the best forecasting performance across all metrics. In contrast, STL and Moving Average resulted in higher MAE and RMSE values and lower

R^{2}

values, indicating inferior predictive accuracy. These findings demonstrate that FFT-based decomposition is more effective in capturing underlying patterns relevant for downstream forecasting tasks.

As a result, the optimization-driven segmentation and FFT-based decoupling not only produce segmentation results consistent with the intrinsic structure of the data, but also significantly improve forecasting accuracy compared to classical statistical, clustering-based segmentation and decomposition techniques, thereby confirming the superiority of the proposed method in both interpretability and predictive performance.

5.5. Sensitivity Analysis

To evaluate the robustness and stability of the proposed model, sensitivity analyses were conducted on two key hyperparameters, the number of training epochs and the input sequence length, using the numerical dataset. The performance was assessed using three metrics: MAE, RMSE, and

R^{2}

.

Epoch Sensitivity Analysis

Figure 12 presents the model performance with respect to the number of training epochs. As shown in the figure, both MAE and RMSE decrease rapidly during the initial training phase and stabilize after approximately 20 epochs, indicating effective convergence of the model. The

R^{2}

score increases sharply at first and then plateaus, reaching a value close to 1, which suggests that the model quickly captures the underlying data patterns and maintains stable predictive performance with further training. These results demonstrate that the model is not sensitive to the number of epochs beyond a certain threshold, and early stopping can be safely applied to avoid unnecessary computation.

2.: Sequence Length Sensitivity Analysis

Figure 13 shows the effect of varying the input sequence length on model performance. As the sequence length increases from 10 to 40, both MAE and RMSE exhibit a decreasing trend, while the

R^{2}

score remains consistently high (close to 1). This indicates that longer input sequences enable the model to better capture temporal dependencies, resulting in improved prediction accuracy. However, the performance gains become marginal as the sequence length exceeds 30, suggesting diminishing returns for very long input sequences. Overall, the model demonstrates stable and robust performance across a wide range of sequence lengths.

In summary, the sensitivity analysis results confirm that the proposed model is robust to changes in both the number of training epochs and the input sequence length, providing reliable performance across different settings.

6. Discussion

The proposed modeling strategy is tailored to industrial KPI forecasting tasks, where time series data typically exhibit clear temporal dependencies and regular seasonal patterns. The segmentation and attention mechanisms are designed to leverage these characteristics, resulting in improved predictive accuracy and interpretability, as validated by extensive experiments. This approach is well-suited for offline scenarios that prioritize forecasting accuracy, with reasonable computational requirements. However, its effectiveness is closely linked to the presence of such temporal structures in the data, and the model’s generalizability to domains with different or less regular patterns remains to be fully explored.

While adversarial risks and real-time deployment were not the primary focus of this study, the modular design of the model provides a foundation for future enhancements in robustness and adaptability. Future work will investigate the application of this method to more diverse datasets, explore transfer learning and domain adaptation techniques to improve generalization, and develop strategies to address adversarial scenarios and resource-constrained environments.

7. Conclusions

This study demonstrates that CHSDM-Net effectively overcomes key limitations of existing KPI forecasting methods, including inadequate adaptation to local data characteristics and limited robustness in complex operational environments. The proposed approach initiates with consistency-aware dynamic segmentation to capture local temporal heterogeneity across multiple variables, formulating this process as an optimization problem. This is followed by a hybrid deep learning architecture that integrates a Static Representation Module with a Dynamic Temporal Disentanglement and Attention Module to extract and model both static and dynamic dependencies among influencing factors and target indicators. Experimental results on multiple real-world industrial datasets show that CHSDM-Net achieves a higher forecasting accuracy compared to conventional baselines and state-of-the-art models. Future work will focus on further improving the resilience of CHSDM-Net by incorporating adversarial defense mechanisms and advanced feature fusion strategies, aiming to extend its applicability to a broader range of industrial scenarios.

Author Contributions

Conceptualization, J.L.; methodology, J.L.; validation, J.L., B.L. and L.Z.; formal analysis, J.L.; investigation, J.L., B.L. and L.Z.; resources, M.W.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, X.J. and B.L.; visualization, M.W.; supervision, X.J.; project administration, X.J.; funding acquisition, X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China (NSFC) [Grant No. 72271238].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data of the numeric example were generated using synthetic test functions and are available upon reasonable request. However, the authors do not have permission to share data for the illustrative example.

Acknowledgments

The authors thank the anonymous reviewers for their insightful and constructive recommendations, which helped improve the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Odufuwa, O.Y.; Tartibu, L.K.; Kusakana, K. Artificial neural network modelling for predicting efficiency and emissions in mini-diesel engines: Key performance indicators and environmental impact analysis. Fuel 2025, 387, 134294. [Google Scholar] [CrossRef]
Frohmann, M.; Karner, M.; Khudoyan, S.; Wagner, R.; Schedl, M. Predicting the Price of Bitcoin Using Sentiment-Enriched Time Series Forecasting. Big Data Cogn. Comput. 2023, 7, 137. [Google Scholar] [CrossRef]
Dioubi, F.; Hundera, N.W.; Xu, H.; Zhu, X. Enhancing stock market predictions via hybrid external trend and internal components analysis and long short term memory model. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102252. [Google Scholar] [CrossRef]
Alharthi, M.; Mahmood, A. Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting. Big Data Cogn. Comput. 2024, 8, 48. [Google Scholar] [CrossRef]
AlSharabi, K.; Bin Salamah, Y.; Aljalal, M.; Abdurraqeeb, A.M.; Alturki, F.A. Long-Term Forecasting of Solar Irradiation in Riyadh, Saudi Arabia, Using Machine Learning Techniques. Big Data Cogn. Comput. 2025, 9, 21. [Google Scholar] [CrossRef]
Elwahsh, H.; Tawfeek, M.A.; Abd El-Aziz, A.A.; Mahmood, M.A.; Alsabaan, M.; El-shafeiy, E. A new approach for cancer prediction based on deep neural learning. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101565. [Google Scholar] [CrossRef]
Xie, H.; Wei, L.; Ruan, G.; Zhang, H.; Shi, J.; Lin, S.; Liu, C.; Liu, X.; Zheng, X.; Chen, Y.; et al. Performance of anthropometry-based and bio-electrical impedance-based muscle-mass indicators in the Global Leadership Initiative on Malnutrition criteria for predicting prognosis in patients with cancer. Clin. Nutr. 2024, 43, 1791–1799. [Google Scholar] [CrossRef]
Su, C.; Peng, X.; Yang, D.; Lu, R.; Huang, H.; Zhong, W. A Transferable Ensemble Additive Network for Interpretable Prediction of Key Performance Indicators. IEEE Trans. Instrum. Meas. 2024, 73, 2532214. [Google Scholar] [CrossRef]
Azam, M.A.; Siddiqui, M.A.; Ali, H. Development of performance indicator for metal-organic frameworks in atmospheric water harvesting. Sep. Purif. Technol. 2025, 355, 129660. [Google Scholar] [CrossRef]
Zhang, P.; Cao, L.; Dong, F.; Gao, Z.; Zou, Y.; Wang, K.; Zhang, Y.; Sun, P. A Study of Hybrid Predictions Based on the Synthesized Health Indicator for Marine Systems and Their Equipment Failure. Appl. Sci. 2022, 12, 3329. [Google Scholar] [CrossRef]
Han, H.; Li, H.; Wu, X.; Yang, H.; Zhao, D. Cascaded LSTM-Based State Prediction of Equipment in Wastewater Treatment Process. IEEE Trans. Instrum. Meas. 2024, 73, 3541112. [Google Scholar] [CrossRef]
Kim, D.; Baek, J.-G. Bagging ensemble-based novel data generation method for univariate time series forecasting. Expert Syst. Appl. 2022, 203, 117366. [Google Scholar] [CrossRef]
Sun, L.; Ji, Y.; Li, Q.; Yang, T. A process knowledge-based hybrid method for univariate time series prediction with uncertain inputs in process industry. Adv. Eng. Inform. 2024, 60, 102438. [Google Scholar] [CrossRef]
Balderas, L.; Lastra, M.; Benítez, J.M. An Efficient Green AI Approach to Time Series Forecasting Based on Deep Learning. Big Data Cogn. Comput. 2024, 8, 120. [Google Scholar] [CrossRef]
Liu, Z.; Feng, Y.; Liu, H.; Tang, R.; Yang, B.; Zhang, D.; Jia, W.; Tan, J. TVC Former: A transformer-based long-term multivariate time series forecasting method using time-variable coupling correlation graph. Knowl.-Based Syst. 2025, 314, 113147. [Google Scholar] [CrossRef]
Bao, X.; Zheng, Y.; Zhong, J.; Chen, L. SIMTSeg: A self-supervised multivariate time series segmentation method with periodic subspace projection and reverse diffusion for industrial process. Adv. Eng. Inform. 2024, 62, 102859. [Google Scholar] [CrossRef]
Wu, H.; Jing, S.; Zhang, R.; Zhang, F.; Jiang, C. Phase unwrapping error identification and suppression method in ϕ-OTDR systems based on PELT-VMD-ARIMA. Opt. Express 2024, 32, 29344–29361. [Google Scholar] [CrossRef]
Cheng, X.; Huang, B.; Zong, J. Device-Free Human Activity Recognition Based on GMM-HMM Using Channel State Information. IEEE Access 2021, 9, 76592–76601. [Google Scholar] [CrossRef]
Machado, A.P.F.; Munaro, C.J.; Ciarelli, P.M. Enhancing one-class classifiers performance in multivariate time series through dynamic clustering: A case study on hydraulic system fault detection. Expert Syst. Appl. 2025, 286, 128088. [Google Scholar] [CrossRef]
Wang, L.; Shen, P. Memetic segmentation based on variable lag aware for multivariate time series. Inf. Sci. 2024, 657, 120003. [Google Scholar] [CrossRef]
Heo, T.; Manuel, L. Greedy copula segmentation of multivariate non-stationary time series for climate change adaptation. Prog. Disaster Sci. 2022, 14, 100221. [Google Scholar] [CrossRef]
Huang, J.; Ren, L.; Ji, Z.; Yan, K. Single-channel EEG automatic sleep staging based on transition optimized HMM. Multimed. Tools Appl. 2022, 30, 43063–43081. [Google Scholar] [CrossRef]
Guo, S.; Zheng, S.; Li, J.; Zhou, Q.; Xu, H. A lightweight social cognitive risk potential field model for path planning with dedicated dynamic and static traffic factors. IET Intell. Transp. Syst. 2025, 19, e12595. [Google Scholar] [CrossRef]
Shi, X.; Hao, K.; Chen, L.; Wei, B.; Liu, X. Multivariate time series prediction of complex systems based on graph neural networks with location embedding graph structure learning. Adv. Eng. Inform. 2022, 54, 101810. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. CoST: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv 2022, arXiv:2202.01575. [Google Scholar]
Yu, G.; Zou, J.; Hu, X.; Aviles-Rivero, A.I.; Qin, J.; Wang, S. Revitalizing multivariate time series forecasting: Learnable decomposition with inter-series dependencies and intra-series variations modeling. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; pp. 57818–57841. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; pp. 22419–22430. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
Shabani, A.; Abdi, A.; Meng, L.; Sylvain, T. Scaleformer: Iterative multi-scale refining transformers for time series forecasting. arXiv 2022, arXiv:2206.04038. [Google Scholar]
Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Garza Ramirez, F.; Canseco, M.M.; Dubrawski, A. N-HiTS: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 6989–6997. [Google Scholar]
Chen, P.; Zhang, Y.; Cheng, Y.; Shu, Y.; Wang, Y.; Wen, Q.; Yang, B.; Guo, C. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. arXiv 2024, arXiv:2402.05956. [Google Scholar]
Xu, M.; Yang, F.; Fang, Y.; Li, F.; Yan, R. Research on time series-based pipeline ground penetrating radar calibration angle prediction algorithm. Sensors 2024, 24, 379. [Google Scholar] [CrossRef]
William, W.S.W. Multivariate Time Series Analysis and Applications; Wiley-Blackwell: Hoboken, NJ, USA, 2019. [Google Scholar]
Yuan, J.; Li, D. Epidemiological and clinical characteristics of influenza patients in respiratory department under the prediction of autoregressive integrated moving average model. Results Phys. 2021, 24, 104070. [Google Scholar] [CrossRef]
Jang, G.; Seo, J.; Lee, H. Analyzing the impact of COVID-19 on seasonal infectious disease outbreak detection using hybrid SARIMAX-LSTM model. J. Infect. Public Health 2025, 18, 102772. [Google Scholar] [CrossRef]
Mulla, S.; Pande, C.B.; Singh, S.K. Times series forecasting of monthly rainfall using seasonal auto regressive integrated moving average with exogenous variables (SARIMAX) model. Water Resour. Manag. 2024, 38, 1825–1846. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Van Ryzin, J. Classification and Regression Trees (Book). J. Am. Stat. Assoc. 1986, 81, 253. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
Nigam, S. Forecasting time series using convolutional neural network with multiplicative neuron. Appl. Soft Comput. 2025, 174, 112921. [Google Scholar] [CrossRef]
Salazar, C.; Banerjee, A.G. A distance correlation-based approach to characterize the effectiveness of recurrent neural networks for time series forecasting. Neurocomputing 2025, 629, 129641. [Google Scholar] [CrossRef]
Ye, H.; Chen, J.; Gong, S.; Jiang, F.; Zhang, T.; Chen, J.; Gao, X. ATFNet: Adaptive time-frequency ensembled network for long-term time series forecasting. arXiv 2024, arXiv:2404.05192. [Google Scholar]
Kheir, N.A.; Holmes, W.M. On validating simulation models of missile systems. Simulation 1978, 30, 117–128. [Google Scholar] [CrossRef]
Montgomery, D.C.; Conard, R.G. Comparison of simulation and flight-test data for missile systems. Simulation 1980, 34, 63–72. [Google Scholar] [CrossRef]
Abdel-Magid, Y.L.; Abido, M.A. Optimal multiobjective design of robust power system stabilizers using genetic algorithms. IEEE Trans. Power Syst. 2003, 18, 1125–1132. [Google Scholar] [CrossRef]
Qiu, X.; Hu, J.; Zhou, L.; Wu, X.; Du, J.; Zhang, B.; Guo, C.; Zhou, A.; Jensen, C.S.; Sheng, Z.; et al. TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods. Proc. Very Large Data Bases 2024, 17, 2363–2377. [Google Scholar] [CrossRef]
Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.-H.; Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
Luo, Y.; Lyu, Z.; Huang, X. TFDNet: Time-Frequency Enhanced Decomposed Network for Long-term Time Series Forecasting. arXiv 2023, arXiv:2308.13386. [Google Scholar] [CrossRef]
Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-term forecasting with TiDE: Time-series dense encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]
Li, Z.; Qi, S.; Li, Y.; Xu, Z. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv 2023, arXiv:2305.10721. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Wang, M.; Meng, Y.; Sun, L.; Zhang, T. Decomposition combining averaging seasonal-trend with singular spectrum analysis and a marine predator algorithm embedding Adam for time series forecasting with strong volatility. Expert Syst. Appl. 2025, 274, 126864. [Google Scholar] [CrossRef]
Simon, J.; Moll, J.; Krozer, V. Trend decomposition for temperature compensation in a radar-based structural health monitoring system of wind turbine blades. Sensors 2024, 24, 800. [Google Scholar] [CrossRef]

Figure 1. The flowchart of a radar check-flight test.

Figure 2. Illustration of dynamic segmentation. Each dataset

Y^{i}

is divided into K aligned segments, where each patch contains corresponding subsequences from all datasets.

Figure 2. Illustration of dynamic segmentation. Each dataset

Y^{i}

is divided into K aligned segments, where each patch contains corresponding subsequences from all datasets.

Figure 3. The comparison of consistent and inconsistent datasets.

Figure 4. The structure of the hybrid static–dynamic multivariate forecast model. The Static Representation Module extracts features from static data, while the Dynamic Temporal Disentanglement and Attention Module normalizes and processes dynamic data. Features from both modules are concatenated and passed to the predictor for generating forecasts.

Figure 5. The workflow of the proposed model.

Figure 6. Boxplots of range differences for Radar 2 across four datasets.

Figure 7. The line chart of part of the dependent-variable data in the fourth segment of the numeric dataset.

Figure 8. The Count and Score dual-axis plots for all five radar datasets. Each subplot corresponds to a different dataset, with the bar representing the count per segment and the line indicating the score. (a) Radar 1. (b) Radar 2. (c) Radar 3. (d) Radar 4. (e) Radar 5.

Figure 9. Comparison of absolute prediction errors for different methods on the Numeric dataset. CHSDM-Net demonstrates consistently lower error across the test set compared to baseline methods.

Figure 10. Comparison of absolute prediction errors for different methods on the Radar1 dataset. CHSDM-Net achieves lower and more stable errors, highlighting its robustness on complex time series.

Figure 11. Ablation study results for different variants.

Figure 12. Epoch sensitivity analysis on the numerical dataset.

Figure 13. Sequence length sensitivity analysis on the numerical dataset.

Table 1. Statistics of raw and preprocessed data samples for each radar dataset.

Dataset	Radar1	Radar2	Radar3	Radar4	Radar5	Radar6
dataset1	13,775	14,371	14,948	15,373	15,580	14,608
dataset2	14,901	14,681	10,472	14,379	14,241	14,384
dataset3	14,779	14,929	12,594	14,191	14,840	14,784
dataset4	14,297	15,665	13,687	14,312	14,415	14,823
After preprocessing	9163	9380	7890	8818	10,028	10,308

Source: Radar measurement experiments.

Table 2. Hyperparameter search ranges.

Hyperparameter	Range
Learning rate	[1 × 10⁻⁴, 1 × 10⁻³]
Dropout rate	[0, 0.1]
Batch size	[4, 8, 16, 32, 64]
Sequence length	[20, 30, 40]
Prediction length	[20, 30, 40]

Table 3. Segmentation results of numeric dataset.

Segment	Original Segmentation		Optimized Segmentation
Segment	Count	Score	Count	Score
1	2609	0.661	2580	0.661
2	2537	0.681	2563	0.674
3	2689	0.690	2748	0.691
4	2078	0.646	2022	0.646

Table 4. GRA Scores for inconsistent segment of numeric dataset.

Method	Dataset 1 vs. Dataset 2	Dataset 1 vs. Dataset 3	Dataset 2 vs. Dataset 3
Original	0.7096	0.6437	0.5857
GA Optimized	0.7099	0.6437	0.5857

Table 5. Comparison with state-of-the-art methods on all datasets. For each metric, the best result is highlighted in bold, and the second-best is underlined. The arrows indicate the desired direction of performance: ↓ denotes lower values are better (for MAE, RMSE, MASE), and ↑ denotes higher values are better (for R²).

Dataset	Metric	CHSDM-Net	Autoformer	FEDformer	Pyraformer	Pathformer	PatchTST	N-HiTS	TiDE	DLinear	SARIMAX
Numeric	MAE↓	0.0568	0.1209	0.0857	0.1682	0.0967	0.1066	0.0857	0.0768	0.2278	0.5031
	RMSE↓	0.1155	0.5013	0.3568	0.2849	0.3618	0.3611	0.3675	0.3535	0.3125	06771
	R²↑	0.9987	0.9801	0.9909	0.9923	0.9906	0.9914	0.9912	0.9909	0.9908	0.9524
	MASE↓	0.0125	0.0318	0.0192	0.0388	0.0205	0.0241	0.0183	0.0149	0.0587	0.2315
Radar1	MAE↓	0.0643	0.0986	0.1399	0.0868	0.1234	0.1337	0.0670	0.0826	0.0689	1.4368
	RMSE↓	0.1274	0.1540	0.1944	0.1450	0.1954	0.1928	0.1292	0.1368	0.1417	1.7694
	R²↑	0.9906	0.9865	0.9784	0.9877	0.9775	0.9788	0.9907	0.9893	0.9879	−0.3071
	MASE↓	0.3277	0.4462	0.6521	0.3720	0.6613	0.5742	0.3260	0.3933	0.2995	3.2291
Radar2	MAE↓	0.0533	0.0877	0.1254	0.0784	0.1108	0.1149	0.0530	0.0704	0.0660	0.9565
	RMSE↓	0.0998	0.1304	0.1673	0.1167	0.1541	0.1589	0.0926	0.1086	0.1108	1.1904
	R²↑	0.9786	0.9648	0.9485	0.9697	0.9457	0.9510	0.9825	0.9756	0.9748	−0.5423
	MASE↓	0.1343	0.2463	0.3533	0.2269	0.2887	0.3541	0.1372	0.1858	0.1939	2.9407
Radar3	MAE↓	0.1470	0.2152	0.2926	0.1949	0.3028	0.2913	0.1709	0.1769	0.1920	2.1623
	RMSE↓	0.2361	0.3021	0.3886	0.2747	0.4023	0.3865	0.2514	0.2563	0.2846	2.6083
	R²↑	0.9836	0.9750	0.9575	0.9809	0.9547	0.9557	0.9836	0.9828	0.9750	−0.3206
	MASE↓	0.0979	0.1688	0.2266	0.1249	0.2142	0.2102	0.1273	0.1257	0.1447	3.5609
Radar4	MAE↓	0.0415	0.0521	0.0877	0.0500	0.0653	0.0867	0.0464	0.0462	0.0618	1.1584
	RMSE↓	0.0936	0.1008	0.1306	0.1015	0.1235	0.1430	0.0948	0.0963	0.1198	1.4514
	R²↑	0.9893	0.9871	0.9798	0.9871	0.9794	0.9736	0.9895	0.9889	0.9826	−0.4982
	MASE↓	0.1985	0.2541	0.4198	0.2530	0.3006	0.3536	0.2229	0.2100	0.2575	3.2294
Radar5	MAE↓	0.0714	0.0978	0.1106	0.0715	0.0980	0.1101	0.0630	0.1866	0.0718	0.8926
	RMSE↓	0.1089	0.1533	0.1493	0.1102	0.1390	0.1524	0.1021	0.2417	0.1208	1.1174
	R²↑	0.9804	0.9864	0.9648	0.9778	0.9667	0.9576	0.9804	0.9114	0.9767	−0.2622
	MASE↓	0.1814	0.4260	0.2872	0.1825	0.2554	0.3085	0.1630	0.6197	0.1906	2.7657

Table 6. Forecasting performance on the numeric dataset using different decomposition methods.

Method	MAE	RMSE	$R^{2}$	MASE
CHSDM-Net	0.0568	0.1155	0.9987	0.0923
STL	0.5754	0.7343	0.9468	0.2437
Moving Average	0.5806	0.7567	0.9496	0.2573

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Long, J.; Jia, X.; Li, B.; Zhu, L.; Wang, M. A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators. Big Data Cogn. Comput. 2025, 9, 163. https://doi.org/10.3390/bdcc9070163

AMA Style

Long J, Jia X, Li B, Zhu L, Wang M. A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators. Big Data and Cognitive Computing. 2025; 9(7):163. https://doi.org/10.3390/bdcc9070163

Chicago/Turabian Style

Long, Jiahui, Xiang Jia, Bingyi Li, Lin Zhu, and Miao Wang. 2025. "A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators" Big Data and Cognitive Computing 9, no. 7: 163. https://doi.org/10.3390/bdcc9070163

APA Style

Long, J., Jia, X., Li, B., Zhu, L., & Wang, M. (2025). A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators. Big Data and Cognitive Computing, 9(7), 163. https://doi.org/10.3390/bdcc9070163

Article Menu

A Consistency-Aware Hybrid Static–Dynamic Multivariate Network for Forecasting Industrial Key Performance Indicators

Abstract

1. Introduction

2. Literature Review

3. Problem Statement

4. Methodology

4.1. Consistency-Aware Dynamic Segmentation

4.2. Hybrid Multivariate Forecasting Model

4.3. Running Flow of Proposed Method

5. Experiments and Results

5.1. Experimental Setup

5.1.1. Dataset Description

5.1.2. Evaluation Metrics

5.1.3. Baselines

5.1.4. Implementation Details

5.2. Overall Performance

5.2.1. Segmentation Results

5.2.2. Main Results

5.3. Ablation Study

5.4. Comparative Analysis of Segmentation and Decomposition Methods

5.5. Sensitivity Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI