Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model

Cao, Bo; Xing, Qinghua; Li, Longyue; Shi, Junjie; Lin, Weijie

doi:10.3390/ai6080192

Open AccessArticle

Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model

by

Bo Cao

¹

,

Qinghua Xing

²,

Longyue Li

^2,*,

Junjie Shi

^3,*

and

Weijie Lin

¹

Graduate School, Air Force Engineering University, Xi’an 710051, China

²

Air Defense and Antimissile School, Air Force Engineering University, Xi’an 710051, China

³

Aviation Maintenance NCO School, Air Force Engineering University, Xinyang 464000, China

^*

Authors to whom correspondence should be addressed.

AI 2025, 6(8), 192; https://doi.org/10.3390/ai6080192

Submission received: 21 July 2025 / Revised: 7 August 2025 / Accepted: 17 August 2025 / Published: 18 August 2025

(This article belongs to the Topic Theoretical Foundations and Applications of Deep Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

The uncertainty and confrontational nature of war itself pose significant challenges to the collection and storage of aerial battlefield temporal data. To address the issue of insufficient training of intelligent models caused by the scarcity of air battlefield situation data, this paper designs an air battlefield time series data augmentation model based on a lightweight denoising diffusion probabilistic model (LDMKD-DA). Considering the advantages of a denoising diffusion probabilistic model (DDPM) in processing images, this paper transforms 1D time series data into image data. 1D univariate time series data, such as High-resolution Range Profile dataset, are transformed by Gramian angular fields and Markov transition fields. Multivariate time series data, such as the air target intention dataset, are transformed by matrix expansion. Then, the data augmentation model is constructed based on the denoising diffusion probabilistic model. Considering the need for miniaturization and intelligence in future combat platforms, the depthwise separable convolution is introduced to lighten the DDPM, and, at the same time, the improved knowledge distillation method is introduced to accelerate the sampling process. The experimental results show that LDMKD-DA is capable of generating synthetic data similar to real data with high quality while significantly reducing FLOPs and params, while having significant advantages in univariate and multivariate time series data amplification.

Keywords:

denoising diffusion probabilistic model; knowledge distillation; depthwise separable convolution; situation awareness; time series data augmentation

1. Introduction

With the rapid development of high-tech breakthroughs, such as artificial intelligence and its wide application in the military, the traditional air combat modes, weapons and equipment, and combat theory are undergoing great changes. The cooperation between unmanned and manned aircraft, as well as the widespread application of stealth technology, has brought great difficulties to data collection and storage in the air battlefield [1]. In order to enable high performance, generalization, and accuracy for AI-based battlefield situational awareness applications, large datasets are a prerequisite. Therefore, battlefield situation awareness under small sample conditions has attracted extensive attention from researchers. One mainstream method currently used to solve this problem is data augmentation. Data augmentation technology can generate more data based on the characteristics and distribution of real data. This paper focuses on reviewing the literature and conducting research on time series data in air battlefield samples.

Aiming at the problems of insufficient training of intelligent models caused by the scarcity of air battlefield situational data, this paper designs an air battlefield time series data augmentation model based on a lightweight denoising diffusion probabilistic model, named LDMKD-DA, starting from time series data. LDM denotes a lightweight denoising diffusion probabilistic model, KD denotes knowledge distillation, and DA denotes data augmentation. The main contributions are as follows:

(1): Data transformation. We convert multivariate time series data into 3-channel images via matrix expansion, and univariate data via Gramian angular fields and Markov transition fields.
(2): Considering the need for miniaturization and intelligence in future combat platforms, depthwise separable convolution is introduced to lighten the denoising diffusion probabilistic model (DDPM). Thus, the number of parameters of the model and the amount of computation are reduced.
(3): This paper designs an improved knowledge distillation method with multiple teacher models to accelerate the sampling process.
(4): To validate the practicality and reliability of the generated data, this paper conducts relevant experiments. Generated data enhance performance of intention recognition and target recognition models.

The remainder of the paper is organized as follows. Section 2 mainly reviews the related work. Section 3 introduces some basic theories used in this paper. Section 4 details the LDMKD-DA model. Section 5 presents the validation experiments carried out in this paper. Section 6 summarizes the work of this paper and looks ahead to the next steps.

2. Related Work

The main methods for time series data augmentation are rule-based methods [2,3,4], simulation model-based methods [5,6], traditional machine learning-based methods [7,8,9] and deep learning-based methods [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], as shown in Table 1.

2.1. Data Augmentation Methods Based on Rules

Rule-based data augmentation methods mainly use various rules or constraints to define the attributes and structure of the data to be generated. These rules can be based on both the own attributes of the data, and the relationships between the data, to ensure that the generated data meet the requirements. Nicolas et al. proposed a flexible, easy-to-use, and extensible database generation framework that introduces the data generation language (DGL) to form a data tuple generation flow with iterators as the basic unit [2]. Houkjær et al. proposed a general relational data generation tool (SRDG), where the data generation algorithm is implemented based on a graph model and some simple data features can be specified by users [3]. Kang et al. constructed a time series generator (GRATIS) using a mixed autoregressive model (MAR). GRATIS can provide a diverse set of parameters, which in turn efficiently generates time series data with controllable characteristics [4].

2.2. Data Augmentation Methods Based on Simulation Models

Simulation model-based data augmentation methods are mainly based on modeling real scenarios or systems and using computer simulation technology to generate time series data. Taking battlefield data as an example, it can be generated through a combat simulation system. Bokde et al. designed two reconstruction methods for generating synthetic windspeed time series. One is the rankwise reconstruction method, and the other is the stepwise reconstruction method [5]. Koltuk et al. proposed a model-based approach for creating synthetic workload trajectories for cloud data centers. The synthetic trajectories have temporal characteristics and cumulative distributions similar to actual trajectories [6].

2.3. Data Augmentation Methods Based on Traditional Machine Learning

Traditional machine learning-based methods mainly use historical time series data to train models, and then generate new time series data, such as algorithms, like support vector machines and linear regression. Arlitt et al. presents an IoT big data scenario benchmarking toolkit, named IoTAbench. The generator is constructed based on Markov chains and ultimately generates time series data with important statistical properties similar to real time series [7]. Shamshad et al. designed a transition matrix method based on Markov chains, which utilizes the first-order and second-order transition probability matrices of a Markov chain to generate wind speed time series [8]. Li et al. designed a method to generate medium-term and long-term correlation time series of multi-wind farms based on a hybrid Gaussian model–hidden Markov model (GMM-HMM). The model establishes a hybrid Gaussian probability distribution mapping relationship between state variables and multidimensional wind power output vectors and generates multi-wind farm time series data based on the Monte Carlo sampling method [9].

2.4. Data Augmentation Methods Based on Deep Learning

Deep learning-based data augmentation methods mainly utilize deep learning models to generate time series data, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models (DMs).

The time series data augmentation methods based on GANs mainly include C-RNN-GAN [10], RCGAN [11], SeqGAN [12], T-CGAN [13], TimeGAN [14], COT-GAN [15], etc. Among them, TimeGAN is specifically designed for time series, utilizing a triple structure consisting of an autoencoder network, an adversarial network, and a supervised network to simultaneously learn static features and temporal dynamics. SigCWGAN introduces Sigmoid activation function and conditional constraints on the basis of WGAN, which better preserves the dynamic characteristics of time series, improves training stability and conditional generation ability. To address the difficulty of generating high-quality electronic medical records (EHRs), Lu et al. proposed a multi-label time series generative adversarial network (MTGAN) for generating EHRs, in which the generator uses GRUs with smooth conditional matrices, and the discriminator uses the Wasserstein distance to provide scores [16]. To address the scarcity of multimodal fault labeling data for bearings, Wang et al. proposed a generative model, namely SACGAN [17]. Shi et al. proposed a solar power generation time series data generation method using a time series generative adversarial network (TimeGAN), including three components, self-coding network, adversarial network and supervisory network, while introducing a recurrent neural network to construct different parts of the TimeGAN [18].

The time series data augmentation methods based on VAE mainly include VRNN [19], SRNN [20], DSAE [21], etc. Among them, HVAE introduces a hierarchical latent variable structure based on standard VAE, capturing multi-scale features of data through multiple layers of latent variables to enhance modeling capabilities for complex distributions. VQ-VAE models the latent space as a finite codebook through discrete latent vectors (rather than continuous distributions), improving the consistency and interpretability of the generated data. To address the data scarcity problem in industrial control systems (ICS), Jeon et al. proposed a variational recurrent self-encoder based on an attention mechanism to generate time series ICS data [22]. Zheng et al. designed a time series data generation method based on few-shot learning, which pretrains the autoencoder with a small quantity of data to generate sufficient hidden spatial data, and then obtains a large number of generated samples through a decoder recovery mechanism [23].

Diffusion modeling is a data generation method proposed in recent years, which has shown greater superiority in image generation. Yi et al. proposed a time series diffusion method (TSDM) for vibration signal generation based on the basic principle of diffusion models. TSDM contains an attention module, Resblock, and a U-Net structure improved by TimeEmbedding, and experimental results show that TSDM can effectively improve the quality of vibration signal generation [24]. ADIB et al. embedded ECG time series data into a two-dimensional space and used a two-dimensional DDPM model to generate ECG signals [25].

3. Related Theory

3.1. Depthwise Separable Convolution

Traditional convolutional neural networks apply kernels to all input channels, extracting features via matrix multiplication and accumulation. This approach, however, leads to excessive parameters and redundant features. To resolve this, researchers developed depthwise separable convolution (DSC), which reduces computational complexity and parameter count—critical for mobile and embedded systems. DSC integrates two key processes, namely depthwise convolution (DW) and pointwise convolution (PW), as shown in Figure 1a,b.

In the DW stage, each input channel is convolved through a separate convolution kernel, which outputs the feature maps corresponding to the different channels and extracts the spatial features of the data. The DW stage only changes the size of the feature map and not the number of channels in the feature map. Assuming that the input is

X \in R^{H_{i} \times W_{i} \times C_{i}}

and the size of the convolution kernel of the DW is

k \times k

, the output feature map

Y \in R^{H_{m} \times W_{m} \times C_{i}}

can be expressed by Equation (1), and the computation

{FLOPs}_{DW}

is shown in Equation (2), as follows:

Y = {Conv}_{k \times k} (X)

(1)

{FLOPs}_{DW} = 2 \times k \times k \times W_{m} \times H_{m} \times C_{i}

(2)

In the PW stage, a 1 × 1 convolution kernel is used to convolve the feature map Y to extract the information between different channels in the feature. PW does not change the dimensional size of the feature map, but its number of channels does. The output feature map

Z \in R^{H_{m} \times W_{m} \times C_{o u t}}

of Y after DW can be expressed by Equation (3), and the computational amount is shown in Equation (4), as follows:

Z = {Conv}_{1 \times 1} (Y)

(3)

{FLOPs}_{PW} = 2 \times C_{i} \times H_{m} \times W_{m} \times C_{o u t}

(4)

For a standard convolution with a convolution kernel size of

k \times k

, the computation is shown in Equation (5). The ratio of computational complexity between DSC and standard convolution is

(1 / C_{o u t}) + (1 / k^{2})

. It can be found that DSC is less computationally intensive and easy to implement and deploy on top of different platforms. When the number of parameters is the same, DSC can make the network deeper. Equation (5) is as follows:

{FLOPs}_{Conv} = 2 \times k \times k \times C_{i} \times H_{m} \times W_{m} \times C_{o u t}

(5)

For this paper, the convolution operation of the U-Net structure in the denoising diffusion probabilistic model is replaced by DSC to realize the lightweighting of the denoising diffusion probabilistic model, which, in turn, adapts to the needs of miniaturized combat platforms.

3.2. Denoising Diffusion Probabilistic Model

DDPMs operate through two core processes, namely a forward diffusion process that progressively corrupts raw data into noise, and a reverse diffusion process that reconstructs meaningful data from noise.

In the forward process, starting from the original data distribution

x_{0} ~ q (x_{0})

, Gaussian noise is incrementally added over T steps, transforming

q (x_{0})

into a latent variable distribution

q (x_{T})

that approximates a standard Gaussian distribution. The transformation at each step is defined as shown in the following Equation (6):

q (x_{1 : T} | x_{0}) = \prod_{t = 1}^{T} q (x_{t} | x_{t - 1})

(6)

where

q (x_{t} | x_{t - 1}) = Ν (x_{t}; \sqrt{α_{t}} x_{t - 1}, (1 - α_{t}) Ι)

, and N denotes the gaussian distribution. For simplicity, let

α_{t} = 1 - β_{t}

and

{\bar{α}}_{t} = \prod_{n = 1}^{t} α_{n}

, then, Equations (7)–(9) are as follows:

q (x_{t} | x_{t - 1}) = Ν (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} Ι)

(7)

q (x_{t} | x_{0}) = \int q (x_{1 : t} | x_{0}) d x_{1 : (t - 1)} = Ν (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) Ι)

(8)

x_{t} = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ε

(9)

where

ε \in Ν (0, Ι)

.

As t increases, noise gradually dominates the data, while the original data component fades. By the time

{\bar{α}}_{t} \approx 0

,

x_{T}

is approximately a standard Gaussian distribution, indicating the completion of forward diffusion.

The reverse process aims to recover the original data distribution from noise. It is achieved by fitting a neural network to predict the noise added at step t. Here,

θ

represents the network parameters. Set

q (x_{t} | x_{0})

to a standard normal distribution; the joint probability distribution of the backward diffusion process is as in the following Equation (10):

p_{θ} (x_{0 : T}) : = p (x_{T}) \prod_{t = 1}^{T} p_{θ} (x_{t - 1} | x_{t})

(10)

where

p_{θ} (x_{t - 1} | x_{t}) : = Ν (x_{t - 1}; μ_{θ} (x_{t}, t), σ_{θ} {(x_{t}, t)}^{2} Ι)

.

By decomposing the noise term, the mean of the reverse step can be approximated as follows:

μ_{θ} (x_{t}, t) = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ε_{θ} (x_{t}, t))

(11)

Setting the constant

{\tilde{β}}_{t}

associated with

β_{t}

to be the variance, the iterative update for the reverse process is as follows in Equation (12):

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ε_{θ} (x_{t}, t)) + {\tilde{β}}_{t} Ι

(12)

where

ε_{θ}

denotes a neural network with the same inputs and outputs.

In inference, we start with

x_{T} ~ N (x_{T}; 0, Ι)

sampled from a standard Gaussian distribution, then iteratively apply the reverse process using the noise predictor. The iterative process ends when

p_{θ} (x_{0})

is computed.

3.3. Knowledge Distillation

In the future informationized and intelligent warfare, combat units are evolving toward greater miniaturization and intelligence to enhance mobility and stealth. However, this trend limits their ability to provide the massive computational resources required for model training. A DDPM typically requires many computational steps to produce a new and high-quality sample. This causes sampling efficiency to become a critical bottleneck. Thus, addressing the time cost becomes a key challenge when applying DDPMs to air battlefield data generation. Existing approaches to accelerate DDPM sampling mainly include noise prediction model compression, optimization of the SDE discretization method, and so on. Among these, knowledge distillation has emerged as a high-performance model compression technique and is widely adopted in diffusion models.

This paper introduces a knowledge distillation framework to accelerate the sampling process of DDPMs. Here, teacher models are deployed in cloud environments, leveraging the cloud’s robust computational resources and high processing speeds to generate high-quality samples. Student models are employed directly on individual combat units, tailored to their constrained computational capabilities.

4. Methodology

The LDMKD-DA model proposed in this paper is shown in Figure 2. The model mainly consists of a data encoding module, data augmentation module, and data decoding module. The data encoding module focuses on preprocessing the input data. For multivariate time series data, three different matrices are generated by matrix expansion and scaling, which are combined to create a 3-channel 2D image file similar to an RGB image file. For univariate time series data, three different matrices were generated from Gramian angular fields and Markov transition fields, which are combined together to create an image file. The data augmentation module is mainly constructed based on DSC and DDPM. The data decoding module mainly converts the generated image data into sample data that is consistent with the original data structure through inverse mapping.

4.1. Data Encoding Module

Analyzed from a multimodal perspective, the air battlefield situational data collected by various types of sensors and platforms in combat can be roughly divided into text data, structured data, image data, and audio and video data. This paper focuses on preprocessing time series data in structured data to make them meet the input requirements of DDPM.

4.1.1. Multivariate Time Series Data Preprocessing

This paper introduces the preprocessing of multivariate time series data with air target intention data as an example. The intention space varies markedly with battlefield settings, combat patterns, and operational goals, thus requiring context-specific definitions with particular combat scenarios. This paper focuses on air targets in air defense operations and determines that the intention space of enemy air targets is {attack, penetrate, interference, feint, surveillance, reconnaissance, and retreat}.

After defining the target intention space, the model’s required feature information can be derived from the correlations between target attribute features and intentions. From the perspective of enemy combat missions, enemy aircraft will exhibit distinct characteristics information when executing different operational tasks. This paper constructs a feature set for air defense target intention recognition, encompassing altitude, speed, acceleration, heading angle, azimuth angle, distance, radar 1D range image, radar cross-section, air-to-air radar status, ground-to-ground radar status, jamming status, and enemy identification response. Among these, the first eight are numerical features, and the remaining four are non-numerical [26]. To meet the input requirements of DDPM, this paper visualizes the intention data and converts it into image data. Its processing mainly includes three steps.

(1): Normalization Processing

For 8 numerical features, the min–max normalization method is used to map them to the interval [0, 1], which is calculated as in the following Equation (13):

x^{'} = \frac{x - \min}{\max - \min}

(13)

where x represents the value of a numerical feature, x′ represents the normalized result of the numerical feature, and min and max correspond to the minimum and maximum values of the feature in the set, respectively.

The four types of non-numerical data in the feature set are all classified data. In order to facilitate the learning of the neural network, these four kinds of data need to be numerically mapped between the interval [0, 1], which is calculated as in the following Equation (14):

y = \frac{i - 1}{j}

(14)

where j represents the size of the classification space and y is the value of the original i-th non-numeric feature after mapping it to the interval [0, 1].

(2): 2D Matrix Embedding

Given a certain intention data

M_{0} \in R^{12 \times 6}

, i.e., the scaling factor k, multiply this data by k and

1 / k

to obtain the scaled up matrix

M_{1} \in R^{12 \times 6}

and the scaled-down matrix

M_{2} \in R^{12 \times 6}

, respectively. The scaling factor k is determined based on the dimensionality of the original data. For the ATI dataset, k is set to 2 through several experiments.

(3): Visualization Processing

Combine

M_{0}

,

M_{1}

, and

M_{2}

to create a 3-channel 2D image file and visualize it with the Matplotlib 3.7.2 library. The visualized intention data for the seven categories is shown in Figure 3.

4.1.2. Univariate Time Series Data Preprocessing

In this paper, the radar high-resolution range profile (HRRP) data are used as an example to introduce the preprocessing of time series data. Wang et al. designed a new embedding framework to map univariate time series data from one to two dimensions [27]. This paper refers to this method to convert univariate time series data into image data, which, in turn, satisfies the input requirements of DDPM. The preprocessing process mainly includes four steps.

(1): Polar Coordinate Transformation

The radar HRRP data can be represented by vector

X = {x_{1}, x_{2}, \dots, x_{N}}

. The X vector is first normalized and scaled to obtain vector

\tilde{X} = {{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{N}}

. Each time step value of

\tilde{X}

is mapped to the interval [−1,1]. Thus, each value can be converted as the cosine of the imaginary angle

φ_{i} \in [0, π]

, as follows:

x_{i} = \cos (φ_{i})

(15)

The data are mapped to the polar coordinate system by Equation (16), as follows:

\{\begin{cases} φ_{i} = \arccos ({\tilde{x}}_{i}) & - 1 \leq {\tilde{x}}_{i} \leq 1 \\ r_{i} = \frac{i}{N} & i \in [1, N] \end{cases}

(16)

when

φ \in [0, π]

,

\cos (φ)

is a monotone function. Therefore, whether it is a forward mapping or a reverse mapping, the correspondence is unique, which enables the transformation from generated image data to one-dimensional time series data.

(2): Gramian Angular Fields (GASF/GADF)

After mapping the timing data to polar coordinates, the Gramian summation angular field (GASF) and Gramian difference angular field (GADF) are defined as shown in the following Equations (17) and (18), respectively:

G A S F = [\begin{matrix} \cos (\frac{φ_{1}}{2} + \frac{φ_{1}}{2}) & \dots & \cos (\frac{φ_{1}}{2} + \frac{φ_{N}}{2}) \\ ⋮ & ⋱ & ⋮ \\ \cos (\frac{φ_{N}}{2} + \frac{φ_{1}}{2}) & \dots & \cos (\frac{φ_{1}}{2} + \frac{φ_{N}}{2}) \end{matrix}]

(17)

G A D F = [\begin{matrix} \cos (\frac{φ_{1}}{2} - \frac{φ_{1}}{2}) & \dots & \cos (\frac{φ_{1}}{2} - \frac{φ_{N}}{2}) \\ ⋮ & ⋱ & ⋮ \\ \cos (\frac{φ_{N}}{2} - \frac{φ_{1}}{2}) & \dots & \cos (\frac{φ_{N}}{2} - \frac{φ_{N}}{2}) \end{matrix}]

(18)

(3): Markov Transition Fields (MTFs)

GASF and GADF only capture static information and do not include dynamic information. Therefore, an MTF is introduced to reflect the dynamic information of the time series data. The MTF captures dynamic information by setting up Q quantization partitions and assigning each

x_{i}

to the corresponding partition

q_{i}

, where

i \in [1, Q]

. For any pair

x_{i}

and

x_{j}

, there are partitions

q_{i}

and

q_{j}

corresponding to them, and the element

M_{i j}

in the MTF denotes the probability of switching from

q_{i}

to

q_{j}

. The main diagonal element denotes the probability of switching from time step i to itself. Equation (19) is as follows:

M T F = [\begin{matrix} M_{11} & \dots & M_{1 N} \\ ⋮ & ⋱ & ⋮ \\ M_{N 1} & \dots & M_{N N} \end{matrix}]

(19)

The number of quantization partitions Q is set to 10 for the HRRP dataset, determined by the entropy of the time series. We use equal-frequency quantization to divide the normalized HRRP values into Q intervals, ensuring that each interval contains approximately the same number of samples. This balances the granularity of dynamic information capture and computational efficiency.

(4): Visualization Processing

We combine the GASF, GADF, and MTF to create a 3-channel 2D image file and visualize it with the Matplotlib library. The visualized HRRP data are shown in Figure 4.

In order to visually demonstrate the process of visualizing univariate data, this article takes [0.2, 0.5, 0.6] as an example for introduction. After polar coordinate transformation, [1.57, 0.72, 0] is obtained. Then, the GASF, GADF, and MTF are as follows:

G A S F = [\begin{matrix} - 1 & - 0.6614 & 0 \\ - 0.6614 & 0.125 & 0.75 \\ 0 & 0.75 & 1 \end{matrix}]

(20)

G A D F = [\begin{matrix} 0 & 0.75 & 1 \\ - 0.75 & 0 & 0.6614 \\ - 1 & - 0.6614 & 0 \end{matrix}]

(21)

M T F = [\begin{matrix} 0 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 1 & 1 \end{matrix}]

(22)

4.2. Data Augmentation Module

The lightweight denoising diffusion probability model (LDDPM) is shown in Figure 5. The LDDPM mainly consists of a forward process and a reverse process.

(1): Forward process. Gaussian noise is gradually added to the original data until it approximates the standard Gaussian distribution.
(2): Reverse process. The LDDPM is fitted by a noise prediction model (U-Net neural network), iteratively denoised, and finally obtains a sample from the data distribution. The inputs of the U-Net neural network are the latent variables $x_{t}$ at time t, feature $x_{o}$ after the original data have been extracted, and time t. By training a U-Net neural network to predict the noise at time t in the reverse process, $μ_{θ} (x_{t}, t)$ and $σ_{θ} (x_{t}, t)$ are obtained, and the next latent variable $x_{t - 1}$ is sampled. The generated sample can be obtained through repeated iterations.

Figure 5. The lightweight denoising diffusion probability model.

Considering the large number of DDPM parameters and computational cost, which is difficult to adapt to the future needs of miniaturized intelligent combat platforms, this paper introduces DSC to lighten the U-Net. The structure of the U-Net is shown in Figure 6, and mainly consists of an encoder, a decoder, and a residual connection between the encoder and the decoder. First, the original image data are transformed into a feature map by DSC. After that, the data are fed into the encoder, middle steps, and decoder, respectively. Finally, the predicted noise value is reconstructed by performing a separable convolution operation on the output of the last layer of the decoder, which will be used to recover the hidden variables

x_{t - 1}

at the next time step.

Given the inputs of the residual block

x_{i} \in R^{H \times W \times C}

,

F (•)

denotes a nonlinear mapping containing GroupNorm, the SiLU activation function, and 3 × 3 DSC, while

A t t e n t i o n (•)

denotes a nonlinear mapping containing the attention mechanism. The output of the residual module can be expressed as shown in the following Equation (23):

x_{i + 1} = R e s (x_{i}) = A t t e n t i o n (F (F (x_{i} \oplus t_{e}) + x_{i}))

(23)

where

x_{i + 1}

denotes the output of the residual block and

t_{e}

denotes the result of encoding the time t by sinusoidal position encoding.

The encoder contains three residual blocks, and except for the last encoder, all others are connected to a downsample module after the residual block. The downsample module consists of a 3 × 3 DSC. The output

E_{i}

of the ith encoder can be expressed as shown in Equation (24), where

D S C (•)

denotes the depthwise separable convolution.

E_{i} = D S C (E_{i - 1}) = D S C (R e s (R e s (R e s (x_{i - 1}))))

(24)

The middle step contains two residual blocks and the attention mechanism, and its output can be expressed as shown in Equation (25).

M = R e s (A t t e n t i o n (R e s (•)))

(25)

The decoder consists of three residual modules, and except for the last encoder, all others are connected to an upsample module after the residual block. The upsample module doubles the feature map size by transpose convolution. The output

D_{i}

of the ith decoder can be expressed as shown in Equation (26), where

T r a n s p o s e C o n v (•)

denotes the transpose convolution.

D_{i} = T r a n s p o s e C o n v (D_{i + 1}, E_{i})

(26)

4.3. Training and Sampling Process

Algorithm 1 provides the specific training process. First, we initialize the model parameters and extract the input features. It then iteratively optimizes the network by minimizing the difference between predicted noise and true noise. Specifically, the process involves sampling data and time steps, computing latent variables, and updating model parameters via gradient descent.

Algorithm 1 Training process

1: repeat

2:

x_{0} ~ D

3:

t ~ Uniform ({1, \dots, T})

4:

ε \in N (0, I)

5:

\nabla_{θ} | | ε - ε_{θ} (\sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ε, t) | |^{2}

6: until converged

Algorithm 2 provides the specific sampling process. Sampling starts from random noise and reconstructs valid samples through T-step iterative denoising. The process adjusts noise injections based on the current time step, gradually refining the latent variable until the final sample is generated.

Algorithm 2 Sampling process

1:

x_{T} ~ N (0, I)

2: for

t = T, \dots, 1

do

3:

z ~ N (0, I)

if t > 1, else z = 0

4:

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ε_{θ} (x_{t}, t)) + σ_{t} z

5: end for

6: return

x_{0}

4.4. Reasoning Acceleration Process

To address the inefficiency of the DDPM’s reverse process, this section introduces an optimized knowledge distillation framework to accelerate sampling. The specific design details are shown in Figure 7.

First, we design teacher models. Considering that time series of empty battlefields of different categories have unique feature distributions, a single teacher model struggles to capture all the category specificities simultaneously. Therefore, we design n teacher models by category, with each teacher focusing on learning the features of a specific type of data to enhance the specificity of distillation. The difference between the different teacher models is that they have different inputs. Taking intention data as an example, seven teacher models (

N_{T e}^{1}, \dots, N_{T e}^{7}

) are designed based on the seven operational intentions of air targets.

Second, we design student models. Data from all categories are fed into the teacher models simultaneously, and the optimal teacher model

N_{T e}^{b e s t}

is selected through multiple experiments. The student model

N_{S}

then replaces the U-Net architecture with a more compact U-Net. Specifically, the input channel count of the student’s U-Net remains unchanged, while the channel counts of convolutional layers in other positions are halved. A 1 × 1 convolutional layer is added to resolve feature dimension mismatches caused by the reduced network scale.

Third, we design the loss function. The loss function integrates three terms. One is the cross-entropy

L_{l a b e l}

between the student model’s predictions and ground-truth labels; the second is the average cross-entropy

L_{T e S}

between the outputs of teacher models and the student model; and the third is the mean squared error

L_{inter}

between intermediate feature maps of teacher and student models to preserve hierarchical feature consistency. Mathematically, the total loss is expressed as follows:

L_{t o t a l} = α L_{l a b e l} + β L_{T S} + γ L_{inter}

(27)

L_{l a b e l} = H (y_{i}, N_{S} (x_{i}))

(28)

L_{T e S} = \frac{1}{n} \sum_{k = 1}^{n} H (N_{T e}^{k τ} (x_{i}), N_{S}^{τ} (x_{i}))

(29)

L_{inter} = \frac{1}{n} \sum_{k = 1}^{n} M S E (F_{T e}^{k} (x_{i}), F_{S} (x_{i}))

(30)

where

N_{T e}^{k τ}

and

N_{S}^{τ}

represent the soft output results of the k-th teacher model and the student model.

τ > 1

serves as the temperature parameter to implement the softening operation.

F_{T e}^{k} (\cdot)

stands for the output feature map of an intermediate layer in the k-th teacher model, and

F_{S} (•)

denotes that of the corresponding intermediate layer in the student model.

M S E (•)

is used to calculate the mean square error between different feature maps.

α

,

β

, and

γ

are the weight coefficients assigned to the three loss components, with their sum equal to 1.

H (•, •)

denotes the cross-entropy.

5. Experiments and Discussions

In order to validate the generative effect of the LDMKD-DA model proposed in this paper, experimental analyses are conducted in this paper using the air target intention dataset and the radar HRRP dataset, respectively. The experiment mainly revolves around the following four questions:

(1): Can the LDMKD-DA model effectively extract features from the original sample?
(2): Can the LDMKD-DA model generate high-quality and diverse samples?
(3): Does the generated sample have the same feature distribution as the original sample?
(4): Can the generated samples be used to train specific models?

5.1. Experimental Setup

5.1.1. Experimental Datasets

1.: Air Target Intention Dataset (ATI Dataset)

In this paper, the ATI dataset is used to validate the generation of multivariate time series data. Detailed information about the ATI dataset is shown in Table 2.

2.: High-Resolution Range Profile Dataset (HRRP Dataset)

The high-resolution range profile (HRRP) dataset is a sequence of target scattering intensity distributions and is a one-dimensional feature signal. In this paper, HRRP sequences for three types of targets are generated by the simulation system. The radar center frequency is set to 10 GHz, the signal bandwidth is set to 600 MHz, and the target distance from the radar is initially 300 KM. The initial simulation data contain 900 samples, with 300 samples for each type of target. In this paper, the HRRP dataset is used to validate the generation of univariate time series data.

5.1.2. Evaluation Indicators

In this paper, maximum mean discrepancy (MMD), number of parameters (Params), amount of computation (FLOPs), accuracy, recall, precision, and F1 score are used as the evaluation indicators. MMD, Params, and FLOPs are mainly used to evaluate the effectiveness of the data augmentation model, and Accuracy, Recall, Precision, and F1-score are mainly used to evaluate the classification effectiveness of the classification model. Among these indicators, MMD works by comparing two data distributions through mean embedding in the reproducing kernel Hilbert space (RKHS); the others are common evaluation metrics for deep learning models, which will not be discussed in detail here.

5.1.3. Hyperparameter Setting

The hyperparameters of the LDMKD-DA are shown in Table 3. The experiments performed in the paper are in the Python language, version 3.8, accelerated by an NVIDIA GeForce RTX2080 GPUs (Xc HYBRID, 11 GB GDDR6 VRAM, driver version: 572.47) (EVGA, Taiwan, China) and CUDA 12.2, and using the PyTorch 1.12.1 deep learning framework.

5.2. Multivariate Time Series Data Experiments

To verify the effectiveness of the LDMKD-DA model for the generation of multivariate time series data, this paper takes the ATI dataset as an example to conduct relevant experiments. We compare the FLOPs, Params, and MMD values of the LDMKD-DA, DDPM, and MMD values of the GAN-based generative model and VAE-based generative model, respectively. The generated data are visualized by the t-SNE algorithm. The evaluation metrics for the different models are shown in Table 4 and Table 5, with smaller MMD values being better.

The FLOPs, Params, and MMD values for LDMKD-DA and DDPM are given in Table 4. By comparison, it can be found that the FLOPs of LDMKD-DA decreased by 81.2% compared to DDPM, Params decreased by 78.5%, and MMD value only increased by 14.6%. This shows that after introducing depthwise separable convolution and knowledge distillation methods, the LDMKD-DA significantly reduced the memory and energy consumption of the model, but its performance in generating data does not show a significant decline. This indicates that the LDMKD-DA model proposed in this article can generate multivariate time series data while reducing parameter and computational cost. In addition, by comparing Table 4 and Table 5, it can be found that the MMD values of both the LDMKD-DA and DDPM are lower than those of the VAE-based generative models (VAE, HAVE, and VQ-VAE) and the GAN-based generative models (GAN, TimeGAN, and SigCWGAN), which suggests that the ATI data generated by LDMKD-DA and DDPM can better simulate the distribution of the original data.

The t-SNE plots of the above eight methods on the ATI dataset are shown in Figure 8, with different colors indicating different intention data. Among them, 0 represents retreat intention, 1 represents penetrate intention, 2 represents attack intention, 3 represents interference intention, 4 represents surveillance intention, 5 represents reconnaissance intention, and 6 represents feint attack intention. During the experiment, it was found that the t-SNE graphs of ATI data generated by the eight methods mentioned above have good discriminatory effects for different categories of intention. However, this does not prove that the generated ATI data are distributed similarly to the original ATI data. Therefore, we fused the generated ATI data with the original ATI data in a 1:1 ratio for visualization. Figure 8a shows the t-SNE plot of the original data, while Figure 8b–i shows the t-SNE plots of mixed data from different models.

From Figure 8, it can be found that the class distributions of the data generated by LDMKD-DA and DDPM outperform the VAE-based generative model and the GAN-based generative model when mixed with the original data. This indicates that LDMKD-DA and DDPM can better learn the features and distribution of the original ATI data, generating high-quality, multi-class and multivariate time series data. In addition, through the t-SNE graph, the proximity between attack and feint intention clusters can be observed, as well as the overlapping distribution of surveillance and reconnaissance intentions. These patterns reflect inherent similarities in their temporal feature patterns. This observation suggests potential challenges in distinguishing these categories due to shared characteristics in the data.

5.3. Univariate Time Series Data Experiments

To verify the effectiveness of the LDMKD-DA model for the generation of univariate time series data, this paper takes the HRRP dataset as an example to conduct relevant experiments. We compare the FLOPs, Params, and MMD values of the LDMKD-DA, DDPM, and MMD values of the GAN-based generative model and VAE-based generative model, respectively. The generated data are visualized by the t-SNE algorithm. The evaluation metrics for the different models are shown in Table 6 and Table 7.

The FLOPs, Params, and MMD values for the LDMKD-DA and DDPM are given in Table 6. By comparison, it can be found that the FLOPs of the LDMKD-DA decreased by 49.9% compared to the DDPM, Params decreased by 39.3%, and MMD value only increased by 1.63%. This shows that the LDMKD-DA model proposed in this article can achieve high-quality generation of univariate time series data while reducing parameter and computational complexity. In addition, by comparing Table 6 and Table 7, it can be found that the MMD values of both the LDMKD-DA and DDPM are lower than those of the VAE-based generative models (VAE, HAVE, and VQ-VAE) and GAN-based generative models (GAN, TimeGAN, and SigCWGAN), which suggests that the HRRP data generated by LDMKD-DA and DDPM can better simulate the distribution of the original data.

The t-SNE plots of the above eight methods on the HRRP dataset are shown in Figure 9, with different colors indicating different targets. During the experiment, it was found that the t-SNE plots of HRRP data generated by the above eight methods had good discrimination effects on different categories of targets. However, these results do not prove that the generated HRRP data have a similar distribution to the original HRRP data. Therefore, we fused the generated HRRP data with the original HRRP data in a 1:1 ratio for visualization. Figure 9a shows the t-SNE plot of the original data, while Figure 9b–i shows the t-SNE plots of mixed data from different models.

From Figure 9, it can be seen that the class distributions of the data generated by the LDMKD-DA and DDPM outperform the VAE-based generative model and the GAN-based generative model when mixed with the original data. This indicates that LDMKD-DA and DDPM can better learn the features and distribution of the original HRRP data, generating high-quality, multi-class, and univariate time series data. The t-SNE plots of the mixed data from the VAE-based and GAN-based generative models clearly show three independent point sets outside the distribution of the original HRRP data, indicating that the generated data are significantly different from the original HRRP data. The point sets of the three categories of mixed data t-SNE plots for the LDMKD-DA and DDPM are not clearly distinguished, however, which is mainly influenced by the original HRRP data, as shown in Figure 9a.

5.4. Application Experiment Analysis

We hope that the time series data generated by the LDMKD-DA model can be applied to a generalized air combat data processing model to obtain more accurate identification results. Therefore, we used the GTLIR model [28] and the BiGRU–Attention model [29] for validation. We focused on experimentally exploring whether the data generated by the LDMKD-DA model can be used as an expansion of the real data, which, in turn, can be used to train the classification model to achieve better or similar classification accuracy. In this paper, the data generated by the LDMKD-DA, DDPM, GAN-based augmentation model, and augmentation model were tested. The ratios of generated samples were designed to carry out related experiments. The ratios of generated samples are the ratio of original data to generated data and original data.

5.4.1. ATI Dataset Application Experiment Analysis

The GTLIR model is an intention recognition model that fuses dilated causal convolution and graph attention mechanism. It mines the temporal relationship of intent data through dilated causal convolution and the spatial relationship of intent data through a graph attention mechanism. The recognition accuracy based on the GTLIR model is shown in Figure 10, and the precision, recall, and F1 score values are shown in Appendix A.

From Figure 10, it can be seen that the accuracy of the intention recognition model gradually increases as the ratio of generated samples increases. The accuracy of the intention recognition model exceeded 99% when the ratio of generated samples was 1/6. At this point, the majority of the dataset used for training is generated ATI data, resulting in a higher accuracy of the model. When the ratio of generated samples is 1/2, half of the dataset used for training is original ATI data and half is generated ATI data. At this point, the accuracy of the model stays above 96%. By comparison, it can be found that the recognition accuracy of LDMKD-DA generated data is only 0.36% lower than that of the DDPM, further proving that the LDMKD-DA model can generate high-quality, multi-class, and multivariable time series data while significantly reducing FLOPs and Params. In addition, the recognition accuracies of both the LDMKD-DA- and DDPM-generated data are higher than those of the VAE-based generative model and the GAN-based generative model. The precision, recall, and F1 score values in Appendix A also prove this result.

5.4.2. HRRP Dataset Application Experiment Analysis

The BiGRU–Attention model is a HRRP target recognition model that incorporates BiGRU and an Attention mechanism. The Attention mechanism can automatically focus on discriminative target regions and combining it with BiGRU can mitigate the time-shift sensitivity of HRRP. The classification accuracy based on the BiGRU–Attention model is shown in Figure 11. and the precision, recall, and F1 score values are shown in Appendix B.

From Figure 11, it can be noticed that the accuracy of target classification increases gradually as the ratio of generated samples increases. The accuracy of the target classification model exceeded 93% when the ratio of generated samples was 1/6. At this point the majority of the dataset used for training is generated HRRP data; thus, the accuracy of the model is high. When the ratio of generated samples is 1/2, half of the dataset used for training is original HRRP data and half is generated HRRP data. At this point, the accuracy of the model stays above 88%. By comparison, it can be found that the classification accuracy of the LDMKD-DA-generated data is only 0.94% lower than that of DDPM, further proving that the LDMKD-DA model can generate high-quality, multi-class, and univariate time series data while significantly reducing FLOPs and Params. In addition, the classification accuracies of both the LDMKD-DA- and DDPM-generated data are higher than those of the VAE-based generative model and the GAN-based generative model. The precision, recall, and F1 score values in Appendix B also prove this result.

6. Conclusions

Aiming at the problems of insufficient intelligent model training caused by the scarcity of air battlefield situation data and long sampling time of DDPMs, this paper proposes an air battlefield time series data augmentation method based on a lightweight diffusion model, namely LDMKD-DA. Considering the advantages of DDPMs in processing image data, in this paper, multivariate air battlefield time series data and univariate air battlefield time series data are preprocessed and converted into image data, respectively. After that, the air battlefield situation data augmentation model is constructed based on a DDPM. Considering the need for miniaturization and intelligence of future combat platforms, DSC is introduced to lighten the DDPM and reduce the number of parameters and computation of the model. Meanwhile, an enhanced knowledge distillation method is introduced, which involves developing category-specific teacher models based on input data classes and refining the loss function during the co-training of teacher and student models to accelerate the sampling process of the diffusion model. Comparative analysis of MMD values between generated and original data reveals that the proposed algorithm achieves significant reductions in FLOPs and parameters while minimizing the increase in MMD values, demonstrating the LDMKD-DA’s capability to generate high-quality time series data. The generated data were inputted into the intention recognition model and target classification model, respectively, to conduct relevant experiments to verify the practicality and reliability of the generated data. The experimental results show that the algorithm proposed in this paper can be used to train existing classification models for time series data with good results. In the future, we plan to study the effect of noisy prediction tables on time series data as well as possible flaws and incompleteness of the dataset, in order to further improve the application of the LDMKD-DA.

Author Contributions

Writing—original draft, Methodology, Validation, B.C.; supervision, writing—review and editing, Q.X.; Resources, writing—review and editing, L.L.; Supervision; Methodology, Writing—review and editing, J.S.; formal analysis, writing—review & editing, visualization, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 72071209.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LDMKD-DA	Air battlefield time series data augmentation model based on lightweight denoising diffusion probabilistic model
DDPM	Denoising diffusion probabilistic model
GAN	Generative adversarial network
VAE	Variational autoencoder
DSC	Depthwise separable convolution
DW	Depthwise convolution
PW	Pointwise convolution
GASF	Gramian summation angular field
GADF	Gramian difference angular field
MTF	Markov transition fields
MMD	Maximum mean discrepancy
Params	Number of parameters
FLOPs	Amount of computation

Appendix A. Precision, Recall, and F1 Score Values of the ATI Dataset Application Experiment

Table A1. The performance metrics of the ATI dataset application experiment using 1/2 ratio of generated samples.

Model	Precision	Recall	F1 Score
LDMKD-DA	96.84	96.60	96.70
DDPM	97.07	97.06	97.05
VAE	95.29	95.03	95.13
HAVE	96.06	95.69	95.86
VQ-VAE	95.73	95.46	95.55
GAN	96.75	96.63	96.64
TimeGAN	96.54	96.31	96.37
SigCWGAN	96.55	96.37	96.44

Table A2. The performance metrics of the ATI dataset application experiment using 1/6 ratio of generated samples.

Model	Precision	Recall	F1 Score
LDMKD-DA	99.22	99.22	99.22
DDPM	99.45	99.45	99.45
VAE	97.34	97.84	97.57
HAVE	98.32	97.67	97.97
VQ-VAE	98.31	97.97	98.14
GAN	98.45	97.70	98.04
TimeGAN	98.86	98.42	98.62
SigCWGAN	98.81	98.57	98.68

Table A3. Trend summary table when the ratio of generated samples is 1/3, 1/4, and 1/5 for the ATI dataset application experiment.

Model	Precision Range	Recall Range	F1 Score Range
LDMKD-DA	97.32–98.39	97.21–98.42	97.26–98.40
DDPM	97.46–98.61	97.36–98.64	97.40–98.62
VAE	96.78–97.72	96.62–97.55	96.64–97.63
HAVE	96.85–98.03	96.52–97.84	96.63–97.91
VQ-VAE	97.38–97.95	97.38–97.95	97.38–97.95
GAN	96.94–98.23	96.59–97.98	96.74–98.08
TimeGAN	97.38–97.99	97.06–97.79	97.17–97.88
SigCWGAN	96.22–98.26	95.01–98.01	95.31–98.11

Appendix B. Precision, Recall, and F1 Score Values of the HRRP Dataset Application Experiment

Table A4. The performance metrics of the HRRP dataset application experiment using 1/2 ratio of generated samples.

Model	Precision	Recall	F1 Score
LDMKD-DA	88.84	88.28	88.46
DDPM	89.10	89.16	89.13
VAE	87.03	87.05	86.93
HAVE	87.94	87.43	87.50
VQ-VAE	87.85	87.17	87.18
GAN	88.49	87.32	87.39
TimeGAN	88.37	88.03	88.16
SigCWGAN	88.38	88.07	88.13

Table A5. The performance metrics of the HRRP dataset application experiment using 1/6 ratio of generated samples.

Model	Precision	Recall	F1 Score
LDMKD-DA	93.05	93.09	93.06
DDPM	93.84	93.81	93.81
VAE	90.93	90.41	90.28
HAVE	92.32	92.06	92.06
VQ-VAE	92.85	92.63	92.57
GAN	93.32	92.84	92.93
TimeGAN	92.11	92.02	92.02
SigCWGAN	92.40	91.77	91.82

Table A6. Trend summary table when the ratio of generated samples is 1/3, 1/4, and 1/5 for the HRRP dataset application experiment.

Model	Precision Range	Recall Range	F1 Score Range
LDMKD-DA	89.17–92.00	88.90–92.01	88.99–92.00
DDPM	91.36–92.89	91.29–92.91	91.29–92.89
VAE	87.93–90.63	87.29–90.56	87.16–90.57
HAVE	88.48–91.29	88.41–91.29	88.31–91.25
VQ-VAE	88.01–91.82	87.95–91.12	87.97–91.20
GAN	89.32–92.22	88.20–91.68	87.97–91.73
TimeGAN	89.21–91.52	88.38–91.21	88.56–91.25
SigCWGAN	88.78–92.19	88.76–91.92	88.66–91.85

References

Fusano, A.; Sato, H.; Namatame, A. Multi-Agent Based Combat Simulation from OODA and Network Perspective. In Proceedings of the 2011 UkSim 13th International Conference on Computer Modelling and Simulation, Cambridge, UK, 30 March–1 April 2011; pp. 249–254. [Google Scholar]
Bruno, N.; Chaudhuri, S. Flexible database generators. In Proceedings of the 31st International Conference on Very Large Data, Trondheim, Norway, 30 August–2 September 2005; VLDB Endowment: Los Angeles, CA, USA; pp. 1097–1107. [Google Scholar]
Houkjær, K.; Torp, K.; Wind, R. Simple and realistic data generation. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Republic of Korea, 12–15 September 2006; VLDB Endowment: Los Angeles, CA, USA; pp. 1243–1246. [Google Scholar]
Kang, Y.; Hyndman, R.J.; Li, F. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Stat. Anal. Data Min. 2020, 13, 354–376. [Google Scholar] [CrossRef]
Bokde, N.D.; Feijóo, A.; Al-Ansari, N.; Yaseen, Z.M. A Comparison Between Reconstruction Methods for Generation of Synthetic Time Series Applied to Wind Speed Simulation. IEEE Access 2019, 7, 135386–135398. [Google Scholar] [CrossRef]
Koltuk, F.; Schmidt, E.G. A Novel Method for the Synthetic Generation of Non-I.I.D Workloads for Cloud Data Centers. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; pp. 1–6. [Google Scholar]
Arlitt, M.; Marwah, M.; Bellala, G.; Shah, A.; Healey, J.; Vandiver, B. IoTAbench: An Internet of Things Analytics Benchmark. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, New York, NY, USA, 28 January–4 February 2015; Association for Computing Machinery: New York, NY, USA; pp. 133–144. [Google Scholar]
Shamshad, A.; Bawadi, M.A.; Wan Hussin, W.M.A.; Majid, T.A.; Sanusi, S.A.M. First and second order Markov chain models for synthetic generation of wind speed time series. Energy 2005, 30, 693–708. [Google Scholar] [CrossRef]
Li, Y.; Hu, B.; Niu, T.; Gao, S.; Yan, J.; Xie, K.; Ren, Z. GMM-HMM-Based Medium- and Long-Term Multi-Wind Farm Correlated Power Output Time Series Generation Method. IEEE Access 2021, 9, 90255–90267. [Google Scholar] [CrossRef]
Mogren, O. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv 2016. [Google Scholar] [CrossRef]
Zhang, X.; Wu, T.; Zhang, Y. Attack Detection and Data Restoration of Remote Estimation Systems Based on D-RCGAN. In Proceedings of the Asian Control Conference, Dalian, China, 5–8 July 2024; pp. 1167–1173. [Google Scholar]
Jiang, X.; Li, D.; Zhang, H.; Zhou, Y.; Liu, J.; Xiang, X. Weighted Strategy Optimization Approach for Discrete Sequence Generation. In Proceedings of the 2024 4th International Symposium on Artificial Intelligence and Intelligent Manufacturing (AIIM), Chengdu, China, 20–22 December 2024; pp. 843–846. [Google Scholar]
Ramponi, G.; Protopapas, P.; Brambilla, M.; Janssen, R. T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling. arXiv 2019. [Google Scholar] [CrossRef]
Vijaya, K. Evaluating Financial Risk in the Transition from EONIA to ESTER: A TimeGAN Approach with Enhanced VaR Estimations. Int. J. Innov. Sci. Mod. Eng. 2024, 12, 1–9. [Google Scholar]
Xu, T.; Wenliang, L.K.; Munn, M.; Acciaio, B. COT-GAN: Generating Sequential Data via Causal Optimal Transport. arXiv 2020. [Google Scholar] [CrossRef]
Lu, C.; Reddy, C.K.; Wang, P.; Nie, D.; Ning, Y. Multi-Label Clinical Time-Series Generation via Conditional GAN. IEEE Trans. Knowl. Data Eng. 2024, 36, 1728–1740. [Google Scholar] [CrossRef]
Wang, H.; Zhu, H.; Li, H. Multi-Mode Data Generation and Fault Diagnosis of Bearings Based on STFT-SACGAN. Electronics 2023, 12, 1910. [Google Scholar] [CrossRef]
Shi, H.; Xu, Y.; Ding, B.; Zhou, J.; Zhang, P. Long-Term Solar Power Time-Series Data Generation Method Based on Generative Adversarial Networks and Sunrise–Sunset Time Correction. Sustainability 2023, 15, 14920. [Google Scholar] [CrossRef]
Haque, T.; Syed, M.A.B.; Jeong, B.; Bai, X.; Mohan, S.; Paul, S.; Ahmed, I.; Das, S. Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling. arXiv 2025. [Google Scholar] [CrossRef]
Fraccaro, M.; Sønderby, S.K.; Paquet, U.; Winther, O. Sequential Neural Models with Stochastic Layers. arXiv 2016. [Google Scholar] [CrossRef]
Li, Y.; Mandt, S. Disentangled Sequential Autoencoder. arXiv 2018. [Google Scholar] [CrossRef]
Jeon, S.; Seo, J.T. A Synthetic Time-Series Generation Using a Variational Recurrent Autoencoder with an Attention Mechanism in an Industrial Control System. Sensors 2024, 24, 128. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.; Zhang, Z.; Cui, R. Few-Shot Learning for Time Series Data Generation Based on Distribution Calibration. In Proceedings of the Web Information Systems and Applications; Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 198–206. [Google Scholar]
Yi, H.; Hou, L.; Jin, Y.; Saeed, N.A.; Kandil, A.; Duan, H. Time series diffusion method: A denoising diffusion probabilistic model for vibration signal generation. Mech. Syst. Signal Process. 2024, 216, 111481. [Google Scholar] [CrossRef]
Adib, E.; Fernandez, A.S.; Afghah, F.; Prevost, J.J. Synthetic ECG Signal Generation Using Probabilistic Diffusion Models. IEEE Access 2023, 11, 75818–75828. [Google Scholar] [CrossRef]
Wang, S.; Wang, G.; Fu, Q.; Song, Y.; Liu, J.; He, S. STABC-IR: An air target intention recognition method based on bidirectional gated recurrent unit and conditional random field with space-time attention mechanism. Chin. J. Aeronaut. 2023, 36, 316–334. [Google Scholar] [CrossRef]
Sun, W.; Lu, G.; Zhao, Z.; Guo, T.; Qin, Z.; Han, Y. Regional Time-Series Coding Network and Multi-View Image Generation Network for Short-Time Gait Recognition. Entropy 2023, 25, 837. [Google Scholar] [CrossRef]
Cao, B.; Xing, Q.; Li, L.; Xing, H.; Song, Z. KGTLIR: An Air Target Intention Recognition Model Based on Knowledge Graph and Deep Learning. Comput. Mater. Contin. 2024, 80, 1251–1275. [Google Scholar] [CrossRef]
Xu, B.; Chen, B.; Wan, J.; Liu, H.; Jin, L. Target-Aware Recurrent Attentional Network for Radar HRRP Target Recognition. Signal Process. 2019, 155, 268–280. [Google Scholar] [CrossRef]

Figure 1. Depthwise separable convolution.

Figure 2. The framework of the LDMKD-DA model.

Figure 3. Visualized intention data.

Figure 4. Visualized radar HRRP data.

Figure 6. U-Net structure after introducing DSC.

Figure 7. Improved knowledge distillation method.

Figure 8. t-SNE plot of the ATI dataset.

Figure 9. t-SNE plot of the HRRP dataset.

Figure 10. Intention recognition results for different ratios of generated samples. (a) Comparison results with VAE-based generative models. (b) Comparison results with GAN-based generative models.

Figure 11. Target classification results for different ratios of generated samples. (a) Comparison results with VAE-based generative models. (b) Comparison results with GAN-based generative models.

Table 1. Related works about time series data augmentation.

Method	References	The Need for Real Data	The Need for Domain Specific Knowledge	Advantages and Disadvantages
Rule-based methods	[2,3,4]	×	×	Advantages: They do not need to rely on large amounts of historical data or training models. Disadvantages: The generated data may be too simple and unrealistic.
Simulation model-based methods	[5,6]	√	√	Advantages: They can simulate the behavior of complex systems, which in turn generates more realistic data. Disadvantages: They need a lot of domain knowledge and are computationally intensive.
Traditional machine learning-based methods	[7,8,9]	√	×	Advantages: They take into account the effect of historical data. Disadvantages: The models are too simple and require parameterization.
Deep learning-based methods	GAN [10,11,12,13,14,15,16,17,18]	√	×	Advantages: They show superiority in terms of the quality of samples generated. Disadvantages: The training process is unstable, suffers from pattern crashes, vanishing gradients, and lacks rigorous mathematical derivation.
	VAE [19,20,21,22,23]	√	×	Advantages: They provide rigorous mathematical derivations. Disadvantages: They struggle to generate high-quality samples.
	Diffusion model [24,25]	√	×	Advantages: They provide a rigorous mathematical derivation process and are able to generate high-quality samples. Disadvantages: The sampling time of the model is too long, and the number of model parameters is too large.

Table 2. Detailed information about the ATI dataset.

Attribute	Details
Source	Air Combat Maneuvering Generator, ACMG
Covered scenarios	Plain air defense combat, mountain air defense combat, etc.
Time distribution	6 sample periods
Total samples	3520
Feature types	8 numerical features and 4 non-numerical features
Sample distribution	Attack: 600; retreat: 600; penetrate: 600; reconnaissance: 600; interference: 280; surveillance: 600; feint: 240

Table 3. Hyperparameters of the LDMKD-DA.

Hyperparameter	Value
Kernel size of DSC	3 × 3
Optimizer	Adam
Learning rate	0.0001
Batch size	64
Number of diffusion steps	1000
Kernel function of MMD	RBF

Table 4. Experimental results based on the diffusion model for the ATI dataset.

Model	FLOPs	Params	MMD
LDMKD-DA	1.368 G	16.382 M	0.0838
DDPM (LDMKD-DA w/o DSC and KD)	7.278 G	76.065 M	0.0731
LDMKD-DA w/o DSC	7.278 G	76.065 M	0.0792
LDMKD-DA w/o KD	1.368 G	16.382 M	0.0845

Table 5. Experimental results based on the GAN and VAE for the ATI dataset.

Model	VAE	HVAE	VQ-VAE	GAN	TimeGAN	SigWCGAN
MMD	0.6851	0.2278	0.1553	0.2860	0.2788	0.2212

Table 6. Experimental results based on the diffusion model for the HRRP dataset.

Model	FLOPs	Params	MMD
LDMKD-DA	314.556 M	2.779 M	0.1124
DDPM (LDMKD-DA w/o DSC and KD)	627.540 M	4.575 M	0.1106
LDMKD-DA w/o DSC	627.540 M	4.575 M	0.1115
LDMKD-DA w/o KD	314.556 M	2.779 M	0.1136

Table 7. Experimental results based on the GAN and VAE for the HRRP dataset.

Model	VAE	HVAE	VQ-VAE	GAN	TimeGAN	SigWCGAN
MMD	0.5562	0.5603	0.5341	0.1832	0.1741	0.1544

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, B.; Xing, Q.; Li, L.; Shi, J.; Lin, W. Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model. AI 2025, 6, 192. https://doi.org/10.3390/ai6080192

AMA Style

Cao B, Xing Q, Li L, Shi J, Lin W. Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model. AI. 2025; 6(8):192. https://doi.org/10.3390/ai6080192

Chicago/Turabian Style

Cao, Bo, Qinghua Xing, Longyue Li, Junjie Shi, and Weijie Lin. 2025. "Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model" AI 6, no. 8: 192. https://doi.org/10.3390/ai6080192

APA Style

Cao, B., Xing, Q., Li, L., Shi, J., & Lin, W. (2025). Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model. AI, 6(8), 192. https://doi.org/10.3390/ai6080192

Article Menu

Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model

Abstract

1. Introduction

2. Related Work

2.1. Data Augmentation Methods Based on Rules

2.2. Data Augmentation Methods Based on Simulation Models

2.3. Data Augmentation Methods Based on Traditional Machine Learning

2.4. Data Augmentation Methods Based on Deep Learning

3. Related Theory

3.1. Depthwise Separable Convolution

3.2. Denoising Diffusion Probabilistic Model

3.3. Knowledge Distillation

4. Methodology

4.1. Data Encoding Module

4.1.1. Multivariate Time Series Data Preprocessing

4.1.2. Univariate Time Series Data Preprocessing

4.2. Data Augmentation Module

4.3. Training and Sampling Process

4.4. Reasoning Acceleration Process

5. Experiments and Discussions

5.1. Experimental Setup

5.1.1. Experimental Datasets

5.1.2. Evaluation Indicators

5.1.3. Hyperparameter Setting

5.2. Multivariate Time Series Data Experiments

5.3. Univariate Time Series Data Experiments

5.4. Application Experiment Analysis

5.4.1. ATI Dataset Application Experiment Analysis

5.4.2. HRRP Dataset Application Experiment Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Precision, Recall, and F1 Score Values of the ATI Dataset Application Experiment

Appendix B. Precision, Recall, and F1 Score Values of the HRRP Dataset Application Experiment

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI