A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control

Nikolakis, Nikolaos; Catti, Paolo

doi:10.3390/fi18010023

Open AccessArticle

A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control

by

Nikolaos Nikolakis

^*

and

Paolo Catti

Laboratory for Manufacturing Systems & Automation (LMS), Department of Mechanical Engineering & Aeronautics, University of Patras, Rio, 26504 Patras, Greece

^*

Author to whom correspondence should be addressed.

Future Internet 2026, 18(1), 23; https://doi.org/10.3390/fi18010023 (registering DOI)

Submission received: 25 November 2025 / Revised: 26 December 2025 / Accepted: 28 December 2025 / Published: 1 January 2026

(This article belongs to the Special Issue Cloud and Edge Computing for the Next-Generation Networks)

Download

Browse Figures

Versions Notes

Abstract

Data-driven quality control (QC) systems for the hot forming of steel parts increasingly rely on deep learning models deployed at the network edge, making multivariate sensor time series a critical asset for both local decisions and management information system (MIS) reporting. However, these models are vulnerable to adversarial perturbations and realistic signal disturbances, which can induce misclassification and distort key performance indicators (KPIs) such as first-pass yield (FPY), scrap-related losses, and latency service-level objectives (SLOs). To address this risk, this study introduces a Digital-Twin-Conditioned Diffusion Purification (DTCDP) framework that constrains latent diffusion-based denoising using process states from a lightweight digital twin of the hot-forming line. At each reverse-denoising step, the twin provides physics residuals that are converted into a scalar penalty, and the diffusion latent is updated with a guidance term. This directly bends the sampling trajectory toward reconstructions that adhere to process constraints while removing adversarial perturbations. DTCDP operates as an edge-side preprocessing module that purifies sensor sequences before they are consumed by existing long short-term memory (LSTM)-based QC models, while exposing purification metadata and physics-guidance diagnostics to the plant MIS. In a four-week production dataset comprising more than 40,000 bars, with white-box ℓ∞ attacks crafted on multivariate sensor time series using Fast Gradient Sign Method and Projected Gradient Descent at perturbation budgets of 1–3% of the physical range, combined with additional realistic disturbances, DTCDP improves the robust classification performance of an LSTM-based QC model from 61.0% to 81.5% robust accuracy, while keeping clean accuracy (≈93%) and FPY on clean data (≈97%) essentially unchanged. These results indicate that physics-aware, digital-twin-guided diffusion purification can enhance the adversarial robustness of edge QC in hot forming without compromising operational KPIs.

Keywords:

adversarial perturbations; cyber physical systems; digital twins; latent diffusion; management information systems; quality control; smart manufacturing; edge computing

1. Introduction

Modern manufacturing lines increasingly rely on cyber–physical systems (CPSs) that blend shop-floor assets with networked sensors and Industrial Internet of Things (IIoT) infrastructure, enabling real-time and distributed QC [1]. In the context of smart manufacturing, IIoT sensors capture and stream data at the edge, and manufacturers rely on Artificial Intelligence (AI)-based approaches on edge and cloud resources to scale quality inspection and control while minimizing response times [2]. This distributed architecture, however, introduces new failure modes for AI-based systems embedded in IIoT devices or exposed through the cloud. In particular, minute and often imperceptible input perturbations, arising either as intentional adversarial attacks or as anomalies, can impact AI-based outputs and, thus, decision-making processes [3].

Because CPS nodes are tightly coupled through message buses and control loops, corrupted data streams can propagate rapidly across stations, degrading decision quality, inflating false rejects/accepts, and negatively affecting the overall system’s reliability [4]. These risks are amplified in high-throughput environments, where even limited misclassifications can lead to measurable increases in scrap, rework, and in turn, disruption of production KPIs [5].

Mitigating such risks requires that the physical constraints of the manufacturing system are respected during data stream purification, so that perturbations are reduced without producing physically implausible signals [6]. However, existing approaches for AI-based QC in CPSs either target generic benchmark datasets or treat models in isolation, with limited attention to process physics, edge latency constraints, and production KPIs [7]. Based on these, diffusion-based purification can be a powerful model-agnostic defense strategy. However, in manufacturing, it can also produce physically implausible reconstructions that satisfy the data manifold but violate process constraints and subsequently distort QC decisions.

To address this gap, this study proposes DTCDP, a physics-informed framework that combines generative, diffusion-based purification with the constraints provided by a digital-twin model of the manufacturing system. Its core innovation lies in denoising sensor signals with a diffusion model while simultaneously guiding the generative trajectory using physically plausible bounds and state estimates from the digital twin. The framework was tested in a steel-forming scenario using edge time-series sensor data subjected to white-box adversarial attacks.

2. State of the Art

In smart manufacturing, CPSs combine shop-floor assets, IIoT sensors, and edge–cloud computing to support automated QC and process optimization [8,9]. These systems increasingly adopt service-oriented and event-driven architectures that integrate AI-based analytics and support responsive, data-driven decision-making [10,11]. Reference architecture models for digital manufacturing, such as RAMI 4.0, provide the groundwork for orchestrating interoperable workflows across enterprise and shop-floor levels [12].

Operationally, smart manufacturing relies on edge–cloud cooperation so that latency-critical control and QC tasks can be handled at the edge, while scalable analytics and reporting are performed in the cloud [13]. In this context, Nain G. et al. outline patterns for distributed edge-based deployment of AI systems and integrating their outcomes into the MIS to improve responsiveness and reliability in dynamic production settings [14].

Digital twins have emerged as a key enabler in this architecture, providing synchronized virtual representations of machines and processes. Embedded into production workflows, digital twins can improve synchronization of process parameters and quality-related indicators, creating a more robust link between operating conditions and product properties [15,16]. By fusing physics-based models with data-driven learners, digital twins support proactive QC by forecasting deviations in process variables and triggering early corrective actions [17]. Edge–cloud digital-twin architectures place sensing, inference, and feedback control near machines, while coordinating model retraining and analytics in the cloud to minimize latency and scale oversight of quality-related KPIs [18].

At the same time, AI-based QC and digital-twin pipelines must be hardened against adversarial perturbations and data poisoning that can mislead classifiers and control policies operating on edge IIoT streams [19]. Deep neural networks are known to be vulnerable to small, carefully crafted perturbations that induce misclassifications [20], and empirical studies on time-series data have shown that such perturbations can degrade fault detection, underscoring the need for robust defenses integrated into the manufacturing data pipeline [21]. Digital-twin-driven security frameworks propose the use of physics and AI residuals to detect inconsistencies between expected and measured behavior, enabling online attack detection and mitigation within IIoT infrastructures [18]. However, digital-twin-driven security frameworks in IIoT typically focus on detection and alerting without addressing in-band purification of corrupted or malicious data streams [19,22]. In addition, they are often validated on generic CPS benchmarks rather than within production-grade QC infrastructures, and they do not exploit twin states to guide generative reconstruction of inputs [23,24]. In parallel, adversarial purification methods are largely developed outside manufacturing contexts, leaving a gap between robust purification techniques and digital-twin-aware deployment in smart factories [25].

Modern approaches for adversarial purification are typically diffusion-based [26], which treats defenses as a generative denoising step that maps a perturbed input back to the clean data manifold before classification [27]. Diffusion-based purification methods exploit a forward noising process and a learned reverse denoising trajectory to remove perturbations while approximately preserving the statistical characteristics of the data [28]. Compared to adversarial training, such purification operates as a model-agnostic preprocessing pipeline that can improve robustness without modifying or retraining downstream AI models [29].

In image-based domains, guided pixel-space diffusion defenses have been proposed that reverse diffusion with auxiliary signals to better preserve label-discriminative content during purification [30]. In a similar manner, Du Yiru, et. al., used task-conditioned variants to adapt the denoising path to downstream objectives such as position-sensitive conditioning for crowd counting through vision-based AI approaches to resist localized perturbations [31]. Beyond vision, diffusion-based purification has been tailored for automatic modulation classification in communications, improving robustness without altering the recognition network [32,33].

Outside of vision-based attacks, diffusion-based adversarial purification has been extended to sequential modalities. As presented by Zhu Baofeng et. al., time-series signals act as model-agnostic pre-processing that, when corrupted, can negatively impact downstream AI inference [34]. In this context, diffusion models for time-series data streams have demonstrated high-fidelity time-series reconstruction and noise suppression while preserving temporal dependencies, underscoring their suitability as purification frontends for sequence AI classifiers [35,36,37]. In RF pipelines, diffusion has also been used directly on raw data streams for data augmentation and restoration, validating that diffusion-based models naturally operate in time-series spaces relevant to defense-side pre-processing [38]. However, it is evident that diffusion-based purification approaches remain largely unexplored in the manufacturing domain, with studies focusing more on the healthcare and medical sector, creating a significant gap in smart manufacturing research [39].

In addition, across existing diffusion-based purification studies, guidance is typically task or label-oriented, while the notion of process feasibility is rarely enforced during sampling [40]. In contrast, manufacturing digital twins already provide synchronized state estimates and constraint residuals that encode the physical aspects of a process [41]. Conditioning latent diffusion on these twin signals can offer a direct way to reduce post-purification semantic drift and maintain the validity of reconstructions [42].

Despite their promise, generative purification methods present several limitations. Pre-trained generators may project inputs onto the training manifold and inadvertently alter class-relevant details, leading to post-purification semantic drift and misclassification [43]. Standard diffusion sampling relies on iterative denoising, which can introduce non-trivial latency and energy overheads at the edge, motivating the use of one-step or few-steps guidance as a mitigation strategy [44]. Generative purification remains susceptible to adaptive attacks that differentiate through the defense pipeline [45], and a persistent trade-off between adversarial robustness and accuracy on clean data has been reported, alongside generalization gaps across datasets and threat models [46,47].

To emphasize the limitations of existing approaches on diffusion-based purification and robustness, diffusion models are applied to industrial time-series denoising and digital-twin- and physics-based security monitoring, Table 1 contrasts DTCDP against recent studies. This highlights that prior works typically use diffusion or digital twins in isolation, whereas DTCDP uses digital-twin physics residuals as an explicit guidance signal that shapes the reverse denoising path during purification.

To address the aforementioned observations while remaining compatible with existing QC models and production monitoring workflows, this work proposes a DTCDP framework, which combines latent diffusion-based denoising with guidance derived from a lightweight digital twin of the manufacturing process, as detailed in the next section.

3. Methodology

3.1. The Architecture of the Digital-Twin-Conditioned Diffusion Purification Framework

This work introduces the DTCDP framework (Figure 1) that combines a latent diffusion purifier with physics-aware guidance provided by a lightweight digital twin. The twin exposes process states and constraint residuals that are translated into soft penalties shaping the denoising trajectory, so that purified signals remain on a physically feasible manifold while adversarial perturbations are removed. DTCDP is designed for execution at the network edge and can integrate with the MIS to expose auditable events, latency metrics and QC-related KPIs.

Latent diffusion contributes a data-manifold prior that removes high-dimensional adversarial noise, but on its own, it can introduce semantic drift by projecting inputs toward statistically plausible, yet physically infeasible trajectories. Digital-twin residual guidance complements this by acting as a feasibility prior, continuously steering denoising toward reconstructions that remain consistent with process physics and operating constraints.

The framework comprises three main layers:

The data integration and digital-twin layer acquires production signals, coupled with their timestamps and metadata. It also provides online estimations of process and system states, and derivation of physics bounds and residuals that quantify violations relevant to the QC process of the line.
The diffusion-based purification layer is composed of four modules:
○
A latent representation module (encoder/decoder) that maps inputs to a compact latent space where purification is performed.
○
A diffusion purifier, implemented as a noise predictor with a deterministic, one-step or few-steps sampler to meet real-time constraints.
○
A twin-guided conditioner that injects a soft physics penalty between denoising steps to keep reconstructions within operational manifolds.
○
A closed-loop purification and validation pipeline that iteratively purifies and evaluates signals against the digital twin before releasing them to the downstream AI components.
DTCDP extends standard latent diffusion purification by injecting digital-twin-derived physics residuals into the sampling dynamics and by using one-step or few-steps sampling tailored to edge latency constraints.

3.2. Data Integration and Digital-Twin Layer

The data integration and digital-twin layer unifies heterogeneous sensor streams into event packets that couple each observation with a synchronized state estimate from the digital twin and associated physics residuals. Its role is to (i) align multi-rate data with consistent timing and metadata, (ii) estimate latent operational states in real time, and (iii) derive compact, physically meaningful residuals that guide the diffusion-based purification layer.

Let

x_{i}

denote a raw sensor measurement at time

t_{i}

, and let

m_{i}

be the associated metadata vector, including information such as sensor ID, sampling rate, and calibration tags. For each timestamp

t_{i}

, the layer builds an event based on the following equation.

D_{i} = (x_{i}, m_{i}, \hat{s} (t_{i}))

(1)

In the above equation

\hat{s} (t_{i})

is a state estimate obtained from the digital twin and synchronized to

t_{i}

. The twin provides state estimates

s (t_{j})

at times

t_{j}

, which may not coincide with measurement timestamps. For

t_{i} \in [t_{j}, t_{j + 1}]

, temporal alignment is obtained by linear interpolation as presented in Equation (2).

\hat{s} (t_{i}) = a * s (t_{j}) + (1 - a) * s (t_{j + 1}), a = \frac{t_{j + 1} - t_{i}}{t_{j + 1} - t_{j}}

(2)

The digital twin follows the following discrete-time state-space formulation:

s_{k + 1} = f (s_{k}, u_{k}) + w_{k}, y_{k} = h (s_{k}) + n_{k}

(3)

where

s_{k} \in R^{d}

is the latent state of operating conditions,

u_{k}

denotes exogenous inputs (e.g., operator adjustments),

y_{k}

is measurable quantities expected under state

s_{k}

,

f (\cdot)

and

h (\cdot)

describe process and measurement mappings, and

w_{k}, n_{k}

represent process and measurement disturbances. The twin is assumed to be identified and calibrated prior to deployment, and provides filtered state estimates

s_{k}

and expected measurements

y_{k}

during operation.

To supply physics information suitable for purification, the twin specifies a set of constraints, as seen in Equation (4)

g_{l} (x, s) \leq 0, w i t h l \in [1, L]

(4)

These constraints are encoding physically feasible behavior, such as bounds on process variables, geometric or temporal consistency, conservation relationships, or domain-specific policies. For each event

D_{i}

, non-negative residuals are computed using Equation (5).

r_{l} (x_{i}, \hat{s}) = \max \{0, g_{l} (x_{i}, \hat{s})\}, l = 1, \dots, L

(5)

Also, for each event, the layer exposes the physics-residual map based on Equation (6).

Φ (z_{i}, \hat{s} (t_{i})) = {[r_{1}, \dots, r_{L}]}^{T}

(6)

Then, a scalar physics penalty is defined based on Equation (7):

L_{p h y s} = r^{T} W r

(7)

where

r

is the stacked residual vector and

W

is a non-negative weight matrix, for example, reflecting state uncertainty or constraint importance. Equations (5)–(7) convert constraint satisfaction into a piecewise-differentiable objective where residuals quantify constraint violations and

L_{phys}

aggregates them into a scalar penalty used for guidance in Equation (14), providing a differentiable constraint proxy that limits drift during purification by penalizing violations of bounds, rare limits, and consistency relations encoded by the twin. By utilizing Equations (1)–(6), the layer emits a tuple of the type seen in Equation (8).

(x_{i}^{*}, \hat{s} (t_{i}), Φ (x_{i}^{*}, \hat{s} (t_{i})), m_{i})

(8)

This tuple provides the normalized data stream, the synchronized state estimate, residuals that encode physical feasibility, and an auditable context that can be provided to legacy and supervisory systems falling under the umbrella of the MIS.

Next, the tuple is consumed by the diffusion purification layer, where the residual map

Φ

and scalar penalty

L_{phys}

steer denoising towards physically plausible reconstructions.

3.3. Diffusion-Based Purification Layer

The diffusion-based purification layer performs generative denoising in a compact latent space, adjusted by physics residuals from the digital twin. It is modality-agnostic and operates as a preprocessing defense before sensor streams are ingested by downstream AI components for in-line QC.

Let x be an input observation (e.g., a multivariate time-series window) and let

E_{enc}

and

E_{dec}

be an encoder and a decoder, respectively. The encoder maps

x

to a latent representation (Equation (9)).

z_{0} = E_{e n c} (x)

(9)

The decoder reconstructs an observation

\tilde{x} = E_{dec} (\tilde{z})

from a latent

\tilde{z}

. Diffusion and purification are carried out in latent space to reduce dimensionality and sampling costs, while decoding is used to evaluate physics residuals and obtain the final purified output.

During training, clean latents are progressively noised according to a standard diffusion process. For a timestep

t \in 1, \dots, T

, the forward process can be written based on Equation (10).

z_{t} = \sqrt{\bar{a_{t}}} z_{0} + \sqrt{1 - \bar{a_{t}}} ε,

(10)

With

ε \sim N (0, I)

,

α_{t} = 1 - β_{t}

, and

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

for a variance schedule

β_{t}

. Equation (10) defines the forward latent noising process

q (z_{t} ∣ z_{0})

, which produces progressively more corrupted latents as

t

increases and yields a Gaussian reference distribution for large

t

.

A time-conditioned noise-prediction network

ε_{θ}

is trained to approximate the noise added at each step by minimizing Equation (11).

\min_{θ} Ε_{t, z_{0}, ε} [{‖ε - ε_{θ} (z_{t}, t)‖}_{2}^{2}]

(11)

where

$ε_{θ} (z_{t}, t)$ : The neural network that predicts the noise at step t,
$θ$ : Trainable weights of the neural network,
$Ε_{t, z_{0}, ε}$ : Expectation over timesteps, data latents and Gaussian noise.

Minimizing Equation (11) trains

ε_{θ}

to estimate the injected noise

ε

at each timestep, which provides the denoising direction used during reverse-time sampling.

At purification time, a noisy latent

z_{t}

is mapped back toward a clean latent using a deterministic sampler. A per-step estimate of the clean latent is obtained using Equation (12).

{\hat{z}}_{0} (z_{t}, t) = \frac{z_{t} - \sqrt{1 - \bar{a_{t}}} * ε_{θ} (z_{t}, t)}{\sqrt{\bar{a_{t}}}}

(12)

Lastly, the latent space is estimated according to Equation (13).

{\hat{z}}_{t - 1} = \sqrt{\bar{a_{t - 1}}} * \hat{z_{0}} + \sqrt{1 - \bar{a_{t - 1}}} * ε_{θ} (z_{t}, t)

(13)

where

${\hat{z}}_{0}$ : the per-step estimate of the clean latent,
$\bar{a_{t - 1}}$ : the cumulative coefficient at t − 1,
$ε_{θ} (z_{t}, t)$ : The neural network that predicts the noise at step t.

Equations (12) and (13) implement the reverse update by forming a clean-latent estimate

{\hat{z}}_{0}

and propagating

z_{t}

to a less noisy latent using

ε_{θ}

. The number of steps

K

controls the accuracy-to-latency trade-off at inference, where

K

denotes the number of reverse updates executed during purification.

To ensure that the purification process remains within physically feasible regions defined by the digital twin, a soft guidance step is applied between sampler updates. Let

{\tilde{x}}_{t} = E_{dec} (z_{t})

be the decoded observation at step

t

, and let

L_{phys} ({\tilde{x}}_{t}, \hat{s} (t_{i}))

be the physics penalty from Section 3.2 computed using the residuals

Φ ({\tilde{x}}_{t}, \hat{s} (t_{i}))

. Guidance is implemented as presented in Equation (14).

z_{t} \leftarrow z_{t} - λ \nabla L_{p h y s} (z_{t})

(14)

where

$λ > 0$ : a guidance strength chosen to balance latency and constraint adherence,
$\nabla L_{p h y s}$ : the gradient of the penalty computed by backpropagating $L_{p h y s}$ through the decoder $E_{dec}$ .

Equation (14) adds physics guidance by modifying the reverse update with a gradient term derived from

L_{phys}

, steering reconstructions toward constraint satisfaction;

λ

sets the strength of this feasibility-to-fidelity trade-off.

The closed-loop purification module orchestrates denoising and guidance. For each event

(x_{i}^{*}, \hat{s} (t_{i}), Φ, m_{i})

, it

encodes $x_{i}^{*}$ into an initial latent $z_{t}$ ,
performs deterministic denoising updates using Equations (12) and (13),
applies physics guidance based on Equation (14),
decodes the resulting latent ${\hat{z}}_{0}$ into a purified observation $\tilde{x}$ ,
recomputes residuals $Φ (\tilde{x}, \hat{s} (t_{i}))$ to verify constraint satisfaction.

The guidance term presented in Equation (14) shapes the reverse denoising trajectory itself rather than performing a post-hoc filtering, projecting each latent update toward the twin-feasible set. As a result, short-lived but physically plausible transients are less likely to be over-smoothed, while adversarial components that increase physics residuals are actively suppressed.

Depending on latency requirements, the module can operate in one-step mode or a few-steps mode. The layer returns the purified observation

\tilde{x}

, its latent

{\hat{z}}_{0}

, and audit parameters including the used timesteps, number of sampler steps, residual statistics and guidance strength

λ

, which are forwarded to the integration layer for MIS traceability.

3.4. Integration with Downstream AI-Based Components and MIS

The integration layer serves as middleware between purification and enterprise oversight. After DTCDP has produced purified data streams and associated audit metadata, downstream AI components already present in the manufacturing environment (e.g., defect classifiers, anomaly detectors) consume the purified signals and return decision packages. The internal design of these AI components is out of scope, and DTCDP treats them as black-box consumers.

Decision packages, together with purification metadata, are ingested into the MIS for traceability and quality-related KPI calculation. To minimize data exposure, raw signals ingested into the framework remain on edge nodes and are released to higher-level systems only on escalation or for offline investigation. All decisions are accompanied by versioned provenance, and physics-guidance diagnostics, including residual statistics, guidance strength, and sampler steps. This enables reproducibility, change control, and systematic comparisons of alternative purifier and threshold configurations within existing MIS procedures. In this way, robustness improvements remain decoupled from any specific AI model while being appropriate for enterprise reporting, audit, and risk management workflows.

4. Implementation

The DTCDP framework was implemented as an edge-ready pipeline for in-line QC. The prototype runs on a Windows 10 workstation with an Intel Core i9 CPU, 32 GB RAM and an NVIDIA RTX 2070 GPU. The data flow follows the logical architecture of Section 3.1 and is summarized in the UML sequence diagram shown in Figure 2. At the integration layer, an Apache Kafka messaging bus receives raw sensor and production data from the shop floor. A Node-RED instance subscribes to the relevant topics, time-synchronizes streams using a UTC time server, and forwards events to the digital twin via RESTful HTTP POST requests implemented in FastAPI on Python (version 3.9.12). The twin is deployed as a set of Python microservices and returns a physics residual map in JSON format (Table 2), including timestamp, state, residuals, weights, the scalar

L_{p h y s}

, and metadata that are later used to guide purification.

The diffusion-based purification layer is implemented as a temporal convolutional autoencoder in TensorFlow, exposed via a FastAPI HTTP interface. Its architecture is shown in Figure 3. The encoder maps each normalized data window to a compact latent representation, while the decoder reconstructs perturbed signals so that physics residuals can be evaluated on the decoded data. During inference, a Python microservice applies a fast one-step guidance update following the guidance equation in Section 3.3 to obtain a purified observation in near real time.

In addition, Table 3 includes a pseudocode block that summarizes the inference-time physics guidance used in DTCDP. After encoding the normalized window into a latent vector, the service performs a deterministic denoising update and then applies a single gradient step that minimizes the digital-twin physics penalty presented in Equation (14) by backpropagating through the decoder.

As seen in Table 3, the physics-guidance step uses a single gradient update with guidance strength λ. Guidance strength is set to 0.05 on the validation split from {0.01, 0.02, 0.05, 0.10} as the best robustness trade-off without over-constraining reconstructions. In long-term edge deployments,

λ

along with the residual weights in

A

, may require adaptive re-tuning as operating regimes and twin fidelity change.

The temporal convolutional autoencoder uses an encoder with three Conv1D blocks (64/128/128 filters, kernel sizes 5/5/3, stride 2, and ReLU), producing a latent tensor of approximately 75 × 128 and a mirrored decoder with up-sampling convolutions to reconstruct the data window. The model is trained with Mean Squared Error MSE reconstruction loss using Adam and a learning rate of 1 × 10⁻³, batch size 256, up to 100 epochs with early stopping (patience is set to 10), and light regularization with dropout set to 0.1.

The diffusion model is trained in the autoencoder latent space using ε-prediction with an MSE objective over timesteps

t \in 1, \dots, T

. The noise predictor is a lightweight 1D U-NET/TCN in latent space with two residual blocks per level, using sinusoidal time embeddings of 128 dimensions. The noise predictor operates on the 75 × 128 latent tensor and follows a three-level 1D U-Net with channel widths [128, 256, 256]. Each level contains two residual temporal blocks with kernel size 3 and dilations {1, 2, 4, 8} cycled across blocks and GroupNorm and SiLU activations. It uses stride-2 down-sampling/nearest-neighbor up-sampling with 1D convolutions on the skip path.

The latent diffusion process uses T = 200 noise levels with a linear variance schedule

β_{t}

from

β_{1} = 1 \times 10^{- 4}

to

β_{T} = 2 \times 10^{- 2}

, with

a_{t} = 1 - β_{t}

and

{\bar{a}}_{t} = \prod_{s = 1}^{t} a_{s}

. The noise predictor is trained with AdamW (learning rate 1 × 10⁻⁴), batch size 128, gradient clipping at 1.0, and a decay of 0.999 for up to 250 epochs with early stopping on validation loss with a patience of 20.

All services run on an edge node connected to sensor devices over low-latency industrial Ethernet. After purification, the edge node publishes compact quality-control records on an Apache Kafka topic, including identifiers, defect type, classifier confidence, a purified/not-purified flag, and end-to-end latency. These records are consumed by the MIS to compute shift-level indicators such as first-pass yield, scrap mass, false reject/accept rates, and latency service-level objectives. Each decision is also logged with a small provenance vector, including attack flags, twin configuration hash, guidance parameters, and model version, to support governance and audit. Low confidence or latency violations trigger abstention and escalation to a manual review, while raw sensor streams remain on the edge node and are exported only upon escalation. In this way, both digital-twin inference and purification stay close to the line, and higher-level systems receive only summarized results and metadata.

5. Use Case

The DTCDP framework was applied to a hot-forming use case for steel parts. Steel bars are heated in an induction furnace, descaled with pressurized water and then transferred to a forming station. The use case aims to test the capacity of physics-guided diffusion purification to improve the robustness of time-series-based quality estimation under adversarial attacks and realistic data disturbances.

The DTCDP was deployed in the real-world environment of the use case, which already possesses a wide range of sensors collecting process-related data that support the pre-existing QC pipeline. Edge sensors such as pyrometers stream data to the DTCDP framework through the Apache Kafka bus. The main signals from the heating and descaling processes monitored by DTCDP are summarized in Table 4.

A dataset was collected over four weeks of operation of the hot-forming process. Each steel bar is represented by a multivariate time window of 6 s centered at the furnace exit, resampled at 100 Hz, resulting in sequences of 600 timesteps with the signals in Table 4. Quality labels of the product are derived from MIS historical data, with classes being OK and NOK, with NOK signifying the presence of a defect. With a production cycle time of approximately 45 s and an average daily production of 1.500 steel bars, a dataset containing over 40,000 per-product data signals was made. Across the four weeks, the class distribution is imbalanced, with OK being approximately 91% of the data and NOK being approximately 9% of the data. This class imbalance has resulted in the use of class weighting in the downstream LSTM classifier. The dataset was split at the batch level into training (65%), validation (15%), and test (20%) subsets to avoid data leakage. This batch-level split serves as a cross-batch evaluation, as the test set contains unseen production batches and associated operating condition variability, rather than random windows from the same batches. The collected windows span both nominal and boundary operating regimes, covering the full constraint envelope used by the twin. In the test split, approximately 10% of its included signals fall within the outer 5% of at least one channel’s admissible range, representing near-boundary operating conditions.

In the hot-forming use case, the preexistence of the study digital twin is instantiated as a low-order discrete-time state-space model following Equation (3), whose state captures the dominant thermal and electro-hydraulic dynamics of induction heating and descaling. The twin state is defined as

s_{k} = {[T_{k}, T_{k}^{w}, P_{k}, η_{k}]}^{T}

, where

T_{k}

is the effective bar thermal state near the furnace exit,

T_{k}^{w}

is an effective water-system load proxy (pressure- and flow-driven),

P_{k}

is the effective electrical power state and

η_{k}

is a slowly varying disturbance term due to material emissivity and induction coil coupling drift. Exogenous inputs are

u_{k} = {[υ_{k}, p_{k}, q_{k}]}^{T}

, corresponding to line speed, water pressure, and descaling flow. The model uses a

Δ t_{t w i n}

= 0.1 s update rate and is linearly interpolated to the aggregated 100 Hz sensor grid when forming event tuples.

Currently, the twin dynamics are modeled with a linear discrete-time update

s_{k + 1} = F s_{k} + G u_{k} + w_{k}

, where F captures dominant thermal inertia and slow drift (

η_{k}

), and G captures the direct influence of line speed, water pressure, and descaling flow on the latent process state. The matrix F is denoted in Equation (15), while the matrix G is denoted in Equation (16).

F = [\begin{matrix} 0.992 & 0.004 & 0.010 & 0.002 \\ 0 & 0.965 & 0 & 0.006 \\ 0 & 0 & 0.975 & 0.012 \\ 0 & 0 & 0 & 0.998 \end{matrix}]

(15)

G = [\begin{matrix} - 0.020 & 0 & 0 \\ 0 & 0.030 & 0.010 \\ 0 & 0 & 0.060 \\ 0 & 0 & 0 \end{matrix}]

(16)

In addition, the measurement mapping

{\hat{y}}_{k} = H s_{k} + n_{k}

produces twin-expected quantities used by the energy residuals. The matrix H is presented in Equation (17), while

n_{k} ~ N (0, R)

, with the R matrix being presented in Equation (18). Lastly,

w_{k}

, which is also used to parametrize the twin transition, is described as

w_{k} ~ N (0, Q)

, with the Q matrix being described in Equation (19).

H = [\begin{matrix} 1.0 & 0 & 0 & 0 \\ 0.90 & - 0.20 & 0 & 0.05 \\ 0 & 0 & 1.0 & 0 \end{matrix}]

(17)

R = [\begin{matrix} 0.49 & 0 & 0 \\ 0 & 0.64 & 0 \\ 0 & 0 & 0.36 \end{matrix}]

(18)

Q = [\begin{matrix} 0.25 & 0 & 0 & 0 \\ 0 & 0.125 & 0 & 0 \\ 0 & 0 & 0.64 & 0 \\ 0 & 0 & 0 & 0.0025 \end{matrix}]

(19)

For each timestep in the 6 s window, the physics residual map is built from (i) per-channel amplitude constraints, (ii) per-channel rate constraints, and (iii) two global consistency constraints, yielding L = 3C + 2 residuals for C = 9 channels. Specifically,

Φ_{k}

includes bounds

x_{k, c} \in [l_{c}, u_{c}]

and rate limits

| x_{k, c} - x_{k - 1, c} | < ρ_{c}

for all signals in Table 4, plus an energy constituency term over the window and an attraction term. The diagonal weighting matrix A (with

W = A^{T} A

) prioritizes thermally critical constraints (temperature and energy) over auxiliary hydraulics, while keeping all constraints active during guidance. The constraint sets are presented in Table 5.

The two global constraints include the energy consistency

| Σ_{t} E_{t} Δ t - Σ_{t} {\hat{E}}_{t} Δ t | \leq ε_{Ε}

, with

ε_{Ε}

= 100 kJ per 6 s windows (approximately 5% of the nominal energy at 280 kW), and the twin attraction

\frac{1}{T} Σ_{t} {| |T_{t}^{h e a t}, T_{t}^{d e s c a l e}, E_{t}| |}_{2}^{2} < ε_{a t t r}

, with

ε_{a t t r}

= 1.0 in normalized units (approximately 1σ average deviation). In terms of weights, the diagonal entries of the matrix A were set piecewise as (i)

α_{a m p}

= 1.0 for all amplitude residuals, (ii)

α_{r a t e}

= 0.7 for all rate residuals,

α_{E}

= 2.0 for all energy residuals and

α_{a t t r}

= 1.5 for all attraction residuals.

It should be pointed out that the matrices F, G, H and noise covariances Q, R were identified on the training split using prediction-error minimization on windows flagged as normal operation with a batch-level split to avoid leakage, after z-score normalization per channel. Constraint bounds

[l_{c}, u_{c}]

were initialized from the [0.5, 99.5] percentiles of the training data per channel and expanded by a safety margin of approximately 3%, while rate limits

ρ_{c}

were set to the 99th percentile of

| x_{k, c} - x_{k - 1, c} |

under normal operations. Lastly,

ε_{Ε}

and

ε_{a t t r}

were set from the 95th percentile of their respective residual distributions.

In terms of preprocessing, before window extraction, raw streams are time-aligned and resampled to 100 Hz. Short missing segments of below 0.2 s are filled by linear interpolation per channel. To suppress acquisition glitches, impulsive spikes were detected using a median filter and replaced with the local median, while any remaining extreme values were clipped to the admissible per-signal bounds. Lastly, all channels were z-score normalized using training-set statistics, with the same parameters applied to validation and test splits.

The autoencoder used in the diffusion layer was trained on the training set and validated on the validation set. Its reconstruction performance is reported in Table 6.

These metrics indicate that the autoencoder captures the relevant heating and descaling dynamics without severe overfitting, providing a suitable latent representation for subsequent diffusion-based purification.

The downstream QC module is an LSTM-based classifier that consumes the per-bar time-series signals and outputs one of the two quality labels. The model consumes the normalized 600 × 9 windows and outputs OK/NOK probabilities via a two-layer LSTM with an MLP head. The classifier is trained with binary cross-entropy using Adam (learning rate 1 × 10⁻³), batch size 256, up to 50 epochs using a patience of 7, and class weighting to account for NOK imbalance. Lastly, this model preexists in this study. The LSTM QC model was retained to remain fully compatible with the plant’s deployed QC pipeline and edge latency constraints, and to isolate the impact of DTCDP as a drop-in defense approach. More complex sequence models were not adopted as they would require redesigning the legacy QC stack and potentially increase the inference cost, without being necessary for the study’s scope.

To probe robustness, white-box ℓ∞ attacks, created using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), are generated on the test signals with perturbation budgets corresponding to 1%, 2%, and 3% of the physical range per data signal, constrained to remain within actuator limits so that perturbed traces are visually plausible. A test-time evasion threat model is adopted, where an adversary with white-box knowledge of the QC classifier tampers with the multivariate sensor streams at the edge. Perturbations are modeled as additive changes on the 6 s per-bar window and are bounded in the ℓ∞ norm; thus, the maximum change to any single sample in each channel is limited to 1 to 3% of its physical range. This captures small yet worst-case sensor manipulations that remain within actuator limits while being optimized to misguide the QC model.

White-box, untargeted test-time evasion attacks (FGSM/PGD) are evaluated against the downstream QC classifier, with perturbations bounded in ℓ∞ and scaled as

ϵ * (u_{c} - l_{c})

per channel for

ϵ \in {0.01, 0.02, 0.03}

. For PGD, each iteration applies projection onto the ℓ∞ around the original window and clipping to the admissible bounds.

FGSM is applied as a single-step ℓ∞ attack with

ϵ \in {0.01, 0.02, 0.03}

times the physical range of each channel as presented in Table 4, using the classifier loss gradient and clipping the perturbed signal to the admissible bounds

[l_{c}, u_{c}]

as seen in Table 5. PGD uses projected iterative updates with 20 iterations, step size

α = \frac{ϵ}{10}

, and a random start uniformly sampled within the ℓ∞ ball.

In addition, non-adversarial disturbances are injected, including small calibration shifts, missing samples reconstructed by interpolation and isolated spikes up to 50% of the signal range to mimic sensor glitches. This ensures that, besides calibration shifts due to sensor drifts, missing-sample dropouts (interpolated) are injected and isolated spikes that approximate common sensor failures and glitches are observed in edge data streams.

Four purification strategies are tested to highlight the advantages of the proposed DTCDP framework. These include the following: (1) no defense, where the LSTM operates directly on normalized signals, (2) simplified preprocessing that consists of data signal-wise low-pass filtering and clipping, (3) unconditioned diffusion, where the latent diffusion purifier is used without physics guidance (

L_{p h y s}

= 0), and (4) the DTCDP framework, where the latent diffusion is guided by the residuals produced by the existing hot-forming digital twin.

For the two diffusion-based strategies (unconditioned diffusion and DTCDP), inference-time sampling was run in a one-step mode (K = 1) using a deterministic reverse update from a fixed intermediate timestep, followed by the physics-guidance update. A few-steps setting (K = 10) was also tested during tuning; however, due to edge latency constraints imposed by the use case owner and the need for real-time rather than near-real-time inference, the evaluation was performed using the one-step mode.

The evaluation of the proposed framework is based on accuracy-related metrics of the QC module. The evaluation focuses on a clean classification accuracy, robust accuracy under attack, first-pass yield (FPY) on clean and under-attack data, false reject rate under attack, and the 95th percentile end-to-end latency per produced product. The evaluation results can be seen in Table 7.

As seen in Table 7, on clean data, all strategies achieve comparable performance, with DTCDP maintaining classification accuracy and FPY within one percentage point of the no-defense baseline. It also introduces a latency overhead of approximately 193 ms, which is below the 45 s task time of the hot-forming process. Under adversarial perturbations, the unprotected baseline exhibits a drop in robust accuracy and FPY, as well as a corresponding increase in false rejects, particularly around edge cases where the classification probability of defective products is borderline. Lastly, simplified pre-processing recovers part of this loss but remains vulnerable to attacks that preserve low-frequency trends while altering transient behavior.

In the hot-forming process, the most damaging corruptions are (i) impulsive spikes on the energy of the process and (ii) slow, structured drifts that preserve low-frequency trends while violating rate limits. DTCDP suppresses these modes as the guidance backpropagates a weighted physics-residual penalty during denoising, preferentially damping changes that increase consistency residuals while maintaining intact physically plausible transients. This effect is more pronounced for iterative PGD than for single-step FGSM, since multi-step attacks can accumulate small but systematic constraint violations that are counteracted by per-step feasibility projection. Lastly, the guidance strength λ controls how aggressively the sampler is pulled toward the twin-feasible manifold; a larger strength improves constraint adherence but can lead to over-constraining sharp yet valid transitions. Hence, a strength of 0.05 was selected as the most optimal robustness and clean-accuracy compromise.

To illustrate robustness trends, a robustness curve has been computed that shows the downstream QC accuracy under attack as a function of the perturbation budget (ε = 1%, 2%, and 3% of the per-signal physical range). Figure 4 presents the robustness curves, demonstrating that DTCDP degrades less as ε increases.

To characterize error modes under attack, the row-normalized confusion matrices for OK vs. NOK at the default decision threshold of 0.5 are reported in Figure 5. The matrices highlight that DTCDP reduces false rejects while also improving NOK recall compared to the no-defense baselines.

To quantify uncertainty, all reported rates in Table 7 have been computed on the held-out test split and are accompanied by 95% confidence intervals estimated via non-parametric bootstrap (1000 resamples at bar level). Pairwise method comparisons were evaluated on the same test windows using a paired test on predictions via a McNemar test for accuracy/robust accuracy and FPY-derived rates and Holm correction for multiple comparisons. ROC-AUC differences were assessed using a paired AUC test on the classifier probabilities. Threshold-swept ROC and precision-recall (PR) curves are computed from the downstream classifier probabilities for the same attack settings used in evaluation (PGD/FGSM attacks). The ROC curves are illustrated in Figure 6, with the dashed line representing the random-guess baseline, while PR curves are illustrated in Figure 7.

As seen from Figure 6 and Figure 7 the ROC/PR curves confirm that DTCDP improves separability under attack across a wide range of thresholds. In particular, the physics-guided pipeline maintains higher true-positive rates at comparable false-positive rates, consistent with the reduced false-reject metrics under attack.

Increasing K yields diminishing returns in robust accuracy, while latency scales approximately linearly, which justifies the selection of the one-step mode (K = 1). The comparative results of changes in steps, latency and robustness are reported in Figure 8.

Both diffusion-based approaches improve robustness, with the physics-guided DTCDP yielding the highest robust accuracy and FPY under attack. From an MIS perspective, this translates into fewer bars unnecessarily routed to reheating or manual inspection and more stable quality indicators, without sacrificing responsiveness or requiring changes to the downstream QC logic.

Furthermore, to assess dependence on digital-twin accuracy, the DTCDP evaluation was repeated while perturbing the twin outputs and state alignment. Such results are presented in Table 8. As seen in Table 8, the robustness degrades as twin fidelity decreases; however, the DTCDP framework remains above the unprotected baseline. Lastly, the fluctuations in the p95 latency are statistically insignificant.

To address deployability under different compute resources and transient burst loads, the end-to-end inference latency was computed for each purification strategy with GPU acceleration enabled and in CPU-only mode. A load was emulated by replaying windows with increasing concurrent requests to the purification service to represent short backlogs or multi-stream operations, while reporting median (p50) and tail (p95) latency. The median results are reported in Figure 9, while Figure 10 reports the tail latency results.

As seen from Figure 9 and Figure 10, latency increases with load, and the effect is most pronounced in the diffusion-based pipelines, where denoising and guidance highly increase the computing time. GPU acceleration consistently reduces tail latency, keeping DTCDP within a sub-second threshold even in moderate burst loads, whereas CPU-only execution shows a steeper degradation as concurrency increases. Lastly, in terms of memory consumption, peak memory consumption with the GPU-accelerated scenario was 1.2 GB in 1 concurrency, rising to 1.6 GB with 8 concurrencies, while in CPU-only execution, RAM usage was 0.9 GB in 1 concurrency, rising to 1.4 GB with 8.

In the context of per-window runtime and throughput, each product corresponds to a 6 s window (600 samples at 100 Hz). The sample-level runtime can be approximated as end-to-end latency when divided by 600. Throughput is computed as concurrency divided by the median latency. For DTCDP in a one-step mode, this corresponds to approximately 6 windows per second on the GPU at a concurrency equal to 1, resulting in 0.3 ms per sample.

As DTCDP is deployed on the edge, the critical loop of purification and QC inference remains local, while only compact QC records or metadata are published to the MIS; therefore, degraded uplink connectivity primarily affects reporting rather than inference. To probe resilience to weak network conditions within the edge stack, a small-scale experiment was carried out where a synthetic delay and packet loss on the HTTP path between Node-RED and the twin service were injected (1% packet loss and 100 ms delay), with a 150 ms timeout and reuse of the last valid twin state. In this scenario, DTCDP maintained stable outputs with a slight tail-latency increase of approximately 79 ms in one-mode concurrency.

Following up on this small-scale experiment, a similar one was carried out to assess scalability to larger sensing configurations. The signals were expanded from 9 to 18 and 27 channels, preserving sampling rate and window length and keeping the same latent architecture. Runtime scaled approximately linearly with channel count, with p95 latency rising from approximately 193 ms in 9 channels to 240 ms in 18 channels and 339 ms in 27 channels on GPU, pinpointing to feasibility for moderately expanded sensor suites, but further future experimentation is needed.

Lastly, as generative purification can be bypassed by gradient-based adaptive attacks, a lightweight Backwards Pass Differentiable Approximation (BPDA)-style evaluation for DTCDP in a one-step mode was performed, where gradients are backpropagated through the purification stage using a straight-through approximation and expectation-over-transformation with Expectation Over Transformation (EOT) being set to five samples and the same PGD settings. The resulting robustness is lower than the non-adaptive PGD evaluation, with a drop in robust accuracy of approximately 4.5% at

ϵ = 0.03

.

6. Conclusions

This work introduced a DTCDP framework that combines latent diffusion with physics-aware guidance to mitigate adversarial perturbations in manufacturing QC. The approach couples a data integration and digital-twin layer, which derives synchronized states and physics residuals from shop floor time-series data, with a latent diffusion purifier and a downstream AI-based classifier deployed at the edge and integrated with MIS workflows.

A pilot application in a hot-forming station for steel parts was used to test the framework in a real-world production environment. Time-series signals were fed to a temporal autoencoder that provided a compact latent representation of the diffusion. Given the low reconstruction errors of the autoencoder, the latent space preserves the relevant data dynamics for subsequent purification. On top of this representation, DTCDP improved the robustness of an LSTM-based quality classifier under

l_{\infty}

attacks and disturbances, raising robust accuracy from 61% for the no-defense baseline to 81.5%.

The pilot also highlights several practical considerations. In real-world deployments, edge nodes may operate under tight power, thermal, and memory budgets and may not include a dedicated GPU, which can increase tail latency under burst loads. Consequently, the DTCDP should be configured in a one-step mode to maintain latency SLOs on CPU-only hardware.

In addition, the effectiveness of physics guidance depends on the fidelity and maintenance of the digital twin and on how well the constraint set captures feasible operating regimes, suggesting the need for mechanisms to detect and compensate for model drift in both the twin and the autoencoder. Digital-twin mismatch can bias feasibility-guided sampling toward twin-consistent yet physically incorrect reconstructions, particularly under regime changes or unmodeled dynamics. Performance is also sensitive to the selection of guidance weights, such as

λ

and residual weights, which govern the trade-off between constraint satisfaction and reconstruction fidelity. In particular, overly strong guidance may over-regularize valid transients, whereas weak guidance may leave violations unresolved. A practical extension is adaptive or data-driven tuning of these parameters under varying operating conditions, for example, via uncertainty-aware scaling based on twin state-estimation confidence, closed-loop adjustment to keep

L_{phys}

within a target range learned from nominal windows, or periodic re-calibration using trusted data to mitigate long-horizon drift in edge deployments. Furthermore, the current evaluation focuses on white-box attacks and a specific hot-forming process. This study has performed a limited quantification of robustness under adaptive attackers that differentiate through purification, and does not evaluate long-term distribution shifts. Therefore, future work will extend the analysis to a wider range of adaptive attackers, alternative threat models, and sensor failures, and investigate robustness across different production lines, through its application on other manufacturing systems such as EV battery assembly.

Lastly, DTCDP is modality-agnostic by design, which opens further extensions to multi-modal data streams combining time-series and image data, as well as coupling purification with online adversarial training of downstream models. Longer-term studies could also quantify the impact of physics-guided diffusion purification on MIS-level indicators such as scrap-related costs, stability of quality metrics, and effectiveness of escalation procedures in smart manufacturing environments. Future work will also focus on deployment hardening, including model compression and quantization, and automated monitoring of constraint drift and guidance-weight stability to support reliable long-term operation.

Author Contributions

Conceptualization, N.N. and P.C.; Methodology, P.C. and N.N.; Software, P.C.; Validation, P.C. and N.N.; Formal analysis, N.N. and P.C.; Data curation, P.C.; Writing—original draft preparation, P.C. and N.N.; Writing—review and editing, N.N. and P.C.; Supervision, N.N.; Project administration, N.N.; Funding acquisition, N.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the HORIZON-CL42021-TWIN-TRANSITION-01 openZDM project under Grant Agreement No. 101058673.

Data Availability Statement

The data cannot be made available due to the manufacture’s confidentiality requirements.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tran, N.-H.; Park, H.-S.; Nguyen, Q.-V.; Hoang, T.-D. Development of a Smart Cyber-Physical Manufacturing System in the Industry 4.0 Context. Appl. Sci. 2019, 9, 3325. [Google Scholar] [CrossRef]
Cerquitelli, T.; Nikolakis, N.; Bethaz, P.; Panicucci, S.; Ventura, F.; Macii, E.; Andolina, S.; Marguglio, A.; Alexopoulos, K.; Petrali, P.; et al. Enabling Predictive Analytics for Smart Manufacturing through an IIoT platform. IFAC-PapersOnLine 2020, 53, 179–184. [Google Scholar] [CrossRef]
Danesh, W.; Sapireddy, S.R.; Rahman, M. Understanding and Detecting Adversarial Examples in IoT Networks: A White-Box Analysis with Autoencoders. Electronics 2025, 14, 3015. [Google Scholar] [CrossRef]
Bhattar, P.L.; Pindoriya, N.M. False Data Injection Attack with Max-Min Optimization in Smart Grid. Comput. Secur. 2024, 140, 103761. [Google Scholar] [CrossRef]
Tambare, P.; Meshram, C.; Lee, C.-C.; Ramteke, R.J.; Imoize, A.L. Performance Measurement System and Quality Management in Data-Driven Industry 4.0: A Review. Sensors 2021, 22, 224. [Google Scholar] [CrossRef]
Wang, H.; Yao, X.; Jiang, Q. Data Reconciliation Method for Nuclear Power Steam Turbine Unit Based on Combined Robust Function and Generalized Regression Neural Network. Nucl. Eng. Des. 2026, 446, 114578. [Google Scholar] [CrossRef]
Zipfel, J.; Verworner, F.; Fischer, M.; Wieland, U.; Kraus, M.; Zschech, P. Anomaly Detection for Industrial Quality Assurance: A Comparative Evaluation of Unsupervised Deep Learning Models. Comput. Ind. Eng. 2023, 177, 109045. [Google Scholar] [CrossRef]
Jaskó, S.; Skrop, A.; Holczinger, T.; Chován, T.; Abonyi, J. Development of Manufacturing Execution Systems in Accordance with Industry 4.0 Requirements: A Review of Standard- and Ontology-Based Methodologies and Tools. Comput. Ind. 2020, 123, 103300. [Google Scholar] [CrossRef]
Beruvides, G.; Quiza, R.; Del Toro, R.; Castaño, F.; Haber, R.E. Correlation of the Holes Quality with the Force Signals in a Microdrilling Process of a Sintered Tungsten-Copper Alloy. Int. J. Precis. Eng. Manuf. 2014, 15, 1801–1808. [Google Scholar] [CrossRef]
Shojaeinasab, A.; Charter, T.; Jalayer, M.; Khadivi, M.; Ogunfowora, O.; Raiyani, N.; Yaghoubi, M.; Najjaran, H. Intelligent Manufacturing Execution Systems: A Systematic Review. J. Manuf. Syst. 2022, 62, 503–522. [Google Scholar] [CrossRef]
Marguglio, A.; Veneziano, G.; Greco, P.; Jung, S.; Siegburg, R.; Schmitt, R.H.; Monaco, S.; Apiletti, D.; Nikolakis, N.; Cerquitelli, T.; et al. A Hybrid Cloud-to-Edge Predictive Maintenance Platform. In Predictive Maintenance in Smart Factories; Cerquitelli, T., Nikolakis, N., O’Mahony, N., Macii, E., Ippolito, M., Makris, S., Eds.; Information Fusion and Data Science; Springer: Singapore, 2021; pp. 19–37. ISBN 978-981-16-2939-6. [Google Scholar]
Kaiser, J.; McFarlane, D.; Hawkridge, G.; André, P.; Leitão, P. A Review of Reference Architectures for Digital Manufacturing: Classification, Applicability and Open Issues. Comput. Ind. 2023, 149, 103923. [Google Scholar] [CrossRef]
Ma, S.; Huang, Y.; Liu, Y.; Kong, X.; Yin, L.; Chen, G. Edge-Cloud Cooperation-Driven Smart and Sustainable Production for Energy-Intensive Manufacturing Industries. Appl. Energy 2023, 337, 120843. [Google Scholar] [CrossRef]
Nain, G.; Pattanaik, K.K.; Sharma, G.K. Towards Edge Computing in Intelligent Manufacturing: Past, Present and Future. J. Manuf. Syst. 2022, 62, 588–611. [Google Scholar] [CrossRef]
Wynn, M.; Irizar, J. Digital Twin Applications in Manufacturing Industry: A Case Study from a German Multi-National. Future Internet 2023, 15, 282. [Google Scholar] [CrossRef]
Catti, P.; Freitas, A.; Pereira, E.; Gonçalves, G.; Lopes, R.P.; Nikolakis, N.; Alexopoulos, K. Data Analytics and AI for Quality Assurance in Manufacturing: Challenges and Opportunities. In Proceedings of the Learning Factories of the Future; Thiede, S., Lutters, E., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 205–212. [Google Scholar]
Catti, P.; Nikolakis, N.; Sipsas, K.; Picco, N.; Alexopoulos, K. A Hybrid Digital Twin Approach for Proactive Quality Control in Manufacturing. Procedia Comput. Sci. 2024, 232, 3083–3091. [Google Scholar] [CrossRef]
Al-Hawawreh, M.; Hossain, M.S. Digital Twin-Driven Secured Edge-Private Cloud Industrial Internet of Things (IIoT) Framework. J. Netw. Comput. Appl. 2024, 226, 103888. [Google Scholar] [CrossRef]
Harbi, Y.; Medani, K.; Gherbi, C.; Aliouat, Z.; Harous, S. Roadmap of Adversarial Machine Learning in Internet of Things-Enabled Security Systems. Sensors 2024, 24, 5150. [Google Scholar] [CrossRef]
Shu, D.; Li, Z.; Barati Farimani, A. A Physics-Informed Diffusion Model for High-Fidelity Flow Field Reconstruction. J. Comput. Phys. 2023, 478, 111972. [Google Scholar] [CrossRef]
Amato, F.; Cirillo, E.; Fonisto, M.; Moccardi, A. Detecting Adversarial Attacks in IoT-Enabled Predictive Maintenance with Time-Series Data Augmentation. Information 2024, 15, 740. [Google Scholar] [CrossRef]
Alcaraz, C.; Lopez, J. Digital Twin-Assisted Anomaly Detection for Industrial Scenarios. Int. J. Crit. Infrastruct. Prot. 2024, 47, 100721. [Google Scholar] [CrossRef]
Sayghe, A. Digital Twin-Driven Intrusion Detection for Industrial SCADA: A Cyber-Physical Case Study. Sensors 2025, 25, 4963. [Google Scholar] [CrossRef]
Qureshi, A.R.; Asensio, A.; Imran, M.; Garcia, J.; Masip-Bruin, X. A Survey on Security Enhancing Digital Twins: Models, Applications and Tools. Comput. Commun. 2025, 238, 108158. [Google Scholar] [CrossRef]
Li, Y.; Li, Z.; Huang, L.; Hu, L.; Zeng, L.; Shen, D. Adversarial Purification with One-Step Guided Diffusion Model. Neural Netw. 2025, 192, 107877. [Google Scholar] [CrossRef]
Cai, M.; Wang, X.; Sohel, F.; Lei, H. Diffusion Models-Based Purification for Common Corruptions on Robust 3D Object Detection. Sensors 2024, 24, 5440. [Google Scholar] [CrossRef]
Ye, X.; Zhang, Q.; Cui, S.; Ying, Z.; Sun, J.; Du, X. Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion Models. Mathematics 2024, 12, 3093. [Google Scholar] [CrossRef]
Chen, H.; Xiang, Q.; Hu, J.; Ye, M.; Yu, C.; Cheng, H.; Zhang, L. Comprehensive Exploration of Diffusion Models in Image Generation: A Survey. Artif. Intell. Rev. 2025, 58, 99. [Google Scholar] [CrossRef]
Koo, I.; Chae, D.-K.; Lee, S.-C. Improving Adversarial Robustness via Distillation-Based Purification. Appl. Sci. 2023, 13, 11313. [Google Scholar] [CrossRef]
Ye, H.; Zhang, Y.; Zhao, X. Robust and Refined Salient Object Detection Based on Diffusion Model. Electronics 2023, 12, 4962. [Google Scholar] [CrossRef]
Du, Y.; Huang, L.; Li, H.; Kong, W. BoostCount: Diffusion-Based Position-Sensitive Adversarial Purification for Crowd Counting. Expert Syst. Appl. 2026, 296, 129041. [Google Scholar] [CrossRef]
Zhang, H.; Ding, W.; Zhang, D.; Xiao, J.; Shao, Z.; Chen, B. APDMs: Adversarial Purification Diffusion Models for Automatic Modulation Classification. Signal Process. 2026, 239, 110249. [Google Scholar] [CrossRef]
Nie, W.; Guo, B.; Huang, Y.; Xiao, C.; Vahdat, A.; Anandkumar, A. Diffusion Models for Adversarial Purification. arXiv 2022, arXiv:2205.07460. [Google Scholar]
Zhu, B.; Cheng, W.; Sun, L.; Feng, Y.; Zhang, X.; Liu, J. Diffusion Based Comprehensive Approach for Highly Contaminated Electrocardiogram Segmentation. Biomed. Signal Process. Control. 2024, 97, 106693. [Google Scholar] [CrossRef]
Vetter, J.; Macke, J.H.; Gao, R. Generating Realistic Neurophysiological Time Series with Denoising Diffusion Probabilistic Models. Patterns 2024, 5, 101047. [Google Scholar] [CrossRef]
Lin, L.; Li, Z.; Li, R.; Li, X.; Gao, J. Diffusion Models for Time-Series Applications: A Survey. Front. Inform. Technol. Electron. Eng. 2024, 25, 19–41. [Google Scholar] [CrossRef]
Yang, Y.; Jin, M.; Wen, H.; Zhang, C.; Liang, Y.; Ma, L.; Wang, Y.; Liu, C.; Yang, B.; Xu, Z.; et al. A Survey on Diffusion Models for Time Series and Spatio-Temporal Data 2024. arXiv 2025, arXiv:2404.18886. [Google Scholar] [CrossRef]
Xu, Y.; Huang, L.; Zhang, L.; Qian, L.; Yang, X. Diffusion-Based Radio Signal Augmentation for Automatic Modulation Classification. Electronics 2024, 13, 2063. [Google Scholar] [CrossRef]
Zhou, H.A.; Wolfschläger, D.; Florides, C.; Werheid, J.; Behnen, H.; Woltersmann, J.-H.; Pinto, T.C.; Kemmerling, M.; Abdelrazeq, A.; Schmitt, R.H. Generative AI in Industrial Machine Vision: A Review. J. Intell. Manuf. 2025, 1–24. [Google Scholar] [CrossRef]
Bagazinski, N.J.; Ahmed, F. ShipGen: A Diffusion Model for Parametric Ship Hull Generation with Multiple Objectives and Constraints. J. Mar. Sci. Eng. 2023, 11, 2215. [Google Scholar] [CrossRef]
Christiand, C.; Kiswanto, G.; Baskoro, A.S.; Hasymi, Z.; Ko, T.J. Tool Wear Monitoring in Micro-Milling Based on Digital Twin Technology with an Extended Kalman Filter. J. Manuf. Mater. Process. 2024, 8, 108. [Google Scholar] [CrossRef]
Liu, J.; Yang, F.; Peng, S.; Huang, X.; Tang, X.; Qiao, X. Physics-Guided Conditional Diffusion Model for GPR Denoising and Signal Recovery in Complex Mining Environments. Remote Sens. 2025, 17, 3837. [Google Scholar] [CrossRef]
Kassis, A.; Hengartner, U.; Yu, Y. DiffBreak: Is Diffusion-Based Purification Robust? arXiv 2025, arXiv:2411.16598. [Google Scholar] [CrossRef]
Giorgetti, G.; Pau, D.P. Transitioning from TinyML to Edge GenAI: A Review. Big Data Cogn. Comput. 2025, 9, 61. [Google Scholar] [CrossRef]
Kumar, S.; Srivastava, S.; Banerjea, S. Fortifying Vision Models: A Comprehensive Survey of Defences against Adversarial Examples. Appl. Soft Comput. 2025, 185, 113874. [Google Scholar] [CrossRef]
Chen, K.; Lu, Y.; Mao, Z.; Chen, J.; Chen, Z.; Qin, J. Towards Robust and Generalizable Adversarial Purification for Deep Image Classification under Unknown Attacks. Expert Syst. Appl. 2025, 286, 127998. [Google Scholar] [CrossRef]
Xu, Y.; Sun, H.; Chen, J.; Lei, L.; Ji, K.; Kuang, G. Adversarial Self-Supervised Learning for Robust SAR Target Recognition. Remote Sens. 2021, 13, 4158. [Google Scholar] [CrossRef]
Yi, H.; Hou, L.; Jin, Y.; Saeed, N.A.; Kandil, A.; Duan, H. Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation. Mech. Syst. Signal Process. 2024, 216, 111481. [Google Scholar] [CrossRef]
Xu, H.; Xu, S.; Yang, W. Unsupervised Industrial Anomaly Detection with Diffusion Models. J. Vis. Commun. Image Represent. 2023, 97, 103983. [Google Scholar] [CrossRef]
Wang, C.; Huang, C.; Zhang, L.; Xiang, Z.; Xiao, Y.; Qian, T.; Liu, J. Denoising Diffusion Implicit Model Combined with TransNet for Rolling Bearing Fault Diagnosis Under Imbalanced Data. Sensors 2024, 24, 8009. [Google Scholar] [CrossRef]
Xu, Q.; Ali, S.; Yue, T. Digital Twin-Based Anomaly Detection with Curriculum Learning in Cyber-Physical Systems. ACM Trans. Softw. Eng. Methodol. 2023, 32, 1–32. [Google Scholar] [CrossRef]
Kang, M.; Song, D.; Li, B. DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification. arXiv 2023, arXiv:2311.16124. [Google Scholar] [CrossRef]
Lin, G.; Tao, Z.; Zhang, J.; Tanaka, T.; Zhao, Q. Adversarial Guided Diffusion Models for Adversarial Purification. Neural Netw. 2025, 191, 107705. [Google Scholar] [CrossRef] [PubMed]

Figure 1. High-level framework representation.

Figure 2. The implementation’s UML sequence diagram.

Figure 3. The autoencoder architecture.

Figure 4. Robustness curves of accuracy vs. the perturbation budget (FGSM/PGD).

Figure 5. Row-normalized confusion matrices under attack.

Figure 6. ROC curves.

Figure 7. PR curves.

Figure 8. Sampling steps (K) vs. robustness and p95 latency.

Figure 9. Latency under load (p50) across hardware and purification strategies.

Figure 10. Latency under load (p95) across hardware and purification strategies.

Table 1. Comparison of DTCDP with recent digital-twin- and diffusion-based studies.

Recent Work	Recent Work’s Contribution	Recent Work’s Limitations	How DTCDP Addresses the Limitations
Yi et al. [48]	Diffusion for vibration time-series generation	Has not been applied to adversarial purification	Purifies attacked time-series using residual-guided denoising
Xu et al. [49]	Conditional diffusion for unsupervised industrial anomaly detection	Detection-focused without a robustness threat model	Defense pre-processor with robustness metrics
Wang et al. [50]	Denoising diffusion implicit model for augmentation	Offline augmentation without adversarial purification	Online purification framework with physics feasibility enforced
Xu et al. [51]	Digital twin used for anomaly detection and data augmentation	Digital twin not coupled to generative denoising path	Digital-twin residuals enter the sampler for guided denoising
Kang et al. [52]	Comparison of adaptive evasion attacks for vision-based applications	Sole focus on vision-based applications without extension to time series	Covers time-series adversarial attacks
Lin et al. [53]	Adds robust guidance to reverse diffusion	Guidance is data-driven	Guidance derived from digital-twin residuals and physical constraints

Table 2. JSON schema of physics residual map.

{
“timestamp”: “timestamp without timezone”,
“state”: {“string of state variables”},
“residuals”: {
“rate_limit”: “array”,
“amplitude”: “array”,
“energy”: “float”,
“attraction”: “float”
},
“weights”: {
“rate_limit”: “array”,
“amplitude”: “array”,
“energy”: “float”,
“attraction”: “float”
},
“L_phys”: “float”,
“metadata”: {
“state_origin”: “interpolated”,
“twin_id”: “string”
}
}

Table 3. Pseudocode block of the inference-time physics guidance used in DTCDP.

Pseudocode: Physics-guided latent diffusion purification (DTCDP) at inference

Input: normalized window x* ¹, twin state ŝ, encoder Eenc, decoder Edec, noise predictor εθ, timestep schedule {tK … t1}, guidance strength λ
Output: purified window

\tilde{x}

, residual stats / Lphys

1: zt ← Eenc(x*) # encode perturbed input
2: for t in {tK … t1} do # K = 1 (one-step) or small K
3: ε ← εθ(zt, t) # noise prediction
4: z0_hat ← (zt − √(1 − ā_t) · ε) / √(ā_t) # clean-latent estimate (Equation (12))
5: zt − 1 ← √(ā_{t − 1})·z0_hat + √(1 − ā_{t − 1})·ε # deterministic update (Equation (13))
6: x_tilde ← Edec(zt − 1) # decode for physics evaluation
7: Lphys ← physics_penalty(x_tilde, ŝ) # from twin residuals (Section 3.2)
8: zt − 1 ← zt − 1 − λ · ∇z Lphys # physics guidance (Equation (14))
9: zt ← zt − 1
10: end for
11:

\tilde{x}

← Edec(zt) # final purified signal
12: return

\tilde{x}

, {Lphys, residual summary, timesteps, λ}

¹ x* represents the normalized input window.

Table 4. Monitored signals.

Data Signal	Measurement Unit	Raw Data Sample	Raw Sampling Rate	Aggregated Rate
Energy consumption	kW	280.4	1 kHz	100 Hz
Coil current	A	889.8	1 kHz	100 Hz
Coil voltage	V	466.1	1 kHz	100 Hz
Steel part surface temperature after heating	°C	1209.2	100 Hz	100 Hz
Coil coolant flow rate	L/min	73.2	1 kHz	100 Hz
Water pressure	Bar	133.8	1 kHz	100 Hz
Descaling water flow rate	L/min	58.6	1 kHz	100 Hz
Steel part surface temperature after descaling	°C	997.4	100 Hz	100 Hz
Line speed	m/s	0.22	100 Hz	100 Hz

Table 5. Twin constraints instantiation.

Signal	$l_{c}$	$u_{c}$	$Rate Limit ρ_{c}$ (per 100 Hz Step)
Energy consumption (kW)	150	450	1.5 kW
Coil current (A)	500	1300	10 A
Coil voltage (V)	300	650	6 V
Steel part surface temperature after heating (°C)	1050	1300	1.0 °C
Coil coolant flow rate (L/min)	50	90	0.6 L/min
Water pressure (Bar)	90	170	1.0 Bar
Descaling water flow rate (L/min)	40	80	0.7 L/min
Steel part surface temperature after descaling (°C)	850	1150	1.2 °C
Line speed (m/s)	0.15	0.30	0.003 m/s

Table 6. Performance metrics of the autoencoder.

MSE per Timestep (×10⁻³)	Normalized RMSE (% of Signal Range)	R² Score
2.0	2.3	0.98

Table 7. Comparison of purification strategies on the test set.

Purification Strategy	Clean Accuracy (%)	Robust Accuracy Under Attack (%)	FPY on Clean Data (%)	FPY on Under-Attack Data (%)	False Reject Rate Under Attack (%)	p95 Latency (ms)
No defense	93.5	61.0	97.0	88.2	9.6	32
Simplified preprocessing	92.9	68.4	96.8	91.0	7.1	36
Unconditioned diffusion	92.1	73.2	96.5	92.1	6.0	148
DTCDP	92.8	81.5	96.9	94.6	4.2	193

Table 8. Sensitivity to twin fidelity.

Twin Fidelity Setting (DTCDP)	Robust Accuracy Under Attack (%)	FPY Under Attack (%)	p95 Latency (ms)
Nominal calibrated twin (as in Table 7)	81.5	94.6	193
+0.25σ Gaussian noise introduced	80.3	94.0	192.7
$+ 0.5 s lag in \hat{s} (t)$ before interpolation	79.1	93.4	193.8
$+ 20 ° C on T_{t}^{h e a t}$ $and T_{t}^{d e s c a l e}$	77.1	91.2	193.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nikolakis, N.; Catti, P. A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control. Future Internet 2026, 18, 23. https://doi.org/10.3390/fi18010023

AMA Style

Nikolakis N, Catti P. A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control. Future Internet. 2026; 18(1):23. https://doi.org/10.3390/fi18010023

Chicago/Turabian Style

Nikolakis, Nikolaos, and Paolo Catti. 2026. "A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control" Future Internet 18, no. 1: 23. https://doi.org/10.3390/fi18010023

APA Style

Nikolakis, N., & Catti, P. (2026). A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control. Future Internet, 18(1), 23. https://doi.org/10.3390/fi18010023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Physics-Aware Latent Diffusion Framework for Mitigating Adversarial Perturbations in Manufacturing Quality Control

Abstract

1. Introduction

2. State of the Art

3. Methodology

3.1. The Architecture of the Digital-Twin-Conditioned Diffusion Purification Framework

3.2. Data Integration and Digital-Twin Layer

3.3. Diffusion-Based Purification Layer

3.4. Integration with Downstream AI-Based Components and MIS

4. Implementation

5. Use Case

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI