Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks

Meng, Xiangxiang; Kang, Yunkai; Shang, Wei; Wu, Wenhong

doi:10.3390/app16104581

Open AccessArticle

Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks

¹

School of Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China

²

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China

³

School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(10), 4581; https://doi.org/10.3390/app16104581

Submission received: 12 March 2026 / Revised: 30 April 2026 / Accepted: 1 May 2026 / Published: 7 May 2026

(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

To address the difficulty of directly sensing in-tank sedimentation states during sludge discharge in horizontal-flow sedimentation tanks (HSTs), this study proposes a soft-sensing framework for bottom-sludge thickness in drinking water treatment plants. This framework is designed to overcome the limited capacity of effluent-turbidity-based indicators for fine-grained discharge control and the impracticality of applying computational fluid dynamics (CFD) to real-time state estimation. The framework integrates Supervisory Control and Data Acquisition (SCADA) operational data, ultrasonic sludge–water interface measurements, and CFD-derived hydraulic priors. To incorporate hydrodynamic knowledge of sediment-particle transport, three fusion paradigms are developed: parameter transfer, representation fusion, and knowledge distillation, injecting physical priors into the parameter space, latent representation space, and supervision-constraint space, respectively. Performance is evaluated using pointwise accuracy (PA), curvature consistency error (CCE), and mass-conservation error (MCE). Experiments on a real-world HST dataset show that, across the six predictors examined, the three paradigms reduced PA, CCE, and MCE by 30.7%, 16.0%, and 56.3% on average relative to the same predictors trained without prior fusion. Under the in-distribution setting, the Attention predictor combined with parameter transfer attained the lowest PA (0.026) and the lowest MCE (1.052) among the eighteen paradigm–predictor combinations evaluated. Under the out-of-distribution setting with extended sedimentation duration, knowledge distillation attained the lowest values on all three metrics across zero-shot, 4-shot, and 6-shot adaptation; in the zero-shot setting, its PA, CCE, and MCE were 33.3%, 50.9%, and 33.8% lower than those of the second-best paradigm. These results demonstrate, within the experimental scope of this study, a methodological foundation for state-informed sludge-discharge scheduling in HSTs.

Keywords:

sedimentation tank soft sensing; hydrodynamic model; knowledge fusion; transfer learning; deep learning

1. Introduction

The transition toward smart water treatment plants (WTPs) aligns with low-carbon environmental policies and addresses the urgent need to enhance plant management while reducing energy consumption. Among all treatment units, the sedimentation tank plays a central role in solid–liquid separation in drinking water treatment plants, and its operational efficiency directly influences both overall plant energy use and treated water quality. Extensive studies have been conducted to improve sedimentation efficiency, leading to substantial gains in settling performance [1,2,3]. However, these advances have also introduced a new challenge: how to remove the settled sludge effectively under conditions of high sedimentation rates and strong process dynamics.

At present, the mainstream industrial practice is to employ sludge removal bridge cranes that travel across the entire tank at a constant speed while performing rapid suction [4]. Although this conservative strategy ensures the safety and reliability of the water treatment process, its operating cycle is largely determined by operator experience and lacks support from effective data analysis and quantitative modeling. As a result, considerable resources are wasted. In China alone, the water consumed during sludge discharge can account for up to 4.5% of a plant’s total treated water volume [5], placing substantial pressure on downstream treatment processes. Moreover, the discharged sludge-laden water often contains heavy metal concentrations far exceeding allowable standards [6], resulting in serious losses of both water resources and energy.

To improve the overall efficiency of sludge discharge, research in both academia and industry has mainly focused on adjusting chemical dosing based on effluent turbidity, with the aim of enabling more stable removal of settled solids within a single discharge cycle. This strategy can stabilize settled-solid removal within a discharge cycle while reducing reagent consumption, water waste, and pollution risk. This research paradigm is feasible in engineering practice because variables such as turbidity and dosing rate can be continuously acquired through the Supervisory Control and Data Acquisition (SCADA) system and can directly support closed-loop dosing control. Consequently, related studies have gradually developed into a relatively systematic technical framework [7,8].

With the rapid development of deep learning and artificial intelligence, an increasing number of neural network architectures have been applied to prediction tasks in real-world water treatment processes [9,10]. For example, Tochio et al. developed an artificial neural network (ANN) model and calibrated it using full-scale WTP data, effectively reducing human–machine interaction in chemical dosing control [11]. Kim et al. considered the effects of extreme weather conditions and human operations, and integrated a convolutional neural network for feature extraction with a gated recurrent unit for time-series analysis. Their model achieved effective prediction of coagulant dosage under dynamically changing conditions and demonstrated that deep learning models can respond to sedimentation states in dynamic environments [12]. Zhao et al. treated effluent-turbidity optimization as a time-series forecasting task and proposed an integrated CNN–BiLSTM model, employing smoothing strategies and temporal redundancy analysis to address the complexity of raw-water data, and validated the model on real datasets from Xi’an and Shanghai [13]. Liu et al. focused on floc-volume dynamics and designed a multilayer convolutional neural network to extract turbidity-variation features from real-time sedimentation-tank monitoring; by introducing turbidity-corrected data, they mitigated the lag caused by hydraulic retention time and improved generalization [14]. To address the long-standing shortage of training data, Zhou et al. proposed a Transformer–TCN integrated architecture incorporating the time-window effect, achieving accurate turbidity prediction under limited short-term data through floc-morphology representation and self-learned lag representation [15]. In all of these studies, however, the modeling target is the water leaving the tank, not the sedimentation state inside it.

This shared design choice exposes a fundamental limitation when the operational question concerns sludge discharge rather than coagulant dosing. Effluent turbidity integrates multiple in-tank processes at the outlet and is inherently insensitive to the spatial location of sludge accumulation. The same outlet turbidity may arise from severe front-section accumulation, residual rear-section buildup, or a combination of both, yet these scenarios require different sludge-discharge ranges and cycles. Frameworks built around end-point variables therefore cannot tell the sludge-discharge gantry where to travel, how far to travel, or when to start; they can only confirm, after the fact, whether the resulting effluent quality remains acceptable. As a consequence, gantry operation in current practice continues to rely on fixed full-length cycles and operator experience, even at plants that have already deployed advanced turbidity-prediction models. What is missing is not a better end-point predictor, but a state-perception layer that observes the bottom of the tank itself.

To reveal the hydraulically driven mechanism of sludge layer thickness inside sedimentation tanks and to approximate the real process as closely as possible, researchers have widely adopted computational fluid dynamics (CFD) [16,17] to numerically simulate key hydraulic behaviors, such as the three-dimensional flow field structure and the distribution of suspended particles in sedimentation tanks [18]. On this basis, the coupling between particle transport and settling has been further analyzed to explain the causes of local sludge accumulation and to guide the optimization of inlet and outlet structures [19]. For example, Ma et al. performed sedimentation tank simulations in ANSYS Fluent (version 2022 R2) using the Random

k

-

ε

turbulence models together with a discrete phase model, and effectively revealed the relationship between changes in particle diameter and settling distance [20]. Bruno et al. used large eddy simulation to capture the major vortex structures of three-dimensional turbulence inside the settler. In the numerical solution, they discretized the governing equations with the finite volume method on a structured grid, and then introduced a scalar tracer to simulate the diffusion and transport of substances in water. This enabled them to analyze the motion and state of settling particles from a hydrodynamic perspective [21].

These studies consistently demonstrate that CFD models can effectively characterize the settling patterns of bottom sludge. However, substantial barriers still limit their engineering application. First, high-resolution three-dimensional simulations are computationally intensive and highly sensitive to boundary conditions, inlet turbulence, and particle parameters. As a result, parameter identification and uncertainty propagation are difficult to incorporate into an operational closed loop. Second, as reported by Zhao et al. and Soleimani et al. [22], sedimentation is a complex nonlinear process jointly governed by flow dynamics and operating conditions. Existing CFD-based methods cannot directly establish an intuitive mapping between tank operating conditions and sludge states. Therefore, they are used mainly for offline diagnosis and design optimization rather than real-time state estimation during operation.

At present, neither deep learning models nor numerical simulation methods can fully overcome the bottlenecks of online engineering deployment. These limitations arise mainly from two aspects: engineering implementation and scientific representation.

From the engineering perspective, the primary challenge lies in the lack of observation and annotation. Bottom sludge thickness in sedimentation tanks has long lacked an online ground-truth measurement that is continuously available, maintainable, and strictly aligned with SCADA data. As a result, the workflow for sample construction, model training, and performance validation based on in-tank state supervision remains incomplete, which limits the feasibility, verifiability, and reproducibility of supervised learning for process-state modeling.

From the scientific perspective, the primary challenge lies in the unclear mechanism governing sludge spatial distribution. The sedimentation process is affected by multiple coupled factors and nonlinear disturbances, including influent fluctuations, dosing strategies, sludge discharge events, and changes in hydraulic structure [23,24]. Because bottom sludge distribution is strongly coupled with flow structure, particle transport, and floc changes, purely data-driven models often fail to maintain physical consistency under varying and disturbed conditions, thereby weakening their generalization ability and engineering reliability.

To address these challenges, this study focuses on the process state inside the sedimentation tank as the modeling target. By integrating multi-source operational data from the SCADA system with interface measurements obtained from an ultrasonic sludge–water interface meter [25], and based on two field experiments, each spanning three months, we define a soft-sensing task for the spatial sedimentation state under operating conditions. This task enables online estimation and prediction of the true in-tank sedimentation state from observable SCADA variables. Furthermore, because sedimentation tank states are influenced by hydraulic action and particle transport, purely data-driven models are prone to physical inconsistency under operating-condition switching, sparse observations, and out-of-distribution scenarios. To address this issue, CFD simulation results are introduced as hydrodynamic priors for settling particles. CFD characterizes the internal flow-field structure of the sedimentation tank and therefore provides soft constraints on spatial distribution and physical plausibility for data-driven models. On the other hand, this study explores an engineering-oriented mechanism for injecting hydraulic prior knowledge. We further conduct a systematic comparison of three knowledge-fusion methods for incorporating hydrodynamic priors of settling particles, thereby providing a new approach for exploiting implicit physical information in data-driven sedimentation-state estimation.

First, we reframe the modeling target from end-point water quality to the in-tank sedimentation state. Existing data-driven studies in WTPs [11,12,13,14,15] model effluent turbidity or coagulant dosage and cannot resolve where bottom sludge has accumulated, which is the variable that actually governs sludge-discharge scheduling.

Second, we construct a closed-loop data foundation for HST soft sensing, combining synchronized SCADA records, ultrasonic sludge-interface profiles, and CFD-simulated hydrodynamic priors collected at a full-scale WTP.

Third, we propose and systematically compare three paradigms for injecting CFD-derived priors into data-driven predictors—parameter transfer, representation fusion, and knowledge distillation—operating, respectively, in the parameter, latent, and supervision spaces. Unlike single-mechanism hybrid CFD–DL approaches, this decomposition isolates where in the learning pipeline the physical prior is most effective, and clarifies the trade-off between in-distribution accuracy and out-of-distribution robustness under operating-cycle shift.

2. Methods

As shown in Figure 1, this study proposes a soft-sensing framework for spatial sludge states in sedimentation tanks by integrating CFD-based physical priors. The framework consists of three stages: data collection, fusion-paradigm construction, and paradigm evaluation. First, CFD simulation data, SCADA operational data, and ultrasonic sludge–water interface measurements are integrated to establish the data foundation for sedimentation-state modeling. Second, to incorporate hydrodynamic priors of settling particles into the prediction model, three fusion paradigms are developed, operating in the parameter space, latent representation space, and supervision-constraint space, respectively. Finally, all predictors are evaluated under both in-distribution (ID) and out-of-distribution (OOD) settings to characterize prediction accuracy, profile-shape consistency, and behavior under cross-condition shift. The three stages are described in detail below.

2.1. Data Collection

2.1.1. Data Preprocessing

This study focuses on the horizontal-flow sedimentation system and sludge discharge unit of a water treatment plant in central China. The plant uses water from the Middle Route of the South-to-North Water Diversion Project and has a designed capacity of 100,000 m³/d. Its treatment process includes conventional folded-plate flocculation, horizontal-flow sedimentation, V-type filtration, activated carbon filtration, and chlorination. The plant has two horizontal-flow sedimentation tanks. The raw water turbidity ranges from 2 to 15 NTU. Based on historical operation data, the average influent turbidity is about 5 NTU, and a typical suspended solids concentration of 15 mg/L was used as the CFD initial condition [26]. For sludge discharge, the sedimentation tank uses a siphon sludge removal bridge (Zhengzhou Water Group, Zhengzhou, China) equipped with 12 suction pipes, with 3 pipes in each channel. Each pipe has a diameter of 80 mm and a straight trumpet-shaped suction head. Under normal conditions, sludge is discharged once every 24 h. The bridge moves at a designed speed of 0.92 m/min, and sludge is removed by the combined action of scrapers and suction heads. Each discharge cycle lasts about 1.5 h. The data used in this study cover two periods: from 4 August 2024 to 31 December 2024, and from 1 August 2025 to 4 November 2025. SCADA data were collected from the plant, together with SLDL2560-YGFR (Tianjin, China) ultrasonic measurements of sludge thickness in the horizontal-flow sedimentation tank with a precision of 0.001 m. To verify the reliability of the ultrasonic readings under the actual operating environment of the study tank, three rounds of manual cross-validation were conducted across the two field campaigns. In each round, ten representative measurement points were selected along the longitudinal axis of the sedimentation tank and the sludge thickness at each point was independently measured by lowering a graduated reference probe and recording the manually read depth. The relative deviation between the ultrasonic and the manually measured values was approximately 2% at each verified point across the three rounds, indicating that the ultrasonic readings are consistent with direct manual measurement within the operating range of the study tank. To avoid interference from sludge discharge, the ultrasonic interface meter was mounted on a bracket 2 m away from the siphon inlet. The sludge removal bridge operated in a forward-and-return mode, and only forward-pass measurements were recorded to avoid disturbances caused by sludge discharge. The collected results were cross-validated manually and against CFD simulation results. The SCADA data mainly include the variables listed in Table 1.

It should be noted that the acquisition of reference sludge-interface data was subject to two practical constraints. First, unlike conventional SCADA or image data, the ultrasonic interface meter had to move synchronously with the sludge discharge gantry to obtain the full spatial profile across 100 sampling points. As a result, data collection was tightly coupled to sludge discharge operation and could not be conducted independently. Second, OOD data collection required extending the sludge discharge interval from the conventional 24 h to 48–140 h, allowing bottom sludge to accumulate for a longer period and thereby increasing operational risk. Each OOD experiment therefore required special approval from management. Consequently, collecting only 13 OOD samples took approximately 3 months.

2.1.2. Numerical Model Selection

The CFD model used in this study is intended to provide a structurally reliable hydrodynamic prior, not a high-fidelity digital twin of every operating excursion observed at the plant. To keep the prior tractable and physically interpretable, all features other than sedimentation duration, influent turbidity, and flow velocity were fixed at their annual mean values during simulation, and the geometry was idealized to the design specifications of the study tank. This deliberate simplification reduces noise in the prior so that downstream models can extract a clean transport pattern, but it also means that the simulated profiles are smoother and more regular than the in-tank dynamics encountered under day-to-day operation. Therefore, the CFD output is not treated as ground truth. Instead, it serves as a hydrodynamic reference manifold that guides the soft-sensing models through the three knowledge-injection paradigms described in Section 2.2. Under this design intent, the multiphase framework, the turbulence closure, and the threshold used to extract the sludge–water interface require justification.

The volume of fluid (VOF) framework was adopted because it treats the suspended-particle volume fraction as a single transported scalar, which is sufficient for locating the longitudinal sludge-layer thickness. Eulerian–Eulerian and Mixture alternatives would resolve interpenetrating dispersed flow in greater detail, but require closure relations—interfacial momentum, turbulent dispersion, and granular stress—whose calibration data are not available for the influent regime of the study tank; using them would add uncertainty to the prior rather than reduce it.

Within this framework, the standard

k

–

ε

closure was used under the RANS formulation. More elaborate closures (RNG or Realizable

k

–

ε

, second-moment closures) would refine the prediction of near-baffle recirculation, but the longitudinal sludge profile is governed primarily by bulk transport rather than by near-wall recirculation, so the additional cost is not justified for the prior-construction purpose of this study. The applicability of the chosen closure depends on whether the simulated flow regime is physically consistent with the actual operating regime of the tank. This consistency is verified through the a-priori hydraulic analysis below. The geometric parameters used in the simulation are listed in Table 2.

To compare the actual hydraulic condition of the sedimentation tank with the numerical simulation, the hydraulic state was first estimated theoretically. The Reynolds number describes the ratio of inertial force to viscous force, and the Froude number describes the ratio of inertial force to gravitational force. They are defined in Equations (1) and (2):

R_{e} = \frac{v R}{γ}

(1)

F_{r} = \frac{v^{2}}{g R}

(2)

where

v

is the characteristic horizontal flow velocity in the tank (m/s),

R

is the hydraulic radius (m),

γ

is the kinematic viscosity of water, and

g

is the gravitational acceleration. Because a guide wall is installed in the front and middle sections of each single tank, the flow cross-sectional area is

ω = 3.5 \times 3.6 = 12.6 m^{2}

, the wetted perimeter is

x = 3.5 + 2 \times 3.6 = 10.7 m

, giving a hydraulic radius

R = 1.17 m

. Substituting into Equations (1) and (2) gives

R_{e} = 29,016

and

F_{r} = 5 \times 10^{- 5}

.

Because

R_{e} ≫ 1

and

F_{r} ≪ 1

, the flow in the tank is turbulent, gravity-dominated, and characterized by a low Froude number. This is consistent with the operating condition of the sedimentation tank and indicates that the simulated flow regime falls within the validated range of the standard

k

–

ε

model under the VOF framework, providing a reliable hydraulic basis for the subsequent solution of the particle-concentration field.

2.1.3. Particle Transport State Modeling Based on Computational Fluid Dynamics

In this study, a fluid dynamics model was used to provide prior knowledge of particle transport under hydraulic effects. After three-dimensional meshing of the sedimentation tank, ANSYS Fluent [27,28] was used with the VOF model for multiphase flow and the standard

k - ε

model for turbulence. The flow regime was characterized by the Reynolds number and the Froude number. The Reynolds number describes the ratio of inertial to viscous forces, while the Froude number describes the ratio of inertial to gravitational forces.

In the transient CFD simulation, to track the transport and deposition of suspended particles in the sedimentation tank, the particle volume fraction is denoted by

α_{s s} (x, t)

. Under the VOF framework, it satisfies the volume-fraction conservation transport equation, as given in Equation (3):

\frac{{\partial α}_{s s} (x, t)}{\partial t} + \nabla \cdot (α_{s s} u) = 0

(3)

where

u

is the velocity vector solved from the RANS equations.

After discretization by the finite volume method over a grid cell

V_{i}

, the numerical result can be written as the time-marching update of the cell volume fraction, as shown in Equation (4):

α_{i}^{n + 1} = α_{i}^{n} - \frac{∆ t}{V_{i}} \sum_{f ϵ \partial V_{i}} [α_{f}^{n} (u_{f}^{n} \cdot n_{f}) A_{f}]

(4)

where

α_{i}^{n}

is the particle-phase volume fraction in cell

i

at time-step

n

,

Δ t

is the time-step size,

f

denotes the cell face,

A_{f}

is the face area,

n_{f}

is the outward unit normal vector, and

u_{f}^{n}

and

α_{f}^{n}

are the velocity and volume fraction on that face, respectively.

To further convert the numerical results into sludge thickness, a threshold

α_{t h} = 0.015

is introduced. The justification for the choice

α_{t h} = 0.015

, including the empirical comparison against alternative threshold values under five representative operating conditions, is provided in Section 3.2. The binary function is defined as in Equation (5):

I (x, t) = H (α_{s s} (x, t) - α_{t h}) = \{\begin{matrix} 1, α_{s s} (x, t) \geq α_{t h} \\ 0, α_{s s} (x, t) < α_{t h} \end{matrix}

(5)

where

H (\cdot)

is the Heaviside function [29].

Since the tank bottom is completely horizontal, let the bottom elevation be the constant

z_{b}

. Then, at any horizontal location

(x, y)

, the sludge thickness

h_{b} (x, y, t)

is defined as the difference between the highest deposited elevation satisfying the threshold condition and the tank bottom, as shown in Equation (6):

h_{b} (x, y, t) = \begin{matrix} m a x \\ z \in [z_{b}, z_{s}] \end{matrix} = z - z_{b} | α_{s s} (x, y, z, t) \geq α_{t h}

(6)

where

z_{s}

is the free-surface elevation.

To determine the sludge–water interface more accurately on the discrete grid, for each fixed

(x, y)

, two adjacent cell centers

(z_{k}, z_{k + 1})

are identified along the vertical grid line such that the volume fraction crosses the threshold

α_{t h}

, namely

(α_{t h} - α_{k}) (α_{k + 1} - α_{t h}) \leq 0

. Linear interpolation is then used to compute the interface elevation

z_{τ} (x, y, t)

, as given in Equation (7):

z_{τ} (x, y, t) = z_{k} + \frac{α_{t h} - α_{k}}{α_{k + 1} - α_{k}} (z_{k + 1} - z_{k})

(7)

where

α_{k} = α_{s s} (x, y, z_{k}, t)

and

α_{k + 1} = α_{s s} (x, y, z_{k + 1}, t)

.

Accordingly, the sludge thickness can be equivalently expressed by Equation (8). Under fixed hydraulic conditions, it is directly related to the simulation time-step. That is, for a given time, the spatial distribution of settling particles in the tank can be obtained:

h_{b} (x, y, t) = z_{τ} (x, y, t) - z_{b}

(8)

2.2. Paradigm Design

2.2.1. Problem Definition

The overall workflow of this task is illustrated in Figure 2. Let the CFD simulation dataset be denoted as

D_{c f d} = {(t_{i}, h_{i}^{c})}_{i = 1}^{N_{c}}

, where

t_{i}

is the simulation duration, and

h_{i}^{c} \in R^{100 \times 1}

is the corresponding spatial profile of bottom sludge thickness. The real observation dataset is denoted as

D_{r e a l} = {(s_{j}, h_{j}^{R})}_{j = 1}^{N_{R}}

, where

s_{j} \in R^{d}

is the SCADA operating condition feature vector, and

h_{j}^{R} \in R^{100 \times 1}

is the bottom sludge thickness profile measured by the ultrasonic interface meter. The two datasets differ mainly in their inputs and represented processes.

D_{c f d}

uses simulation duration

t

as input and characterizes the temporal evolution of bottom sludge under constant hydraulic conditions. In contrast,

D_{r e a l}

uses multi-source operating features

s

as input and reflects the spatial distribution of bottom sludge under real operating conditions.

On this basis, the soft sensing task of spatial sedimentation state under operating conditions can be defined as Equation (9):

F (P_{c f d}, D_{r e a l}) \to \hat{h} = {[{\hat{h}}_{1}, {\hat{h}}_{2}, \dots, {\hat{h}}_{100}]}^{T} \in R^{100 \times 1}

(9)

where

P_{c f d}

denotes the hydrodynamic physical prior learned from

D_{c f d}

, and

{\hat{h}}_{k}

is the predicted bottom sludge thickness at the

k

-th sampling point, with

k = 1,2, \dots, 100

.

The essence of this task is to establish an effective mapping from the observable operating features

s

to the bottom sludge spatial profile

h

, under the constraint of the physical prior

P_{c f d}

. To improve the physical consistency and generalization ability of the mapping function under non-stationary conditions and out-of-distribution scenarios, the prior knowledge of hydraulic particle transport contained in

P_{c f d}

was injected into predictive models in different ways. Accordingly, three physical knowledge fusion paradigms were proposed:

Paradigm I: Physics-Prior Pretraining & Transfer

Paradigm II: Representation Fusion

Paradigm III: Knowledge Distillation

These three paradigms operate in the parameter space, latent representation space, and supervised constraint space, respectively. Their detailed designs are presented in Section 2.2.2, Section 2.2.3 and Section 2.2.4.

While each of these injection mechanisms shares conceptual ancestry with existing physics-informed learning approaches—parameter transfer with hybrid CFD–DL, representation fusion with multi-modal feature engineering, and knowledge distillation with physics-residual constraints—these prior approaches typically inject physical knowledge at a single fixed level in the learning pipeline. To our knowledge, the present work is the first to systematically decompose physical-prior injection across the parameter, latent, and supervision levels and to compare them under a unified evaluation protocol.

2.2.2. Paradigm I: Physics-Prior Pretraining & Transfer

Inspired by cross-disciplinary transfer learning [30], this paradigm adopts a two-stage sequential optimization strategy. First, the model is pretrained on the CFD dataset, so that its parameters implicitly encode the distribution patterns of sediment particles under hydrodynamic effects in the weight space. The pretrained weights are then used to initialize fine-tuning on real operational data. Since the fine-tuning process starts from a parameter neighborhood informed by the underlying hydraulic transport equations [31], it can be viewed as a biased update within a feasible solution space constrained by physical priors. This helps mitigate the loss of generalization caused by limited real data and operating-condition disturbances. The overall process is defined in Equations (10) and (11):

θ_{c f d}^{*} = a r g m i n_{θ} E_{(t, h) \sim D_{c f d}} [l (f_{θ} (t), h)]

(10)

θ_{r e a l}^{*} = a r g m i n_{θ} E_{(s, h) \sim D_{r e a l}} [l (f_{θ} (s), h)] s . t . {θ_{0} = θ}_{c f d}^{*}

(11)

where

θ_{0}

denotes the initialization parameters for the fine-tuning stage.

It should be noted that, in this paradigm, the physical prior acts as an implicit initialization bias. Its effect depends on the domain consistency between the pretraining and real-world tasks, as well as on hyperparameters such as the fine-tuning learning rate.

2.2.3. Paradigm II: Representation Fusion

Unlike Paradigm I, this paradigm does not use pretrained parameters directly. Instead, it builds two encoding branches. The physical information related to sediment particle transport is treated as an additional feature input. The prior physical knowledge is embedded in the latent space, rather than imposing a strong direct constraint on the output. Therefore, this paradigm has higher engineering flexibility.

Paradigm II first encodes the CFD data, as defined in Equation (12):

P_{c f d} = E_{p h y} (t)

(12)

where

E_{p h y}

denotes the physical-information encoder used with the CFD data.

Following Equation (8), the remaining CFD prior—after excluding grid-position information—is directly related to sedimentation time, so

E_{p h y}

is implemented as a sinusoidal time-step encoder that captures the temporal transport pattern implied by the prior.

The operating-condition features are subsequently extracted, as defined in Equation (13):

Z_{o p c} = E_{o p c} (s, t)

(13)

where

E_{o p c}

denotes the operating-condition encoder, which is used together with the SCADA data.

Z_{o p c}

is the operating-condition representation extracted by this encoder.

After the two representations are obtained, the fusion and prediction process is defined by Equation (14):

\hat{y} = P (Φ (P_{c f d}, Z_{o p c}))

(14)

where

Φ (\cdot)

denotes the fusion operator that combines the two types of representations in the latent space, and

P (\cdot)

denotes the prediction head that outputs the sedimentation state prediction.

The optimization objective is given by Equation (15):

\min_{θ_{real}, {θ_{opc}, θ}_{P}} E_{(S, h) \sim D_{r e a l}} [L (P (Φ (P_{c f d}, Z_{o p c})), h)]

(15)

with

θ_{p h y}

,

θ_{o p c}

, and

θ_{P}

denoting the parameters of the two encoders and the prediction head, respectively.

The fusion operator

Φ (\cdot)

is the central component of Paradigm II, since it determines whether the CFD-derived and SCADA-derived latent spaces can be aligned in a way that supports prediction. In this study,

Φ

is implemented as a gated cross-attention module: the operating-condition representation

z_{o p c}

serves as the query, while the CFD-derived representation

z_{c f d}

serves as the key and value. The attention output is then combined with

z_{o p c}

through a learnable gate, so that the contribution of the physical prior is modulated by the current operating condition rather than imposed uniformly. Both encoders are projected to a shared embedding dimension

d = 128

before fusion, ensuring that

z_{c f d}

and

z_{o p c}

are dimensionally compatible. The two encoders and the fusion module are trained jointly under the objective in Equation (15), so that the alignment between

z_{c f d}

and

z_{o p c}

emerges from the supervision signal of the real-domain reconstruction task rather than being enforced by an explicit alignment loss. This data-driven alignment is consistent with the engineering setting of this study, in which the CFD prior is informative but not exact; an explicit alignment loss would over-constrain the latent space toward a CFD-defined manifold that does not perfectly match the real plant dynamics.

2.2.4. Paradigm III: Knowledge Distillation

This paradigm is inspired by physics-informed neural networks (PINNs). PINNs were originally proposed by Raissi, Perdikaris, and Karniadakis as a deep-learning framework for solving forward and inverse problems involving nonlinear partial differential equations [32,33,34], and have since been extended across a broad range of physics-informed machine-learning applications [34]. Their application to hydraulic-prediction problems analogous to the present study has been demonstrated, for instance, in open-channel water-transfer modelling under sparse monitoring data [32]. PINN-based methods usually improve interpretability by explicitly introducing physical constraints, such as equation residuals, into backpropagation. However, under small-sample, noisy, and multiphysics conditions, such methods often suffer from optimization difficulty and unstable convergence. In addition, their engineering application usually requires strict control of boundary conditions, parameter calibration, and numerical stability.

To address these issues, this study adopts a knowledge distillation strategy. CFD-derived hydraulic principles are first learned by a teacher model and then transferred to the real-data model through backpropagation. In this way, physical constraints are introduced into the training objective in an implicit form, which helps avoid the convergence problems of explicit constraint methods.

In this paradigm, a teacher model

f_{T}

is first trained on the CFD dataset so that it can learn the manifold structure induced by physical laws. The training process is defined in Equation (16):

θ_{T}^{*} = a r g m i n_{θ} E_{(t, h) \sim D_{c f d}} [l (f_{T, θ_{T}} (t), P_{c f d})]

(16)

After that, a student model with the same architecture is trained on the real-world data. Its optimization objective is defined in Equation (17):

θ_{S}^{*} = a r g m i n_{θ} E_{(s, h) \sim D_{r e a l}} [L_{s u p} (f_{S, θ_{S}} (s), h) + λ L_{a l i g n} (g_{s} (s), g_{T} (s))]

(17)

where

g_{S} (\cdot)

and

g_{T} (\cdot)

denote the outputs of the target model and the prior model, respectively. In

L_{a l i g n}

, the mean squared error is used as the distillation loss function for prediction alignment.

λ

is a coefficient that controls the strength of the implicit constraint imposed by the prior model.

On this basis, we introduce the physical manifold characterized by CFD as a structural prior. The optimization is therefore not an unconstrained search over the entire feasible space, but is biased toward solutions whose representations can be projected onto this physical manifold or remain sufficiently close to its neighborhood. In this way, physical consistency is incorporated into the learning objective as an implicit regularization term. Even without explicitly introducing equation-residual constraints, this strategy can effectively suppress nonphysical solutions and improve model generalization across different operating conditions.

2.3. Paradigm Evaluation Metrics

To comprehensively evaluate sludge–water interface prediction, we assessed model performance from three complementary perspectives: pointwise accuracy, shape consistency, and mass conservation.

2.3.1. Pointwise Accuracy (PA)

Compared with predicting only effluent turbidity or a few sampled points, profile-level output more directly reflects the in-tank sedimentation state and provides greater value for estimating sludge discharge timing and volume. We therefore use the prediction error at all observation points as an evaluation metric to measure how closely the prediction matches the true state. A lower value indicates better performance. The metric is defined as Equation (18):

P A = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{y}}_{i} - y_{i}|

(18)

where

{\hat{y}}_{i}

denotes the model prediction at the

i

-th observation point,

y_{i}

denotes the value measured by the sludge–water interface meter at the

i

-th point, and

N

is the total number of measurement points. In this study,

N = 100

.

2.3.2. Curvature Consistency Error (CCE)

Pointwise error alone may hide an important issue: a model can fit well on average while still producing local nonphysical oscillations. Such shape distortions may mislead operation decisions and reduce the usefulness of the model in closed-loop control. Since one key aim of this study is to use CFD data as an implicit physical constraint, an additional metric is needed to directly evaluate whether the predicted curve shape is physically reasonable. CCE measures the consistency between the predicted and true profiles in terms of shape variation. A lower value indicates that the model not only fits point values well but also better reproduces the structural shape of the interface curve. It is computed as follows in Equations (19)–(21):

∆^{2} y_{i} = y_{i + 1} - 2 y_{i} + y_{i - 1}

(19)

∆^{2} {\hat{y}}_{i} = {\hat{y}}_{i + 1} - 2 {\hat{y}}_{i} + {\hat{y}}_{i - 1}

(20)

C C E = \frac{1}{N - 2} \sum_{i = 1}^{N - 2} |∆^{2} {\hat{y}}_{i} - ∆^{2} y_{i}|

(21)

where

Δ^{2} y_{i}

is the second-order difference of the true sludge profile, and

Δ^{2} {\hat{y}}_{i}

is that of the predicted profile.

2.3.3. Mass-Conservation Error (MCE)

The cost and risk of sludge discharge depend not only on local points but also on the overall interface level. If the model systematically overestimates the profile, it may lead to excessive discharge, causing water waste and increasing the load on sludge thickening systems. In contrast, systematic underestimation may result in sludge accumulation and deterioration of effluent quality. We therefore introduce the area error of the profile curve as a mass-conservation metric to capture the model’s systematic deviation at the global level. A lower value indicates better prediction accuracy. It is defined in Equation (22):

M C E = ∆ x | \sum_{i = 1}^{N} {\hat{y}}_{i} - y_{i} |

(22)

where

Δ x

is the spacing between sampling points. In this study, the points are evenly spaced and

Δ x = 1

, so the interface curve is approximated by discrete summation.

Unlike PA, this metric allows positive and negative errors to offset each other, and thus better reflects the overall bias of the prediction.

3. Experiments and Discussion

3.1. Experimental Benchmark Setup

The data collection process is described in Section 2.1. We collected 247 paired SCADA and ultrasonic sludge–water interface samples at a 24 h interval as in-distribution (ID) data, and split them into training, validation, and test sets at a ratio of 7:2:1. In addition, 13 OOD samples were collected by varying the sedimentation duration to evaluate model generalization. To avoid information leakage, all data were split chronologically, which is more consistent with online deployment. Missing values and outliers were processed, and continuous features were standardized using statistics from the training set only. To construct transferable physical priors, we further generated CFD data. Based on the average operating condition of the sedimentation tank in 2024 and prior experience [32], seven representative operating conditions were designed, covering different hydraulic loads, suspended solids concentrations, and sedimentation durations. CFD results were sampled every 20 min, yielding 40,222 samples, which were split into CFD training and test sets at a ratio of 8:2. Except for sedimentation duration, influent turbidity, and flow velocity, all other features were fixed at their annual mean values. The simulation parameters are shown in Table 3.

Conditions A and B represent lower flow velocity and lower influent turbidity, respectively, whereas Condition E represents an extreme suspended-solids concentration. The other conditions were designed to examine the effect of sedimentation time on the in-tank state.

The encoder and predictor were first pretrained on the CFD data, and the best weights on the CFD test set were used to initialize training on real-world data. All deep learning models were trained on an NVIDIA GeForce RTX 4090 GPU (Santa Clara, CA, USA) using Python 3.10 and PyTorch (version 2.1). The batch size was 16, the optimizer was AdamW, and the learning rate was 0.0001. Traditional machine learning baselines were implemented with scikit-learn and tuned by grid search on the same training and validation splits. For fairness, each experiment was repeated 10 times with different random seeds, and average results were reported. Unless otherwise noted, model selection and early stopping were both based on validation performance.

3.2. Comparative Experiments with Baseline Models

The aim of this section is to compare traditional machine-learning methods with deep temporal models without introducing physical priors or additional constraints. Accordingly, all models were trained on the real-world training set and evaluated on the test set. We selected several widely used and effective machine-learning models in urban water systems, as well as a range of advanced deep learning predictors, including the physics-informed neural network (PINN) with explicit physical constraints and the diffusion model incorporating flow-related characteristics. For the PINN, both mass-consistency loss and partial differential equation loss were applied. The experimental results are presented in Table 4.

The baseline results show that conventional machine-learning models, including XGBoost, LightGBM, and Random Forest, cannot balance all three metrics well. For example, XGBoost and LightGBM yield relatively high PA values of 0.196 and 0.225, respectively. Although Random Forest reduces PA to 0.116, its CCE rises to 0.154, indicating unstable local shape prediction. This is likely because traditional ML models rely mainly on static or limited lag features [42,43,44] and cannot fully capture the delayed accumulation and nonlinear coupling of the sedimentation process.

The trade-off between mass conservation (MCE) and shape consistency (CCE) is visualized in Figure 2. Models in the lower-left region of the plot achieve low values on both metrics; this region is occupied predominantly by the deep temporal models (LSTM, GRU, and Attention), with Attention achieving the lowest MCE (3.085) and LSTM and GRU achieving the lowest CCE (both 0.032). Conventional ML models cluster in the upper-left or upper-right regions, indicating that they sacrifice one metric for the other. PINN occupies an isolated position in the upper region of the plot (CCE 0.151), and Diffusion lies in the right-most region (MCE 5.428), confirming that neither generative modeling nor explicit physical-residual constraints achieved a balanced trade-off in this baseline setting. Overall, no model in Figure 2 falls into the joint low-CCE and low-MCE region—the visual gap between the deep-temporal cluster and the lower-left corner of the plot quantifies the room for improvement that the three knowledge-fusion paradigms in the next section are designed to fill. The CFD output itself, evaluated as a candidate prior, exhibits absolute errors more than an order of magnitude higher than the best data-driven baselines (PA ≈ 0.80–0.85 at

α_{t h} = 0.015

), confirming that it cannot serve as a stand-alone predictor and must be used as a structural prior rather than as ground truth. At the same time, its profile errors are remarkably stable across the five simulated operating conditions (PA varies by less than 6% and MCE by less than 5% at

α_{t h} = 0.015

), indicating that the CFD output captures a hydrodynamic structure largely invariant to operating-condition perturbations and is therefore appropriate as a transferable prior.

Among the three tested thresholds,

α_{t h} = 0.015

yields the lowest profile-error metrics across all five conditions, while

α_{t h} = 0.005

systematically overestimates and

α_{t h} = 0.05

systematically underestimates the interface elevation.

3.3. Validation Experiments on the Effectiveness of Paradigm Design

This section evaluates whether the three hydraulic particle-transport prior fusion paradigms proposed in this study can systematically improve profile prediction performance, and examines their effects on the three metrics. The three paradigms are described in Section 2.2.2, Section 2.2.3 and Section 2.2.4. To reduce model-specific bias, we compared all predictors under the three paradigms, and the results are shown in Table 5.

When averaged across the six predictors, the three paradigms produced different improvement profiles relative to the unfused baselines. Paradigm I reduced PA by 25.1%, CCE by 8.7%, and MCE by 47.6% on average; it was the only paradigm that improved all three metrics simultaneously when averaged across predictors. Paradigm III reduced PA by 36.9% and MCE by 50.0% on average—the largest gains in those two metrics—but CCE increased by 7.4% on average relative to the unfused baselines, indicating that distillation-based supervision in our task did not preserve fine-grained curvature structure as well as parameter-level injection. Paradigm II reduced PA by 4.0% and MCE by 15.2% on average, while CCE increased by 5.7%; this confirms that latent-level fusion alone produced smaller and less consistent gains in our experiments than the other two routes. Counted by predictor-level wins, Paradigm I produced the lowest PA in three of the six predictors and the lowest MCE in four; Paradigm III produced the lowest PA in three predictors and the lowest MCE in two. Among the eighteen paradigm–predictor combinations, the Attention predictor under Paradigm I produced the lowest PA (0.026, tied with MLP under Paradigm III) and the lowest MCE (1.052); the lowest CCE in the same comparison was produced instead by GRU under Paradigm I (0.032), with Attention under Paradigm I producing the third-lowest CCE (0.034). MLP also showed substantial improvement under Paradigm I, indicating that the benefit of CFD-informed initialization in our setting was not restricted to sequence-based predictors. Among individual large gains, GRU under Paradigm III reached PA 0.033, CCE 0.033, and MCE 1.364, while PINN improved from PA 0.086 (unfused) to PA 0.028 under Paradigm III. The Diffusion model showed limited improvement under all three paradigms, suggesting that, in the small-sample sedimentation setting examined here, generative-style modeling did not benefit from CFD prior injection to the same extent as discriminative architectures.

The visualization in Figure 3 is consistent with the quantitative results above. Under the ID setting, Paradigm I produced profiles whose global shape—including the rapid decay near the inlet, the plateau region, and the rising tail near the outlet—visually tracked the observed profile more closely than Paradigms II and III in most samples. The Attention predictor under Paradigm I produced the visually closest match in our sample selection, without large systematic deviation in the rising tail; this is consistent with its low PA and MCE in Table 5. Paradigm II produced visually stable profiles in some samples but exhibited residual fluctuations and slight drift in the plateau region in other samples, consistent with the higher MCE of Paradigm II across most predictors in Table 5; this issue is examined further in Section 3.4. Paradigm III produced visually smoother profiles in the plateau region with fewer high-frequency excursions, consistent with the lower CCE values reported for GRU and PINN under Paradigm III in Table 5, although the cross-predictor average CCE under Paradigm III was higher than the unfused baselines as noted above. Across predictors, LSTM and GRU exhibited local smoothness but showed cumulative systematic drift in the plateau region under long-step prediction. MLP produced occasional local bulges or overall over-estimation, consistent with its lack of explicit temporal modeling. PINN exhibited spikes and high-frequency fluctuations across multiple seeds under our small-sample, noisy real-data setting, indicating that, in this regime, the explicit physical-residual constraints did not consistently translate into smooth profile reconstruction; this is consistent with prior reports of optimization difficulty for PINNs under small-data and high-noise conditions [26].

Computational differences among the three paradigms mainly arise during training, whereas their inference costs are comparable. To ensure a fair paradigm-level comparison, the reported computational costs were averaged across different encoder backbones rather than measured using a single predictor, because training time, GPU memory usage, and inference latency are jointly affected by both the fusion paradigm and the encoder complexity. Therefore, only encoder-averaged results can provide a meaningful basis for comparing the computational burden of the three paradigms. On a single NVIDIA RTX 4090 GPU, the average per-seed training time across encoder backbones was approximately 38 min for Paradigm I, 45 min for Paradigm II, and 62 min for Paradigm III. The corresponding peak training-stage GPU memory was 2.8 GB, 3.4 GB, and 5.1 GB, respectively. The larger memory footprint of Paradigm III is mainly due to teacher–student distillation. During inference, Paradigms I and III retain only the deployed predictor, while Paradigm II additionally retains the physical-prior encoder and fusion module. The average inference-stage GPU memory was 1.2 GB for Paradigms I and III and 1.5 GB for Paradigm II, with per-sample latency of 18 ms, 23 ms, and 18 ms, respectively. These results indicate that all three paradigms satisfy the latency requirement for online operation, with Paradigm I having the lowest average training cost and Paradigms I and III having comparable inference efficiency.

3.4. Ablation of Physical Prior Mechanisms

This section conducts ablation studies on two key components of physical prior injection. First, we examine whether the CFD-based pretraining transfer in Paradigm I requires end-to-end fine-tuning for effective adaptation in the real domain. Second, we compare different fusion strategies in Paradigm II and analyze their effects on pointwise accuracy, shape consistency, and mass conservation. These experiments isolate the contribution of each fusion mechanism. The results are shown in Table 6 and Table 7. The best-performing Attention model was used in all experiments.

The ablation results for Paradigm I show that, compared with training from scratch, pretraining without predictor fine-tuning did not improve performance and instead produced higher errors. This pattern is consistent with the interpretation that the CFD-derived initialization provides a useful starting point in parameter space, but that domain bias between the simulated and real distributions still requires correction through gradient updates on real data. The result therefore characterizes the operating regime under which Paradigm I is effective in our experiments rather than asserting a general advantage of pretraining-only initialization.

As shown in Table 7, the ablation results for different fusion strategies in Paradigm II indicate that direct concatenation performs worst on all three metrics. This suggests that treating physical priors as ordinary input features can damage the separability of latent representations, induce local nonphysical oscillations, and increase global bias, as reflected by the much higher MCE. In contrast, gating achieves the most balanced performance, followed by attention fusion. This indicates that selectively injecting physical priors through learnable weighting mechanisms can significantly improve stability and physical consistency. A possible reason is that, although the particle concentration distribution in CFD data follows physical laws, the representations extracted from CFD may still conflict with those learned from real SCADA data. Therefore, the information from different representations must be adaptively reorganized to reduce such conflicts.

3.5. Evaluation Experiments on Generalization and Out-of-Distribution Extrapolation Performance

This section evaluates model behavior under one form of OOD shift, namely a systematic change in the sludge-discharge cycle. The OOD scenario considered here is therefore a structural shift in operating policy rather than a perturbation in input features or a change in source-water properties. Since bottom sludge accumulation is strongly coupled with sedimentation duration, this distribution shift directly changes both the overall level and the spatial distribution of the sludge–water interface profile. To characterize task difficulty under discharge-cycle shift and examine model sensitivity to output structure, we first report the direct prediction performance of different decoders on the OOD data. Specifically, the best parameters obtained for each predictor in the ID setting were directly tested on the OOD data. The results are shown in Figure 4 and Table 8.

Compared with the ID results, the traditional machine learning models XGBoost, LightGBM, and Random Forest show MCE values of 18.813, 16.910, and 15.068, respectively, with substantially increased CCE. This indicates that, when the discharge cycle changes and the output distribution shifts as a whole, static regression models cannot adequately represent process dynamics or constrain profile shape and total mass, resulting in limited robustness. The results also show clear performance differences across decoders under discharge-cycle shift, together with strong error amplification. For example, MLP and Diffusion yield PA values of 1.317 and 2.981, respectively, while their MCE values rise to 112.519 and 27.538. This suggests that some models are prone to systematic mismatch in the overall profile level under target distribution shift. In contrast, Attention achieves the lowest PA and MCE, while GRU and LSTM obtain lower CCE values, indicating that temporal memory structures contribute to shape stability, although they still fail to control global mass error effectively. The radar plot shows that inter-model dispersion in the OOD setting is larger than in the ID setting. Traditional machine-learning models occupy a relatively compact region with lower mass error but moderate shape consistency. MLP and Diffusion extend further from the radar centroid across multiple metrics in the OOD setting, consistent with their elevated PA (1.317 and 2.981) and MCE (112.519 and 27.538) in Table 8. Attention, GRU, and LSTM occupy a more compact radar region, consistent with their lower OOD error metrics in the same table. Under direct OOD evaluation (i.e., without target-domain fine-tuning), the lowest observed PA across all baselines was 0.193 and the lowest observed MCE was 15.068—approximately 7-fold and 14-fold higher than the corresponding lowest values under the ID setting (Table 4 and Table 5). This motivates the few-shot adaptation experiments described next.

We further introduced a small number of labeled OOD samples for fine-tuning, and evaluated two adaptation settings: 4-shot and 6-shot. That is, 4 or 6 OOD samples were used for light fine-tuning, and the remaining samples were used for testing. The results are shown in Table 9.

The results show clear differences among the three paradigms under discharge-cycle shift. In the 0-shot setting, Paradigm III produced the lowest values on all three metrics: relative to the second-best paradigm in this setting (Paradigm I), its PA, CCE, and MCE were 33.3%, 50.9%, and 33.8% lower, respectively. This is consistent with the interpretation that, even without target-domain supervision, the distillation route biases the predictor toward the CFD-derived profile region, which mitigates both systematic plateau-level shifts and high-frequency local excursions under cycle change. Under few-shot fine-tuning, the three paradigms differed more strongly in shape consistency and mass conservation than in pointwise accuracy. In the 4-shot setting, Paradigm II produced the lowest PA (0.061) by a small margin over Paradigm III (0.062), while Paradigm III retained the lowest CCE (0.029, 34.1% lower than Paradigm II) and the lowest MCE (3.678, 19.8% lower than Paradigm II). When the number of fine-tuning samples increased to 6-shot, Paradigm III produced the lowest values on all three metrics: relative to the second-best paradigm in this setting (Paradigm II), its PA, CCE, and MCE were 7.6%, 29.3%, and 38.1% lower respectively. By contrast, Paradigm I exhibited an increase in CCE from 0.124 (4-shot) to 0.142 (6-shot), suggesting that under structural distribution shift caused by discharge-cycle changes, a transfer mechanism based only on initialization bias may be more sensitive to target-domain sample distribution than the distillation route. To further illustrate the differences among paradigms, predictions on the OOD data are visualized in Figure 5.

Figure 5 shows that, under OOD conditions dominated by changes in sedimentation duration, the three physical-prior fusion paradigms affect sludge–water interface prediction in distinct ways. Compared with the ID setting, OOD samples show more pronounced systematic shifts in the overall level of the platform region, together with stronger and more irregular local disturbances. This is consistent with the physical mechanism that sludge accumulation increases with sedimentation time, and it also indicates that changes in the discharge cycle make both global mass control and local shape fitting more difficult. As shown in Figure 5, the main advantage of Paradigm I does not lie in local smoothness. In several samples, obvious high-frequency oscillations and local fluctuations still appear in the platform region. However, compared with other methods, Paradigm I better preserves the overall contour of key structures, including the sharp drop near the inlet, the position of the platform region, and the rise near the outlet. As a result, its predicted curves are generally closer to the true profiles in terms of global level and main trend. Its strength should therefore be understood as better control of global structure and absolute bias, rather than suppression of local oscillations. Paradigm II jointly models CFD-derived physical information as latent conditional features together with real operating-condition representations. Under OOD conditions, its performance depends more strongly on the quality of cross-domain representation alignment. When the physical representation matches the target operating-state distribution well, the model can maintain a reasonable trend to some extent. However, in some samples, the later part of the platform region still shows persistent overestimation, underestimation, or local drift, indicating that feature enhancement alone is insufficient to provide stable output constraints in out-of-distribution settings. By contrast, the advantage of Paradigm III is most evident at the shape level. Its predicted curves are more continuous and smoother overall, with fewer high-frequency fluctuations and spike-like anomalies in the platform region. This suggests that, under teacher-model guidance, the solution space of the student model is constrained to a neighborhood closer to the feasible manifold characterized by CFD, which is more beneficial for preserving profile consistency and physical plausibility. Overall, under OOD conditions, the key challenge is not the sharp decline near the inlet, but the systematic drift of the platform region and local nonphysical oscillations. The former leads to cumulative errors in total sludge discharge estimation, while the latter increases the risk of misjudging local sludge accumulation or short-circuit flow. In this respect, Paradigm III shows the most stable shape consistency under zero-shot and few-shot adaptation.

4. Conclusions

To address the limited physical reliability of data-driven methods and the difficulty of using pure CFD methods for real-time state estimation, this study developed a soft-sensing framework for sedimentation-state prediction by integrating SCADA operational data, measured sludge–water interface data, and CFD-based hydraulic priors. The framework enables effective prediction of the true in-tank sedimentation state from observable operating variables. On this basis, we further proposed and systematically compared three paradigms for physical-knowledge integration, namely parameter transfer, representation fusion, and knowledge distillation. These paradigms inject the hydraulic structure and particle-transport priors encoded in CFD into the parameter space, latent representation space, and supervision-constraint space, respectively.

Averaged across the six predictors, Paradigm I reduced PA, CCE, and MCE by 25.1%, 8.7%, and 47.6% relative to the unfused baselines, and was the only paradigm that improved all three metrics simultaneously. Paradigm III produced the largest average reductions in PA (36.9%) and MCE (50.0%) but increased CCE by 7.4%, indicating that distillation-based supervision did not preserve curvature structure as effectively as parameter-level injection. Paradigm II produced the smallest gains, with an average CCE increase of 5.7%. Under the in-distribution condition (24 h discharge cycle), Attention with Paradigm I produced the lowest PA (0.026, tied with MLP under Paradigm III) and the lowest MCE (1.052) among the eighteen paradigm–predictor combinations; the lowest CCE was produced instead by GRU under Paradigm I (0.032), with Attention under Paradigm I third (0.034). Within-architecture, Attention under Paradigm I reduced PA, CCE, and MCE by 50.0%, 10.5%, and 65.9% relative to unfused Attention—quantifying the benefit of CFD-informed initialization, not a cross-model ranking. Cross-combination, the corresponding reductions over the second-best of the eighteen combinations were 7.1% in PA (vs. PINN–III, 0.028) and 2.9% in MCE (vs. MLP–III, 1.083). Under the out-of-distribution condition, Paradigm III produced the lowest CCE and MCE in all three shot configurations, while Paradigm II produced the lowest PA in the 4-shot setting only; relative to the second-best paradigm, Paradigm III reduced PA, CCE, and MCE by 33.3%, 50.9%, and 33.8% in the zero-shot setting, and by 7.6%, 29.3%, and 38.1% in the 6-shot setting. The two routes therefore exhibit a complementary trade-off in our experimental scope: parameter transfer benefits ID prediction through CFD-informed initialization, while distillation benefits OOD prediction by biasing the predictor toward the CFD-derived region, at the cost of weaker ID curvature consistency.

The main contribution of this study lies not only in improving sedimentation-state prediction accuracy, but also in proposing and validating an engineering-oriented soft-sensing framework for hydraulic knowledge integration that is reusable across models and can provide complementary advantages under different operating conditions. This framework offers a unified and verifiable methodological basis for sludge discharge optimization, online state perception in smart water treatment plants, and digital-twin-driven fine-grained operation. Despite the relative paradigm orderings observed under both ID and OOD evaluation in this study, four boundaries of the present work should be acknowledged. First, the framework was validated at a single 100,000 m³/d plant with one tank geometry and one influent-water regime over approximately 240 days of campaign data and a subsequent two-month deployment. Cross-plant generalization to tanks with different aspect ratios, coagulation chemistries, and source-water characteristics therefore remains to be demonstrated, and is the most immediate direction for follow-up work. Second, the OOD evaluation was constructed from thirteen reference profiles collected under extended sludge-discharge cycles, with virtual interpolation used only for supplementary response analysis. A larger OOD set would strengthen the robustness claim for Paradigm III, but expanding it in full-scale operation is constrained by safety review, since each profile requires deliberately prolonging the discharge interval; lightweight reference-data acquisition that does not depend on extended cycles is therefore a key methodological extension for future work. Third, the CFD priors were generated under simplified conditions, with most operating features held at annual-mean values and only sedimentation duration, influent turbidity, and flow velocity varied across the five simulated regimes. Although the simulated flow regime was verified to be physically consistent with that of the study tank through an a-priori Reynolds–Froude analysis, a systematic sensitivity study of how variations in turbulence-model parameters, inlet conditions, and grid resolution propagate into prediction performance was beyond the scope of this validation campaign and is identified here as a natural extension, particularly when porting the framework to plants with markedly different geometries. Fourth, in Paradigm II the alignment between the CFD-derived and SCADA-derived latent representations is learned implicitly from the supervision signal rather than enforced by an explicit alignment loss; while this avoids over-constraining the latent space toward a CFD-defined manifold, a principled comparison of supervision-driven and constraint-driven alignment under varying degrees of distribution shift remains an open methodological question. Taken together, these boundaries delineate the validated regime of the proposed framework rather than indicate fundamental obstacles, and each suggests a concrete and tractable next step.

Author Contributions

Conceptualization, Y.K., W.W. and W.S.; methodology, X.M., Y.K., W.W. and W.S.; formal analysis, X.M., W.W. and W.S.; investigation, X.M. and Y.K.; resources, Y.K., W.W. and W.S.; data curation, X.M. and Y.K.; writing—original draft preparation, X.M. and Y.K.; writing—review and editing, Y.K., W.W. and W.S.; visualization, X.M., W.W. and W.S.; supervision, W.W., Y.K. and W.S.; project administration, Y.K., W.W. and W.S.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the Yellow River Conservancy Commission and the China Foundation for the Protection of the Yellow River (Grant No. CYRF2025-ZZ022).

Data Availability Statement

The data can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Architectural settings of the deep-learning predictors.

Model	Architecture	Hidden/Depth	Output Dim
MLP	3-layer feedforward, ReLU activation, dropout 0.1	[256, 128, 100]	100
LSTM	2-layer unidirectional, dropout 0.1 between layers	hidden 128	100
GRU	2-layer unidirectional, dropout 0.1 between layers	hidden 128	100
Attention	Transformer encoder, 4 heads, 2 layers, FFN dim 256, dropout 0.1	d_model = 128	100
Diffusion	DDPM-1D conditional denoising network, U-Net backbone	4 down/up blocks, base channels 64	100
PINN	MLP backbone + dual constraint losses (mass-consistency + 1-D sedimentation PDE residual)	[256, 256, 100]	100

Table A2. Hyperparameter grids and selected values of the tree-based machine-learning baselines.

Model	Architecture	Hidden/Depth
XGBoost (xgboost 1.7)	max_depth ∈ {3, 5, 7}; n_estimators ∈ {100, 300, 500}; learning_rate ∈ {0.01, 0.05, 0.1}	max_depth = 5; n_estimators = 300; learning_rate = 0.05
LightGBM (lightgbm 4.0)	num_leaves ∈ {15, 31, 63}; n_estimators ∈ {100, 300, 500}; learning_rate ∈ {0.01, 0.05, 0.1}	num_leaves = 31; n_estimators = 300; learning_rate = 0.05
Random Forest (sklearn)	n_estimators ∈ {100, 300, 500}; max_depth ∈ {None, 10, 20}; min_samples_split ∈ {2, 5}	n_estimators = 300; max_depth = 20; min_samples_split = 2

Table A3. Implementation settings of the three knowledge-fusion paradigms.

Model	Architecture
Paradigm I—pretraining epochs (CFD)	100
Paradigm I—fine-tuning epochs (real data)	up to 200, early stop
Paradigm I—pretraining learning rate	1 × 10⁻⁴
Paradigm I—fine-tuning learning rate	1 × 10⁻⁴
Paradigm II—fusion module	gated cross-attention, shared embedding dim 128
Paradigm II—sinusoidal time-step encoder dim	128
Paradigm III—distillation loss	MSE on output profiles
Paradigm III—distillation weight $λ$	0.5 (selected from {0.1, 0.3, 0.5, 0.7, 1.0})
All paradigms—data splits and seed protocol	identical to baselines (7:2:1 chronological, 10 seeds)

References

Soleimani, R.; Hassan, R.; Baghban, A. Towards cleaner water by leveraging AI for optimizing coagulation processes in microplastic removal. J. Environ. Chem. Eng. 2025, 14, 120725. [Google Scholar] [CrossRef]
Zhang, T.; Wei, W.; Hong, Y.; Wang, J.; Zhang, J.; Pei, Y.; Li, Q. CFD method for the effect of baffle locations and baffle lengths on the hydraulic characteristics of a horizontal sedimentation tank. Desalination Water Treat. 2024, 317, 100187. [Google Scholar]
Epoyan, S.; Sukhorukov, G.; Haiduchok, O.; Volkov, V. The method and research of a horizontal settler with improved design. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2023; Volume 2490, p. 060017. [Google Scholar]
Chen, W.Y.; Zhang, X.H.; Liu, J.Q.; Fu, Z.Y.; Huang, P.P.; Zhang, W.P. Prediction of spatial and temporal distribution of sludge level in horizontal sedimentation tanks based on hybrid model. Water Purif. Technol. 2025, 44, 69–77. [Google Scholar]
Nie, L.W. Investigation on Waste Water in Drinking Waterworks and One Typical Case Study; Harbin Institute of Technology: Harbin, China, 2013. [Google Scholar]
Şenol, H.; Çakır, İ.T.; Bianco, F.; Görgün, E. Improved methane production from ultrasonically-pretreated secondary sedimentation tank sludge and new model proposal: Time series (ARIMA). Bioresour. Technol. 2024, 391, 129866. [Google Scholar] [CrossRef]
Mendoza, E.; Andramuño, J.; Núñez, J.; Benítez, I. Deliberative architecture for smart sensors in the filtering operation of a water purification plant. J. Phys. Conf. Ser. 2021, 1730, 012088. [Google Scholar]
Safonyk, A.; Bomba, A.; Tarhonii, I. Modeling and automation of the electrocoagulation process in water treatment. In Proceedings of the Conference on Computer Science and Information Technologies, Lviv, Ukraine, 11–14 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 451–463. [Google Scholar]
Al Aani, S.; Bonny, T.; Hasan, S.W.; Hilal, N. Can machine language and artificial intelligence revolutionize process automation for water treatment and desalination? Desalination 2019, 458, 84–96. [Google Scholar] [CrossRef]
Deepa, A.; Mastan, A.; Buddolla, V. Advances in imaging techniques for real-time microbial visualization in wastewater treatment reactors: Challenges, applications, and process optimization. TrAC Trends Anal. Chem. 2025, 188, 118227. [Google Scholar] [CrossRef]
Tochio, E.L.L.; do Nascimento, B.C.; Lautenschlager, S.R. Coagulant dosage prediction in the water treatment process. Water Supply 2023, 23, 3515–3531. [Google Scholar] [CrossRef]
Kim, J.; Hua, C.; Lin, S.; Kang, S.; Kang, J.H.; Park, M.H. Deep learning-based coagulant dosage prediction for extreme events leveraging large-scale data. J. Water Process Eng. 2024, 66, 105934. [Google Scholar]
Zhao, W.; Liu, Y.; Shuang, R.; Lu, W.; Li, S.; Zhao, C.; Dou, C.; Shu, H. Regulation of coagulant dosage in water treatment based on explainable integrated time-series deep learning models. Process Saf. Environ. Prot. 2025, 201, 107613. [Google Scholar]
Liu, H.; Chen, Y.; Pan, X.; Zhang, J.; Huang, J.; Lichtfouse, E.; Zhou, G.; Ge, H. Image recognition enhances efficient monitoring of the coagulation-settling in drinking water treatment plants. J. Clean. Prod. 2024, 482, 144251. [Google Scholar] [CrossRef]
Zhou, Z.; Wang, B.; Huang, Z.; Wu, X.; Yang, W.; Guo, G.; Qiu, S.; Yang, J.; Zhou, A. Floc image-driven deep learning enhanced by temporal windows and transformers for carbon emission reduction in drinking water treatment plants. Water Res. 2025, 289, 124868. [Google Scholar] [CrossRef] [PubMed]
Hadi, G.A.; Kriš, J. A CFD methodology for the design of rectangular sedimentation tanks in potable water treatment plants. J. Water Supply Res. Technol. AQUA 2009, 58, 212–220. [Google Scholar] [CrossRef]
Gao, H.; Stenstrom, M.K. Development and applications in computational fluid dynamics modeling for secondary settling tanks over the last three decades: A review. Water Environ. Res. 2020, 92, 796–820. [Google Scholar] [CrossRef]
Raeesh, M.; Devi, T.T.; Hirom, K. Recent developments on application of different turbulence and multiphase models in sedimentation tank modeling—A review. Water Air Soil Pollut. 2023, 234, 5. [Google Scholar] [CrossRef]
Hirom, K.; Devi, T.T. Application of computational fluid dynamics in sedimentation tank design and its recent developments: A review. Water Air Soil Pollut. 2022, 233, 22. [Google Scholar] [CrossRef]
Ma, W.; Zhang, C.; Pan, C.; Lu, M.; Li, J.; Ma, J.; Ma, Y.; Qin, K. CFD-based flow simulation and optimization of horizontal tube sedimentation tanks (HTST). Process Saf. Environ. Prot. 2025, 199, 107287. [Google Scholar] [CrossRef]
Bruno, P.; Bruno, F.; Di Bella, G.; Napoli, E.; De Marchis, M. CFD study on the effect of the baffles geometry in sedimentation efficiency in wastewater treatments through Large Eddy Simulations. J. Environ. Manag. 2025, 373, 123536. [Google Scholar] [CrossRef]
Zhao, W.; Liu, Y.; Wang, L.; Li, S.; Sun, L.; Shuang, R. A decision support system for coagulant dosage using explainable NSGA-II optimized stacking framework. J. Water Process Eng. 2026, 83, 109672. [Google Scholar] [CrossRef]
Zhang, W.; Zou, Z.; Sui, J. Numerical simulation of a horizontal sedimentation tank considering sludge recirculation. J. Environ. Sci. 2010, 22, 1534–1538. [Google Scholar] [CrossRef] [PubMed]
Yu, P.F.; Wang, A.; Sun, H.W.; Wang, D.; Ma, X.G.; Wang, Y.J.; Hu, B.M.; Han, L.Y.; Fu, Y.B.; Jiang, D.L.; et al. The influence of water flow disturbance form on the granulation of algae sludge: Granulation property, treatment effect and algae biological structure. J. Water Process Eng. 2025, 78, 108691. [Google Scholar] [CrossRef]
Locatelli, F.; Laurent, J.; François, P.; Bekkour, K. In situ monitoring of activated sludge batch settling using an ultrasonic device. In Proceedings of the 11th IWA Conference on Instrumentation Control and Automation, Narbonne, France, 18–20 September 2013. [Google Scholar]
Beshay, P.F.; Goh, M.H.; Ang, E.Y.; Kang, C.W.; Ng, T.Y.; Wang, P.C. A review of the applicability and limitations of current single droplet dynamics modelling implemented in Ansys-fluent. Int. Commun. Heat Mass Transf. 2025, 166, 109134. [Google Scholar] [CrossRef]
Abdelrazik, A.S.; Sharafeldin, M.A.; Elwardany, M.; Masoud, A.M.; Allam, A.N.; Shboul, B.; Eissa, A.O.; Aliyu, M. Computational modeling of high-concentration solar systems using ANSYS-Fluent: Verified models, implemented methods, and existing challenges. Renew. Sustain. Energy Rev. 2026, 226, 116305. [Google Scholar] [CrossRef]
Chen, H.C. CFD simulation of directional short-crested waves on jack-up structure. Int. J. Offshore Polar Eng. 2013, 23, 38–45. [Google Scholar]
Legua, M.P.; Morales, I.; Ruiz, L.S. The heaviside function and Laplace transforms. In Proceedings of the 10th WSEAS International Conference on Applied Mathematics, Dallas, TX, USA, 1–3 November 2006. [Google Scholar]
Xiong, Z.; Wang, Z.; Huang, F.; Qiu, M.; Fang, S.; Yang, L.; Zhou, X.; Liu, S.; Zhang, P.; Zhang, W. Multi-to-uni modal knowledge transfer pre-training for molecular representation learning. Nat. Commun. 2026, 17, 3797. [Google Scholar] [CrossRef]
Ougahi, J.H.; Rowan, J.S. Investigating deep learning knowledge transfer in streamflow prediction from global to local catchment. Water Resour. Res. 2026, 62, e2025WR041194. [Google Scholar] [CrossRef]
Wu, H.; Ji, J. S3-Fire: Enhanced generalization in spatiotemporal fire prediction with selective state and spectral spaces. Expert Syst. Appl. 2025, 287, 128083. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Wang, D.; Guo, H.; Sun, Y.; Liang, H.; Li, A.; Guo, Y. Prediction of oil–water two-phase flow patterns based on Bayesian optimisation of the XGBoost algorithm. Processes 2024, 12, 1660. [Google Scholar] [CrossRef]
Zhou, S.; Song, C.; Zhang, J.; Chang, W.; Hou, W.; Yang, L. A hybrid prediction framework for water quality with integrated W-ARIMA-GRU and LightGBM methods. Water 2022, 14, 1322. [Google Scholar] [CrossRef]
Budak, İ. Prediction of water quality’s pH value using Random Forest and LightGBM algorithms. MEMBA Su Bilim. Derg. 2025, 11, 42–49. [Google Scholar] [CrossRef]
Li, Z.; Mu, T.; Li, X.; Li, P.; Feng, J.; Xu, H.; Liu, C.; Qian, S. Physics-informed neural network for hydraulic prediction in open-channel water transfer projects with sparse monitoring data. Water Res. 2025, 287, 124507. [Google Scholar] [CrossRef] [PubMed]
Wu, W.; Kang, Y. Ensemble empirical mode decomposition Granger causality test dynamic graph attention transformer network: Integrating transformer and graph neural network models for multi-sensor cross-temporal granularity water demand forecasting. Appl. Sci. 2024, 14, 3428. [Google Scholar] [CrossRef]
Rishi, J.; Mothish, G.V.S.; Subramani, D. Conditional diffusion model with nonlinear data transformation for time series forecasting. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
Li, H.; Liu, C.; Guo, X.; Sun, H.; Li, X.; Jiang, H.; Miao, S. Applying machine learning approach to design operational control strategies for a wastewater treatment plant in typical scenarios. Water 2025, 17, 310. [Google Scholar] [CrossRef]
Gyparakis, S.; Trichakis, I.; Daras, T.; Diamadopoulos, E. Artificial neural networks (ANNs) and multiple linear regression (MLR) analysis modelling for predicting chemical dosages of a water treatment plant (WTP) of drinking water. Water 2025, 17, 227. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Wen, K.; Guo, L.; Xia, Z.; Cheng, S.; Chen, J. A hybrid simulation method integrating CFD and deep learning for gas–liquid bubbly flow. Chem. Eng. J. 2024, 495, 153515. [Google Scholar] [CrossRef]
Wu, W.; Zhang, J.; Chen, B.; Xu, Y.; Xu, H. Wisdom in the details: An instance-level decoupled explainable framework for survival-analysis-based risk assessment in water distribution networks. J. Water Process Eng. 2026, 82, 109475. [Google Scholar] [CrossRef]

Figure 1. Framework.

Figure 2. Trade-off between curvature consistency error (CCE, x-axis) and mass-conservation error (MCE, y-axis) for nine baseline predictors evaluated without prior fusion. Each marker represents one predictor; the lower-left corner corresponds to simultaneously low CCE and low MCE. No baseline predictor reaches this region, motivating the knowledge-fusion paradigms developed in Section 2.2.

Figure 3. Predicted versus observed bottom-sludge profiles on the ID test set, comparing the three knowledge-fusion paradigms across six predictors. Rows correspond to predictors (LSTM, GRU, MLP, Attention, Diffusion, PINN); columns correspond to paradigms (Paradigm I, Paradigm II, Paradigm III). The dashed black line shows the observed profile, and colored lines show paradigm-specific predictions. Three regions of each profile are referred to in the text: the rapid decay near the inlet, the plateau region, and the rising tail near the outlet.

Figure 4. Direct OOD prediction performance of nine predictors trained only on ID data, visualized as a radar plot of PA, CCE, and MCE. Each axis is normalized so that values closer to the centre indicate better performance; each polygon corresponds to one predictor.

Figure 5. Predicted versus observed bottom-sludge profiles on the OOD test set under three target-domain support levels. Rows correspond to support levels (0-shot, 4-shot, 6-shot); columns correspond to paradigms (Paradigm I, Paradigm II, Paradigm III). The dashed black line shows the observed profile, and colored lines show paradigm-specific predictions.

Table 1. SCADA data variables.

Feature Name	Value Range	Mean	Unit	Description
Sedimentation tank pH	7.69–8.54	8.12	—	Acidity of the water
Raw water COD	1.13–15.49	7.69	mg/L	Level of oxidizable organic pollution
Raw water dissolved oxygen	0.69–11.90	6.32	mg/L	Oxygen content in the water
Raw water turbidity	1.61–30.12	5.52	NTU	Degree of water cloudiness caused by suspended particles; strongly affected by environmental fluctuations
Instantaneous flow rate	1912.5–4639.18	2254	m³/h	Instantaneous influent flow rate of the plant, reflecting the daily operating water volume
Sodium hypochlorite dosage	16.00–600.12	200.40	L	Mainly used for disinfection to ensure microbial safety
Algaecide dosage	0.10–2.38	1.42	Kg	Daily algaecide consumption, used to suppress and remove algae and some algal metabolites in raw water
PAC dosage	0.04–166.62	143.28	L	Daily dosage of polyaluminum chloride (PAC), which promotes the rapid formation of settleable flocs from fine suspended solids
Coagulant dosage	0.12–85.75	28.67	Kg	Daily consumption of coagulant aid, usually used with PAC to enlarge floc size, improve floc stability, and enhance fine floc formation
Sedimentation duration	24	24	Hour	Time interval between two sludge discharge operations; the default interval is 24 h

Table 2. Parameters of horizontal flow sedimentation tank.

Items	Value
Dimensions of one sedimentation tank	96.5 × 14.75 × 4.0 m
Number of corridors in one tank	4
Dimensions of one corridor	96.5 × 3.5 × 4.0 m
Water depth of the sedimentation tank	4 m (freeboard: 0.4 m)
Thickness of the perforated distribution wall	0.3 m
Horizontal spacing of square holes	0.285 m
Vertical spacing of square holes	0.3 m

Table 3. Fluent Simulation Parameters.

Conditions	Hydraulic Load (m³/h)	Suspended Solids Concentration (mg/L)	Simulation Time (h)
A	1127	45	48
B	2254	15	48
C	2254	45	24
D	2254	45	240
E	2254	90	48

Table 4. Predictor performance without introducing the knowledge fusion paradigm. The implementation parameters of each model are shown in Appendix A Table A1, Table A2 and Table A3.

Model Class	Predictor	PA ↓	CCE ↓	MCE ↓	Reference
ML	xgboost	0.196 ± 0.017	0.055 ± 0.006	8.813 ± 0.704	Wang et al. [34] (2024)
	LightGBM	0.225 ± 0.019	0.047 ± 0.005	8.910 ± 0.726	Zhou et al. [35] (2022)
	RandomForest	0.116 ± 0.011	0.154 ± 0.013	7.068 ± 0.581	Budak et al. [36] (2025)
	PINN	0.086 ± 0.008	0.151 ± 0.012	6.322 ± 0.497	Li et al. [37] (2025)
	Attention	0.052 ± 0.004	0.038 ± 0.004	3.085 ± 0.248	Wu et al. [38] (2024)
	Diffusion	0.097 ± 0.009	0.057 ± 0.006	5.428 ± 0.433	Rishi et al. [39] (2025)
	LSTM	0.054 ± 0.005	0.032 ± 0.003	4.276 ± 0.331	Li et al. [40] (2025)
	GRU	0.054 ± 0.005	0.032 ± 0.003	4.259 ± 0.337	Li et al. [40] (2025)
	MLP	0.052 ± 0.004	0.067 ± 0.006	4.169 ± 0.342	Gyparakis et al. [41] (2025)
CFD Conditions	A	0.830	0.134	90.486	$α_{t h}$ = 0.015
		0.901	0.153	120.341	$α_{t h}$ = 0.005
		1.345	0.198	140.423	$α_{t h}$ = 0.05
	B	0.817	0.139	88.972	$α_{t h}$ = 0.015
		0.926	0.149	123.108	$α_{t h}$ = 0.005
		1.312	0.204	137.856	$α_{t h}$ = 0.05
	C	0.845	0.131	92.214	$α_{t h}$ = 0.015
		0.887	0.158	118.765	$α_{t h}$ = 0.005
		1.371	0.191	143.507	$α_{t h}$ = 0.05
	D	0.804	0.136	89.643	$α_{t h}$ = 0.015
		0.915	0.151	121.884	$α_{t h}$ = 0.005
		1.329	0.201	138.992	$α_{t h}$ = 0.05
	E	0.852	0.128	91.337	$α_{t h}$ = 0.015
		0.874	0.156	119.428	$α_{t h}$ = 0.005
		1.358	0.195	141.964	$α_{t h}$ = 0.05

Table 5. Comparison of predictors under different paradigms.

Predictor	Paradigm	PA ↓	CCE ↓	MCE ↓	References
LSTM	Paradigm I	0.041 ± 0.004	0.055 ± 0.006	2.499 ± 0.213	Li et al. [40]
	Paradigm II	0.054 ± 0.006	0.043 ± 0.005	4.467 ± 0.351
	Paradigm III	0.041 ± 0.003	0.083 ± 0.007	2.731 ± 0.184
GRU	Paradigm I	0.047 ± 0.005	0.032 ± 0.004	3.340 ± 0.272	Li et al. [40]
	Paradigm II	0.053 ± 0.007	0.033 ± 0.004	4.511 ± 0.395
	Paradigm III	0.033 ± 0.004	0.033 ± 0.003	1.364 ± 0.121
MLP	Paradigm I	0.030 ± 0.003	0.044 ± 0.005	1.333 ± 0.117	Gyparakis et al. [41]
	Paradigm II	0.050 ± 0.005	0.043 ± 0.004	4.383 ± 0.364
	Paradigm III	0.026 ± 0.002	0.040 ± 0.003	1.083 ± 0.096
Attention	Paradigm I	0.026 ± 0.002	0.034 ± 0.003	1.052 ± 0.088	Wu et al. [38]
	Paradigm II	0.045 ± 0.004	0.039 ± 0.004	1.822 ± 0.143
	Paradigm III	0.042 ± 0.004	0.044 ± 0.005	1.803 ± 0.132
Diffusion	Paradigm I	0.073 ± 0.008	0.042 ± 0.004	4.918 ± 0.401	Rishi et al. [39]
	Paradigm II	0.080 ± 0.009	0.077 ± 0.008	5.514 ± 0.462
	Paradigm III	0.076 ± 0.007	0.046 ± 0.005	5.211 ± 0.387
PINN	Paradigm I	0.089 ± 0.010	0.071 ± 0.007	1.320 ± 0.115	Li et al. [37]
	Paradigm II	0.097 ± 0.011	0.143 ± 0.012	2.049 ± 0.176
	Paradigm III	0.028 ± 0.003	0.039 ± 0.004	1.489 ± 0.127

Table 6. Paradigm I: Ablation of pretraining transfer strategies.

Fine-Tuning Setting	PA ↓	CCE ↓	MCE ↓
Frozen feature extractor	0.063 ± 0.006	0.116 ± 0.010	3.550 ± 0.284
Pretrained, frozen predictor	0.116 ± 0.009	0.066 ± 0.006	4.171 ± 0.337
Pretrained, fine-tuned predictor	0.026 ± 0.003	0.034 ± 0.004	1.052 ± 0.093

Table 7. Paradigm II: Ablation of feature fusion strategies.

Fusion Method	PA ↓	CCE ↓	MCE ↓
Gating	0.045 ± 0.004	0.039 ± 0.004	1.822 ± 0.146
Attention fusion	0.040 ± 0.003	0.054 ± 0.005	2.963 ± 0.231
CONCAT	0.084 ± 0.008	0.165 ± 0.013	4.653 ± 0.382

Table 8. Direct prediction performance on OOD data.

Predictor Type	PA ↓	CCE ↓	MCE ↓
MLP	1.317 ± 0.102	2.033 ± 0.164	112.519 ± 8.437
LSTM	0.308 ± 0.026	0.017 ± 0.003	30.775 ± 2.146
GRU	0.284 ± 0.021	0.017 ± 0.002	28.433 ± 1.984
Attention	0.193 ± 0.018	0.034 ± 0.004	17.733 ± 1.326
Diffusion	2.981 ± 0.241	6.884 ± 0.527	27.538 ± 2.103
PINN	0.392 ± 0.031	0.716 ± 0.058	27.868 ± 2.014
Random Forest	0.192 ± 0.042	0.352 ± 0.019	15.068 ± 1.998
XGBoost	0.236 ± 0.019	0.155 ± 0.013	18.813 ± 1.447
LightGBM	0.245 ± 0.020	0.446 ± 0.036	16.910 ± 1.284

Table 9. Comparative Experiments on Paradigm Effectiveness under OOD Few-Shot Support.

Predictor Type	Paradigm	PA ↓	CCE ↓	MCE ↓
0-shot	Paradigm I	0.126 ± 0.011	0.055 ± 0.006	5.051 ± 0.402
	Paradigm II	0.166 ± 0.014	0.218 ± 0.018	9.778 ± 0.764
	Paradigm III	0.084 ± 0.007	0.027 ± 0.003	3.345 ± 0.271
4-shot	Paradigm I	0.098 ± 0.009	0.124 ± 0.011	6.382 ± 0.497
	Paradigm II	0.061 ± 0.006	0.044 ± 0.004	4.585 ± 0.366
	Paradigm III	0.062 ± 0.006	0.029 ± 0.003	3.678 ± 0.295
6-shot	Paradigm I	0.095 ± 0.008	0.142 ± 0.012	5.941 ± 0.468
	Paradigm II	0.066 ± 0.006	0.041 ± 0.004	5.162 ± 0.407
	Paradigm III	0.061 ± 0.005	0.029 ± 0.003	3.193 ± 0.256

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meng, X.; Kang, Y.; Shang, W.; Wu, W. Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks. Appl. Sci. 2026, 16, 4581. https://doi.org/10.3390/app16104581

AMA Style

Meng X, Kang Y, Shang W, Wu W. Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks. Applied Sciences. 2026; 16(10):4581. https://doi.org/10.3390/app16104581

Chicago/Turabian Style

Meng, Xiangxiang, Yunkai Kang, Wei Shang, and Wenhong Wu. 2026. "Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks" Applied Sciences 16, no. 10: 4581. https://doi.org/10.3390/app16104581

APA Style

Meng, X., Kang, Y., Shang, W., & Wu, W. (2026). Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks. Applied Sciences, 16(10), 4581. https://doi.org/10.3390/app16104581

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hydrodynamic-Knowledge Fusion Paradigms for Soft Sensing of Spatial Sediment Distribution in Horizontal-Flow Sedimentation Tanks

Abstract

1. Introduction

2. Methods

2.1. Data Collection

2.1.1. Data Preprocessing

2.1.2. Numerical Model Selection

2.1.3. Particle Transport State Modeling Based on Computational Fluid Dynamics

2.2. Paradigm Design

2.2.1. Problem Definition

2.2.2. Paradigm I: Physics-Prior Pretraining & Transfer

2.2.3. Paradigm II: Representation Fusion

2.2.4. Paradigm III: Knowledge Distillation

2.3. Paradigm Evaluation Metrics

2.3.1. Pointwise Accuracy (PA)

2.3.2. Curvature Consistency Error (CCE)

2.3.3. Mass-Conservation Error (MCE)

3. Experiments and Discussion

3.1. Experimental Benchmark Setup

3.2. Comparative Experiments with Baseline Models

3.3. Validation Experiments on the Effectiveness of Paradigm Design

3.4. Ablation of Physical Prior Mechanisms

3.5. Evaluation Experiments on Generalization and Out-of-Distribution Extrapolation Performance

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI