1. Introduction
Wastewater treatment plants are known for the highly important role they play in the protection of public health, preservation of the aquatic ecosystems and the efficient management of water resources. Modern WWTPs employ complex mechanical, chemical and biological processes that are designed to remove pollutants, nutrients and suspended solids before the treated effluent is discharged into the environment in natural water bodies [
1]. Among these processes, the activated sludge system of the aerobic section of any WWTP is considered a global standard due to its efficiency and operational flexibility. However, the performance of this type of biological treatment remains efficient and operationally flexible only under continuous monitoring and accurate control of key variables, such as the Dissolved Oxygen values, nutrient concentrations, temperature, pH sensor and other characteristics specific to the system [
2].
The DO sensor represents a crucial part of the activated sludge system, and therefore, monitoring it is in itself a highly important task, with the greatest challenge being the early detection of anomalies that hinder the efficient performance of the entire WWTP [
3]. Faults in the DO sensor, even if they may seem minor deviations, can cause defective aeration control actions that result in higher energy demands, reduced nitrification effectiveness or lead to an improper discharge process. Thus, within such systems, fault detection has become an essential component of the WWTP supervision [
4].
Over the past decade, numerous studies on Machine Learning (ML) and Deep Learning (DL) approaches have opened new possibilities for building fault detection algorithms and for extracting hidden patterns from wastewater process data [
5]. For example, classical neural network architectures such as Feedforward Neural Network (FF-NN), Convolutional Neural Network (C-NN), Long Short-Term Memory Neural Network (LSTM-NN) and Radial Basis Function Neural Network (RBF-NN) have demonstrated their unique strengths: FF-NNs are capable of modeling nonlinear mappings between the input data and the predicted output [
6]; C-NNs can identify features such as sharp transitions, peaks, repetitive patterns, localized distortions or other shape-based fault signatures [
7]; and LSTM-NNs are used for detecting collective sensor faults in WWTP data and are known to be particularly effective for temporal sequence modeling [
8], whereas RBF-NNs excel at approximating nonlinear functions using localized radial basis functions and can learn complex patterns that cannot be captured by linear models [
9].
However, despite their individual strengths, each of these neural network architectures also exhibit certain limitations when employed on its own for anomaly detection in a WWTP. For example, FF-NNs lack an intrinsic mechanism for modeling temporal dynamics and are sensitive to noise in sensor data, which can hinder the detection of gradually developing faults, C-NNs struggle with long-range temporal dependencies and may overlook global nonlinear behaviors that are characteristic of complex process signals, LSTM-NNs are computationally expensive and are prone to overfitting in noisy environments, whereas RBF-NNs cannot model temporal evolution and depend heavily on the placement of radial centers, which can limit performance in varying data [
10,
11,
12].
In order to address these shortcomings, this study proposes hybrid approaches, such as a hybrid between a FF-NN and RBF (FF-NN + RBF), a hybrid between a C-NN and RBF (C-NN + RBF) and a hybrid between a LSTM-NN and RBF LSTM-NN + RBF. Building hybrid approaches combines the complementary strengths of the primary networks, which extract global, temporal or spatial representations, while the RBF component refines the decision boundaries through localized nonlinear modeling. As a result, the hybrid algorithms offer enhanced adaptability to the nonlinear behavior of DO sensor data, proving to be more well-suited for fault detection than using the A-NNs on their own [
5,
13,
14,
15,
16].
There are numerous comprehensive scientific articles that summarize the use of A-NNs and hybrid models for anomaly detection in WWTPs, highlighting the evolution from classical feedforward models to recurrent and convolutional architectures, as well as the growing interest in hybrid algorithms that aim to exploit complementary strengths. Many articles highlight the use of A-NNs for fault detection, such as [
17,
18,
19,
20,
21], where the authors compare two DL methods, such as C-NN versus FF-NN [
17], three classical ML approaches with two Autoencoder-type algorithms [
18], LSTM-NN versus FF-NN [
19], C-NN versus LSTM-NN [
20], and a classical RBF-NN versus a custom one [
21], in order to find the method that outperforms the others on the task of detecting anomalies injected on the DO sensor of Benchmark Simulation Model No. 2 (BSM2). A notable paper is [
7], which proposes a C-NN approach employed for diagnosing faults in wastewater treatment plants by applying a proxy-nearest neighborhood component analysis loss, which improves the model’s ability to distinguish between different types of anomalies, and thus obtains an average classification accuracy of 85.4%. Another important paper is [
22], which presents a LSTM-NN used for detecting faults in different sensors of a wastewater treatment plant, focusing on the oxidation and nitrification processes with the network obtaining a fault detection recall rate of over 92%.
There are also many scientific articles that study hybrid approaches. One such article is review [
5], which presents the use of A-NNs for predicting water quality variables in WWTPs, covering studies over a long period of time, thus emphasizing that hybrid architectures often improve prediction accuracy over single-network models. Another relevant paper is [
13], which proposes a hybrid soft-sensor that combines a simplified activated sludge process that captures the biochemical dynamics of a WWTP with a variable-structure RBF neural network to compensate for the prediction errors of the activated sludge process under various operating conditions. Study [
14] deals with machine fault detection by employing a hybrid convolutional neural network with a long short-term memory (CNN-LSTM) attention-based model in order to train the dataset, and the conclusion is that using the hybrid model yields superior results compared to traditional reference models. Paper [
15] introduces a hybrid DL algorithm that integrates GCN, C-NN, GRU and Attention Mechanism, whereas article [
16] proposes a hybrid VAE–LSTM model with a combined loss function that integrates reconstruction and prediction errors, with both hybrid methods being used for addressing the problem of anomaly detection in WWTPs and demonstrating that hybrid approaches outperform traditional ones.
Therefore, in this study, we designed and analyzed three hybrid neural network architectures, namely FF-NN + RBF, CNN + RBF and LSTM-NN + RBF, for anomaly detection of the DO sensor in WWTPs. The objective was to identify the most effective hybrid model for constructing a reliable and robust fault detection algorithm suitable for real-time monitoring applications. Each hybrid architecture integrates a DL model as a feature extractor with an RBF classifier as the decision-making component.
Although hybrid architectures combining deep learning models with RBF-based classifiers have been explored in various pattern recognition and industrial diagnosis domains, their application in WWTP fault detection has typically been limited to isolated architectural configurations. The methodological novelty of the present study does not reside solely in the combination of DL methods with an integrated RBF classifier, but in the systematic design and evaluation of a hybrid framework applied consistently across three fundamentally different A-NNs. This approach enables a controlled investigation of how global nonlinear modeling, convolutional feature extraction and recurrent temporal learning interact with localized RBF decision boundaries under various anomaly conditions (in this case, the anomaly conditions are created by injecting mechanical faults in the DO sensor of BSM2).
To address limitations identified in existing fault detection approaches for WWTPs, this study is guided by the following research questions: Can hybrid neural network architectures that combine DL feature extractors with an RBF classifier improve the detection of mechanical faults in the DO sensor of a WWTP? How does the performance of different hybrid architectures compare across various anomaly scenarios, including both single-fault and multi-fault conditions? Which hybrid architecture provides the most robust and stable performance across fault scenarios when evaluated using a comprehensive set of classification metrics?
The selection of the proposed hybrid architectures was motivated by the need to evaluate how different A-NNs interact with an RBF classifier in the context of mechanical faults that evolve over time (such as the bias, drift, precision degradation, spike, saturation faults), as well as the stuck fault, which freezes the DO sensor at a certain value. The FF-NN + RBF hybrid serves as a baseline architecture, representing a conventional feedforward approach capable of modeling global nonlinear relationships, but lacking explicit temporal memory mechanisms. The CNN + RBF hybrid assesses the effectiveness of convolutional feature extraction in capturing local patterns and short-term correlations within sensor signals. The LSTM-NN + RBF hybrid is included due to its proven capability to model long-range temporal dependencies, which are particularly relevant for DO sensors, where mechanical faults often develop progressively over time.
By applying all three hybrid architectures within the same preprocessing pipeline, fault-injection protocol, as well as both single-fault and multi-fault simulation scenarios, this study provides a structured and fair comparison of their suitability for detecting mechanical anomalies. The models are evaluated using an extensive set of performance metrics, including accuracy, precision, recall, F1-score, balanced accuracy, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), ROC-AUC, and PR-AUC. This comprehensive and scenario-based evaluation framework, combining hybrid modeling and such an extensive anomaly diversity, has not been jointly addressed in previous WWTP fault detection studies and constitutes a key contribution of the present work.
The paper is organized as follows:
Section 2 presents the materials and methods used, such as the proposed framework the study is based on, the major contributions of our work, descriptions of the considered anomaly scenarios, the design of the three hybrid approaches, as well as the comprehensive set of classification metrics employed for comparing the three hybrid networks.
Section 3 discusses the results of the simulations and consists of the values for the classification metrics obtained for the three approaches, as well as graphical representations supporting them, whereas
Section 4 contains the discussions based on aforementioned results.
Section 5 marks the conclusions of the paper.
2. Materials and Methods
This study proposes three hybrid neural network architectures, namely FF-NN + RBF, C-NN + RBF and LSTM-NN + RBF, used for detecting mechanical anomalies injected in the DO sensor of BSM2. The considered anomaly scenarios include mechanical faults, such as bias, drift, spike, stuck, precision degradation and saturation, either injected on their own or in combination, and the three hybrid methods are tested on all datasets. We selected these types of faults because they are frequently encountered in real WWTPs and injected them in the DO sensor due to how important this sensor is for the plant’s effective operation. Early detection of such faults maintains process efficiency, optimizes resource usage, complies with established regulations and protects the equipment of a WWTP, thus highlighting the importance of a fault detection algorithm with high performance.
Thus, the proposed framework of this paper is presented in
Figure 1.
Therefore, the key steps of our research are:
The injection of mechanical faults (bias, drift, spike, stuck, PD and saturation) in the DO sensor of BSM2;
The extraction of the 10 anomaly scenarios that consist of either a few instances of the same fault, or of all the six types of anomalies injected in different order;
Data preprocessing and splitting, which prepares the datasets for training and testing;
Design and implementation of the three hybrid neural network architectures (FF-NN + RBF, C-NN + RBF and LSTM-NN + RBF);
Simulations for the three hybrid methods on all 10 anomaly scenarios;
Evaluation of the three hybrid approaches with the set of classification metrics (accuracy, precision, recall, F1-score, balanced accuracy, Cohen’s Kappa, MCC, ROC-AUC and PR-AUC);
Comparison of the obtained values in order to determine the best hybrid method that becomes the anomaly detection algorithm.
In order to investigate the performance of the proposed hybrid neural networks, the study begins with the injection of mechanical faults in the DO sensor of BSM2. The mechanical anomalies considered are: bias fault, drift fault, spike fault, stuck fault, precision degradation fault and saturation fault. These faults are often studied and follow models and equations widely used in other papers, as seen in many scientific articles, such as [
18,
23,
24,
25,
26], thus highlighting their relevance in the fault detection literature.
The bias fault occurs when the sensor deviates from its normal function by a constant offset. When injected in the DO sensor, the bias fault causes the oxygen concentration to be higher or lower than the normal value, regardless of the dynamic behavior of the system. To simulate this fault, we have added a positive constant value to the sensor output, for a certain period of time, which results in a positive, higher shift in the entire signal. The following equation accurately describes the injected bias fault:
where
is the time interval when the fault is injected,
represents the output of the sensor at time
,
is the noise,
is the expected output of the sensor without the presence of faults,
is the added constant offset, and thus
is the output of the sensor with injected faults.
The drift fault represents a gradual, time-dependent deviation from the normal sensor output, unlike the bias anomaly, which exhibits a constant deviation. When injected in the DO sensor, the drift fault causes the oxygen concentration to increase or decrease progressively over time from the normal value, and sometimes, this deviation may remain undetected until it reaches a critical threshold. To simulate this fault, we added a bias that increases or decreases over time, for a certain interval, as described in the mathematical equation below:
where
is the time-dependent deviation.
The spike fault is characterized by sudden, high-amplitude deviations that happen for a short period of time. When injected in the DO sensor, the spike fault causes large amplitude peaks in the oxygen concentration that often appear isolated and do not persist over time. To simulate this fault, we introduced impulse-like disturbances in the form of multiplying the normal sensor output with a constant value, on limited periods of time, as seen in the following equation:
where
is the constant value for the impulse-like disturbance.
The stuck fault occurs when the sensor becomes fixed at a certain value, regardless of the actual process dynamics. When injected in the DO sensor, the stuck fault causes the oxygen concentration level to freeze at a certain value, instead of normally fluctuating, thus creating a flat signal that is highly misleading for process control. To simulate this fault, we force the sensor to read a constant value for a certain period of time, as observed in the mathematical equation below:
where
is the constant value shown by the sensor reading.
The precision degradation fault refers to the loss of sensor accuracy due to increased noise around the normal value. When injected in the DO sensor, the PD fault causes the exhibition of larger, random fluctuations of the oxygen levels, and it is simulated by adding a noise with zero mean and high variance, and it is mathematically defined as
where
indicates the amplified noise added to the sensor output.
The saturation fault occurs when the sensor reaches its operational limits (upper limit, lower limit or both) and cannot report values outside a predefined range. When injected in the DO sensor, the saturation fault causes limited readings for the oxygen concentration levels, with the result being a flatline at the upper limit, lower limit or both. To simulate the saturation fault, the sensor output was clipped between a maximum and minimum allowed limit, as seen below:
where
is the lower allowed limit and the
is the upper allowed limit.
The BSM2 benchmark model is described in [
27], with the MATLAB Simulink framework (version R2022b, MathWorks, Natick, MA, USA) that is publicly available on GitHub [
28]. The BSM2 benchmark model provides a standardized, well-validated simulation environment for studying wastewater treatment processes. Using the Simulink scheme, the mechanical anomalies described above were injected directly into the DO sensor block within the aeration tank subsystem and thus, this procedure ensured that the datasets extracted contain both normal and faulty data.
We have implemented 10 anomaly scenarios. The data were obtained from simulations performed using the BSM2, in which mechanical faults were artificially injected into the DO sensor within a MATLAB Simulink environment. The first six scenarios contain a single type of fault each, with several instances of the same anomaly, but with different added values and injected on different periods of time, whereas the last five contain all six considered faults, injected in different orders and on different time intervals. The fault scenarios, with the added values, as well as starting day and duration in hours, can be thus observed in
Table 1.
The graphical representation for each of the fault scenarios considered in
Table 1 can be observed below, in
Figure 2 and
Figure 3.
The next step of our research was to prepare the data for simulations. The raw datasets used in this study consist of three columns:
time, which contains the simulation timestamps;
DO_sensor, the values of the sensor readings at each timestamp;
label, which indicates the operating conditions of the DO sensor, namely “0” for normal data and “1” for faulty data.
For all three hybrid networks, we only used the second and third columns, with the
DO_sensor column being extracted separately as a one-dimensional vector
, and the
label column being extracted separately as another vector
. The values from the
DO_sensor column were standardized using StandardScaler, a preprocessing tool from the scikit-learn library [
29] that applies Z-score normalization by subtracting the mean and dividing by the standard deviation of the training data. This was performed in order to ensure there is stable numerical behavior during training, and was computed using the following equation [
30,
31]:
where
indicates original DO sensor value at time index
,
represents the mean of all values in the training dataset,
is the standard deviation computed over the training dataset and the resulting
is the normalized value of the sensor measurement. The time series was then segmented into non-overlapping windows of fixed length equal to 50, and the number of complete windows is given by
where
indicates the integer part, rounding down, of the number. We reshaped the three-dimensional array
and the corresponding label array
so that for each window, the label assigned to the sample is the label of the last time step in that window, which reflects the presence or absence of faults at the end of the segment.
The next step is partitioning the dataset into training and testing subsets. Thus, for the resulting dataset
, we considered a split of 75% for training and 25% for testing, with stratification on the labels to preserve the class distribution, which was implemented with train_test_split from scikit-learn [
32]. During model training, an additional validation subset of 20% of the training data was created internally, and as a result, approximately 60% of the data were used for training, 15% for validation and 25% for final testing [
33]. The same preprocessing and splitting process were applied for all three hybrid networks to ensure a fair and comprehensive comparison.
After preprocessing and splitting, the data were reshaped according to the requirements of each architecture used in the hybrid networks, as follows:
FF-NN + RBF received each window in a flattened vector of length equal to 50, which is consistent with traditional feedforward modeling practices that operate on fixed-length feature vectors [
34];
C-NN + RBF used a one-dimensional tensor of shape
to process each window, which allows the convolutional filters to detect abrupt changes, localized peaks or distortions characteristic to the faults injected in the DO sensor [
35];
LSTM-NN + RBF operated on three-dimensional input values of shape
, which helps preserve the sequential nature required for capturing temporal dependencies in the sensor trajectories [
36].
In all three models, an intermediate feature extractor (FF-NN, C-NN or LSTM-NN accordingly) was first trained in an unsupervised manner in order to produce latent feature vectors for the training windows, which were used to initialize the RBF layer, a technique used in other papers as well [
37]. Specifically, the feature extractor was applied to the training input values to obtain an array of vectors. Then, the KMeans clustering algorithm [
38] with 20 clusters was used to determine the initial RBF centers in the feature space, with the learned cluster centers being stored as a trainable Keras variable and used by the custom RBF layer in the final hybrid model.
Each input sequence of length 50 is assigned a single label corresponding to the last time step of the sequence. This approach simplifies sequence labeling and enables the model to learn temporal dependencies leading up to faults. However, it may introduce a slight bias: the early samples in each time window may not yet contain fault information, but the sequence is labeled as faulty if the last step contains a fault. This could lead to marginally inflated performance metrics during early fault detection. We note that this effect is limited by the relatively short window length and the persistence of faults in the considered scenarios, but it remains an important consideration when interpreting the results.
The next step in our research was building the hybrid neural networks. All three models combine DL feature extractors with an RBF component, which was possible due to the strong nonlinear approximation capabilities the RBF network is known for, which was well complemented by each of the DL methods specific characteristics.
The FF-NN + RBF hybrid network is thus composed of a DL component in the form of a Feedforward Neural Network that receives an input vector of length equal to 50. The network starts with an input layer of size 50, followed by a dense layer with 64 neurons and ReLU activation function regularized with an L2 penalty of 0.001 to reduce overfitting, and a batch normalization layer that is applied in order to stabilize the activations and accelerate training. These are followed by a second dense layer with 32 neurons and ReLU activation function, also with L2 regularization. The output of this second dense layer forms a 32-dimensional latent feature vector that summarizes the global nonlinear characteristics of the input window, which are then used to initialize and feed the RBF component. The RBF layer is composed of 20 Gaussian radial basis neurons, with the initial centers being obtained by applying the KMeans clustering algorithm to the FF-NN feature vectors. In the hybrid model, the RBF outputs are passed through a dense layer with 64 neurons and ReLU activation function, also with L2 regularization, followed by a batch normalization, a dropout layer with a rate of 0.3, and another dense layer with 32 neurons and ReLU activation function. The output layer is a single neuron with sigmoid activation function, which provides the probability of the window being anomalous. The architecture of the FF-NN + RBF hybrid network can be observed in
Figure 4a.
The C-NN + RBF hybrid network processes each input window as a one-dimensional sequence of 50 time steps with a single feature channel. The DL component is thus a one-dimensional Convolutional Neural Network that starts with an input layer of size 50, followed by the first convolutional layer with 32 filters with kernel size 3 that uses ReLU activation function, and a max-pooling layer with pool size 2, which reduces the sequence length and focuses on the most important activations. Then, the second convolutional layer uses 64 filters with kernel size 3 and ReLU activation function is followed by another max-pooling layer with pool size 2, with the resulting feature maps being flattened into a single feature vector that is used as an input to the RBF component. Again, 20 RBF neurons are defined, with their centers being obtained by applying the KMeans clustering algorithm on the C-NN feature vectors. Just like with the FF-NN + RBF hybrid network, the RBF outputs are passed through a dense layer with 64 neurons and ReLU activation function, also with L2 regularization, followed by a batch normalization, a dropout layer with a rate of 0.3, and another dense layer with 32 neurons and ReLU activation function. The output layer is a single neuron with sigmoid activation function, which provides the probability of the window being anomalous. The architecture of the FF-NN + RBF hybrid network can be observed in
Figure 4b.
The LSTM-NN + RBF hybrid network is designed to capture the temporal evolution of the DO sensor signal within each window, and the DL component consists of two stacked LSTM layers. The first layer has 64 units and is configured with
that produces a full sequence of hidden states and allows the second layer, that has 32 units, to process the entire temporal context. The second LSTM layer returns the final hidden state and yields a 32-dimensional temporal embedding that encodes the most relevant dynamic information in the window, and this becomes the input of the RBF component. Just like with the other hybrid models, the RBf layer consists of 20 neurons, with their centers being obtained by applying the KMeans clustering algorithm on the LSTM-NN feature vectors, ensuring that the radial centers are well positioned in the temporal feature space. Again, the RBF outputs are passed through a dense layer with 64 neurons and ReLU activation function, also with L2 regularization, followed by a batch normalization, a dropout layer with a rate of 0.3, and another dense layer with 32 neurons and ReLU activation function. The output layer is a single neuron with
sigmoid activation function, which provides the probability of the window being anomalous. The architecture of the FF-NN + RBF hybrid network can be observed in
Figure 4c.
The design of the hybrid network architectures was guided by the need to combine the strengths of DL feature extraction with the localized approximation capabilities of RBF layers. In the FF-NN + RBF model, fully connected layers extract global nonlinear features from the flattened sequences, which are then transformed by an RBF layer with centers initialized through k-means clustering on the extracted features. The C-NN + RBF model leverages convolutional layers to extract spatially local patterns from sequences, while the RBF layer captures complex feature interactions for fault detection. Similarly, the LSTM-NN + RBF model uses LSTM layers to extract temporal dependencies from the sequences, with the RBF layer providing nonlinear mapping of the learned representations. Across all architectures, the RBF layer serves to enhance the capabilities of the hybrid models by combining feature-driven clustering with subsequent dense layers for classification. The number of RBF units, placement of the RBF layer after feature extraction and use of dense layers with dropout and batch normalization serve to maximize performance while maintaining generalization.
The hybrid models leverage a synergistic interaction between the deep feature extractor and the RBF layer. The DL methods, namely FF-NN, C-NN and LSTM-NN, first maps the input sequences into a high-level feature space that encodes relevant patterns for fault detection. The RBF layer then performs a localized nonlinear mapping of these features, with each neuron responding strongly to inputs near its cluster center. This mechanism allows the model to emphasize prototypical feature patterns, effectively enhancing class separability. The subsequent dense layers integrate these RBF responses to produce the final classification. Thus, the RBF layer complements the DL feature extractor by combining global feature representation with localized pattern sensitivity, improving both robustness and accuracy. The synergistic interaction between the deep feature extractor and the RBF layer can be observed in
Figure 5 below.
Although not the primary focus of this study, the RBF layer offers a degree of structural transparency, as each radial basis neuron corresponds to a center in the learned feature space. Model predictions are influenced by the similarity between input features and these learned prototypes. Future work may explore visualization of feature clusters to further analyze interpretability.
The three hybrid networks were trained using a consistent set of hyperparameters in order to ensure comparability across architectures. The main hyperparameters used during training are summarized in
Table 2 and
Table 3.
Table 2 includes the sequence length, the number of RBF neurons, loss function, optimizer, batch size, validation split, as well as hyperparameters specific for the early stopping and the learning rate.
Table 3 consists of the values for the final learning rate obtained after training, as well as the number of epochs effectively trained for each hybrid model, on each fault scenario considered.
The hyperparameters for all three hybrid models were selected to ensure stable and comparable training across the considered fault scenarios. The sequence length of 50 was chosen to capture sufficient temporal context from the sensor signals, and thus it balances information content with computational efficiency. The number of RBF units was set to 20 based on preliminary experiments, which showed that this configuration helps with the reduction in overfitting. The models were trained using the Adam optimizer with an initial learning rate of 0.0005, which ensured efficient convergence across different hybrid architectures. Early stopping with a patience of 15 epochs and learning-rate reduction on plateau (factor 0.5, patience 5 and minimum learning rate 1 × 10−5) were applied to adaptively terminate training and adjust learning rates based on validation loss, preventing overfitting while allowing sufficient training. The final learning rates and number of effective training epochs for each hybrid network across all fault scenarios reflect the dynamic adjustment of the learning process and confirm the robustness of these hyperparameter choices.
To finalize this section, it is to be noted that generative Artificial Intelligence tools (such as ChatGPT (version GPT-5, OpenAI, San Francisco, CA, USA), Quillbot Paraphrasing Tool (version 2025, QuillBot Inc., Chicago, IL, USA) or Grammarly (version 2025, Grammarly Inc., San Francisco, CA, USA)) were used solely to assist in drafting, rephrasing and refining the textual context of this manuscript. These tools were used to improve clarity throughout the paper, as well as summarize some of the related works cited in the paper. No generative Artificial Intelligence tools were employed for manipulating or generating the scientific results obtained and presented in this paper.
3. Results
This section presents the results obtained from evaluating the three hybrid networks (FF-NN + RBF, C-NN + RBF and LSTM-NN + RBF). Recall that the three hybrid models were designed for detecting mechanical faults injected in the DO sensor of the BSM2 benchmark model. The purpose is to find the most efficient method out of the three for anomaly detection in WWTPs, with the potential of being used in real-time WWTPs.
3.1. Classification Metrics
All three models were assessed using a comprehensive set of classification metrics, namely accuracy (A), precision (P), recall (R), F1-score (F1-S), balanced accuracy (BA), Cohen’s Kappa (CK), Matthew’s Correlation Coefficient (MCC), and the areas under the Receiver Operating Characteristic curve (ROC-AUC) and the Precision–Recall curve (PR-AUC). These classification metrics are defined with the values obtained from the confusion matrix, which organizes prediction outcomes into four distinct categories, denoted as [
39].
True Positive (TP), when the model correctly detects a fault;
False Negative (FN), when the model fails to detect an anomaly;
False Positive (FP), when the model falsely detects a fault, when actually no fault was present;
True Negative (TN), when the model correctly recognized normal sensor operation.
Accuracy measures the proportion of correct classifications out of all observations. In the context of DO sensor monitoring, the accuracy provides a general indication of how reliable the hybrid model is in identifying both normal and anomalous data. The accuracy is computed using the equation below [
40]:
Precision evaluates the reliability of fault alarms. When a model has high precision, it will automatically generate few false alarms, meaning that when it identifies a fault, it is usually correct, and this is highly important in WWTPs, because too many false alarms may result in unnecessary maintenance activities and interventions. The precision is computed using the equation below [
41]:
Recall quantifies the ability of the hybrid models to detect actual faults, and it is critical in the context of WWTPs because false negatives correspond to undetected anomalies, and thus, the aeration controller may operate based on corrupted readings, leading to other problems in the plant. The recall is computed using the equation below [
42]:
F1-score provides a balanced view of precision and recall, being especially meaningful for anomaly detection, because both undetected faults (FN) and false alarms (FP) may have negative consequences when it comes to the operation of a WWTP. The F1-score is computed using the equation below [
43]:
Balanced accuracy provides an unbiased assessment when class distributions are unequal and gives weight to the classifier’s ability to recognize both faulty and normal operation conditions, and it is very useful in the case of the BSM2 benchmark model, which produces more normal data values than abnormal ones. The balanced accuracy is computed using the equation below [
44]:
Cohen’s Kappa measures the agreement between predicted and actual states, adjusted for chance, and is computed using the equation below [
45]:
where
is the observed accuracy and
is the expected agreement. The higher the value for this coefficient is, the higher the probability that the hybrid model is capturing underlying fault patterns, rather than relying on class imbalance.
Matthew’s Correlation Coefficient considers all four values in the confusion matrix and remains reliable even when fault instances are rare. In the context of fault detection in WWTPs, this coefficient reflects the model’s sensitivity to anomalies, as well as its resistance to generating false alarms. Matthew’s Correlation Coefficient is computed using the equation below [
46]:
The Receiver Operating Characteristic curve plots the TP rate against the FP rate across all classification thresholds. The area under this curve, namely ROC-AUC, thus measures the model’s ability to differentiate between faulty and normal sensor readings, with a high ROC-AUC value indicating that the classifier ranks true faults higher than normal data samples [
47].
The Precision–Recall curve focuses on the positive values by plotting precision versus recall. In fault detection, PR-AUC reflects the model’s ability to detect critical anomalies without producing too many false alarms, with a high value of PR-AUC proving that the classifier sustains high precision even as recall increases [
48].
3.2. Simulation Results
The three hybrid networks, namely FF-NN + RBF, C-NN + RBF and LSTM + RBF, were simulated on all the fault scenarios presented in
Table 1, and for all three of them, we collected the confusion matrices, as well as the values for the classification metrics presented in the previous subsection. Therefore,
Table 4 presents the confusion matrices for the three hybrid models, and
Table 5 contains the values for the chosen set of classification metrics.
To provide an overall quantitative comparison independent of individual fault scenarios, the average values of all performance metrics were computed across the ten considered scenarios for each hybrid model. This analysis allows a global assessment of the three hybrid networks considered in this paper. As reported in
Table 6 and as visually observed in
Figure 16, the LSTM-NN + RBF hybrid achieves the highest average performance for most evaluation metrics, particularly in terms of accuracy, F1-score, MCC, Cohen’s Kappa and balanced accuracy. These results confirm the superior and more consistent fault detection capability of the proposed LSTM-NN + RBF hybrid architecture when compared to the other two hybrid models, FF-NN + RBF and C-NN + RBF.
3.3. Training and Validation Performance
Figure 17,
Figure 18,
Figure 19,
Figure 20,
Figure 21,
Figure 22,
Figure 23,
Figure 24,
Figure 25,
Figure 26,
Figure 27,
Figure 28,
Figure 29,
Figure 30,
Figure 31,
Figure 32,
Figure 33,
Figure 34,
Figure 35,
Figure 36,
Figure 37,
Figure 38,
Figure 39,
Figure 40,
Figure 41,
Figure 42,
Figure 43,
Figure 44,
Figure 45 and
Figure 46 present the graphical representations for the evolution of training and validation accuracy, loss, ROC-AUC and PR-AUC respectively over successive epochs. Monitoring accuracy and loss during training provides valuable insight into how each model assimilates the patterns that distinguish normal from faulty sensor behavior, whereas the other two curves reveal how stable the hybrid networks are.
3.4. Analysis of Performance
The comparative results reveal clear performance differences among the three hybrid architectures depending on the fault characteristics.
The Bias scenario represents a static shift in the DO sensor, with minimal temporal evolution. Under this condition, all three hybrid models achieve near-perfect performance (100% for FF-NN + RBF and LSTM-NN + RBF, and 99.66% accuracy for C-NN + RBF), indicating that temporal modeling is not essential when discriminative information is primarily amplitude-based. The RBF layer successfully separates the feature representations in all cases.
The Drift scenario involves gradual temporal evolution. Here, the FF-NN + RBF model shows a substantial recall drop (61.02%) and F1-score of 75.79%, whereas C-NN + RBF and LSTM-NN + RBF achieve F1-scores above 94%. This confirms that architectures incorporating temporal memory, such as the LSTM-NN, or local temporal filters, such as the C-NN, are better suited for progressive faults. The LSTM-NN + RBF achieves the highest balanced accuracy (95.76%) and MCC (94.66%), reflecting superior minority-class detection under evolving patterns.
The Spike scenario corresponds to abrupt and short-duration deviations. In this case, LSTM-NN + RBF achieves perfect classification, while C-NN + RBF also performs strongly (F1 = 90.48%). The FF-NN + RBF exhibits lower precision (79.17%), suggesting higher false positives. The convolutional filters of the C-NN component effectively capture localized transient behavior, while the LSTM-NN benefits from short-term sequential sensitivity.
The PD scenario introduces more complex dynamics and partial separability. All models experience performance degradation, but temporal models remain superior. LSTM-NN + RBF achieves the highest ROC-AUC (97.88%) and PR-AUC (96.69%), indicating better ranking ability under moderate imbalance.
The Saturation scenario represents the most challenging condition, with severe class imbalance and overlapping distributions. Here, the limitations of the FF-NN + RBF become evident: MCC drops to 30.19% and F1-score to 56.36%, despite an accuracy of 67.24%. This discrepancy highlights the misleading nature of accuracy under imbalance. In contrast, LSTM-NN + RBF improves MCC to 62.03% and balanced accuracy to 80.29%, demonstrating improved minority-class discrimination. The RBF layer alone is insufficient when the extracted features lack temporal richness, explaining the low values for the FF-NN + RBF hybrid.
In combined fault scenarios (All faults scenario 1–4), performance differences become more pronounced. In All faults scenario 1, FF-NN + RBF records an F1-score of 75%, while both C-NN + RBF and LSTM-NN + RBF exceed 93%, indicating that simultaneous fault detection requires richer temporal representation. The most severe degradation appears in All faults scenario 4. FF-NN + RBF collapses to an F1-score value of 42.62% and MCC of 36.62%, while LSTM-NN + RBF maintains an F1-score of 89.86% and MCC of 88.51%. This substantial gap demonstrates that architectures without temporal memory struggle under high class overlap and imbalance. The internal state mechanism of the LSTM-NN component enables better discrimination of minority classes across longer contexts.
Therefore, across imbalance-heavy scenarios, such as the Saturation scenario and All faults scenario 1–4, accuracy remains relatively high compared to MCC and balanced accuracy, particularly for FF-NN + RBF. For example, in Saturation, FF-NN + RBF achieves 67.24% accuracy but only 30.19% MCC, indicating poor true class correlation. This confirms that MCC and balanced accuracy provide a more reliable assessment under imbalance conditions. Temporal hybrids consistently achieve higher MCC and balanced accuracy values, demonstrating greater robustness to skewed class distributions. The improvement is particularly visible in LSTM-NN + RBF, which maintains stronger recall without excessive precision loss.
The RBF layer enhances nonlinear separability in the learned feature space for all architectures. However, its effectiveness depends on the quality of extracted features. When temporal dynamics are essential (Drift, PD, multi-fault scenarios), the LSTM feature extractor provides more structured representations, allowing the RBF layer to construct smoother and more discriminative decision boundaries. When features lack temporal richness, such is the case for FF-NN + RBF hybrid in complex scenarios, the RBF mapping cannot fully compensate for insufficient representation.
The results indicate that:
FF-NN + RBF is adequate for static or easily separable faults;
C-NN + RBF provides strong performance for localized or moderately dynamic faults;
LSTM-NN + RBF offers the most consistent and robust performance across dynamic, imbalanced, and multi-fault conditions.
Thus, performance differences are directly linked to the interaction between fault temporal characteristics, class imbalance severity, and the architectural capacity for sequential modeling.
4. Discussion
The results presented in the previous section prove that the three hybrid algorithms, namely FF-NN + RBF, C-NN + RBF and LSTM-NN + RBF, can detect mechanical faults with high accuracy. However, the values obtained for the classification metrics considered in this paper, and presented in
Section 3.2, prove that the LSTM-NN + RBF model outperforms the other two hybrid models across nearly all metrics. Moreover, by observing the graphical representations for the training and validation accuracy, loss, ROC-AUC and PR-AUC over successive epochs, it is clear that the LSTM-NN + RBF hybrid network is the most stable. In contrast to the LSTM-NN + RBF and C-NN + RBF hybrids, the FF-NN + RBF model almost always recorded the lowest performance across nearly all evaluation metrics for the considered anomaly scenarios. This underperformance can be attributed to the limited capacities of the FF-NN component when dealing with sequential sensor data, where temporal evolution carries essential diagnostic informatic. The C-NN and LSTM-NN components are clearly more equipped to handle such tasks, considering the results obtained from simulations.
To evaluate the contribution of the RBF layer, we note that our previous studies have systematically analyzed the corresponding pure FF-NN, CNN and LSTM architectures under similar experimental conditions (scenarios comprising mechanical faults injected in the DO sensor of BSM2) [
17,
19,
20,
21]. These studies demonstrated that while the standalone DL models achieve competitive performance in single-fault scenarios, their ability to discriminate complex anomaly scenarios is limited. In contrast, the proposed hybrid architectures, which integrate DL feature extraction with the RBF classifier, consistently improve performance metrics such as F1-score, Matthews Correlation Coefficient and PR-AUC in multi-fault scenarios. This indicates that the RBF component provides robustness in challenging anomaly detection cases, which is not captured by the simple models alone. Therefore, although the pure DL models perform adequately in simple scenarios, the hybrid approach offers clear advantages for real-world, complex fault detection tasks.
The superior performance of the LSTM-NN + RBF model emphasizes the idea that algorithms that are capable of modeling long-range temporal dependencies are better suited for anomaly detection in the context of WWTPs, especially when it comes to sensors such as the DO sensor, where faults develop progressively over time. Thus, the LSTM component provides a more informative representation for the RBF classifier, as opposed to other neural network architectures that are based solely on spatial or global nonlinear features. Similar observations have been widely reported in the literature, where LSTM-based models consistently outperform feedforward and convolutional architectures in anomaly detection tasks due to their gated memory structure and ability to retain long-term contextual information [
36,
49,
50]. The improved performance of the CNN + RBF hybrid relative to the FF-NN + RBF model further highlights the importance of structured feature extraction in time-series analysis. While convolutional neural networks are effective in capturing local temporal patterns and short-term correlations within sensor signals, they remain limited in modeling long-term dependencies. Consequently, their performance in scenarios characterized by progressive fault evolution remains inferior to that of the LSTM-based hybrid, as also reported in previous industrial monitoring studies [
50,
51].
In addition to the benefits of temporal modeling, the integration of a radial basis function (RBF) classifier plays a crucial role in enhancing classification performance. RBF networks are well known for their ability to construct localized nonlinear decision boundaries, which improves class separability in complex and potentially overlapping feature spaces [
51,
52]. This property is particularly advantageous in imbalanced fault detection problems, where minority fault classes are often difficult to discriminate. Previous studies have demonstrated that hybrid architectures combining DL feature extractors with RBF- or kernel-based classifiers can significantly improve robustness and generalization in industrial fault diagnosis applications [
52,
53]. The results obtained in this study further support these findings, showing that the combination of deep temporal feature extraction and localized nonlinear classification yields superior fault detection performance.
Unlike most existing studies on fault detection in WWTPs, which primarily rely on standalone ML or DL models, the proposed work introduces and systematically evaluates hybrid architectures that combine deep feature extractors with a radial basis function (RBF) classifier. Previous works have demonstrated the effectiveness of conventional ML or DL approaches individually; however, they often suffer from limited separability in the learned feature space or require large volumes of training data to achieve robust performance. In contrast, the proposed hybrid models exploit the temporal modeling capabilities of FF-NN, CNN and LSTM architectures while leveraging the localized nonlinear decision boundaries provided by the RBF layer, leading to improved fault discrimination.
Furthermore, this study differs from existing literature by performing a comprehensive and scenario-based evaluation on multiple mechanically induced fault types, including both single-fault and multi-fault conditions, and by assessing performance using an extensive set of classification metrics under both balanced and imbalanced scenarios. This combination of hybrid modeling, fault diversity and rigorous metric-based evaluation has not been jointly addressed in previous studies and represents a key contribution of the present work.
From a computational perspective, the three proposed hybrid architectures exhibit different levels of complexity. The FF-NN + RBF model has the lowest computational cost due to its simple feedforward structure and limited number of trainable parameters, resulting in fast training and inference. The CNN + RBF hybrid presents a moderate computational cost, as convolutional operations increase training time; however, the use of shared weights and parallel computation makes this architecture efficient during inference. The LSTM-NN + RBF hybrid is the most computationally demanding, owing to the sequential processing and gated memory mechanisms of the LSTM component, which require increased training time and memory resources. Nevertheless, the RBF classifier itself introduces negligible additional computational overhead compared to the deep feature extraction stage.
The computational cost of the proposed hybrids can be effectively balanced depending on application requirements. Model complexity can be reduced by limiting the number of hidden units, convolutional filters, or LSTM cells, as well as by optimizing the input sequence length. Furthermore, since training can be performed offline, real-time deployment is primarily affected by inference cost, which remains acceptable for all three models. In practice, the FF-NN + RBF hybrid may be preferred for low-resource environments, the CNN + RBF model for real-time or near real-time monitoring, and the LSTM-NN + RBF hybrid for accuracy-critical fault detection tasks where higher computational cost is justified by superior detection performance.
The proposed hybrid models can be integrated into wastewater treatment plant monitoring systems to provide timely detection of mechanical anomalies in the DO sensor. In practice, the model would continuously process sensor readings and, upon detecting a fault, trigger alerts to operators via the plant’s SCADA system. These alerts can initiate corrective actions, such as adjusting aeration rates or scheduling maintenance interventions for the affected sensor, thereby minimizing process disruption and ensuring reliable plant operation. Depending on computational resources and operational priorities, the FF-NN + RBF hybrid may be used for low-resource monitoring applications, the CNN + RBF hybrid for near real-time fault detection, and the LSTM-NN + RBF hybrid for scenarios where high detection accuracy is critical. This implementation framework demonstrates the practical applicability of the proposed approach beyond simulation-based studies and provides a pathway toward intelligent and automated fault management in WWTPs.
Despite the promising results, several limitations of this study should be acknowledged. First, the experimental analysis is based on simulated data generated using the Benchmark Simulation Model No. 2 (BSM2). Although BSM2 is a widely accepted and realistic benchmark, it cannot fully capture all uncertainties, disturbances, and operational variabilities encountered in real wastewater treatment plants. Second, the study focuses exclusively on mechanical faults affecting the DO sensor, and the applicability of the proposed hybrid models to other sensor types or fault categories has not yet been validated. Finally, while the LSTM-NN + RBF hybrid achieves the best detection performance, its higher computational complexity compared to simpler architectures may pose challenges for real-time deployment without appropriate optimization or hardware acceleration.
Nevertheless, this study also highlights several opportunities for future research. Firstly, we plan to improve the fault scenarios by researching and introducing other types of anomalies, such as biological, chemical or hydraulic faults, as well as inject them in other types of sensors, such as the pH sensor or turbidity sensor and observe the behavior of BSM2. Secondly, we plan to extend the analysis of the three hybrid models to datasets obtained from real WWTPs, which would include other challenges, such as sensor noise, maintenance artifacts or environmental variability. Finally, we aim to study other possibilities of creating hybrid models, which could include various ML methods combined with DL ones.