1. Introduction
Modern process industries are increasingly required to sustain continuous production under stringent and variable conditions. As system structures become more integrated and working environments more complex, faults are more likely to arise during production. If such events remain undetected, they may compromise production stability, degrade product quality, and increase safety risks. Therefore, accurate fault detection is of great significance for identifying abnormal states at an early stage and ensuring the safe and reliable operation of industrial systems [
1].
During the past decades, various fault detection methods have been developed for industrial process monitoring. Traditional model based methods usually rely on accurate mathematical descriptions or mechanistic knowledge of the monitored system [
2]. Although these methods have clear physical meanings and strong interpretability, their application is often limited when dealing with complex industrial processes, since it is difficult to establish accurate mechanistic models for systems with strong nonlinearity, time-varying behavior, and multivariable coupling. In contrast, data-driven fault detection methods do not require explicit mechanistic models and can directly extract useful information from historical process data [
3]. Classical data-driven methods mainly include principal component analysis (PCA) [
4], partial least squares [
5], independent component analysis [
6], canonical variate analysis [
7], support vector machine, Gaussian mixture model, and k-nearest neighbor methods [
8,
9]. These methods have been widely applied in process monitoring and have achieved satisfactory performance in many industrial scenarios. Nevertheless, most of them are based on shallow feature representations or statistical assumptions, and their ability to describe complex nonlinear and dynamic characteristics is still limited. When the monitored process contains strong temporal dependence and hidden nonlinear relationships, traditional methods may fail to extract discriminative fault features effectively.
Deep learning has provided an effective modeling framework for fault detection in industrial processes [
10]. Unlike conventional models, deep learning approaches can automatically extract hierarchical representations from process data, thereby improving the characterization of complex nonlinear relationships and dynamic behaviors. For example, a cascaded monitoring network named MoniNet was proposed to simultaneously capture temporal dynamic correlations and local spatial correlations, enabling effective anomaly detection in real industrial processes [
11]. Recurrent and convolutional architectures were systematically evaluated for early fault detection in the Tennessee Eastman process, showing that deep learning models can improve detection performance while reducing the dependence on manual feature engineering [
12]. Bayesian recurrent neural networks were used for chemical process fault detection, enabling nonlinear dynamic modeling while providing uncertainty information for monitoring decisions [
13]. Deep recurrent neural networks have also been incorporated into residual control charts for autocorrelated process monitoring and verified using papermaking process data [
14]. In addition, recurrent neural networks have been used for sensor fault detection and isolation in nonlinear systems [
15]. Compared with conventional recurrent neural networks, long short-term memory (LSTM) networks can better preserve long-term dependencies through gating mechanisms, making them suitable for fault detection in dynamic processes. By combining an LSTM-based attention model with the sequential probability ratio test, early fault warning can be achieved by evaluating the statistical deviation of prediction residuals [
16].
In practical industrial scenarios, normal operating data are typically abundant, whereas fault samples remain scarce [
17]. Moreover, fault conditions are often diverse and difficult to exhaustively collect, while accurate data labeling requires substantial labor and time costs. Consequently, modeling approaches that rely heavily on labeled fault data are often difficult to meet the practical requirements of industrial process monitoring. In contrast, unsupervised fault detection methods characterize regular process behavior using normal operating data and detect anomalies by measuring the deviation of new observations from the learned reference. Therefore, they are more suitable for industrial applications with limited labeled fault samples. In this context, the autoencoder (AE) has been widely applied to industrial process fault detection because of its clear structure, relatively stable training process, and ability to learn nonlinear feature representations from normal operating data [
18,
19].
A typical autoencoder consists of an encoder and a decoder. The encoder maps input samples into a low-dimensional latent space, while the decoder reconstructs the original inputs from the latent representations. When trained only on normal operating data, an AE can capture the main features and distribution characteristics of regular process behavior and accurately reconstruct samples within this range. Faulty samples that deviate from this reference usually produce reconstructed outputs that differ significantly from the original inputs. The resulting reconstruction errors are used to construct anomaly scores for fault detection [
20]. Various AE-based models have been developed to improve fault detection performance in industrial processes. Deep AE based feature learning has shown its ability to extract representative process features for process pattern recognition [
21]. To capture coexisting linear and nonlinear characteristics, PCA was combined with a stacked autoencoder to enhance fault detection in complex industrial processes [
22]. In addition, sparse autoencoder combined with adaptive slow feature analysis has been applied to fault detection in time-varying processes [
23]. For wastewater treatment applications, a stacked denoising autoencoder was employed for sensor validation in real plants and achieved fault detection rates of up to 98% [
24]. Moreover, a multistage variational autoencoder (VAE) was designed for wastewater treatment process monitoring by combining stage division with probabilistic latent modeling [
25].
However, AE based fault detection methods generally assume that faulty samples cannot be well reconstructed by a model trained only with normal operating data. This assumption does not always hold in practice. Previous studies have reported that deep autoencoders may generalize well to samples outside the normal distribution and reconstruct some faulty samples with small errors, especially when the fault magnitude is weak or the faulty pattern is close to the normal operating distribution [
10,
26]. As a result, faulty samples may be incorrectly identified as normal, leading to missed detections. This phenomenon is commonly associated with the overgeneralization problem of AE models [
27].
Memory augmented autoencoders provide a promising strategy for alleviating this problem. Instead of directly using latent features for reconstruction, memory augmented AE models store representative normal prototypes in an external memory bank. During reconstruction, the latent representation of the current sample is used as a query to retrieve the most relevant memory items, and the retrieved normal prototypes are then used to guide the decoding process. In this way, the model tends to reconstruct samples according to stored normal patterns, thereby limiting its ability to recover faulty samples and increasing the reconstruction discrepancy between normal and faulty conditions [
27,
28].
Beyond reconstruction constraints, effective fault detection in industrial processes requires explicit modeling of temporal evolution. Since industrial process data usually exhibit temporal dependence and dynamic correlations, insufficient dynamic modeling may reduce the sensitivity to weak or slowly evolving faults [
29,
30]. Another practical challenge is that industrial processes often operate under variable conditions caused by load fluctuations, set point changes, and process adjustments [
31,
32]. Such variations may shift the statistical characteristics of normal samples and make the boundary of normal operating patterns more difficult to describe accurately [
33].
Motivated by these considerations, this paper proposes MI-CVAE, a memory enhanced and prediction assisted conditional variational autoencoder for unsupervised fault detection in industrial processes. In the proposed framework, local statistical information is incorporated into the VAE as auxiliary condition input to better characterize normal operating behavior under varying process conditions. A memory module is embedded in the latent space to store representative normal prototypes and constrain the reconstruction process, thereby alleviating the overgeneralization problem of AE models. To further capture process dynamics, an Informer prediction branch is introduced to learn the temporal evolution of process variables. Reconstruction and prediction errors are then jointly used to construct monitoring statistics for fault detection. The main contributions of this paper are summarized as follows.
(1) A memory-enhanced conditional VAE framework is proposed for unsupervised industrial fault detection. Local statistical information is used to characterize variations in normal operating states, while memory prototypes constrain reconstruction and suppress the excessive recovery of abnormal samples.
(2) An Informer prediction branch is introduced into the reconstruction model to jointly use reconstruction and prediction errors for fault detection. The reconstruction branch measures deviations from normal patterns, while the prediction branch captures abnormal dynamic evolution, thereby improving the detection of weak and dynamic faults.
(3) The effectiveness of the proposed MI-CVAE method is validated on the Benchmark Simulation Model No. 1 (BSM1) wastewater treatment benchmark and a real papermaking process dataset. Experimental results demonstrate that MI-CVAE outperforms the comparison methods while maintaining a low false alarm rate.
2. Dataset Description
2.1. Case 1: BSM1
BSM1 is a standardized simulation platform widely used in the field of wastewater treatment [
34], and its system configuration is shown in
Figure 1. This platform simulates a typical activated sludge treatment process, consisting of five biological reactors and one secondary clarifier. Internal and external recirculation streams are also incorporated to enable the effective removal of nitrogen and carbon pollutants. To construct process monitoring data, BSM1 was simulated under dry-weather conditions. The influent profile covered 14 consecutive days with a sampling interval of 15 min, yielding a total of 1345 data points. Considering their relevance to effluent quality and operational regulation, 15 key process variables were selected for analysis, including influent flow rate, dissolved oxygen concentration, suspended solids, and various nitrogen-containing component concentrations. Detailed information on these variables is provided in
Table 1.
Eight typical fault conditions were constructed on the BSM1 simulation platform. Faults 1–4 are process faults, which were introduced by changing biochemical reaction parameters, settling performance parameters, or actuator output signals, causing the system dynamics to deviate from normal operation. Faults 5–8 are sensor faults, mainly involving abnormal variations in control setpoints or measurement signals, such as bias, drift, and complete failure. These faults are used to assess the model’s detection performance for both process disturbances and measurement abnormalities.
For the process faults, Faults 1 and 2 simulate reduced microbial activity by decreasing the maximum specific growth rates of autotrophic and heterotrophic microorganisms, respectively. Fault 3 represents deterioration of settling performance by reducing the settling velocity in the secondary clarifier. Fault 4 is introduced by increasing the nitrate actuator output signal, resulting in abnormal changes in internal recirculation and nitrogen-related variables. These faults reflect typical abnormalities in biochemical reactions, settling separation, and operational regulation.
Sensor faults are used to simulate measurement abnormalities in monitoring and control loops. Fault 5 corresponds to a shift in the dissolved oxygen controller setpoint, Faults 6 and 7 represent fixed bias and linear drift of the dissolved oxygen sensor, respectively, and Fault 8 denotes complete sensor failure. Since the BSM1 system involves feedback control, sensor faults may not only affect state observation but also propagate to related process variables through the control loop.
The detailed settings and parameter descriptions of the eight faults are summarized in
Table 2. To illustrate the dynamic influence of process disturbances,
Figure 2 shows the temporal responses of all variables under Fault 1 and compares them with those under normal operating conditions.
2.2. Case 2: Papermaking Process Monitoring Dataset
To validate the applicability and robustness of the proposed fault detection model in practical industrial processes, production data collected from a papermaking enterprise from January to December 2024 were used in this study. The dataset covers four key sections of the papermaking process: the approach flow, wire, press, and drying sections. These sections are sequentially connected and exhibit strong coupling and dynamic transmission among process variables, making them representative for process monitoring. The raw field data were first screened to exclude invalid records associated with production shutdown, operating condition switching, and abnormal missing values. Consequently, 442 valid samples were retained for each process section. The approach flow, wire, press, and drying sections include 21, 10, 27, and 13 process variables, respectively.
In practical industrial processes, severe faults generally occur with low frequency, and field monitoring data often lack sufficient and accurate fault annotations. Therefore, representative abnormal patterns of process variables are commonly constructed based on normal operating data in industrial process monitoring studies to evaluate the identification capability of fault detection methods under different dynamic disturbance conditions. Referring to commonly observed abnormal evolution patterns of variables in industrial process monitoring, four types of faults were constructed in this study, including drift, cycle, scale up, and scale down faults. In combination with the actual operating characteristics of the papermaking process, disturbances were introduced into selected key variables at specified time points to simulate different variation trends that may occur under abnormal operating conditions.
The detailed fault settings, including the fault number, corresponding process section, fault type, and affected variables, are summarized in
Table 3. All faults were introduced at the 293rd sample. The model was trained using data collected under normal operating conditions, while the fault data were used for testing and performance evaluation.
Figure 3 compares the variable trajectories under normal and fault-injection conditions for Fault 1, showing the characteristic changes in process variables after fault occurrence.
3. Materials and Methods
3.1. Data Preprocessing
The original process data are first standardized, and time-series samples are then constructed using a sliding window, with the next observation used as the prediction target.
For the
i-th input window, the condition vector is constructed by concatenating the mean and standard deviation vectors of all variables within the window:
where
and
represent the mean and standard deviation vectors of the variables in the window, respectively. Hence,
.
For the
p-th variable, they are calculated as:
where
L denotes the window length, and
denotes the value of the
p-
th variable at the
t-th step within the
i-th window. In this way, the condition vector retains the local level and fluctuation information of each window, thereby providing auxiliary constraints for latent distribution learning.
3.2. Conditional Variational Autoencoder
In this work, conditional information is introduced into the VAE framework [
35] to construct a CVAE, whose structure is shown in
Figure 4. Compared with the conventional VAE, the CVAE incorporates a condition vector derived from window-based statistical features into both the encoder and decoder. This enables the latent distribution to be learned under local process constraints, thereby improving the representation of normal operating modes.
For an input sample
and its corresponding condition vector
, the encoder maps them into the latent distribution:
In Equation (4),
zi is the latent variable,
μi and
represent the mean and variance of the latent variable distribution;
ϕ denotes the encoder parameters. The reparameterization trick is then used to sample the latent variable:
During decoding,
is concatenated with
and fed into the decoder to reconstruct the input sample:
where
denotes the decoder parameters. By introducing
into both the encoder and decoder, the CVAE can learn latent representations related to the current local process state.
The CVAE training objective consists of a reconstruction loss and a Kullback–Leibler divergence term:
The parameter
β is the weight factor of the KL divergence term. The reconstruction loss
is defined as:
Here,
is the window length,
is the number of variables, and
is the number of samples constructed from sliding windows. The KL divergence term is given by:
The symbol dz denotes the dimensionality of the latent variable, and β is the weight factor for the KL term. By jointly optimizing these two terms, the model learns a smooth latent representation while retaining the ability to reconstruct normal samples under local process constraints.
3.3. Memory Module
The memory module stores representative normal patterns in the latent space and retrieves the most relevant memory information for the current sample [
27]. The memory matrix is defined as:
where
denotes the number of memory units, and
represents the latent prototype vector of the
k-th memory unit. For the
i-th input sample, the encoder first outputs the parameters of the latent variable distribution,
and
. Here,
serves as the query vector for the memory module, which is used to compute the similarity between the input and each memory unit. To eliminate the influence of the vector norm difference on the matching results, both the query vector and the memory vectors are
-normalized, and the cosine similarity between them is computed as:
This similarity measures the directional consistency between
μi and each memory prototype. Subsequently, a temperature coefficient
is introduced to scale the similarity, and a softmax function is applied to obtain the attention weight for each memory unit:
The parameter
adjusts the sharpness of the attention distribution. When
is small, the model focuses more on a few memory units with high similarity. When
is large, the attention distribution becomes smoother. After obtaining the attention weights, a weighted sum over the memory units is computed to obtain the memory read vector:
This vector represents the information most relevant to the current input in terms of normal operating patterns. It is fused with the latent variable
obtained from the reparameterization trick using a fusion coefficient
to form the final enhanced latent representation:
Here, is the fusion coefficient. The fused latent vector preserves both the characteristics of the current input sample and the memory-enhanced information, serving as input for subsequent decoder reconstruction.
3.4. Informer-Based Prediction Module
The Informer-based prediction module is used to model the temporal evolution immediately following the input window. By introducing a prediction branch, the model complements the reconstruction branch in capturing dynamic dependencies and improves its sensitivity to abnormal temporal variations.
Let the input window sequence be:
where
is the window length and
is the number of variables. The Informer module takes the window sequence
as input and outputs the prediction
at the next time step. First, a linear mapping projects the original input into a high-dimensional feature space, and positional encoding is added to retain temporal order information, yielding the initial embedding representation:
with
and
P denoting the input projection matrix and positional encoding matrix, respectively.
represents the initial feature embedding of the input sequence, and
is the embedding dimension.
During the encoding phase, Informer employs probabilistic sparse (ProbSparse) self-attention to model the long-range temporal dependencies in the input sequence [
36].
Figure 5 shows the structure of ProbSparse self-attention. Instead of computing attention for all queries, ProbSparse self-attention selects the top-
u queries with the highest sparsity scores for attention calculation. This strategy reduces computational complexity while preserving the dominant dependency relationships in the sequence.
For the
l-th encoder layer, the input features
are linearly projected to obtain the query, key, and value matrices:
The matrices
,
,
correspond to the query, key, and value projections, respectively. For each query vector
, Informer calculates a sparsity measure using its dot product with all key vectors to evaluate its contribution to overall attention:
Here, is the key dimension and is the sequence length. This metric reflects the sparsity of attention for query ; queries that have stronger correlations with a few keys receive higher sparsity scores and are retained for subsequent attention calculation.
Based on the sparsity measure, the most important
queries are selected from all queries to form the sparse query set
, and the ProbSparse attention is then computed accordingly:
In the multi-head mechanism, the features are projected into multiple subspaces, and the outputs of all heads are concatenated and linearly transformed:
where the output of the
r-th attention head is:
After ProbSparse attention, the features are fed into the feedforward network with residual connections and layer normalization for stable training. Thus, the output of the
l-th layer of the encoder can be expressed as:
Here, denotes the position-wise feed-forward network.
To improve sequence modeling efficiency, Informer incorporates a distilling mechanism between encoder layers. Specifically, after partial encoding, a 1D convolution, activation, and pooling compress the temporal dimension, reducing redundant information while highlighting dominant dynamic features:
After multi-layer ProbSparse attention and distilling, the encoder outputs high-level temporal features, which are projected to the prediction layer to obtain the multivariate forecast for the next time step:
where
represents the aggregated window-level temporal features from the encoder,
is the prediction projection function, and
is the predicted multivariate value at the next time step.
3.5. Joint Loss Function
To jointly optimize reconstruction, prediction, latent distribution regularization, and memory representation learning, a joint loss function is adopted. The overall loss function is defined as:
In Equation (28), , and denote the reconstruction loss, prediction loss, and KL divergence loss, respectively. The terms , and are introduced to constrain memory representation learning. The coefficient β is the same KL divergence weight as defined in Equation (7). The parameters λrec, λpred, λpull, λent, and λdecay are the weights assigned to the reconstruction loss, prediction loss, pull loss, entropy regularization term, and memory weight decay term, respectively.
The reconstruction loss and KL divergence loss have been introduced in
Section 3.2 and are not elaborated here. The prediction loss measures the deviation between the predicted and actual next-step states:
where
is the number of variables,
is the prediction horizon (for this model
since single-step prediction is used). Additionally, the memory module introduces corresponding constraints. The pull loss reduces the discrepancy between the mean latent variable and the memory read vector:
The entropy regularization term is used to control the sharpness of memory attention allocation, formulated as:
The memory weight decay term is expressed as:
By optimizing these loss terms together, the model learns normal reconstruction patterns, temporal prediction relationships, and memory-enhanced latent representations, providing a basis for subsequent monitoring statistic construction and fault detection.
4. Process Monitoring Framework Based on MI-CVAE
4.1. The Proposed MI-CVAE Model
The MI-CVAE model consists of a conditional variational autoencoder, a memory module, and an Informer-based prediction branch, forming a dual-branch monitoring framework that combines reconstruction and prediction. As shown in
Figure 6, the model takes the time-series window sample
and its condition vector
as inputs. The condition vector is constructed from the mean and standard deviation of each variable within the window to provide local operating-condition information.
In the reconstruction branch, the encoder maps and into a latent distribution, and the latent representation is obtained through the reparameterization trick. The memory module retrieves representative normal patterns from the memory bank using the latent mean as the query, and then fuses the retrieved memory information with the original latent variable for reconstruction. This memory-enhanced mechanism constrains the reconstruction process with normal operating patterns, thereby reducing the over-generalization of the autoencoder to abnormal samples and improving the sensitivity of reconstruction errors to faults. In the prediction branch, the Informer module learns temporal dependencies from the input window and predicts the process state at the next time step. This branch complements the reconstruction branch by capturing the normal temporal evolution of process variables. When faults occur, the deviation from the learned evolution pattern leads to increased prediction errors.
Overall, MI-CVAE integrates normal-pattern reconstruction and temporal prediction within a unified framework. The reconstruction branch captures latent distribution characteristics, while the prediction branch models dynamic evolution patterns, providing a more comprehensive feature basis for fault detection.
Figure 7 illustrates the implementation procedure of the MI-CVAE-based fault detection model. The detailed steps are as follows.
Step 1: The raw process data are standardized, and the input samples and one-step-ahead prediction targets are constructed using a sliding-window strategy. Meanwhile, the mean and standard deviation of each variable within the window are extracted to form the condition vector.
Step 2: The MI-CVAE model is trained using samples collected under normal operating conditions. The model is jointly optimized by the reconstruction loss, prediction loss, KL divergence loss, and constraint terms associated with the memory module.
Step 3: The reconstruction and prediction errors are calculated using the normal samples in the training set. The fused squared prediction error (SPE) and squared Mahalanobis distance (MD2) statistics are then constructed, and the corresponding control limits are determined by kernel density estimation.
Step 4: The test samples are fed into the trained MI-CVAE model to calculate the corresponding SPE and MD2 statistics. These statistics are compared with the control limits to discriminate between normal and abnormal operating states.
Step 5: Based on the detection results in the normal and faulty periods, the fault detection rate and false alarm rate are calculated to evaluate the fault detection performance of the model.
4.2. Monitoring Statistics
Monitoring statistics are used to quantify the deviation of process samples from normal operating conditions. In this work, SPE and MD2 monitoring statistics are constructed from both the reconstruction and prediction branches.
For an arbitrary input window sample, the reconstruction error and prediction error are defined as:
where
x denotes the input window sample, and
is the reconstructed output of the model.
y denotes the true prediction target, and
is the output of the prediction branch.
and
represent the reconstruction error and prediction error, respectively.
According to the definition of SPE, the SPE statistics corresponding to the reconstruction and prediction branches are expressed as:
To avoid scale differences between the two branches, the SPE statistics are standardized using the mean and standard deviation calculated from the training set:
where
and
denote the mean and standard deviation of the SPE statistic of the reconstruction branch in the training set, respectively. Similarly,
and
denote the mean and standard deviation of the SPE statistic of the prediction branch.
and
are the standardized SPE statistics of the reconstruction and prediction branches, respectively.
The standardized SPE statistics from the two branches are then fused in a weighted manner to obtain the final SPE monitoring statistic:
The coefficients
and
are assigned to the reconstruction and prediction branches, respectively, and satisfy:
To further consider the covariance structure among residual variables, MD
2 is introduced as a complementary monitoring statistic. For the reconstruction and prediction branches, the instantaneous MD
2 statistics are defined as:
where
denotes the reconstruction residual at the
t-th time step within the input window, and
denotes the prediction residual at the
t-th prediction step.
and
are the mean vectors of the corresponding residuals in the training set, while
and
are the corresponding residual covariance matrices.
Since local anomalies may be diluted by averaging over the window, the maximum instantaneous MD
2 is used as the window-level statistic:
The MD
2 statistic of the prediction branch is defined as:
where
L denotes the length of the input window, and
H denotes the prediction horizon. When the prediction horizon is equal to 1, the MD
2 statistic of the prediction branch is calculated from the residual of a single prediction step.
Similar to SPE, the branch-specific MD
2 statistics are standardized using the corresponding training statistics:
Here, and denote the mean and standard deviation of the MD2 statistic of the reconstruction branch in the training set, respectively. and denote the mean and standard deviation of the MD2 statistic of the prediction branch, respectively. and are the standardized MD2 statistics of the two branches.
The final MD
2 monitoring statistic is obtained as:
After obtaining the monitoring statistics from the training set, kernel density estimation (KDE) is employed to model the probability distribution of the training statistics in a nonparametric manner, thereby avoiding prior assumptions about their distribution forms. Let
denote a set of monitoring statistic values calculated from the training samples. The corresponding probability density function can be estimated as:
In Equation (48),
n is the number of training samples,
b is the bandwidth, and
K(⋅) represents the kernel function. A Gaussian kernel is used in this work. Given a confidence level
α, the control limit
δ is determined by:
Here,
J denotes the monitoring statistic to be modeled, which can be either the fused SPE or MD
2, and
δ represents the corresponding control limit under the confidence level
α. The confidence level of the KDE-based control limit was set to
α = 0.99, following the common practice in process monitoring studies where a 99% confidence limit is used to determine monitoring thresholds [
37,
38]. This setting corresponds to an approximate nominal false alarm probability of 1% under normal operating conditions, thereby helping to suppress unnecessary false alarms while preserving sufficient sensitivity to faults.
During testing, each sample is fed into the trained MI-CVAE model, and its fused SPE and MD2 statistics are compared with the control limits. A sample is identified as faulty if either or ; otherwise, it is regarded as normal.
4.3. Evaluation Metrics
The fault detection rate (FDR) and false alarm rate (FAR) were adopted as evaluation metrics. Based on the confusion matrix, TP denotes the number of fault samples correctly identified as abnormal, FN denotes the number of fault samples incorrectly classified as normal, FP denotes the number of normal samples incorrectly identified as abnormal, and TN denotes the number of normal samples correctly classified as normal. The two metrics are defined as follows:
A higher FDR indicates stronger capability in identifying fault samples, whereas a lower FAR indicates fewer false alarms under normal operating conditions. Therefore, a desirable fault detection method should achieve a high FDR while maintaining a low FAR.