Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals

Wang, Rui; Shan, Guangbin; Qiu, Feng; Zhu, Linqi; Wang, Kang; Meng, Xianglong; Li, Ruiqin; Song, Kai; Chen, Xu

doi:10.3390/pr12122935

Open AccessArticle

Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals

by

Rui Wang

^1,†,

Guangbin Shan

^2,†,

Feng Qiu

²,

Linqi Zhu

²,

Kang Wang

³,

Xianglong Meng

³,

Ruiqin Li

³,

Kai Song

^1,*

and

Xu Chen

¹

School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China

²

State Key Laboratory of Chemical Safety, SINOPEC Research Institute of Safety Engineering Co., Ltd., Qingdao 266071, China

³

Shanxi Beifang XingAn Chemical Industry Co., Ltd., Taiyuan 030021, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Processes 2024, 12(12), 2935; https://doi.org/10.3390/pr12122935

Submission received: 9 November 2024 / Revised: 17 December 2024 / Accepted: 20 December 2024 / Published: 22 December 2024

(This article belongs to the Special Issue Advances in Process Systems Engineering: Selected Papers from China PSE Annual Meeting)

Download

Browse Figures

Versions Notes

Abstract

Corrosion monitoring is crucial for ensuring the structural integrity of equipment. Acoustic emission (AE) and electrochemical noise (EN) have been proven to be highly effective for the detection of corrosion. Due to the complementary nature of these two techniques, previous studies have demonstrated that combining both signals can facilitate research on corrosion monitoring. However, current machine learning models have not yet been able to effectively integrate these two different modal types of signals. Therefore, a new deep learning framework, CorroNet, is designed to synergistically integrate AE and EN signals at the algorithmic level for the first time. The CorroNet leverages multimodal learning, enhances accuracy, and automates the monitoring process. During training, paired AE-EN data and unpaired EN data are used, with AE signals serving as anchors to help the model better align EN signals with the same corrosion stage. A new feature alignment loss function and a probability distribution consistency loss function are designed to facilitate more effective feature learning to improve classification performance. Experimental results demonstrate that CorroNet achieves superior accuracy in corrosion stage classification compared to other state-of-the-art models, with an overall accuracy of 97.01%. Importantly, CorroNet requires only EN signals during the testing phase, making it suitable for stable and continuous monitoring applications. This framework offers a promising solution for real-time corrosion detection and structural health monitoring.

Keywords:

corrosion monitoring; process automation and monitoring; artificial neural networks; deep learning; acoustic emission; electrochemical noise

1. Introduction

The corrosion of equipment can result in facility damage, production interruptions, environmental pollution, and even explosion. It is one of the primary factors causing the structural integrity degradation in equipment. Therefore, developing effective automated corrosion monitoring methods is crucial to ensure process safety, equipment reliability, and environment protection [1,2].

Acoustic emission (AE) and electrochemical noise (EN) technologies are two types of technologies proven to be effective for corrosion monitoring. AE involves detecting high-frequency elastic waves generated by the rapid release of energy within a material [3,4,5]. Such signals primarily reflect physical phenomena occurring during crack propagation, pitting initiation, hydrogen embrittlement, and other microscopic damage behaviors. On the other hand, EN signals primarily reflect electrochemical reactions during corrosion, with random or semi-random fluctuations in current and potential due to uneven reactions, revealing changes in the corrosion process mechanisms. [6,7,8,9]. Because EN signals are sensitive to the early stages of corrosion, they have been applied in the research of the corrosion behavior of metallic materials [10], the corrosion inhibition of steel reinforcement in concrete [11], stress corrosion cracking [12], and several other corrosion areas.

AE and EN signals for pitting corrosion are chaotic, non-linear, and complex, requiring advanced non-linear methods for processing. Deep learning algorithms effectively address this need. For example, Homborg et al. interpreted EN time–frequency spectra as an image classification problem. They explored the application of convolutional neural networks (CNNs) for the deep learning-based image classification of electrochemical noise time–frequency transient information [13]. Rubio et al. utilized a deep autoencoder model to analyze local electrochemical noise for PEM fuel cell fault diagnosis [14]. Guo et al. applied the InceptionTime model for damage classification in carbon fiber-reinforced composites using AE time-series data [15]. Liu et al. proposed a transfer learning method based on convolutional neural networks (CNN-TL) for pipeline leak detection under various working conditions [16]. Qiu et al. established a CNN-based hydrogen defect recognition (HDR) model from AE signals [17].

As a passive method, however, AE relies on damage events, lacks early warning capabilities, and struggles with noise interference and signal interpretation, while EN signals can directly represent the chemical reactions occurring during corrosion. But EN signals’ characteristics are complex, especially with the indistinct features across different corrosion stages. This complexity makes stage-specific feature analysis challenging. Due to the complementary nature of AE and EN signals in corrosion monitoring, their combined use offers significant potential for advancing corrosion research. In fact, the integrated analysis of these signals has traditionally been conducted manually [18,19,20,21,22]. But such extensive manual analyses are time consuming and labor intensive.

Recently developed multimodal techniques provide a way to integrate different types of data at the feature and/or algorithmic levels to enhance the model’s performance [23,24,25,26,27]. Inspired by these state-of-the-art (SOTA) methods, in this paper, a new network, called CorroNet, was proposed to synergistically integrate AE and EN signals at the algorithmic level for the first time. In CorroNet, AE and EN are treated as two distinct modalities with different granularities. Each modality reflects the different aspects of corrosion process while sharing semantically related features of the corrosion stages. By employing multimodal deep learning algorithms to jointly process these two modalities, more accurate corrosion state monitoring may be achieved. The contributions of this paper are summarized as follows:

(1): A new deep learning framework, named CorroNet, for corrosion monitoring is proposed. It achieves multi-granularity synergistic learning between AE and EN signals at the model learning level;
(2): A feature alignment method for corrosion events and a probability distribution alignment loss function are designed for CorroNet to significantly improve the performance in the corrosion monitoring task.

This paper is organized as follows: Section 2 introduces the proposed CorroNet model, the experimental setup, and the corrosion dataset. Section 3 presents an analysis of the model’s corrosion monitoring results. Section 4 provides the conclusions.

2. Methods

The goal of the acoustic-electrochemical synergistic corrosion monitoring is to achieve synergistic learning between the two modalities of acoustic emission (AE) and electrochemical noise (EN) data and to learn an accurate mapping function

F (x)

from the training set

X_{t r a i n}

in order to minimize the prediction error on the test set

X_{t e s t}

.

As shown in Figure 1, CorroNet leverages both AE and EN signals to enhance corrosion monitoring. AE, as a passive method, captures signals only during active damage events, resulting in irregular and discontinuous data. In contrast, EN provides continuous-time data with uniform sampling, making it suitable for stable monitoring. CorroNet integrates AE and EN signals during training to leverage both modalities. During testing, it uses only EN signals for consistent and reliable online monitoring.

The training set consists of paired AE-EN data and additional unpaired EN samples. AE signals are selected based on significant corrosion stage features (as shown in Figure 1a), serving as anchors during the learning process. These anchors assist in aligning EN signals of the same category, enhancing the model’s ability to recognize corrosion stages and improving the classification accuracy of the EN data (as shown in Figure 1b).

As illustrated in Figure 1c, AE signals support synergistic learning with EN during training, enriching CorroNet’s understanding of corrosion mechanisms. Once fully trained, CorroNet can effectively combine AE’s intermittent insights with EN’s stable monitoring capability. Therefore, it can accurately predict corrosion stages using only EN signals. The subsequent experimental results (Section 3) provide validation and support for this.

2.1. The Overview of the CorroNet

The Transformer encoder, known for strong sequential modeling, is used as CorroNet’s backbone to extract features from EN and AE time-series data. CorroNet has two branches to process AE and EN signals separately and eventually performs synergistic learning. Different modules are connected after the Transformer encoders in each branch to accomplish various functions. The detailed architecture of CorroNet is shown in Figure 2:

(1): Most importantly, to achieve multi-granularity synergistic learning, a feature alignment loss $L_{I n f o N C E}$ and a probability distribution consistency loss $L_{K L}$ are designed. After the Transformer encoders pass $f_{A}$ and $f_{E}$ in the two branches, projection heads $g_{A}$ and $g_{E}^{a}$ (represented in orange and blue, respectively) are introduced to align the representations $p_{A}$ and $p_{E}$ of the two modalities in feature space. Additionally, supervised contrastive learning is applied to the EN data using a second projection head $g_{E}^{b}$ (green) after the EN Transformer encoder with the loss function $L_{S C}$ ;
(2): Secondly, since AE signals serve as anchor points to optimize the feature mapping of EN signals, a classification head $q_{A}$ is connected after the Transformer encoder $f_{A}$ . It is used to initially train the model for accurate AE classification, for more precise anchor points in the feature space, and for the subsequent optimization of EN signals’ mapping. Similarly, a classification head $q_{E}$ is connected after the EN branch’s Transformer encoder $f_{E}$ ;
(3): Additionally, due to the multi-granularity feature learning, the complex multi-level feature alignment and loss constraints make precise training and pre-training even more essential. Therefore, a specific training strategy is proposed for CorroNet, dividing the training process into three stages. The details can be found in Section 2.4.

Each component of the CorroNet is discussed in detail in the following section.

2.2. The Transformer Encoder

AE and EN signals are both time-series data characterized by temporal dependencies. Previous studies have demonstrated that Transformer models have achieved remarkable success in temporal modeling tasks [28,29,30,31,32]. As illustrated in Figure 2, the Transformer encoder is introduced as the backbone of CorroNet to leverage the attention mechanism for efficient feature extraction from AE and EN signals, represented as

f_{A}

and

f_{E}

, respectively. Specifically, the Transformer encoder consists of three main components: the embedding block, multi-head attention, and multi-layer perceptron (MLP). The embedding block transforms the raw time-series data into multiple tokens; multi-head attention captures temporal dependencies, and the MLP performs non-linear transformations and feature fusion to enhance expressiveness. The details of the Transformer encoder are in Appendix A.1.

2.3. The AE-EN Alignment Framework

The core component of CorroNet is the multi-granularity synergistic learning between AE and EN signals. It includes three main parts, feature alignment learning, probability distribution alignment learning, and supervised contrastive learning. Each of these components addresses different aspects of the model’s learning process, and their integration enhances the overall corrosion monitoring capabilities of the model. Feature alignment captures mutual information between AE and EN signals; distribution alignment minimizes modality mismatches, and supervised contrastive learning uses labels to enhance corrosion stage feature extraction in EN. The following sections provide a detailed explanation of each module within CorroNet.

The specific structure of the multi-granularity synergistic learning consists of Transformer encoders

f_{A}

and

f_{E}

and projection heads

g_{A}

,

g_{E}^{a}

, and

g_{E}^{b}

. Transformer encoder

f_{A}

is responsible for feature extraction from AE signals, followed by the projection head

g_{A}

, which comprises an MLP to further map the extracted features. The features

z_{A}

extracted by

f_{A}

serve as the class anchors for guiding the feature learning of EN signals. Another branch extracts features from EN signals using the Transformer encoder

f_{E}

. Two parallel projection heads,

g_{E}^{a}

and

g_{E}^{b}

, map these features into different subspaces. Specifically,

g_{E}^{a}

works together with

g_{A}

to achieve a joint mapping for the AE and EN modalities. The mapping produced by

g_{E}^{b}

is used for supervised contrastive learning to further enhance the model’s learning of EN signals.

In the synergistic learning modules, all samples from

X_{t r a i n}

are utilized, including the paired AE-EN dataset

X_{A E - E N} = \{{(x_{E}, x_{A}, y_{A})}_{i = 1}^{N_{1}}\}

and additional EN training data

X_{E N} = \{{(x, y)}_{i = 1}^{N_{2}}\}

. As illustrated in Figure 2, AE samples

x_{A}

are fed into the Transformer encoder

f_{A}

to generate the features

z_{A}

. The features

z_{A}

are then passed through the projection head

g_{A}

, resulting in the representations

p_{A}

. The paired EN samples

x_{E}

are fed into the Transformer encoder

f_{E}

, which outputs the features

z_{E}

. Subsequently, the features

z_{E}

are further mapped by the projection head

g_{E}^{a}

to generate the representations

p_{E}

. As shown in Figure 2, the model uses feature alignment learning

L_{I n f o N C E}

and distribution alignment learning

L_{K L}

to treat AE representations

p_{A}

as class anchors. This aligns EN representations

p_{E}

with

p_{A}

, training

f_{E}

to learn corrosion stage semantics.

The feature alignment learning

L_{I n f o N C E}

is based on the InfoNCE loss [33], which enhances the similarity between paired representations

(p_{A}, p_{E})

while pushing apart unpaired samples, as formulated in Equation (1):

L_{I n f o N C E} = - \sum_{i = 1}^{N_{1}} \log (\frac{\exp (p_{E}^{i} \cdot p_{A}^{i} / τ)}{\exp (p_{E}^{i} \cdot p_{A}^{i} / τ) + \sum_{j \neq i}^{N_{1}} \exp (p_{E}^{i} \cdot p_{A}^{j} / τ)}),

(1)

where

τ

is the temperature parameter [34], used to control the smoothness during training.

Distribution alignment learning

L_{K L}

is designed based on the Kullback–Leibler divergence.

L_{K L}

minimizes the distribution mismatch between AE and EN modalities. Specifically, it first computes the individual distribution probability scores

s_{A E} a n d s_{E N}

for AE and EN samples. Then, an exponential kernel is applied to fit the distribution probability scores of the two modalities to compute their similarity. Taking AE as an example, the distribution probability score

s_{A E}^{i, j}

for the

(i, j)

-th pair of AE samples is calculated using Equation (2):

s_{A E}^{i, j} = \frac{e x p (p_{A}^{i} \cdot p_{A}^{j} / τ)}{\sum_{k = 1}^{N_{1}} e x p (p_{A}^{i} \cdot p_{A}^{k} / τ)}

(2)

The probability score of EN is obtained in the same way and is denoted as

s_{E N}^{i, j}

. The probability distribution similarity alignment loss

L_{K L}

is given by Equation (3):

L_{K L} = \sum_{i = 1}^{N_{1}} \sum_{j = 1}^{N_{1}} s_{E N}^{i, j} \cdot \log (\frac{s_{E N}^{i, j}}{s_{A E}^{i, j}}) .

(3)

Unpaired EN samples

x

are processed by

f_{E}

, producing features

z

, which are mapped by

g_{E}^{b}

into representations

p

. The model is further trained using supervised contrastive learning to enhance the model’s ability to learn discriminative features for different stages of EN signals. Supervised contrastive learning

L_{S C}

encourages the model to pull closer the feature representations of samples belonging to the same class, while pushing apart the feature representations of samples from different classes, as described in Equation (4):

L_{S C} = \sum_{i \in I} \frac{- 1}{|P (i)|} \sum_{j \in P (i)} \log (\frac{\exp (z_{i} \cdot z_{j} / τ)}{\sum_{a ϵ A (i)} \exp (z_{i} \cdot z_{a} / τ)}),

(4)

where

I

denotes the set of indices for all samples in a batch, and

P (i)

represents the set of positive samples that share the same label as sample

i

.

A (i)

is the set of all other samples in the training batch except for the

i

-th sample itself.

τ

is the temperature parameter that controls the smoothness of contrastive learning. In summary, the training loss function for multi-granularity synergistic learning consists of three components:

L_{I n f o N C E}

,

L_{K L}

, and

L_{S C}

. The detailed optimization process for the parameters of each module in the model is provided in Section 2.4.

Additionally, since AE signals serve as anchors to assist in EN signal learning, it is essential to first obtain a feature extractor which is capable of extracting effective corrosion stage discriminative features (denoted as

z_{A}

) from AE signals. Therefore, before the AE-EN synergistic learning, the Transformer encoder

f_{A}

is pre-trained with a supervised classification task. After encoding AE signals with

f_{A}

, a classification head

q_{A}

composed of an MLP is added to classify the corrosion stages of AE signals. This outputs the predicted stage label

\hat{y_{A}} = q_{A} (f_{A} (x_{A}))

. The AE portion

\{{(x_{A}, y_{A})}_{i = 1}^{N_{1}}\}

of the

X_{A E - E N}

is used as the training data. The AE signals

x_{A}

are input into the classifier, and the predicted outputs

\hat{y_{A}}

are compared with the true labels

y_{A}

to compute the cross-entropy loss

L_{C N}^{A}

, as shown in Equation (5):

L_{C N}^{A} = - \sum_{i = 1}^{N_{1}} {y_{A}}^{i} l o g ({\hat{y_{A}}}^{i}),

(5)

where

i

represents the sample index, and

N_{1}

is the number of AE samples.

After multi-granularity AE-EN synergistic training, the Transformer encoder

f_{E}

gains the ability to effectively extract corrosion-related semantic information from EN signals. Finally,

f_{E}

is fine-tuned to perform the final corrosion stage classification task. As shown in Figure 2, a classification head

q_{E}

, composed of an MLP, is attached to

f_{E}

. During the fine-tuning stage, only EN data are used to train the model. The samples

x

from

X_{E N}

are fed into

f_{E}

and

q_{E}

, which output the predicted labels

\hat{y}

. Similar to Equation (5), the cross-entropy loss

L_{C N}^{E}

is computed between

\hat{y}

and the true labels

y

and used to fine-tune the model, further improving its classification performance.

L_{C N}^{E} = - \sum_{i = 1}^{N_{2}} y^{i} l o g ({\hat{y}}^{i}) .

(6)

2.4. Model Training and Optimization

This section describes the training strategy proposed for CorroNet. The proposed CorroNet model undergoes a three-stage training process to maximize its ability to extract semantic corrosion stage features from EN signals, thereby improving its performance in corrosion stage monitoring. Stage 1 aims to pre-train the transformer encoder of the AE branch to extract highly discriminative representations of corrosion stages from AE signals to provide anchor points for feature mapping of EN signals. Stage 2 focuses on achieving synergistic learning between AE and EN signals to enhance the model’s ability to recognize features of different corrosion stages through multi-granularity synergistic learning. Stage 3 further utilizes corrosion label information to enable the model to fully learn the corrosion stage features from EN signals. The parameters of the CorroNet model are optimized using gradient backpropagation, with the specific process detailed in Equations (7)–(16). The parameters for each component

f_{A}

,

f_{E}

,

q_{A}

,

q_{E}

,

g_{A}

,

g_{E}^{a}

, and

g_{E}^{b}

are denoted as

W_{f_{A}}

,

W_{f_{E}}

,

W_{q_{A}}

,

W_{q_{E}}

,

W_{g_{A}}

,

W_{g_{E}^{a}}

, and

W_{g_{E}^{b}}

, respectively.

(1): Stage 1 Parameter Optimization:

AE signals serve as anchors to optimize the feature mapping of EN signals, so the AE branch’s Transformer encoder

f_{A}

is first trained to extract corrosion stage features. Therefore, only the parameters of

f_{A}

and

q_{A}

are updated in Stage 1, while other modules are frozen.

W_{f_{A}} \leftarrow W_{f_{A}} - η_{1} (\frac{\partial L_{C N}^{A}}{\partial W_{f_{A}}})

(7)

W_{q_{A}} \leftarrow W_{q_{A}} - η_{1} (\frac{\partial L_{C N}^{A}}{\partial W_{q_{A}}})

(8)

where

η_{1}

represents the learning rate for Stage 1.

(2): Stage 2 Parameter Optimization:

In the Stage 2, multi-granularity synergistic learning between AE and EN, as well as supervised contrastive learning for EN signals, is performed. Thus, the parameters of

f_{A}

,

g_{A}

,

f_{E}

,

g_{E}^{a}

, and

g_{E}^{b}

are updated, while other modules remain frozen.

W_{f_{A}} \leftarrow W_{f_{A}} - η_{2} (λ \frac{\partial L_{K L}}{\partial W_{f_{A}}} + (1 - λ) \frac{\partial L_{I n f o N C E}}{\partial W_{f_{A}}})

(9)

W_{g_{A}} \leftarrow W_{g_{A}} - η_{2} (λ \frac{\partial L_{K L}}{\partial W_{f_{A}}} + (1 - λ) \frac{\partial L_{I n f o N C E}}{\partial W_{f_{A}}})

(10)

W_{f_{E}} \leftarrow W_{f_{E}} - η_{2} (λ \frac{\partial L_{K L}}{\partial W_{f_{E}}} + (1 - λ) \frac{\partial L_{I n f o N C E}}{\partial W_{f_{E}}})

(11)

W_{g_{E}^{a}} \leftarrow W_{g_{E}^{a}} - η_{2} (λ \frac{\partial L_{K L}}{\partial W_{g_{E}^{a}}} + (1 - λ) \frac{\partial L_{I n f o N C E}}{\partial W_{g_{E}^{a}}})

(12)

W_{f_{E}} \leftarrow W_{f_{E}} - η_{2} (\frac{\partial L_{S C}}{\partial W_{f_{E}}})

(13)

W_{g_{E}^{b}} \leftarrow W_{g_{E}^{b}} - η_{2} (\frac{\partial L_{S C}}{\partial W_{f_{E}}})

(14)

where

η_{2}

represents the learning rate for Stage 2, and

λ

is the hyperparameter that balances the loss function components.

(3): Stage 3 Parameter Optimization:

In the final stage, fine-tuning is conducted to achieve the final corrosion stage classification. In this stage, only the parameters of

f_{E}

and

q_{E}

are updated, and all other modules are frozen.

W_{f_{E}} \leftarrow W_{f_{E}} - η_{3} (\frac{\partial L_{C N}^{E}}{\partial W_{f_{E}}}),

(15)

W_{q_{E}} \leftarrow W_{q_{E}} - η_{3} (\frac{\partial L_{C N}^{E}}{\partial W_{q_{E}}}),

(16)

where

η_{3}

represents the learning rate for Stage 1.

Based on the above formulas and descriptions, the pseudocode for the training and inference process can be found in Algorithm 1.

Algorithm 1: Training and Inference Procedure of CorroNet

Input: Training sample sets

X_{t r a i n}

consist of the paired AE-EN dataset

X_{A E - E N} = \{{(x_{E}, x_{A}, y_{A})}_{i = 1}^{N_{1}}\}

and the unpaired EN dataset

X_{E N} = \{{(x, y)}_{i = 1}^{N_{2}}\}

. Testing sample sets

X_{t e s t} = \{{(x_{t}, y_{t})}_{i = 1}^{N_{3}}\}

consist of the EN data.
Output: The predicted corrosion stage label

\hat{y_{t}}

for the sample

x_{t}

from the testing set.

Initialize randomly the network weight

W_{f_{A}}

, W_{f_{E}}

, W_{q_{A}}

, W_{q_{E}}

, W_{g_{A}}

, W_{g_{E}^{a}}

, and

W_{g_{E}^{b}}

. Set the number of iterations for each stage as

n_{1}

,

n_{2}

, and

n_{3}

.
Training:
for

(i = 1 t o n_{1})

do: # Stage 1 training process
Frozen

W_{f_{E}}

, W_{q_{E}}

, W_{g_{A}}

, W_{g_{E}^{a}}

, and W_{g_{E}^{b}}

.
obtain AE data and labels

{(x_{A}, y_{A})}

from X_{A E - E N}

.
Calculate

L_{C N}^{A}

by solving Equation (5).
Update

W_{f_{A}}

and W_{q_{A}}

by solving Equations (7) and (8).
end for.
for

(i = 1 t o n_{2})

do: # Stage 2 training process
for obtain a batch of paired AE-EN data

\{(x_{E}, x_{A})\}

from

D a t a l o a d e r (X_{A E - E N})

do:
Frozen

W_{q_{A}}

, W_{q_{E}}

, and W_{g_{E}^{b}}

.
Calculate

L_{I n f o N C E}

and L_{K L}

by solving Equations (1) and (3).
Update

W_{f_{A}}

, W_{g_{A}}

, W_{f_{E}}

, and W_{g_{E}^{a}}

by solving Equations (9)–(12).
end for.
for obtain a batch of EN data

\{x\}

from D a t a l o a d e r (X_{E N})

do:
Frozen

W_{f_{A}}

, W_{q_{A}}

, W_{q_{E}}

, W_{g_{A}},

and W_{g_{E}^{a}}

.
Calculate

L_{S C}

by solving Equation (4).
Update

W_{f_{E}}

and W_{g_{E}^{b}}

by solving Equations (13) and (14).
end for.
for

(i = 1 t o n_{3})

do: # Stage 3 training process
for obtain a batch of EN data

\{x\}

from D a t a l o a d e r (X_{E N})

do:
Frozen

W_{f_{A}}

, W_{q_{A}}

, W_{g_{A}}

, W_{g_{E}^{a}}

, and W_{g_{E}^{b}}

.
Calculate

L_{C N}^{E}

by solving Equation (6).
Update

W_{f_{E}}

and W_{q_{E}}

by solving Equations (15) and (16).
end for.
Inference:
Obtain a test sample

x_{t}

from X_{t e s t} .

Calculate the predicted label

\hat{y_{t}} = q_{E} (f_{E} (x_{t}))

.
Provide the model’s predicted result for the corresponding corrosion stage of the sample.

2.5. Evaluation Metrics

To evaluate the effectiveness of the model in corrosion monitoring, Accuracy and F1 score are chosen as evaluation metrics. Accuracy measures the proportion of correctly classified samples out of the total number of samples. It provides a straightforward measure of the model’s overall performance by indicating the percentage of correct predictions. The F1 score is the harmonic mean of precision and recall, balancing the trade-off between these two metrics. Precision measures the proportion of true positives out of all predicted positives, and recall measures the proportion of true positives out of all actual positives. Specifically, in addition to calculating the Accuracy (Acc.) and F1 score for each class individually, the Average Accuracy (Ave. Acc.) and the average Macro-F1 score on the test set were also calculated.

For each class

c

, Accuracy and F1 score are calculated using Equations (17) and (18):

{A c c}_{c} = \frac{{T P}_{c} + {T N}_{c}}{{T P}_{c} + {T N}_{c} + {F P}_{c} + {F N}_{c}}

(17)

{F 1}_{c} = 2 \times \frac{{P r e c i s i o n}_{c} \times {R e c a l l}_{c}}{{P r e c i s i o n}_{c} + {R e c a l l}_{c}}

(18)

{P r e c i s i o n}_{c} = \frac{{T P}_{c}}{{T P}_{c} + {F P}_{c}}

(19)

{R e c a l l}_{c} = \frac{T P_{c}}{{T P}_{c} + {F N}_{c}}

(20)

{T P}_{c}

is the true positives for class

c

;

{T N}_{c}

is true negatives for class

c

;

{F P}_{c}

is false positives for class

c

;

{F N}_{c}

is false negatives for class

c

. The Average Accuracy (Ave. Acc) across all classes is the mean of the per-class Accuracies, and the Macro-F1 score is the unweighted average of the per-class F1 scores.

2.6. Experimental Setup and Data Description

Pitting corrosion damage is a major issue affecting material strength as it can lead to catastrophic failure of metallic systems and structures, which is often difficult to predict. Therefore, this paper uses pitting corrosion as an example to evaluate the performance of CorroNet and other comparative methods. To obtain AE and EN data for model training and testing, pitting corrosion experiments were conducted on 304 stainless steels. The experimental setup is shown in Figure 3, with Figure 3a depicting a schematic diagram and Figure 3b showing the physical experimental setup. The experiments were conducted at room temperature, using 3.5% NaCl (Sinopharm Chemical Reagent Co. Ltd., Shanghai, China) as the corrosive medium. The number of 304 stainless steel specimens (Shen Xin Ke Ji Co. Ltd., Tengzhou, Shandong, China) used in this study was 20. The working surface of the 304 stainless steel specimens (25 × 50 × 3 mm) was ground with 2000-grit silicon carbide paper, cleaned with ethanol, and then air-dried. A circular area of 1 cm² was exposed for corrosion, with the rest sealed using D-31 epoxy/polyurethane primer. Except for the NaCl solution, all other materials were supplied by Shen Xin Ke Ji Co. Ltd. (Tengzhou, Shandong, China). Sampling details are in Appendix A.2.

The EN and AE data were labeled into three categories according to the corrosion states: no corrosion, metastable corrosion, and stable corrosion, denoted as C1, C2, and C3, respectively. In the no corrosion stage, the metal remains in a healthy state with no obvious signs of corrosion. In the metastable corrosion stage, corrosion has initiated but remains unstable, where corrosion reactions may be localized or intermittent. In the stable corrosion stage, the corrosion process enters a stable phase, with a steady rate and intensified damage [22,35].

Similar to previous studies [36], the synergistic AE-EN data pairs were generated by capturing both signals occurring simultaneously at the same time step. Each AE event comprises 1024 sampling points, and EN signals within 5 min before and after the event were paired as synergistic data. Given the small sample size of AE data, the goal was to achieve effective learning with a limited dataset. As a result, approximately 90 AE-EN aligned data segments were sampled.

In addition, we conducted multiple corrosion experiments with the same experimental setup as described above and collected additional unpaired EN data to expand the dataset. Each experiment lasted approximately 100 h in total, metastable corrosion began after about 10 h and stable corrosion after about 70 h. At the end of the experiment, the long-sequence data collected throughout the entire experiment were segmented. Each segment consists of 3600 sampling points, corresponding to a 1 h duration. In total, the EN dataset comprised 324 samples, with 108 samples for each corrosion stage. To ensure a fair comparison between models and consistency with previous studies, the EN data were randomly split into a train set and a test set in an 8:2 ratio. Using a sliding window approach (window length: 600, stride: 30), the original 3600-time-step sample segments were further divided into smaller segments. The details of the EN dataset are shown in Table 1.

The proposed CorroNet method was implemented in PyTorch 12.1, which was built using Anaconda (Anaconda Inc., Austin, TX, USA), and the code editor used was Visual Studio Code (Microsoft Corporation, Redmond, WA, USA). The Transformer encoders,

f_{A}

and

f_{E}

, were both configured with two layers (

L = 2

) and the multi-head attention mechanism included two heads. The optimizer was configured as AdamW with a learning rate of 0.003, the temperature parameter

τ

of 0.07, and the weight

λ

set at 0.5. All tasks used the same preprocessing method, and all models were trained and tested using the same workstation (Windows, Intel Xeon Gold 6248R, NVIDIA GeForce RTX 4090, and 256 GB RAM).

3. Results and Discussion

3.1. Comparison of Corrosion Monitoring Performance

To evaluate the corrosion monitoring performance of CorroNet, it was compared against commonly used deep learning models, i.e., CNN and LSTM. The dataset described in Section 2.6 was used to train and test all models. CNN is a representative model widely utilized in corrosion analysis [16,17,37]. LSTM is a deep learning model extensively employed for analyzing and modeling time-series data [38]. Previous studies have demonstrated their effectiveness in corrosion classification tasks.

In prior research, CNN and LSTM have predominantly been applied to single-modal signal corrosion classification tasks. For a fair comparison, experiments using CNN and LSTM within the proposed CorroNet multimodal framework were also conducted. Specifically, the Transformer backbone in CorroNet was replaced with CNN and LSTM, respectively. This created two variants: Corro-CNN and Corro-LSTM. This setup was intended to demonstrate the robustness and effectiveness of the proposed acoustic-electrochemical alignment framework in improving corrosion classification performance across different backbones. The comparison results are presented in Table 2.

As shown in Table 2, CorroNet achieved the highest Average Accuracy (Ave. Acc.) of 0.9701 and Macro-F1 score of 0.9701 among all models. For specific classes, C1 samples were the most challenging to classify for all models. This may be attributed to the less distinctive features and higher noise interference in the “no corrosion” stage. Notably, CorroNet achieved an Accuracy of 0.9203 and an F1 score of 0.9549 for C1, which are 11.28% higher in Accuracy and 4.8% higher in F1 score than the CNN and 8.94% higher in Accuracy and 4.26% higher in F1 score than the LSTM. Additionally, CorroNet maintained remarkable Accuracies and F1 scores for C2 and C3 with minimal variance while CNN and LSTM exhibited comparatively larger performance drops. This superior performance is attributed to CorroNet’s Transformer backbone for effective feature extraction. Its synergistic learning framework that aligns AE and EN signals enables more precise corrosion stage classification compared to the single-modal CNN and LSTM models.

The comparison between the regular models (CNN and LSTM) and their variants (Corro-CNN and Corro-LSTM) also demonstrated the effectiveness of synergistic learning. The regular CNN achieved the lowest performance with an Ave. Acc. of 0.9310 and a Macro-F1 of 0.9307. This suggested that CNN may have encountered difficulties when handling time-series data with complex temporal dependencies. LSTM performed slightly better than CNN (Ave. Acc. 0.9389, Macro-F1 0.9385). This result was expected, as LSTMs were generally more capable of handling time-series information compared to CNNs. However, LSTM still fell short compared to CorroNet, particularly in C1 (Accuracy 0.8807), where it struggled to capture the necessary features for accurate classification.

Integrating CNN into the CorroNet framework (Corro-CNN) significantly improved its performance, with a 2.38% increase in Ave. Acc. And a 2.4% increase in Macro-F1 score. The enhanced performance, especially in C1 (Accuracy +2.43%) and C2 (Accuracy +4.91%), indicated that the CorroNet framework helped CNN handle complex temporal tasks. Similarly, Corro-LSTM outperformed the LSTM, with a 2.13% increase in Ave. Acc. And a 2.16% increase in Macro-F1 score. The most significant improvement was in C1 (Accuracy +9.54%), where Corro-LSTM achieved the highest Accuracy (0.9761) among all the models. These results suggested that the CorroNet framework enhanced the ability to capture and align features from dual-modal data to further improve the overall performance.

To further analyze each model’s corrosion monitoring performance and ability to learn corrosion-related semantic features, Figure 4 presents the confusion matrixes of the predicted results on the test set. The horizontal axis represents the predicted labels, and the vertical axis represents the true labels.

The analysis of the confusion matrices for all five models revealed a general pattern, samples from adjacent corrosion stages exhibited more similar features, resulting in a relatively higher misclassification rate. The non-corroded stage (C1) and the metastable corrosion stage (C2) were misclassified by most of the models. For instance, in CNN model (Figure 4a), 13.5% of C1 samples were misclassified as C2, while in LSTM (Figure 4c), 8.33% of C1 samples were also misclassified as C2. Similarly, 6.21% of C2 samples in the CNN model were also misclassified as C3, and 4.55% in the LSTM model. This indicated that most models exhibited weaker recognition capabilities for early-stage corrosion data (C1 and C2). Despite these challenges, CorroNet (Figure 4e) maintained strong corrosion monitoring performance, with only 0.72% of C2 samples misclassified as C1.

Both Corro-CNN and Corro-LSTM significantly reduced misclassification compared to regular CNN and LSTM models. Corro-CNN (Figure 4b) reduced the misclassification rate of C1 as C2 by 1.8% and the misclassification rate of C2 as C3 by 5.08%. Corro-LSTM (Figure 4d) reduced the misclassification rate of C1 as C2 by 7.65%. The improvement in the performance of CNN and LSTM after incorporating the CorroNet framework further demonstrated the effectiveness of the synergistic learning between AE and EN data. In contrast, all models performed exceptionally well on the stable corrosion stage (C3), nearly achieving 100% Accuracy. This indicated that C3 features are more distinct and easier to classify with minimal errors.

In addition to the 8:2 split ratio, we analyzed how different data split ratios affected the model’s robustness in corrosion detection. The results and analysis are in Appendix A.3.

3.2. Ablation Experiments

To evaluate the key components in CorroNet, ablation experiments were conducted. CorroNet consists of the following critical components: (1) Using AE signals as anchors to enhance the learning of corrosion-related semantic features in EN signals. This is achieved through the multi-granularity synergistic learning of AE and EN, including feature alignment learning and probability distribution consistency learning; (2) introducing supervised contrastive learning to capture class-specific features of EN signals; (3) pre-training AE on the corrosion stage classification task to provide precise anchor points for optimizing feature mapping during training.

Therefore, the following seven ablation experiments were conducted:

(1) w/o FAL: the feature alignment learning (FAL) loss function

L_{I n f o N C E}

was removed; (2) w/o DAL: the probability distribution alignment learning (DAL) loss function

L_{K L}

was removed; (3) w/o SCL: the supervised contrastive learning (SCL)

L_{S C}

of EN signals was removed; (4) w/o AE pre-training: the pre-training of the AE feature extractor in Stage 1 was removed, and the subsequent learning was conducted directly; (5) w/o AE: the AE signals were entirely excluded from the learning model, which means that only the supervised contrastive learning and the fine-tuning of EN signals were performed; (6) w/o pre-train: both pre-training stages (Stages 1 and 2) were excluded, and only the classification fine-tuning of EN signals was performed directly; (7) w/o EN: The EN signals was removed during model training, and only AE signals were used for model training and prediction. The full model is represented by CorroNet. The results are shown in Table 3.

w/o FAL (without feature alignment learning): Removing the feature alignment learning

L_{I n f o N C E}

caused a 2.61% drop in Ave. Acc., with C1 Accuracy decreasing by 4.09%. This showed that feature alignment learning is essential for learning the corrosion features across stages.

w/o DAL (without distribution alignment learning): distribution alignment learning

L_{K L}

was particularly important for C1 classification, as its absence caused a noticeable 4.81% decline in C1 Accuracy and a 2.34% drop in F1 score.

w/o SCL (without supervised contrastive learning): Removing supervised contrastive learning resulted in a 7.24% drop in C1 Accuracy and a 3.97% decrease in F1 score. These showed that SCL is essential for learning class-specific features, especially for early-stage corrosion (C1).

w/o AE pre-training (without AE pre-training): Removing the pre-training of AE signals reduced Ave. Acc. by 1.05% and Macro-F1 score decreased by 1.17%. Additionally, the C1 Accuracy significantly dropped by 3.33%, highlighting the importance of AE pre-training for distinguishing early corrosion stages (C1).

w/o AE (without AE signals): Removing AE signals resulted in a significant performance degradation, especially in C1 (Accuracy drops by 5.67% and F1 score dropped by 2.81%) and C2 (Accuracy dropped by 4.46% and F1 score dropped by 4.97%). Additionally, the Ave. Acc. decreased by 3.28%, and the Macro-F1 score dropped by 3.3%. These highlight the importance of using AE signals as anchors to improve corrosion feature learning.

w/o pre-train (without pre-training in Stages 1 and 2): Removing both pre-training stages led to the worst performance, with Ave. Acc. decreasing by 4.06% and Macro-F1 by 4.14%. C1 Accuracy significantly dropped by 10.26%, with the F1 score decreasing by 6.77%. The F1 score for C2 also decreased by 4.96%, demonstrating that pre-training is critical for learning robust features. The sharp decline in C1 and C2 performance further underscored the importance of synergistic learning of AE and EN signals and the supervised contrastive learning for effective corrosion feature extraction.

w/o EN (without EN signals): Removing EN signals resulted in a significant performance drop, with the Ave. Acc. decreasing by 13.67% and the Macro-F1 score dropping by 13.9%. This highlights the importance of combining EN and AE signals for model learning.

Comparing the seven ablation experiments, it was found that removing EN signals caused the largest performance drop, which confirms their crucial role. Excluding AE signals or pre-training also led to considerable performance degradation, highlighting the importance of AE-EN multimodality and the pre-training stages. Overall, the ablation experiments strongly supported the design choices made in CorroNet, as each component contributed significantly to the model’s ability to accurately monitor corrosion stages, especially in the early stages where feature differences are more subtle.

4. Conclusions

A novel deep learning framework, named CorroNet, is proposed in this paper, which provides a new approach to automated corrosion monitoring. The proposed CorroNet has the potential to detect corrosion processes on the surface of stainless steels in chloride solutions. The main conclusions are summarized as follows:

(1): CorroNet pioneers the integration of multi-scale learning from both AE and EN signals, effectively capturing the intricate characteristics of corrosion evolution at different stages. By combining the complementary strengths of these two signals, CorroNet offers a comprehensive approach to corrosion monitoring;
(2): To further improve the model’s performance, CorroNet incorporates an adaptive learning mechanism that dynamically adjusts to the complexities of corrosion data. This mechanism not only optimizes feature extraction but also allows for more reliable detection of early-stage corrosion, which is often difficult to identify with traditional monitoring methods;
(3): A comprehensive metal corrosion monitoring task was conducted to validate CorroNet’s real-world applicability. The results highlight its superior performance in predicting various corrosion stages. Ablation studies further underscore the significance of the key model components in achieving this performance.

Despite the promising capabilities of CorroNet, there are opportunities for further improvement. The current research centers on pitting corrosion, one of the most prevalent corrosion forms. However, distinct types of corrosion have different mechanisms. As different corrosion mechanisms may require tailored approaches, future work should explore diverse corrosion types and extend CorroNet’s capabilities to handle multiple corrosion scenarios. Additionally, the development of a user-friendly graphical interface (GUI) is planned to facilitate better model interaction and accessibility for users in industrial settings.

Author Contributions

Conceptualization, R.W. and K.S.; methodology, R.W. and G.S.; software, R.W.; validation, K.W., X.M. and R.L.; resources, K.S.; data curation, G.S., F.Q. and L.Z.; writing—original draft preparation, R.W. and G.S.; writing—review and editing, K.S.; supervision, X.C.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2022YFC3004502), Ministry of Science and Technology of the People’s Republic of China.

Data Availability Statement

The code is available upon request.

Conflicts of Interest

Authors Guangbin Shan, Feng Qiu and Linqi Zhu were employed by State Key Laboratory of Chemical Safety, SINOPEC Research Institute of Safety Engineering Co. Ltd. Authors Kang Wang, Xianglong Meng and Ruiqin Li were employed by Shanxi Beifang XingAn Chemical Industry Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

AE	Acoustic emission
EN	Electrochemical noise
CNN	Convolutional neural network
LSTM	Long short-term memory network
PEM	Proton exchange membrane
CNN-TL	Transfer learning method based on convolutional neural networks
HDR	Hydrogen defect recognition
SOTA	State of the art
MLP	Multi-layer perceptron
InfoNCE	Information noise-contrastive estimation
KL	Kullback–Leibler divergence
CN	Cross-entropy
SCL	Supervised contrastive learning
FAL	Feature alignment learning
DAL	Distribution alignment learning
Ave. Acc.	Average Accuracy

Appendix A

Appendix A.1. The Transformer Encoder

This section provides the computational details of the Transformer encoder, as shown in Figure A1. The raw time-series input is represented as

x \in R^{H \times W}

, where

H

denotes the sequence length of the original sample matrix, and

W

denotes the number of features in the original sample. In the embedding module,

x

is first segmented into temporal slices and mapped into multiple one-dimensional token vectors

{x_{1}, x_{2}, \dots {, x}_{p} \in R^{d}}

through a linear projection. Consequently, the shape of

x

is transformed into

x \in R^{n \times d}

, where

d

is the dimension of the projected vectors, and

n

represents the number of tokens. A learnable position encoding

E_{p o s} \in R^{n \times d}

is then added to the token sequence

x

to retain positional information. The final output of the embedding module is denoted as

Z_{0} \in R^{n \times d}

.

Next,

Z_{0}

is processed through the multi-head attention computation. The attention mechanism extracts global features by computing the weighted sum of the values

V

, where the weights are derived from the interaction between queries

Q

and keys

K

across the tokens. Multi-head attention allows the model to simultaneously capture temporal features at multiple scales simultaneously. The computation of the multi-head attention mechanism is described using Equations (A1)–(A3):

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(A1)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(A2)

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{O}

(A3)

The dot-product (multiplicative) attention is computed according to Equation (A1). Here,

Q

,

K

, and

V

are obtained by applying normalization to the output sequence

Z_{0}

from the embedding block, followed by multiplication with the transformation matrices

W_{q}

,

W_{k}

, and

W_{v}

, respectively. These are referred to as the query matrix

Q \in R^{n \times d_{k}}

, the key matrix

K \in R^{n \times d_{k}}

, and the value matrix

V \in R^{n \times d_{v}}

. The result of the

i

-th attention head is denoted as

{h e a d}_{i}

, and all the heads are concatenated. The number of attention heads is denoted by

h

, and a linear transformation matrix

W_{o}

is applied to obtain the output

Z_{1} \in R^{n \times d}

. Finally,

Z_{1}

is fed into the MLP module, employing residual connections. To fully extract the features from the time-series data, the above attention computation and MLP module are repeated across multiple layers, with the model depth denoted by

L

. Therefore, the computation of the Transformer encoder module is summarized using Equations (A4) and (A5):

Z_{l}^{'} = M u l t i H e a d (L N (Z_{l - 1})) + Z_{l - 1}, l = 1 \dots L

(A4)

Z_{l} = M L P (L N (Z_{l}^{'})) + Z_{l}^{'}, l = 1 \dots L

(A5)

The output matrix

Z_{L}

from the final layer is subjected to average pooling to obtain the final feature vector, denoted as

z \in R^{d}

.

Figure A1. Detailed structure of the Transformer encoder.

Appendix A.2. Details of Data Collection

As shown in Figure 3, during the collection of AE signals, one side of the test steel sheet was in contact with the solution, while an AE sensor was placed on the other side. A PCI-2 acoustic emission detection system from Physical Acoustics Corporation (PAC) was used, with a 40 dB gain, and a WD-type broadband AE sensor was positioned above the specimen. The sampling frequency of the AE signal was set to 1024 Hz. The EN data were collected using a Gamry Reference 600 electrochemical workstation in ZRA mode, with a saturated calomel electrode (SCE) as the reference electrode, and the 304 steel specimen was used as the working electrode. The sampling frequency of the EN signal was 1 Hz. The corrosion process of the 304 steel specimen in 3.5% NaCl solution was simultaneously monitored using a combination of AE and EN methods. In addition to the experiments combining AE and EN, to expand the dataset and provide test data for evaluating the model’s performance, additional corrosion experiments were conducted where only EN signals were collected. The experimental parameters for these tests were identical to those described above.

Appendix A.3. Detailed Results for Different Data Split Ratios

In addition to the aforementioned 8:2 split ratio for EN signals, we further considered the impact of different data split ratios on the model’s performance to evaluate its robustness in corrosion detection under varying data partition scenarios. The results are shown in Table A1. Experimental results demonstrated that the proposed model maintained stable and robust corrosion monitoring capabilities despite changes in the data split ratio, further confirming the robustness of the proposed model.

Table A1. Results for different data split ratios.

	Ave. Acc.	Macro-F1	C1		C2		C3
	Ave. Acc.	Macro-F1	Acc.	F1	Acc.	F1	Acc.	F1
5:5	0.9606	0.9605	0.8971	0.9425	0.9899	0.9480	0.9949	0.9910
6:4	0.9679	0.9654	0.9095	0.9473	0.9899	0.9661	0.9849	0.9830
7:3	0.9686	0.9645	0.9273	0.9477	0.9699	0.9639	0.9849	0.9820
8:2	0.9701	0.9701	0.9203	0.9549	0.9928	0.9600	0.9973	0.9953
9:1	0.9781	0.9513	0.9028	0.8930	0.9625	0.9677	0.9949	0.9932

References

Khalaf, A.H.; Xiao, Y.; Xu, N.; Wu, B.; Li, H.; Lin, B.; Nie, Z.; Tang, J. Emerging AI technologies for corrosion monitoring in oil and gas industry: A comprehensive review. Eng. Fail. Anal. 2024, 155, 107735. [Google Scholar] [CrossRef]
Shao, X.; Cai, B.; Ahmed, S.; Zhou, X.; Hu, Z.; Sui, Z.; Liu, X. Towards proactive corrosion management: A predictive modeling approach in pipeline industrial applications. Process Saf. Environ. Prot. 2024, 190, 1471–1480. [Google Scholar] [CrossRef]
Wu, X.; Han, E.-H. Acoustic emission during pitting corrosion of 304 stainless steel. Corros. Sci. 2011, 53, 1537–1546. [Google Scholar]
Wu, K.; Byeon, J.-W. Morphological estimation of pitting corrosion on vertically positioned 304 stainless steel using acoustic-emission duration parameter. Corros. Sci. 2019, 148, 331–337. [Google Scholar] [CrossRef]
Wu, K.; Jung, W.-S.; Byeon, J.-W. In-situ monitoring of pitting corrosion on vertically positioned 304 stainless steel by analyzing acoustic-emission energy parameter. Corros. Sci. 2016, 105, 8–16. [Google Scholar] [CrossRef]
Cottis, R. Simulation of electrochemical noise due to metastable pitting. J. Corr. Sci. Eng. 2000, 3. Available online: https://www.jcse.org/viewPaper/ID/42/QIgtNHePZr4acRFOy9UMYZd36DXHAj (accessed on 2 November 2020).
Homborg, A.; Tinga, T.; van Westing, E.; Zhang, X.; Ferrari, G.; de Wit, J.; Mol, J. A critical appraisal of the interpretation of electrochemical noise for corrosion studies. Corrosion 2014, 70, 971–987. [Google Scholar] [CrossRef] [PubMed]
Denissen, P.J.; Homborg, A.M.; Garcia, S.J. Interpreting electrochemical noise and monitoring local corrosion by means of highly resolved spatiotemporal real-time optics. J. Electrochem. Soc. 2019, 166, C3275. [Google Scholar] [CrossRef]
Homborg, A.; Olgiati, M.; Denissen, P.; Garcia, S. An integral non-intrusive electrochemical and in-situ optical technique for the study of the effectiveness of corrosion inhibition. Electrochim. Acta 2022, 403, 139619. [Google Scholar] [CrossRef]
Xia, D.-H.; Ji, Y.; Zhang, R.; Mao, Y.; Behnamian, Y.; Hu, W.; Birbilis, N. On the localized corrosion of AA5083 in a simulated dynamic seawater/air interface—Part 1: Corrosion initiation mechanism. Corros. Sci. 2023, 213, 110985. [Google Scholar] [CrossRef]
Recio-Hernández, J.A.; Landa, A.E.; Miguel, G.F.S.; Mejía-Sánchez, E.; Huerta, F.L.; Cruz, R.O.; Martínez, R.G. Electrochemical noise study of the passivation of AISI 1018 carbon steel as reinforcements embedded in ternary concretes during the setting process. ECS Trans. 2023, 110, 159. [Google Scholar] [CrossRef]
Recio-Hernández, J.A.; Landa, A.E.; Miguel, G.F.S.; Mejía-Sánchez, E.; Huerta, F.L.; Cruz, R.O.; Martínez, R.G. Electrochemical noise of SCC inhibition of a supermartensitic stainless steel in sour solution. ECS Trans. 2023, 110, 29. [Google Scholar] [CrossRef]
Homborg, A.; Mol, A.; Tinga, T. Corrosion classification through deep learning of electrochemical noise time-frequency transient information. Eng. Appl. Artif. Intell. 2024, 133, 108044. [Google Scholar] [CrossRef]
Rubio, M.; Sanchez, D.; Gazdzicki, P.; Friedrich, K.; Urquia, A. Failure mode diagnosis in proton exchange membrane fuel cells using local electrochemical noise. J. Power Sources 2022, 541, 231582. [Google Scholar] [CrossRef]
Guo, F.; Li, W.; Jiang, P.; Chen, F.; Liu, Y. Deep learning approach for damage classification based on acoustic emission data in composite materials. Materials 2022, 15, 4270. [Google Scholar] [CrossRef]
Liu, P.; Xu, C.; Xie, J.; Fu, M.; Chen, Y.; Liu, Z.; Zhang, Z. A CNN-based transfer learning method for leakage detection of pipeline under multiple working conditions with AE signals. Process Saf. Environ. Prot. 2023, 170, 1161–1172. [Google Scholar] [CrossRef]
Qiu, F.; Shen, Z.; Bai, Y.; Shan, G.; Qu, D.; Chen, W. Hydrogen defect acoustic emission recognition by deep learning neural network. Int. J. Hydrog. Energy 2024, 54, 878–893. [Google Scholar] [CrossRef]
Zhang, Z.; Zhao, Z.; Bai, P.; Li, X.; Liu, B.; Tan, J.; Wu, X. In-situ monitoring of pitting corrosion of AZ31 magnesium alloy by combining electrochemical noise and acoustic emission techniques. J. Alloys Compd. 2021, 878, 160334. [Google Scholar] [CrossRef]
Kawasaki, Y.; Fukui, S.; Fukuyama, T. Phenomenological process of rebar corrosion in reinforced concrete evaluated by acoustic emission and electrochemical noise. Constr. Build. Mater. 2022, 352, 128829. [Google Scholar] [CrossRef]
Kovac, J.; Alaux, C.; Marrow, T.J.; Govekar, E.; Legat, A. Correlations of electrochemical noise, acoustic emission and complementary monitoring techniques during intergranular stress-corrosion cracking of austenitic stainless steel. Corros. Sci. 2010, 52, 2015–2025. [Google Scholar] [CrossRef]
Calabrese, L.; Bonaccorsi, L.; Galeano, M.; Proverbio, E.; Di Pietro, D.; Cappuccini, F. Identification of damage evolution during SCC on 17-4 PH stainless steel by combining electrochemical noise and acoustic emission techniques. Corros. Sci. 2015, 98, 573–584. [Google Scholar] [CrossRef]
Kietov, V.; Mandel, M.; Krüger, L. Combination of electrochemical noise and acoustic emission for analysis of the pitting corrosion behavior of an austenitic stainless cast steel. Adv. Eng. Mater. 2019, 21, 1800682. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.-T.; Parekh, Z.; Pham, H.; Le, Q.V.; Sung, Y.-H.; Li, Z.; Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
Yang, Y.; Pan, L.; Liu, L. Event camera data pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023. [Google Scholar]
Van Den Oord, A.; Vinyals, O. Neural discrete representation learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Girdhar, R.; El-Nouby, A.; Liu, Z.; Singh, M.; Alwala, K.V.; Joulin, A.; Misra, I. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Zhou, K.; Tong, Y.; Li, X.; Wei, X.; Huang, H.; Song, K.; Chen, X. Exploring global attention mechanism on fault detection and diagnosis for complex engineering processes. Process Saf. Environ. Prot. 2023, 170, 660–669. [Google Scholar] [CrossRef]
Huang, H.; Wang, R.; Zhou, K.; Ning, L.; Song, K. CausalViT: Domain generalization for chemical engineering process fault detection and diagnosis. Process Saf. Environ. Prot. 2023, 176, 155–165. [Google Scholar] [CrossRef]
Zhou, K.; Wang, R.; Tong, Y.; Wei, X.; Song, K.; Chen, X. Domain generalization of chemical process fault diagnosis by maximizing domain feature distribution alignment. Process Saf. Environ. Prot. 2024, 185, 817–830. [Google Scholar] [CrossRef]
Bi, X.; Wu, D.; Xie, D.; Ye, H.; Zhao, J. Large-scale chemical process causal discovery from big data with transformer-based deep learning. Process Saf. Environ. Prot. 2023, 173, 163–177. [Google Scholar] [CrossRef]
van den Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
Chen, X.; Xie, S.; He, K. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Hou, Y.; Aldrich, C.; Lepkova, K.; Machuca, L.; Kinsella, B. Analysis of electrochemical noise data by use of recurrence quantification analysis and machine learning methods. Electrochim. Acta 2017, 256, 337–347. [Google Scholar] [CrossRef]
Calabrese, L.; Galeano, M.; Proverbio, E.; Di Pietro, D.; Donato, A. Topological neural network of combined AE and EN signals for assessment of SCC damage. Nondestruct. Test. Eval. 2019, 35, 98–119. [Google Scholar] [CrossRef]
Jiang, S.; Zavala, V.M. Convolutional neural nets in chemical engineering: Foundations, computations, and applications. AIChE J. 2021, 67, e17282. [Google Scholar] [CrossRef]
Pham, T.D. Time-frequency time-space LSTM for robust classification of physiological signals. Sci. Rep. 2021, 11, 6936. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Explanation of AE and EN synergistic learning; (a) is an example of AE signals with significant corrosion features; (b) is the comparison between the synergistic learning model using AE and EN signals as well as existing models that use only EN signals; (c) is the overview of the proposed synergistic learning-based corrosion monitoring method.

Figure 2. The structure of CorroNet.

Figure 3. Corrosion experimental setup; (a) is a schematic and (b) is the physical setup.

Figure 4. The confusion matrixes of different models; (a–e) are the confusion matrices of the prediction results obtained by CNN, Corro-CNN, LSTM, Corro-LSTM, and CorroNet, respectively.

Table 1. Composition of electrochemical noise dataset.

	Training Set		Testing Set
	Original Count	Count After Sliding Window	Original Count	Count After Sliding Window
C1	86	8686	22	2222
C2	86	8686	22	2222
C3	86	8686	22	2222

Table 2. Results for corrosion monitoring. (The bolded values in the table indicate the optimal values).

	Ave. Acc.	Macro-F1	C1		C2		C3
	Ave. Acc.	Macro-F1	Acc.	F1	Acc.	F1	Acc.	F1
CNN	0.9310	0.9307	0.8573	0.9221	0.9356	0.9037	1.0000	0.9663
Corro-CNN	0.9548	0.9547	0.8816	0.9351	0.9847	0.9362	0.9982	0.9928
LSTM	0.9389	0.9385	0.8807	0.9275	0.9361	0.9271	1.0000	0.9609
Corro-LSTM	0.9602	0.9601	0.9761	0.9445	0.9046	0.9466	1.0000	0.9893
CorroNet	0.9701	0.9701	0.9203	0.9549	0.9928	0.9600	0.9973	0.9953

Table 3. Results for ablation experiments. (The bolded values in the table indicate the optimal values).

	Ave. Acc.	Macro-F1	C1		C2		C3
	Ave. Acc.	Macro-F1	Acc.	F1	Acc.	F1	Acc.	F1
w/o FAL	0.9440	0.9441	0.8794	0.9320	0.9667	0.9225	0.9860	0.9777
w/o FAL	0.9554	0.9552	0.8722	0.9315	0.9977	0.9398	0.9964	0.9944
w/o SCL	0.9470	0.9465	0.8479	0.9152	0.9946	0.9315	0.9986	0.9928
w/o AE pre-training	0.9586	0.9584	0.8870	0.9346	0.9887	0.9429	1.0000	0.9978
w/o AE	0.9373	0.9371	0.8636	0.9268	0.9482	0.9103	1.0000	0.9741
w/o Pre-train	0.9295	0.9287	0.8177	0.8872	0.9743	0.9104	0.9964	0.9884
w/o EN	0.8334	0.8311	0.7083	0.7907	0.9167	0.8627	0.8750	0.8400
CorroNet	0.9701	0.9701	0.9203	0.9549	0.9928	0.9600	0.9973	0.9953

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Shan, G.; Qiu, F.; Zhu, L.; Wang, K.; Meng, X.; Li, R.; Song, K.; Chen, X. Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals. Processes 2024, 12, 2935. https://doi.org/10.3390/pr12122935

AMA Style

Wang R, Shan G, Qiu F, Zhu L, Wang K, Meng X, Li R, Song K, Chen X. Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals. Processes. 2024; 12(12):2935. https://doi.org/10.3390/pr12122935

Chicago/Turabian Style

Wang, Rui, Guangbin Shan, Feng Qiu, Linqi Zhu, Kang Wang, Xianglong Meng, Ruiqin Li, Kai Song, and Xu Chen. 2024. "Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals" Processes 12, no. 12: 2935. https://doi.org/10.3390/pr12122935

APA Style

Wang, R., Shan, G., Qiu, F., Zhu, L., Wang, K., Meng, X., Li, R., Song, K., & Chen, X. (2024). Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals. Processes, 12(12), 2935. https://doi.org/10.3390/pr12122935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Corrosion State Monitoring Based on Multi-Granularity Synergistic Learning of Acoustic Emission and Electrochemical Noise Signals

Abstract

1. Introduction

2. Methods

2.1. The Overview of the CorroNet

2.2. The Transformer Encoder

2.3. The AE-EN Alignment Framework

2.4. Model Training and Optimization

2.5. Evaluation Metrics

2.6. Experimental Setup and Data Description

3. Results and Discussion

3.1. Comparison of Corrosion Monitoring Performance

3.2. Ablation Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. The Transformer Encoder

Appendix A.2. Details of Data Collection

Appendix A.3. Detailed Results for Different Data Split Ratios

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI