^{*}

^{†}

Previous address: Materials Performance and Non-Destructive Evaluation (NDT), Department of Metallurgy and Materials, Katholieke Universiteit Leuven, Kasteelpark Arenberg 44, B-3001 Heverlee, Belgium.

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The damage caused by corrosion in chemical process installations can lead to unexpected plant shutdowns and the leakage of potentially toxic chemicals into the environment. When subjected to corrosion, structural changes in the material occur, leading to energy releases as acoustic waves. This acoustic activity can in turn be used for corrosion monitoring, and even for predicting the type of corrosion. Here we apply wavelet packet decomposition to extract features from acoustic emission signals. We then use the extracted wavelet packet coefficients for distinguishing between the most important types of corrosion processes in the chemical process industry: uniform corrosion, pitting and stress corrosion cracking. The local discriminant basis selection algorithm can be considered as a standard for the selection of the most discriminative wavelet coefficients. However, it does not take the statistical dependencies between wavelet coefficients into account. We show that, when these dependencies are ignored, a lower accuracy is obtained in predicting the corrosion type. We compare several mutual information filters to take these dependencies into account in order to arrive at a more accurate prediction.

A large part—25 to 40%—of the costs related to corrosion can be saved by the use of appropriate corrosion monitoring and control systems [

Regular practice in the chemical process industry consists of periodic inspections of the plant, e.g., every 3 months, every 6 months or every year [

The most frequent corrosion processes in the chemical process industry are: uniform corrosion (or general corrosion), pitting and stress corrosion cracking (SCC) [

There are at least two important reasons why researchers and industrial experts should be able to distinguish between different types of corrosion. Firstly, pitting and SCC are more harmful types of corrosion compared to uniform corrosion. Uniform corrosion reduces the thickness of the material relatively uniformly, hence taking a long time before holes are formed in the material. On the other hand, pitting causes pits and SCC causes cracks which can grow much faster, puncturing the material. This may lead to unexpected leaks in chemical plants. Therefore, occurrence of pitting and SCC Acoustic Emission (AE) events should advance the inspection of the installation.

Secondly, the discrimination between different corrosion processes should be performed prior to the quantitative analysis of correlating acoustic emission activity to the corrosion rate. In Seah

Although future successes in corrosion prevention will still depend on selecting and developing more corrosion resistant materials, it is expected that the main progress in corrosion prevention will be achieved with better information-processing strategies and the development of more efficient monitoring tools that support corrosion control programs [

Features to characterize the acoustic emission activity have often been obtained in the time-amplitude domain [

A challenge that arises after the extraction of wavelet coefficients with a Wavelet Packet Transform is the selection of a basis that is optimal in some sense, or the selection of a few coefficients for signal compression or pattern recognition purposes [

In the research reported in this article, we contribute to the selection of the most informative basis functions, from a library of wavelet packets, to distinguish between different types of corrosion, using information-theoretic criteria. We use the mutual information [

This section describes the experimental set-up to obtain the acoustic emission signals. A U-shaped steel sample is shown in

The probe is designed such that the corrosion process occurring in the probe is representative for that in the plant [

The damage that occurs on the probe is captured by means of piezoelectric sensors attached to the corroding probe. In order to guarantee a good acoustical transfer from the probe to the sensor, a ‘high vacuum’ grease (DOW Corning^{®}) is applied between the sensor and the probe. The sensors used here are broadband sensors (B1025, Digital Wave Corporation) [

Two types of steel that belong to the most often used construction materials in the chemical process industry [

The different mechanisms that lead to the emission of acoustic events have been treated extensively in [

The basic approach for constructing features is to compute a number of general statistical parameters from time series such as the median, the mean, the standard deviation and higher-order moments. However, when restricting oneself to a limited number of parameters in advance, important information may be lost due to the implicit assumptions behind these parameters, e.g., the mean and standard deviation are only sufficient to characterize signals that consist of independent and identically distributed (i.i.d.) Gaussian noise.

A more thorough approach is to extract the wavelet coefficients from a wavelet packet decomposition (WPD) [

The reader acquainted with wavelet packet decompositions may skip this section, which introduces the background to feature extraction from wavelet packet decompositions. This background is needed in order to understand the feature selection procedures in Sections 3.2 and 4. We will use the terminology of template and basis function interchangeably. Strictly speaking, a template is a more general terminology, because it does not need to be part of a basis.

We represent a single time series by means of a sequence of observations x(t): x(0), x(1), ... x(N-1), where ‘t’ refers to the time index and ‘N’ is the number of samples. The time series x(t) can be considered as being sampled from an ‘N’ dimensional distribution defined over an N dimensional variable X(t): X(0), X(1), ... X(N-1), we write this ‘N’ dimensional variable in short hand notation as _{0:N−1}.

Features are computed from a wavelet packet decomposition by computing the inner product between the templates and the time series (using a continuous representation, for the ease of notation):

A feature, in this case a wavelet coefficient, in the wavelet packet decomposition needs to be specified by the scale index ‘i’, frequency index ‘j’ and time index ‘k’. The coefficient γ_{i,j,k} can be considered as quantifying the similarity, by means of the inner product, between time series x(t) and wavelet function _{i}^{j}^{i}k^{i}k^{j}^{i}:

It is the parameter ‘j’ that determines the shape of the template. If we choose the 12-tap Coiflet filter [^{0}(^{1}(^{2}(^{15}(

In _{1}^{1}, W_{2}^{1}, W_{3}^{1}, W_{4}^{1} and W_{4}^{0} these subspaces are shaded in grey.

The first four subspaces are spanned by
_{4}^{0}

Retaining any binary tree in ^{2}(

Such a tree is called an admissible tree. If the leaves of this tree are denoted by {_{l}, j_{l}_{1 ≤} _{l}_{≤} _{L}

This means that the space _{0}^{0}_{il}^{jl}

It should be noted that a full wavelet packet decomposition yields too many features. In cases where one can assume that the exact time location ‘k’ of the template is of no importance, one can, e.g., consider an average or the energy of wavelet coefficients over time for each possible combination of the scale index ‘i’ and the frequency index ‘j’. This will lead to fewer features to be selected from. Here, we will consider the full complexity of the problem, when the exact time location of the template can be of importance, and consider all coefficients from a full wavelet packet decomposition as selectable.

A full wavelet packet decomposition leads to N × (log_{2}N + 1) features. This can be seen as follows. From ^{i}. Therefore the frequency index ‘j’ at a certain scale ‘i’ will be an integer from [0, 2^{i} − 1], indicating the starting position of the subspace at scale ‘i’.

As can be seen from ^{i}k_{0,0,0}, ... γ_{0,0,N−1}. At the next scale, ‘i’ = 1, we obtain N/2 coefficients in each subspace _{1,0,0}, ... γ_{1,0,N/2−1} and γ_{1,1,0}, ... γ_{1,1,N/2−1}.

At the highest frequency resolution ‘i’ = log_{2}N, and we obtain coefficients: γ_{log} _{N}_{,0,0}, ... γ_{log} _{N}_{,}_{N}_{−1,0}. Hence, at each scale, there are ‘N’ coefficients, and in total there are log_{2}N+1 different scale levels. This leads overall to N × (log_{2}N + 1) different coefficients to select from. The variables that can be associated with the coefficients _{i,j,k}_{i,j,k}

In this section, we consider the selection of the most discriminative basis functions _{i}^{j}^{i}k_{j}_{j}_{j}_{j}

Step 0: Expand each training signal into a time-frequency dictionary D: this involves the computation of all coefficients _{i,j,k}

Step 1: Estimate the class conditional probability density functions ^{y}_{i,j,k}_{i,j,k}

Step 2: For each wavelet coefficient variable, _{i,j,k}_{i,j,k}_{2}N). Many discriminant measures can be used in practice. We use the symmetric relative entropy, _{i,j,k}

Because this discriminant measure is not symmetric, a symmetric version is obtained as:

When more than two classes are considered, _{i,j,k}

Step 3: Evaluate the discriminant power of each basis

Hence, one searches for the indices (

Step 4: Select ‘_{i}^{j}^{i}k_{i,j,k}

Step 5: Construct classifiers using the ‘_{i,j,k}

In Step 3, the algorithm searches a basis _{(}_{i,j,k)}_{∈}_{B}_{i,j,k}

The additive property of the discriminant powers of coefficients in a basis leads to a very rapid search for the basis with the highest discriminant power. It is easily seen that an optimal basis can be found in O(

The feature selection procedures based on the mutual information are called filter approaches, due to the fact that the classifier used in the prediction is not involved in the selection of the features [

We perform a sequential forward search (SFS) over all wavelet coefficients using a mutual information criterion. In the SFS, we start with the empty feature set S = {Ø} as the selected coefficients so far and the whole dictionary D = {Γ_{i,j,k}}, with 0 ≤ i ≤ log_{2}N, 0 ≤ j ≤ 2^{i} – 1 and 0 ≤ k ≤ N/(2^{i}) – 1, as the available feature set. In each iteration of the SFS, the variable Γ’_{i,j,k}, which achieves the highest value of the mutual information criterion, taking into account the previously selected features, is selected. S is updated in each iteration as: S = S ∪ Γ’_{i,j,k} and the dictionary is updated as D = D\{Γ’_{i,j,k}}. Three different mutual information criteria were compared for the SFS filter: a density-based method (Section 4.1), a distance-based method (Section 4.2) and a relevance-redundancy method (Section 4.3).

The estimation of the mutual information by means of a Parzen window density estimator was proposed in [

The functional H(.) is the entropy [_{k} is the set of indices of data points which belong to class “k”, _{j} is the feature vector of the j’th training data point and #C is the total number of classes. The covariance matrix _{2}(n), where “n” is the sample size of the training set. This estimator is referred to as “MI Parzen”.

Instead of estimating the probability density functions, the mutual information between a discrete class variables and a feature vector

In _{c}” the number of training data points in class “c”, ɛ_{c}(i,k) is twice the distance from the i’th data point in class “c” to its k’th neighbor in class “c” in the training set, “d” the dimensionality of the data points and “c_{d}” the volume of the d-dimensional unit ball. We used the Euclidean distance between data points, in this case “c_{d}” = π^{d/2/}Γ(1 + d/2), with Γ(.) the gamma-function.

The unconditional entropy Ĥ(_{c}” replaced with the total number of training points “n” and ɛ_{c}(i,k) replaced by ɛ(i,k), _{c}/n. In the experiments, the number “k” of nearest neighbors was set equal to 6. This estimator is referred to as “MI knn”.

Relevance-redundancy approaches select features that are highly relevant with respect to the class variable, but penalize a feature if it is redundant with respect to previously selected features. These approaches often use mutual information to estimate both the relevance and the redundancy. Suppose that F_{i} is a candidate feature to be selected and that S is the set of already selected features; a relevance-redundancy criterion based on the normalized mutual information [_{i};C) and MI(F_{i};F_{s}) are required. Note that the normalization in _{i};F_{s}) through min{H(F_{i}), H(F_{s})}. The ratio
_{i};F_{s}) is always smaller or equal to the minimum of _{i}_{s}_{i};C) quantifies the relevance of feature F_{i} with respect to the target variable ‘C’, it will be large when F_{i} is highly relevant. The term
_{i} with the already selected features F_{s}∈S. When F_{i} and F_{s} are strongly dependent, or correlated in a more stricter sense,
_{i};C) will be penalized. This allows features that are less relevant, but have a very low redundancy with the already selected features, to be included.

In the computation of the normalized mutual information, the features were first discretized into 3 states [_{i} < μ(F_{i}) – (σ(F_{i}))/2 were set to state 0, μ(F_{i}) – (σ(F_{i}))/2 ≤ F_{i} ≤ μ(F_{i}) + (σ(F_{i}))/2 were set to state 1 and values of F_{i} > μ(F_{i}) + (σ(F_{i}))/2 were set to state 2. Note that μ(F_{i}) and σ(F_{i}) are, respectively, the mean and standard deviation of F_{i}. The mutual information was then computed from the contingency tables of the discretized features,

We tested four different popular classifiers to predict the different corrosion types:

k-nearest neighbor (knn): the Euclidean distance is used with “k” set to 3, see Section 4.5.4 in [

decision tree J48 (WEKA’s implementation of C4.5) from WEKA package 3.4.1 [

Gaussian Mixture Model (GMM): the number of Gaussians per class is taken equal to 1 in the experiments and hence each class is modeled as a multivariate Gaussian distribution (see, e.g., McLachlan and Peel [

naïve Bayes classifier (NB) from WEKA package 3.4.1 [

In the validation of the different algorithms, we performed a 10-fold cross-validation [

We stopped feature selection after 50 features have been selected, as can be observed from

Note the slower increase in accuracy for the LDB algorithm compared to the mutual information approaches that can be observed in _{0}^{0} as the most discriminative basis. Although the coefficients in this subspace provide discriminative information between SCC (largest values), pitting (intermediate values) and uniform corrosion + absence of corrosion (these two classes have the smallest values), the LDB algorithm was misled by the high dependencies that are present in subspace _{0}^{0}. Indeed, in the scatter plot of

Comparison of

The classification accuracies do not reveal the structure of the errors made in the identification of the corrosion types. Therefore, we computed the confusion matrix. We concentrate on the highest accuracy we could achieve: this is obtained in

The columns in the confusion matrix shown in

Finally, we note that the approach presented in this paper is generally applicable to acoustic events originating from different steel types. However, the resistance of steel towards a particular type of corrosion is influenced largely by its alloyed elements: chromium, manganese, molybdenum, nickel and nitrogen [

We have used the acoustic emission technique, a non-destructive testing technique, to identify different types of corrosion that occur most often in the chemical process industry. As stated in the introduction, one of the main progresses in corrosion prevention can be achieved with better information-processing strategies and the development of more efficient monitoring tools that support corrosion control programs [

The authors are grateful to N. Saito, University of California, Davis, USA, for providing the local discriminant basis selection algorithm and to M. Winkelmans for providing data from corrosion experiments. We are also grateful to M. Wevers for offering the opportunity to work on the problem of corrosion identification using the acoustic emission technique. GVD is supported by the CREA Financing (CREA/07/027) program of the K.U.Leuven. MMVH is supported by research grants received from the Excellence Financing program (EF 2005), the Belgian Fund for Scientific Research—Flanders (G.0588.09), the Interuniversity Attraction Poles Programme—Belgian Science Policy (IUAP P6/054), the Flemish Regional Ministry of Education (Belgium) (GOA 10/019), and the European Commission (IST-2007-217077). This work used the HPC (high-performance computing infrastructure) of the K.U.Leuven.

Processing stages for making predictions of the corrosion type. A steel probe (2) is inserted in a bypass (1) of the chemical process plant and is therefore exposed to the same environmental conditions as the installation. Acoustic events are captured by means of a broadband sensor (3). Subsequently AE signals are amplified and filtered (4). In order to obtain a fair validation of the system, the acquired signals are split into a training (5) and testing set (6). Features are extracted from the training signals by means of a Wavelet Packet Decomposition (7). A classifier (8) is trained based on the selected wavelet coefficients of the training set. Testing signals are projected onto the selected basis functions. Subsequently, the wavelet coefficients of the testing signals are used to test the overall performance of the system.

Example signals of different corrosion types. The example of the absence of corrosion in _{2} 40 weight% at 85 °C environment. The example of the absence of corrosion in _{3}PO_{4} 10 weight% at environment temperature. The signals in _{3} 1 weight% at 45 °C environment. In _{2} 40 weight% at 85 °C environment; _{3})_{2} 60 weight% at 105 °C environment.

Templates (wavelet packets) corresponding to the 12-tap Coiflet filter.

Library of wavelet packet functions. Different subspaces are represented by W_{i}^{j}. Index ‘i’ is the scale index, index ‘j’ is the frequency index. The depth ‘I’ of this tree is equal to 4. Every subtree within this tree, where each node has either 0 or 2 children, is called an admissible tree. Two admissible trees are emphasized, one shaded in grey and one marked with diagonals.

Evolution of the accuracy of the k-nearest neighbor classifier (k = 3) as a function of the number of wavelet coefficients selected with the LDB algorithm and the mutual information filter algorithms. The horizontal line indicates the accuracy when all 1,024 samples are used (no FSS).

Evolution of the accuracy of the decision tree J48 classifier as a function of the number of wavelet coefficients selected with the LDB algorithm and the mutual information filter algorithms. The horizontal line indicates the accuracy when all 1,024 samples are used.

Evolution of the accuracy of the Gaussian mixture model as a function of the number of wavelet coefficients selected with the LDB algorithm and the mutual information filter algorithms. The horizontal line indicates the accuracy when the 1,024 samples were sub-sampled with a factor 15 to avoid numerical problems in the estimation of the parameters of the model. This subsampling was performed by taking the first time sample and then every 15th sample.

Evolution of the accuracy of naïve Bayes classifier as a function of the number of wavelet coefficients selected with the LDB algorithm and the mutual information filter algorithms. The horizontal line indicates the accuracy when all 1,024 samples are used.

Scatter plots of the first 3 coefficients that were selected most often by the local discriminant basis algorithm (LDB) as a triplet in the 10 training sets of the 10 fold cross-validation. These are the coefficients γ_{0,0,77}, γ_{0,0,78} and γ_{0,0,79} in subspace _{0}^{0}. These scatter plots illustrate that the first three selected coefficients are highly redundant.

The steel types, the corrosive medium and the number of different experiments considered. The data was obtained from [

Absence of corrosion | 1.0038 | NaOH 20 weight% + NaCl 3 weight% 80 °C | 1 (99) | 4 (197) |

1.4541 | CaCl_{2} 40 weight% 85 °C |
3 (98) | ||

Uniform corrosion | 1.0038 | H_{3}PO_{4} 10 weight% T_{environment} |
6 (194) | 6 (194) |

Pitting | 1.4541 | brackish water + FeCl_{3} 1 weight% 45 °C |
9 (214) | 9 (214) |

Stress corrosion cracking | 1.0038 | Ca(NO_{3})_{2} 60 weight% 105 °C |
1 (147) | 10 (205) |

1.4541 | CaCl_{2} 40 weight% 85 °C |
9 (58) |

Confusion matrix for the naïve Bayes classifier using 27 wavelet coefficients. The numbers are obtained using all 10 test folds from the 10 fold cross-validation.

35 | 8 | 6 | ||

52 | 0 | 0 | ||

1 | 0 | 1 | ||

0 | 0 | 7 |