Health Indicators Construction and Remaining Useful Life Estimation for Concrete Structures Using Deep Neural Networks

: Remaining useful life (RUL) prognosis is one of the most important techniques in concrete structure health management. This technique evaluates the concrete structure strength through determining the advent of failure, which is very helpful to reduce maintenance costs and extend structure life. Degradation information with the capability of reﬂecting structure health can be considered as a principal factor to achieve better prognosis performance. In traditional data-driven RUL prognosis, there are drawbacks in which features are manually extracted and threshold is deﬁned to mark the specimen ’ s breakdown. To overcome these limitations, this paper presents an innovative SAE-DNN structure capable of automatic health indicator (HI) construction from raw signals. HI curves constructed by SAE-DNN have much better ﬁtness metrics than HI curves constructed from statistical parameters such as RMS, Kurtosis, Sknewness, etc. In the next stage, HI curves constructed from training degradation data are then used to train a long short-term memory recurrent neural network (LSTM-RNN). The LSTM-RNN is utilized as a RUL predictor since its special gates allow it to learn long-term dependencies even when the training data is limited. Model construction, veriﬁcation, and comparison are performed on experimental reinforced concrete (RC) beam data. Experimental results indicates that LSTM-RNN generally estimates more accurate RULs of concrete beams than GRU-RNN and simple RNN with the average prediction error cycles was less than half compared to those of the simple RNN.


Introduction
The last decades have witnessed the explosive increase in the construction industry to meet the unceasing demand for civilian, industrial, and defense purposes.Regardless of the purpose, one of the most significant criteria for structural materials is durability.Even though concrete is one of the most popular materials for construction, brittleness can make it vulnerable to deterioration through cracking progression.The cracking phenomenon is crucial to structure safety and economics, and it has therefore been frequently studied [1][2][3][4].Besides fault diagnosis and classification (FDC) algorithms that have been widely used to detect and isolate incipient faults in many applications [5][6][7].Fault prognosis methods have been increasingly receiving more attention in the recent time [8,9].With prognosis, imminent failures can be reported, and maintenance can be performed accordingly.
An essential metric for prognostic methods is the remaining useful life (RUL), which is also called lead time or prognostic distance.It is defined as the duration from the current time when a specimen is being inspected until its expected useful life expires [9,10].The prediction of RUL can be performed by two main approaches: data-driven and modelbased.Model-based methods represent damage progression by analytical models [11], wherein the researchers need to capture the nature of both the failure and the system.In contrast, estimating RUL with available data in data-driven methods does not need the exact failure mechanisms for data-driven modeling.Therefore, they have been in favor in recent studies [12].
The data-driven approach to prognosis can be divided into three stages: data acquisition, construction of health indicators (HIs), and then prognostics.To monitor and highlight the fracture occurrences, acoustic emission (AE) is commonly used for data acquisition, being able to detect both external and internal damage on the structure [13][14][15].Therefore, AE is utilized here to study the deterioration of a concrete structure during a loading process until its final failure.The second stage, which constructs HIs, can strongly affect the accuracy of the RUL [9].Most data-driven methods form HIs from statistical parameters, among which kurtosis and root mean square (RMS) are used most often.Williams et al. [16] proposed the direct use of these two parameters for bearing vibration signal quantification, while Antoni [17] suggested spectral kurtosis in bearing health monitoring, which is considered one of the most intriguing and effective HI construction methods.In this approach, the kurtosis can be calculated by the fourth-order central moment over the squared second-order central moment.In addition to these two parameters, the smoothness index [18] is also important.This parameter is the result of the geometric mean over the arithmetic mean of the modulus of the wavelet coefficients.
In recent years, synthesized HIs have attracted increasing attention [19].They are often formed by data fusion techniques that combine multi-dimensional features such as RMS, kurtosis, and variance into one-dimensional HIs [9,10,20].Even though synthesized HIs have achieved spectacular performance and results, there are still disadvantages to be overcome.The first major problem is the uneven contribution of inputs to HI construction, which is the consequence of different features having different ranges.The second major problem is the difficulty in determining the failure threshold, which depends on the specific specimen at hand.
Deep learning techniques have recently been successfully applied in many areas, including natural language processing, computer vision, and robotics [21,22].They have displayed a promising competence to extract features automatically during the requisite training period.End-to-end classification models with direct mapping to the output classes from raw input can be constructed without manual feature extraction to establish an intermediate feature space.In order to solve the problems of uneven inputs and difficulty in determining the failure threshold described earlier, this paper proposes the development of a deep neural network (DNN) for HIs formation from raw input.In this approach, a stacked autoencoder (SAE) is first utilized for pretraining to prevent the DNN from being trapped in local optimum by random initialization.The DNN's layers are pretrained successively in a layer-wise, unsupervised learning strategy.After a regression layer is added, the whole network is fine-tuned in a supervised manner with the labeled data.AE hits are detected in each signal segment with the constant false-alarm rate (CFAR) algorithm [4]; the number of AE hits can represent the damage growth in a concrete structure [4], and these values are also used as input data labels.
In this study, a recurrent neural network (RNN) is used for RUL prediction.In comparison to a feedforward neural network, the data 'context' and data clustering for long-term prediction in an RNN are presented by competitive learning rules in input preprocessing [22].To cope with conventional RNNs' poor long-term dependencies learning, long short-term memory (LSTM) is implemented in this research.LSTM is able to remember information for long periods of time via the introduced gates [22,23].Redundant information is disposed of by the forget gate; the key information is stored in the internal state after being chosen by the input gate; and the output information is then determined by the output gate.This architecture allows long-term storage, updating the key information efficaciously, and avoiding gradient vanishing.
LSTM RNN has been widely and successfully applied for machine health monitoring [24], gas concentration prediction [25], marine temperature prediction [26], and wind power short-term prediction [27].In this study, concrete degradation data with hundreds of segmented signals over the deterioration progression in concrete beams is considered long-term time-series data.With the availability of offline training data, an LSTM-RNN can be constructed to perform long-term dependencies learning on the concrete degradation progress.Then, with just an amount of online data, precise RLU prognosis on the specimen can be achieved with the trained LSTM-RNN.
This paper presents the following contributions: The constant false-alarm rate (CFAR) algorithm is used to capture the AE hits in every degradation cycle; the number of AE hits in each cycle is considered the label to train the DNN.

3.
The LSTM-RNN is investigated to learn the long-term dependencies of HI curves constructed in an offline process and is then used to predict the RUL of a concrete structure in an online process.
This paper is arranged as follows: Section 2 introduces the experimental setup with data descriptions and illustrations.Section 3 describes the SAE-DNN-based HI constructor thoroughly, along with the LSTM-RNN-oriented RUL prognosis.Section 4 gives the experimental results and analysis.Finally, Section 5 presents our conclusions.

Experimental Specimens and Data Acquisition System
All of the reinforced concrete (RC) beam specimens are identical in terms of specifications: 240 × 15 × 30 cm in length × width × height; compressive strength 24 MPa; and five reinforcing bars of 16-mm diameter (Figure 1).The maximum tensile strength for each specimen is over 455 MPa.To test our proposed approach, a run-to-failure dataset is constructed from four-point bending tests on the RC beams (Figure 2).Each four-point bending test is performed as follows: an RC beam is loaded equally with two concentrated loads.Supports are placed 2000 mm apart, center-to-center.The displacement rate applied by the actuator through the I-section of an 80-cm steel beam can be either 1 mm/s or 2 mm/s.This load is continuously applied on each RC beam until the beam is completely crushed.The vertical displacement is measured by a linear variable differential transformer (LVDT) located at the middle of the bottom surface (Figure 2).The data acquisition system is an eight-channel PCI-based AE system (PCI-8).This device is capable of simultaneous data acquisition, waveform processing, and data transfer up to 132 Mb/s in the 1-400 kHz range.Its channels are connected to eight low-frequency AE R3I-AST sensors mounted on the specimen's surface.These sensors have outstanding sensitivity and require no extra preamplifier on the long cable drive.In order to ensure that the inherent characteristics of the concrete are omitted, the sampling rate of 5 MHz is set.LSTM RNN has been widely and successfully applied for machine health monitoring [24], gas concentration prediction [25], marine temperature prediction [26], and wind power short-term prediction [27].In this study, concrete degradation data with hundreds of segmented signals over the deterioration progression in concrete beams is considered long-term time-series data.With the availability of offline training data, an LSTM-RNN can be constructed to perform long-term dependencies learning on the concrete degradation progress.Then, with just an amount of online data, precise RLU prognosis on the specimen can be achieved with the trained LSTM-RNN.
This paper presents the following contributions: 1.A health indicator (HI) constructor based on a SAE-DNN is developed.No manually extracted features are used in the construction of the SAE-DNN-based HI constructor.The SAE-DNN automatically extracts representative features during the training process.2. The constant false-alarm rate (CFAR) algorithm is used to capture the AE hits in every degradation cycle; the number of AE hits in each cycle is considered the label to train the DNN. 3. The LSTM-RNN is investigated to learn the long-term dependencies of HI curves constructed in an offline process and is then used to predict the RUL of a concrete structure in an online process.
This paper is arranged as follows: Section 2 introduces the experimental setup with data descriptions and illustrations.Section 3 describes the SAE-DNN-based HI constructor thoroughly, along with the LSTM-RNN-oriented RUL prognosis.Section 4 gives the experimental results and analysis.Finally, Section 5 presents our conclusions.

Experimental Specimens and Data Acquisition System
All of the reinforced concrete (RC) beam specimens are identical in terms of specifications: 240 × 15 × 30 cm in length × width × height; compressive strength 24 MPa; and five reinforcing bars of 16-mm diameter (Figure 1).The maximum tensile strength for each specimen is over 455 MPa.To test our proposed approach, a run-to-failure dataset is constructed from four-point bending tests on the RC beams (Figure 2).Each four-point bending test is performed as follows: an RC beam is loaded equally with two concentrated loads.Supports are placed 2000 mm apart, center-to-center.The displacement rate applied by the actuator through the I-section of an 80-cm steel beam can be either 1 mm/s or 2 mm/s.This load is continuously applied on each RC beam until the beam is completely crushed.The vertical displacement is measured by a linear variable differential transformer (LVDT) located at the middle of the bottom surface (Figure 2).The data acquisition system is an eight-channel PCI-based AE system (PCI-8).This device is capable of simultaneous data acquisition, waveform processing, and data transfer up to 132 Mb/s in the 1-400 kHz range.Its channels are connected to eight low-frequency AE R3I-AST sensors mounted on the specimen's surface.These sensors have outstanding sensitivity and require no extra preamplifier on the long cable drive.In order to ensure that the inherent characteristics of the concrete are omitted, the sampling rate of 5 MHz is set.

Separation of Destructive Processes
A conventional RUL prediction is based on the specimen's deterioration progress.The progression of damage occurs as follows: the specimen shows high and stable performance during a long starting phase, which then begins to deteriorate gradually during the second period.In the final stage, the degradation intensifies and, as a consequence, the specimen's performance plummets abruptly and drastically until its final failure.
In comparison, fracture monitoring separates crack growth into four stages, as shown in Figure 3 The degradation process measured by sensor 1 of specimen 1 is shown in Figure 3 in terms of load and number of AE hits in 1-second signal cycles.

SAE-DNN-Based HI Constructor
The multiple hidden layers concept has been available since the early years of deep learning.This approach was initially disappointing because its performance was even worse than shallow networks, the result of the limitations of conventional back-propagation due to poor training, which often utilized random initialization and got stuck in unoptimized local solutions.This difficulty was overcome in 2006 with unsupervised layerwise pretraining, which was proposed by Hinton et al. [28] to deal with the existing limitations of DNN optimization.Currently, more sophisticated and abstract features with hierarchical structures can be learned because pretraining offers layer-by-layer, high-level feature extraction from lower-level ones.

Separation of Destructive Processes
A conventional RUL prediction is based on the specimen's deterioration progress.The progression of damage occurs as follows: the specimen shows high and stable performance during a long starting phase, which then begins to deteriorate gradually during the second period.In the final stage, the degradation intensifies and, as a consequence, the specimen's performance plummets abruptly and drastically until its final failure.
In comparison, fracture monitoring separates crack growth into four stages, as shown in Figure 3: The degradation process measured by sensor 1 of specimen 1 is shown in Figure 3 in terms of load and number of AE hits in 1-second signal cycles.

Separation of Destructive Processes
A conventional RUL prediction is based on the specimen's deterioration prog The progression of damage occurs as follows: the specimen shows high and stable pe mance during a long starting phase, which then begins to deteriorate gradually d the second period.In the final stage, the degradation intensifies and, as a consequ the specimen's performance plummets abruptly and drastically until its final failure In comparison, fracture monitoring separates crack growth into four stages, as sh in The degradation process measured by sensor 1 of specimen 1 is shown in Figur terms of load and number of AE hits in 1-second signal cycles.

SAE-DNN-Based HI Constructor
The multiple hidden layers concept has been available since the early years of learning.This approach was initially disappointing because its performance was worse than shallow networks, the result of the limitations of conventional back-prop tion due to poor training, which often utilized random initialization and got stuck i optimized local solutions.This difficulty was overcome in 2006 with unsupervised l wise pretraining, which was proposed by Hinton et al. [28] to deal with the existing tations of DNN optimization.Currently, more sophisticated and abstract features hierarchical structures can be learned because pretraining offers layer-by-layer, high feature extraction from lower-level ones.

SAE-DNN-Based HI Constructor
The multiple hidden layers concept has been available since the early years of deep learning.This approach was initially disappointing because its performance was even worse than shallow networks, the result of the limitations of conventional back-propagation due to poor training, which often utilized random initialization and got stuck in unoptimized local solutions.This difficulty was overcome in 2006 with unsupervised layer-wise pretraining, which was proposed by Hinton et al. [28] to deal with the existing limitations of DNN optimization.Currently, more sophisticated and abstract features with hierarchical structures can be learned because pretraining offers layer-by-layer, high-level feature extraction from lower-level ones.
An excellent choice for an efficacious layer-wise unsupervised learning algorithm would be a stacked autoencoder (SAE) [29].This structure includes multi-layer AEs, each of which is a single-hidden-layer unsupervised neural network, and its input/output layers are set identically.The original input is intended to be reconstructed by the output layer with high accuracy.A simplified AE structure is illustrated in Figure 4.The input of SAE, which is supposedly x = [x 1 , x 2 , . . ., x n ] T ∈ R n with dimension n, is projected by the encoder to the hidden layer h = [h 1 , h 2 , . . ., h m ] T ∈ R m by the following mapping function f : in which m stands for the dimension of the hidden variable vector; W is the m × n weight matrix; b ∈ R m is the bias vector; and the nonlinear activation function s f can be either the sigmoid function, the tanh function, or the rectified linear unit function.
Appl.Sci.2021, 11, x FOR PEER REVIEW 5 of 17 An excellent choice for an efficacious layer-wise unsupervised learning algorithm would be a stacked autoencoder (SAE) [29].This structure includes multi-layer AEs, each of which is a single-hidden-layer unsupervised neural network, and its input/output layers are set identically.The original input is intended to be reconstructed by the output layer with high accuracy.A simplified AE structure is illustrated in ( ) ( ) in which m stands for the dimension of the hidden variable vector; W is the mn   ( ) ( ) with W as an nm  weight matrix; n bR  as the bias vector term for the output layer; and the activation function f s as either the sigmoid function or others.Overall, { , , , }  is the function to be learned.As aforementioned, an AE aims to get the reconstructed output x as close to the initial input x as possible.This is done by forcing constraints on the network, for example, a hidden unit number limitation.Assume that the training input is , where the total number of training samples is denoted as N .In- itially, each x is projected to a hidden representation () i h and then mapped to the reconstructed data x .By the following mean squared reconstructed error calculation, the reconstructed loss function can be minimized: The hidden representation h is then mapped to the output layer x ∈ R n by the mapping function f in the decoder: with W as an n × m weight matrix; b ∈ R n as the bias vector term for the output layer; and the activation function s f as either the sigmoid function or others.Overall, θ = W, W, b, b is the AE parameter set and g θ (x) = f ( f (x)) ≈ x is the function to be learned.As aforementioned, an AE aims to get the reconstructed output x as close to the initial input x as possible.This is done by forcing constraints on the network, for example, a hidden unit number limitation.Assume that the training input is X = x (1) , x (2) , . . ., x (N) , where the total number of training samples is denoted as N. Initially, each x (i) is projected to a hidden representation h (i) and then mapped to the reconstructed data x (i) .By the following mean squared reconstructed error calculation, the reconstructed loss function can be minimized: The gradient descent (GD) is used for AE parameter updating.After the completion of training, the weight and bias are then saved for this AE.The multi-AE-layered structure of an SAE is presented in Figure 5.The layer-by-layer training of an SAE is done as follows: Initially, the original input data is mapped in the first hidden feature layer by the first AE; the output of the first AE is then considered the input for the second AE, and the process continues.By the end, every layer in the SAE is pretrained.To prevent the AE from just learning the identity of the input and make the learnt features more robust, noise can be added to the input data for training.The AE is forced to reconstruct a corrupted version of the input.This method is called a stacked denoising autoencoder (SDA) and it is chosen for our study.
Following the unsupervised pretraining, the output layer is added to the top of the SAE and the weights and biases are fine-tuned.Each hidden layer's weights are initialized by the pretrained weights {W k , b k } k=1,2,...,K .The output layer's parameters {W 0 , b 0 } can be set by random initialization.Afterwards, back-propagation is utilized to achieve improved weights W k , b k k=1,2,...,K by fine-tuning the entire network.The predicted error is minimized as follows: with y j and ŷj being the j th data sample's label and predicted output, respectively.The SAE training procedure is displayed in Figure 5.
The gradient descent (GD) is used for AE parameter updating.After the completion of training, the weight and bias are then saved for this AE.The multi-AE-layered structure of an SAE is presented in Figure 5.The layer-by-layer training of an SAE is done as follows: Initially, the original input data is mapped in the first hidden feature layer by the first AE; the output of the first AE is then considered the input for the second AE, and the process continues.By the end, every layer in the SAE is pretrained.To prevent the AE from just learning the identity of the input and make the learnt features more robust, noise can be added to the input data for training.The AE is forced to reconstruct a corrupted version of the input.This method is called a stacked denoising autoencoder (SDA) and it is chosen for our study.
Following the unsupervised pretraining, the output layer is added to the top of the SAE and the weights and biases are fine-tuned.Each hidden layer's weights are initialized by the pretrained weights with j y and ˆj y being the th j data sample's label and predicted output, respectively.
The SAE training procedure is displayed in Figure 5.

Impulse Detection Using the Constant False Alarm Rate (CFAR) Algorithm
In this study, impulse detection is performed with the CFAR algorithm.CFAR is a popular data-based target-detecting technique that excels in environments where background noise and interference are varying, especially for radar systems.A threshold based on power is determined so that when a signal segment exceeds it, this segment can be considered a "hit" on the target.
Most CFAR schemes utilize a power threshold calculated from the noise floor of the cell blocks surrounding the cell under test (CUT).Several cells adjacent to the CUT are

Impulse Detection Using the Constant False Alarm Rate (CFAR) Algorithm
In this study, impulse detection is performed with the CFAR algorithm.CFAR is a popular data-based target-detecting technique that excels in environments where background noise and interference are varying, especially for radar systems.A threshold based on power is determined so that when a signal segment exceeds it, this segment can be considered a "hit" on the target.
Most CFAR schemes utilize a power threshold calculated from the noise floor of the cell blocks surrounding the cell under test (CUT).Several cells adjacent to the CUT are neglected to protect the training cells from signal leaking, which can negatively influence the noise estimation.Figure 6 describes a simplified CFAR scheme.In this study, the cell average CFAR with acknowledged stability and robustness [30] is chosen for implementation.The threshold for detection T is first computed as follows: ii with  as the threshold factor and i P being the estimated noise power, as calculated: where N and G are the number of training cells and guard cells, respectively.Generally, the number of leading and lagging cells are set as equal.The threshold factor  is harnessed to control the number of detected targets in a directly proportional relationship: with fa P being the desired false alarm rate.This parameter should be chosen with caution, since it is a trade-off between the number of detected targets and the number of detected false targets.
This study deals with signals in one-second segments, each of which is then divided into 2000 cells for analysis.Both leading and lagging blocks contain 30 cells, in which 20 are used for training and the rest are guard cells.Equations ( 5)-( 7) are used to calculate the adaptive threshold of each CUT.A target is detected when the power of the CUT exceeds its threshold.The concept and results of this algorithm are shown in Figure 7.In this study, the cell average CFAR with acknowledged stability and robustness [30] is chosen for implementation.The threshold for detection T is first computed as follows: with α as the threshold factor and P i being the estimated noise power, as calculated: where N and G are the number of training cells and guard cells, respectively.Generally, the number of leading and lagging cells are set as equal.The threshold factor α is harnessed to control the number of detected targets in a directly proportional relationship: with P f a being the desired false alarm rate.This parameter should be chosen with caution, since it is a trade-off between the number of detected targets and the number of detected false targets.This study deals with signals in one-second segments, each of which is then divided into 2000 cells for analysis.Both leading and lagging blocks contain 30 cells, in which 20 are used for training and the rest are guard cells.Equations ( 5)-( 7) are used to calculate the adaptive threshold of each CUT.A target is detected when the power of the CUT exceeds its threshold.The concept and results of this algorithm are shown in Figure 7.

LSTM-RNN-Oriented RLU Prediction
LSTM has been adopted as a popular solution for temporal sequence and long-range dependency modeling, having been applied in numerous studies on language modeling, speech recognition, and online handwriting recognition, among others.The reason for the preference of LSTM over RNN is its ability to resolve the vanishing/exploding gradients

LSTM-RNN-Oriented RLU Prediction
LSTM has been adopted as a popular solution for temporal sequence and long-range dependency modeling, having been applied in numerous studies on language modeling, speech recognition, and online handwriting recognition, among others.The reason for the preference of LSTM over RNN is its ability to resolve the vanishing/exploding gradients inherent in RNN training.Figure 8 presents LTSM in its basic structure, with the number of input layers, recurrent layers, and output layers all being 1.The innovation of LSTM is that it introduces different types of gates, such as the input gate i (t) , forget gate f (t) , output gate o (t) , and input modulation gate g (t) , and other components like hidden units h (t) and memory cells c (t) .

LSTM-RNN-Oriented RLU Prediction
LSTM has been adopted as a popular solution for temporal sequence and long-range dependency modeling, having been applied in numerous studies on language modeling, speech recognition, and online handwriting recognition, among others.The reason for the preference of LSTM over RNN is its ability to resolve the vanishing/exploding gradients inherent in RNN training.Figure 8 presents LTSM in its basic structure, with the number of input layers, recurrent layers, and output layers all being 1.The innovation of LSTM is that it introduces different types of gates, such as the input gate () t i , forget gate () t f , out- put gate () t o , and input modulation gate () t g , and other components like hidden units () t h and memory cells () t c .Information disposing is determined and performed by the forget gate () t f .This gate utilizes a logistic activation function for inputs () t x and ( 1) t h  , of which the output Information disposing is determined and performed by the forget gate f (t) .This gate utilizes a logistic activation function for inputs x (t) and h (t−1) , of which the output is then provided to an element-wise multiplication operation.The gate is closed if the output is 0, and open if it is 1.The forget gate's calculation is: Afterwards, new information is evaluated to decide whether it can be stored in the internal state.Initially, the "input modulation gate" g (t) , which acts as a tanh layer, forms a candidate state vector c (t) .Then, the input gate i (t) determines which parts of g (t) are to be supplemented to the long-term state c (t) .The two outputs are computed as follows: Deriving from Equations ( 8)- (10), the previous internal state c (t−1) can be used to achieve the current state c (t) : Eventually, the long-term state is then evaluated by the output gate o (t) to determine which parts of it can be read and output at this time to both h (t) and y (t) .After putting the internal state c (t) through a tanh layer (to push the values to be between −1 and 1), it is multiplied by the output of the sigmoid gate to acquire the remaining state values.This is calculated as follows: Appl.Sci.2021, 11, 4113 9 of 16 with W and b as the layer weights and biases, respectively.

Experimental Validation 4.1. Dataset Description
Figure 9 displays the four-point bending test from which the dataset used here was collected.Our proposed method is validated on AE data acquired by AE sensors at the sampling frequency of 5 MHz.The duration of each degradation cycle is set at 1 s.For each four-point bending test, a total of eight sensors are positioned on the RC beam to collect data in the run-to-failure process.Twenty-four run-to-failure process signals are acquired during three four-point bending tests to validate the proposed approach.Details about the dataset are listed in Table 1.The data is divided into two equal parts for training (signals from sensors 1-4) and test sets (signals from sensors 5-8) formation.Figure 10 shows the degradation process of three test run-to-failure sensor signals (i.e., sensor 5), one from each of the concrete beams, in terms of RMS, kurtosis values, and AE hits.Twenty-four run-to-failure process signals are acquired during three four-point bending tests to validate the proposed approach.Details about the dataset are listed in Table 1.The data is divided into two equal parts for training (signals from sensors 1-4) and test sets (signals from sensors 5-8) formation.Figure 10 shows the degradation process of three test run-to-failure sensor signals (i.e., sensor 5), one from each of the concrete beams, in terms of RMS, kurtosis values, and AE hits.

The Efficacy of an SAE-DNN-Based HI Constructor
This stage starts with SAE-DNN-based HI constructor training.Initially, fast Fourier transform (FFT) is utilized to determine the signal segment's spectrum.Since each segment is 5 × 10 6 in length, after FFT there are 2.5 × 10 6 data points which are too big to feed into the SAE.Therefore, the number of inputs is reduced by splitting the spectrum of the AE signals into a suitable amount of frequency bands and then computing their root mean square (RMS), which represents an approximation of each band's energy.
Following the data preprocessing, the SAE model is constructed and trained.Signal spectrum vectors of size 2000 are fed to the encoder, which are then processed by three size-diminishing dense layers (1000 to 200 to 10 units with Xavier initialization and the ELU activation function).The encoder's output (size of 10) is then fed to the decoder and processed by three size-increasing dense layers (200 to 1000 to 2000 units).Dropout layers with rate of 0.1 are added before the dense layers to improve the SAE's regularization.Afterwards, Adam optimization is utilized for SAE training with unlabeled signal spectra as both the inputs and targets.Different levels (from 0.1-0.5) of the fractions of masked zero are tested, which shows the best performance at 0.1.
Then the encoder's layers are reused in the DNN model as the hidden layers.After a logistic regression layer addition, this DNN model is fine-tuned in a supervised way.The DNN output layer's size is 1 and its according label is the normalized number (in the range of [0, 1]) of AE hits detected in each degradation cycle.The DNN model is designed so that the outputs, which are the HI values, are in the range of [0, 1], therefore the sigmoid activation is chosen as the activation function of the output layer.During the training process, the reused layers are frozen so that they retain the learning ability of high-level features from the low-level input features of the SAE model.In addition, the early stopping and checkpoint techniques are applied during training SAE and DNN to get the best parameters of SAE and DNN structures.
Following the completion of SAE-DNN-based HI constructor's training, the run-tofailure data is harnessed for the concrete beam's HI construction.Half of the signals are utilized here for the evaluation of the HI constructor's performance.Figure 11 shows the

The Efficacy of an SAE-DNN-Based HI Constructor
This stage starts with SAE-DNN-based HI constructor training.Initially, fast Fourier transform (FFT) is utilized to determine the signal segment's spectrum.Since each segment is 5 × 10 6 in length, after FFT there are 2.5 × 10 6 data points which are too big to feed into the SAE.Therefore, the number of inputs is reduced by splitting the spectrum of the AE signals into a suitable amount of frequency bands and then computing their root mean square (RMS), which represents an approximation of each band's energy.
Following the data preprocessing, the SAE model is constructed and trained.Signal spectrum vectors of size 2000 are fed to the encoder, which are then processed by three size-diminishing dense layers (1000 to 200 to 10 units with Xavier initialization and the ELU activation function).The encoder's output (size of 10) is then fed to the decoder and processed by three size-increasing dense layers (200 to 1000 to 2000 units).Dropout layers with rate of 0.1 are added before the dense layers to improve the SAE's regularization.Afterwards, Adam optimization is utilized for SAE training with unlabeled signal spectra as both the inputs and targets.Different levels (from 0.1-0.5) of the fractions of masked zero are tested, which shows the best performance at 0.1.
Then the encoder's layers are reused in the DNN model as the hidden layers.After a logistic regression layer addition, this DNN model is fine-tuned in a supervised way.The DNN output layer's size is 1 and its according label is the normalized number (in the range of [0, 1]) of AE hits detected in each degradation cycle.The DNN model is designed so that the outputs, which are the HI values, are in the range of [0, 1], therefore the sigmoid activation is chosen as the activation function of the output layer.During the training process, the reused layers are frozen so that they retain the learning ability of high-level features from the low-level input features of the SAE model.In addition, the early stopping and checkpoint techniques are applied during training SAE and DNN to get the best parameters of SAE and DNN structures.
Following the completion of SAE-DNN-based HI constructor's training, the run-tofailure data is harnessed for the concrete beam's HI construction.Half of the signals are utilized here for the evaluation of the HI constructor's performance.Figure 11 shows the HIs of the three tests from sensor 5.In comparison to HIs constructed from conventional features like RMS and kurtosis with larger scales, the proposed method's HIs ranges between 0 and 1, with 1 being the failure condition.Consequently, the threshold definition has no need for such HI employment.An HI exceeding a fixed FT triggers the obtain of RUL.HIs of the three tests from sensor 5.In comparison to HIs constructed from conventional features like RMS and kurtosis with larger scales, the proposed method's HIs ranges between 0 and 1, with 1 being the failure condition.Consequently, the threshold definition has no need for such HI employment.An HI exceeding a fixed FT triggers the obtain of RUL.Two other metrics, monotonicity and trendability, [24] are also used in this study for HI fitness validation.Monotonicity is the characterization of the underlying increasing or decreasing trend: with n being the number of observations for a specific feature.Monotonicity M can be calculated with the absolute difference between a feature's "positive" and "negative" derivatives.It ranges from 0-1, with 0 emphasizing a non-monotonic feature and 1 showing a highly monotonic feature.
The second metric, trendability, is related to an extracted feature's functional form and correlation with time.In another words, it shows how an asset's state varies with time.For example, a constant function presents zero correlation with time, while a high correlation can be found with a linear function.Similarly, non-linearity also causes a variation in correlation.This metric is computed as follows: with R being the correlation coefficient between x and y , which are the feature and time index, respectively, in this study.The state of correlation can be either no correlation, negative, positive, or perfect.Thus, R ranges from −1 to 1. Figure 12 shows a conceptual demonstration of the curve fits.With the aim to highlight the improvement of the proposed method's HIs compared to HIs based on RMS or kurtosis, a fitness analysis is done on the testing dataset.In this Two other metrics, monotonicity and trendability, [24] are also used in this study for HI fitness validation.Monotonicity is the characterization of the underlying increasing or decreasing trend: with n being the number of observations for a specific feature.Monotonicity M can be calculated with the absolute difference between a feature's "positive" and "negative" derivatives.It ranges from 0-1, with 0 emphasizing a non-monotonic feature and 1 showing a highly monotonic feature.
The second metric, trendability, is related to an extracted feature's functional form and correlation with time.In another words, it shows how an asset's state varies with time.For example, a constant function presents zero correlation with time, while a high correlation can be found with a linear function.Similarly, non-linearity also causes a variation in correlation.This metric is computed as follows: with R being the correlation coefficient between x and y, which are the feature and time index, respectively, in this study.The state of correlation can be either no correlation, negative, positive, or perfect.Thus, R ranges from −1 to 1. Figure 12 shows a conceptual demonstration of the curve fits.
Appl.Sci.2021, 11, x FOR PEER REVIEW 12 of 17 HIs of the three tests from sensor 5.In comparison to HIs constructed from conventional features like RMS and kurtosis with larger scales, the proposed method's HIs ranges between 0 and 1, with 1 being the failure condition.Consequently, the threshold definition has no need for such HI employment.An HI exceeding a fixed FT triggers the obtain of RUL.Two other metrics, monotonicity and trendability, [24] are also used in this study for HI fitness validation.Monotonicity is the characterization of the underlying increasing or decreasing trend: with n being the number of observations for a specific feature.Monotonicity M can be calculated with the absolute difference between a feature's "positive" and "negative" derivatives.It ranges from 0-1, with 0 emphasizing a non-monotonic feature and 1 showing a highly monotonic feature.
The second metric, trendability, is related to an extracted feature's functional form and correlation with time.In another words, it shows how an asset's state varies with time.For example, a constant function presents zero correlation with time, while a high correlation can be found with a linear function.Similarly, non-linearity also causes a variation in correlation.This metric is computed as follows: with R being the correlation coefficient between x and y , which are the feature and time index, respectively, in this study.The state of correlation can be either no correlation, negative, positive, or perfect.Thus, R ranges from −1 to 1. Figure 12 shows a conceptual demonstration of the curve fits.With the aim to highlight the improvement of the proposed method's HIs compared to HIs based on RMS or kurtosis, a fitness analysis is done on the testing dataset.In this With the aim to highlight the improvement of the proposed method's HIs compared to HIs based on RMS or kurtosis, a fitness analysis is done on the testing dataset.In this analysis, two aforementioned metrics of each type of HI are measured.Table 2 shows the summarized results, in which the proposed method presents a significant boost of performance in comparison with the two HIs based on RMS and kurtosis.This implies that our method is more suitable in terms of describing the concrete beam degradation process.The LSTM-RNN-based RLU predictor is then trained with previously constructed HI curves, which are segmented into a series of time-steps.The data can be considered a univariate time series due to it being a sequence of one value per time step.The RUL predictor utilizes past HI values to predict the future ones until a predefined threshold is met.In order to maximize the number of time series to train the model, a shifting window of size 50 is used in the segmentation process.This procedure is shown in Figure 13.
Appl.Sci.2021, 11, x FOR PEER REVIEW 13 of 17 analysis, two aforementioned metrics of each type of HI are measured.Table 2 shows the summarized results, in which the proposed method presents a significant boost of performance in comparison with the two HIs based on RMS and kurtosis.This implies that our method is more suitable in terms of describing the concrete beam degradation process.

The Efficacy of the LSTM-RNN-Oriented RLU Prediction
The LSTM-RNN-based RLU predictor is then trained with previously constructed HI curves, which are segmented into a series of time-steps.The data can be considered a univariate time series due to it being a sequence of one value per time step.The RUL predictor utilizes past HI values to predict the future ones until a predefined threshold is met.In order to maximize the number of time series to train the model, a shifting window of size 50 is used in the segmentation process.This procedure is shown in Figure 13.At this step the LSTM predictor is already capable of forecasting the next time-step value.However, with an alteration in the procedure, it can also predict more than just one future state.The next prediction can be added to the inputs (acting as if this predicted value had occurred) to further predict the following ones until the end.Our model is trained to forecast at every time-step instead of just the final time-step.By following this technique, the loss can contain a term for every time-step output rather than just the output at the last time step.This allows more stabilization and faster training as more error gradients are able to flow through the model [22].
One input layer of the same size as the number of time-steps is utilized for the construction of the applied LSTM-RNN.Following this layer are two hidden LSTM layers of size 20.A linear activation function is utilized in the dense output layer of one neuron.Early stopping and checkpoint techniques are also used during training to construct a better LSTM-RNN model.At this step the LSTM predictor is already capable of forecasting the next time-step value.However, with an alteration in the procedure, it can predict more than just one future state.The next prediction can be added to the inputs (acting as if this predicted value had occurred) to further predict the following ones until the end.Our model is trained to forecast at every time-step instead of just the final time-step.By following this technique, the loss can contain a term for every time-step output rather than just the output at the last time step.This allows more stabilization and faster training as more error gradients are able to flow through the model [22].
One input layer of the same size as the number of time-steps is utilized for the construction of the applied LSTM-RNN.Following this layer are two hidden LSTM layers of size 20.A linear activation function is utilized in the dense output layer of one neuron.Early stopping and checkpoint techniques are also used during training to construct a better LSTM-RNN model.
The developed method is implemented with degradation data collected from three concrete beams A, B, and C. Generally, RUL estimations toward the end of a degradation process are more important than earlier ones because this is usually when maintenance decisions are made.Therefore, two major time stamps at cycle 350 (minor crack initialization) and cycle 450 (major crack initialization) are chosen for the average prediction error calculation.The definition of the RUL prediction error is: with e being the RUL prediction error, and LC i , LC i are the actual cycles and the estimated cycles, respectively.In this study, the DNN model is designed so that the outputs, which are the HI values, are in the range of [0, 1].Therefore, the sigmoid activation is chosen as the activation function of the output layer.In this case, the fixed HI threshold should be set to 1.However, experimental results have shown that the output of DNN rarely reaches the value of 1.Hence, this study has set the fixed HI threshold of 0.95 to mark the specimen's breakdown.
Figure 14 shows specimen A's RUL prediction error with the prediction calculated at cycle 350, and the prediction error calculated at cycle 450.The details of statistics concerning specimen A's RUL prediction error can be found in Table 3 along with the other two specimens; a minor prediction error with a small standard deviation is preferred.As can be seen in Table 3, the proposed method's prediction error at cycle 350 is 32 cycles, which is lower than the error of 36 and 81 cycles predicted by the gated recurrent unit (GRU) RNN [31] and the simple RNN, respectively.This indicates that our method is more effective at capturing long-term dependencies than those other two approaches.The prediction error at cycle 450 is 21, 32, and 61 cycles, respectively.It is clear that, in both cases, the simple RNN presents the largest error.This is ample evidence showing its inability to effectively store and learn long-term dependencies without special gates.The proposed method's results on specimen B are shown in Figure 15.The prediction error of our proposed LSTM-RNN, the GRU-RNN, and the simple RNN at cycle 350 were 41, 43, and 95 cycles, respectively.At cycle 450, they were 34, 37, and 89 cycles.The proposed method's results on specimen B are shown in Figure 15.The prediction error of our proposed LSTM-RNN, the GRU-RNN, and the simple RNN at cycle 350 were 41, 43, and 95 cycles, respectively.At cycle 450, they were 34, 37, and 89 cycles.The proposed method's results on specimen B are shown in Figure 15.The prediction error of our proposed LSTM-RNN, the GRU-RNN, and the simple RNN at cycle 350 were 41, 43, and 95 cycles, respectively.At cycle 450, they were 34, 37, and 89 cycles.In Figure 16, specimen C's prediction results from the proposed method are plotted; its values can be again checked in Table 3.The first prediction at cycle 350 shows the prediction error for our proposed method and the GRU-RNN as 36 and 39 cycles, respectively; the prediction error at cycle 450 are 24 cycles with LSTM-RNN and 32 cycles with GRU-RNN.Concerning the simple RNN, the 88 and 68 divergent cycles in the two predictions clearly demonstrate its failure of long-term dependencies learning.

Conclusions
Reliable HI curves construction and long-term dependencies learning of degradation data are important but challenging tasks for an accurate remaining useful life (RUL) estimation of concrete structures.In this study, we proposed an SAE-DNN model that automatically constructs HI curves from degradation raw signals.The HI curves constructed have better fitness metrics than statistical parameters-based HI curves.More specifically, these HI curves have the average monotonicity and trendability metrics of 0.67 and 0.68, respectively, which much higher than those of HI curves based on statistical parameters In Figure 16, specimen C's prediction results from the proposed method are plotted; its values can be again checked in Table 3.The first prediction at cycle 350 shows the prediction error for our proposed method and the GRU-RNN as 36 and 39 cycles, respectively; the prediction error at cycle 450 are 24 cycles with LSTM-RNN and 32 cycles with GRU-RNN.Concerning the simple RNN, the 88 and 68 divergent cycles in the two predictions clearly demonstrate its failure of long-term dependencies learning.The proposed method's results on specimen B are shown in Figure 15.The prediction error of our proposed LSTM-RNN, the GRU-RNN, and the simple RNN at cycle 350 were 41, 43, and 95 cycles, respectively.At cycle 450, they were 34, 37, and 89 cycles.In Figure 16, specimen C's prediction results from the proposed method are plotted; its values can be again checked in Table 3.The first prediction at cycle 350 shows the prediction error for our proposed method and the GRU-RNN as 36 and 39 cycles, respectively; the prediction error at cycle 450 are 24 cycles with LSTM-RNN and 32 cycles with GRU-RNN.Concerning the simple RNN, the 88 and 68 divergent cycles in the two predictions clearly demonstrate its failure of long-term dependencies learning.

Conclusions
Reliable HI curves construction and long-term dependencies learning of degradation data are important but challenging tasks for an accurate remaining useful life (RUL) estimation of concrete structures.In this study, we proposed an SAE-DNN model that automatically constructs HI curves from degradation raw signals.The HI curves constructed have better fitness metrics than statistical parameters-based HI curves.More specifically, these HI curves have the average monotonicity and trendability metrics of 0.67 and 0.68, respectively, which much higher than those of HI curves based on statistical parameters

Conclusions
Reliable HI curves construction and long-term dependencies learning of degradation data are important but challenging tasks for an accurate remaining useful life (RUL) estimation of concrete structures.In this study, we proposed an SAE-DNN model that automatically constructs HI curves from degradation raw signals.The HI curves constructed have better fitness metrics than statistical parameters-based HI curves.More specifically, these HI curves have the average monotonicity and trendability metrics of 0.67 and 0.68, respectively, which much higher than those of HI curves based on statistical parameters such as RMS, Kurtosis, or Sknewness, etc.Moreover, the curves' HI values are in the range of a [0, 1], therefore, threshold definition is no need for such HI employment.
The HI curves constructed from training degradation data are then fed to train the LSTM-RNN for RUL prediction.The study validates the prediction performance of the LSTM-RNN by estimating RUL and calculating the average prediction error on testing experimental concrete beams at two times; at cycle 350 (minor crack initialization) and at cycle 450 (major crack initialization).Experimental results on concrete beams A, B, and C indicate that the LSTM-RNN generally estimates more accurate RULs of concrete beams than the GRU-RNN and the simple RNN.The average prediction error cycles of the LSTM-RNN on concrete beams A, B, C at cycle 350 are 32, 41, and 36, respectively; at cycle 450 are 21, 34, 24, respectively.These error values are lower than those of the GRU-RNN and much lower than those of the simple RNN.In other words, the proposed method outperformed a GRU-RNN and a simple RNN in predicting the RUL of concrete structures.
Overfitting is an important issue that needs to be handled carefully during training deep neural networks, especially when the data training is limited as in this study.It makes the outcome of deep neural networks low and unstable.There are currently many

1 .
A health indicator (HI) constructor based on a SAE-DNN is developed.No manually extracted features are used in the construction of the SAE-DNN-based HI constructor.The SAE-DNN automatically extracts representative features during the training process.2.

Figure 1 .
Figure 1.Schematic illustration of longitudinal section and cross section of the RC beam.Figure 1.Schematic illustration of longitudinal section and cross section of the RC beam.

Figure 1 .
Figure 1.Schematic illustration of longitudinal section and cross section of the RC beam.Figure 1.Schematic illustration of longitudinal section and cross section of the RC beam.

Figure 2 .
Figure 2. Schematic illustration of the four-point bending test for the RC beam.
: Stage 1: The RC specimen deteriorates from its normal condition to a damaged state.Micro-cracks start at the end of this stage.Stage 2: Hairline cracks appear on the surface, which soon develop into macro-cracks.Stage 3: Main cracks form.Distributed flexure appears along with shear cracks, which soon lead to steel yielding.Stage 4: The steel yielding intensifies and shear cracks ultimately culminate in concrete crushing.

Figure 3 .
Figure 3. Damage progression in an RC beam during a bending test.

Figure 2 .
Figure 2. Schematic illustration of the four-point bending test for the RC beam.

Stage 1 :
The RC specimen deteriorates from its normal condition to a damaged state.Micro-cracks start at the end of this stage.Stage 2: Hairline cracks appear on the surface, which soon develop into macro-cracks.Stage 3: Main cracks form.Distributed flexure appears along with shear cracks, which soon lead to steel yielding.Stage 4: The steel yielding intensifies and shear cracks ultimately culminate in concrete crushing.

Figure 2 .
Figure 2. Schematic illustration of the four-point bending test for the RC beam.

Figure 3 :
Stage 1: The RC specimen deteriorates from its normal condition to a damaged state cro-cracks start at the end of this stage.Stage 2: Hairline cracks appear on the surface, which soon develop into macro-crack Stage 3: Main cracks form.Distributed flexure appears along with shear cracks, w soon lead to steel yielding.Stage 4: The steel yielding intensifies and shear cracks ultimately culminate in con crushing.

Figure 3 .
Figure 3. Damage progression in an RC beam during a bending test.

Figure 3 .
Figure 3. Damage progression in an RC beam during a bending test.

Figure 4 . 12 [
The input of SAE, which is supposedly vector; and the nonlinear activation function f s can be either the sigmoid function, the tanh function, or the rectified linear unit function.

Figure 5 .
Figure 5. Training procedure of an SAE.

Figure 5 .
Figure 5. Training procedure of an SAE.
Appl.Sci.2021, 11, x FOR PEER REVIEW 7 of 17 neglected to protect the training cells from signal leaking, which can negatively influence the noise estimation.Figure6describes a simplified CFAR scheme.

Figure 6 .
Figure 6.Diagram of a CFAR detection scheme.

Figure 6 .
Figure 6.Diagram of a CFAR detection scheme.

17 Figure 7 .
Figure 7.The application of CFAR detection for impulse detection.(a) Power of cells versus adaptive threshold, (b) AE impulse automatically picked during 1-second length.

Figure 7 .
Figure 7.The application of CFAR detection for impulse detection.(a) Power of cells versus adaptive threshold, (b) AE impulse automatically picked during 1-second length.

Figure 7 .
Figure 7.The application of CFAR detection for impulse detection.(a) Power of cells versus adaptive threshold, (b) AE impulse automatically picked during 1-second length.

Figure 9 .
Figure 9. Pictogram of the four-point bending test.

Figure 10 .
Figure 10.Illustration of the degradation process of concrete beams using RMS values, kurtosis values and AE hits.

Figure 10 .
Figure 10.Illustration of the degradation process of concrete beams using RMS values, kurtosis values and AE hits.

Figure 11 .
Figure 11.HIs of test datasets constructed by the proposed model for concrete beam A, B, and C.

Figure 11 .
Figure 11.HIs of test datasets constructed by the proposed model for concrete beam A, B, and C.

Figure 11 .
Figure 11.HIs of test datasets constructed by the proposed model for concrete beam A, B, and C.

Figure 13 .
Figure 13.Time series construction from training degradation data.

Figure 13 .
Figure 13.Time series construction from training degradation data.

Table 1 .
Specifics of the experimental dataset.

Table 1 .
Specifics of the experimental dataset.

Table 2 .
Metric comparison of different types of HIs.

Table 2 .
Metric comparison of different types of HIs.