Deep Learning-Based Acoustic Emission Scheme for Nondestructive Localization of Cracks in Train Rails under a Load

This research proposes a nondestructive single-sensor acoustic emission (AE) scheme for the detection and localization of cracks in steel rail under loads. In the operation, AE signals were captured by the AE sensor and converted into digital signal data by AE data acquisition module. The digital data were denoised to remove ambient and wheel/rail contact noises, and the denoised data were processed and classified to localize cracks in the steel rail using a deep learning algorithmic model. The AE signals of pencil lead break at the head, web, and foot of steel rail were used to train and test the algorithmic model. In training and testing the algorithm, the AE signals were divided into two groupings (150 and 300 AE signals) and the classification accuracy compared. The deep learning-based AE scheme was also implemented onsite to detect cracks in the steel rail. The total accuracy (average F1 score) under the first and second groupings were 86.6% and 96.6%, and that of the onsite experiment was 77.33%. The novelty of this research lies in the use of a single AE sensor and AE signal-based deep learning algorithm to efficiently detect and localize cracks in the steel rail, unlike existing AE crack-localization technology that relies on two or more sensors and human interpretation.


Introduction
Rail transport plays an important role in transferring a large number of passengers and freight between destinations. The significance of the railway thus necessitates regular structural heath monitoring (SHM) to identify welding discontinuity defects in the rail track and maintain the health of steel rail. Scheduled SHM prevents the incidence of rail accidents and improves passenger comfort.
Nondestructive acoustic emission (AE) testing is a passive technique to detect elastic energy spontaneously released by material with developing cracks [1][2][3][4]. The AE technique is also deployed to characterize the crack behavior under a load. AE is a transient elastic wave in the frequency range of 95 kHz-300 kHz. The advantages of a passive AE technique over conventional active nondestructive testing (NDT) methods include online monitoring, high sensitivity, early and rapid defect detection [5][6][7][8].
The AE signal from cracks in the steel rail is caused by fatigue in wheel/rail contact. If improperly attended to, the rail crack induced by dynamic wheel/rail contact and interaction could result in disastrous failures [9][10][11][12][13]. The defects in steel rail are predominantly fatigue cracks [14][15][16].
The conventional AE technology (without machine learning) has been applied to inspection and monitoring of building or structural integrity, pressure vessel, pipeline, aircraft components, storage tanks, tanks on tank ship and in material research [17][18][19][20]. In structural integrity, AE testing was used to inspect and monitor emerging fatigue cracks Figure 1 illustrates the diagram of the proposed nondestructive single-sensor AE scheme for crack detection in steel rail. Under the proposed scheme, AE signals were captured by the AE sensor and converted into digital signal data by AE data acquisition module. The digital signal data were pre-processed to remove noises using a total variation denoising algorithm. The denoised digital signal data were processed and classified to localize cracks in the steel rail. In the figure, P1, P2, and P3 denote rail head, rail web, and rail foot, respectively.

Acoustic Emission Sensor
Acoustic emission (AE) is the transient elastic waves in solids that occur when the internal structure of a material undergoes deformation due to plastic deformation, crack propagation, erosion, corrosion impact, leakage, and fatigue. This research focused on the AE signals caused by fatigue cracks in the steel rail under a load.
The AE sensor was Vallen model VS150-RIC with an 80 kHz-500 kHz bandwidth and −40 °C to +85 °C operating temperatures. Given the natural frequency of cracks at 150 kHz, the AE sensor was used to capture crack signals (elastic waves) in the steel rail. The AE sensor was of piezoelectric type to convert elastic waves in the steel rail into electrical AE signal. To mitigate the triboelectric effect, the AE sensor of the proposed scheme was integrated electronics piezoelectric (IEPE) sensor, as opposed to the non-IEPE sensor which is more prone to the triboelectric effect. To further mitigate the triboelectric effect, the bandpass filter frequency range of the proposed deep learning-based AE sensor scheme was set at 95 kHz-300 kHz.

AE Data Acquisition Module
The AE signals captured by the AE sensor were converted into digital signal data by an AE data acquisition module (Vallen AMSY-6) with 16-bit analog-to-digital conversion (ADC). The sampling rate of ADC conversion was 20 MHz. A computer was used for storage of digital signal data.

Pre-processing of AE Signal Data
Total variation denoising (TVD) algorithm was used to remove noise in the AE signal of cracks in a steel rail. In practical use, TVD is effective in removing ambient noise and rolling noise. Figure 2a shows the AE signal of cracks in steel rail with ambient and wheel/rail noise, and Figure 2b illustrates the denoised AE signal using TVD.

Acoustic Emission Sensor
Acoustic emission (AE) is the transient elastic waves in solids that occur when the internal structure of a material undergoes deformation due to plastic deformation, crack propagation, erosion, corrosion impact, leakage, and fatigue. This research focused on the AE signals caused by fatigue cracks in the steel rail under a load.
The AE sensor was Vallen model VS150-RIC with an 80 kHz-500 kHz bandwidth and −40 • C to +85 • C operating temperatures. Given the natural frequency of cracks at 150 kHz, the AE sensor was used to capture crack signals (elastic waves) in the steel rail. The AE sensor was of piezoelectric type to convert elastic waves in the steel rail into electrical AE signal. To mitigate the triboelectric effect, the AE sensor of the proposed scheme was integrated electronics piezoelectric (IEPE) sensor, as opposed to the non-IEPE sensor which is more prone to the triboelectric effect. To further mitigate the triboelectric effect, the bandpass filter frequency range of the proposed deep learning-based AE sensor scheme was set at 95 kHz-300 kHz.

AE Data Acquisition Module
The AE signals captured by the AE sensor were converted into digital signal data by an AE data acquisition module (Vallen AMSY-6) with 16-bit analog-to-digital conversion (ADC). The sampling rate of ADC conversion was 20 MHz. A computer was used for storage of digital signal data.

Pre-processing of AE Signal Data
Total variation denoising (TVD) algorithm was used to remove noise in the AE signal of cracks in a steel rail. In practical use, TVD is effective in removing ambient noise and rolling noise. Figure 2a shows the AE signal of cracks in steel rail with ambient and wheel/rail noise, and Figure 2b illustrates the denoised AE signal using TVD.
The AE signal with ambient noise and rolling noise is mathematically expressed in Equation (1).
where y is noisy AE signal, x is unnoisy AE signal, and w is additive white Gaussian noise.  The AE signal with ambient noise and rolling noise is mathematically expressed in Equation (1).

= +
(1) where y is noisy AE signal, x is unnoisy AE signal, and w is additive white Gaussian noise.
Given that x is a piecewise constant signal with sparse representation in first-order derivative, the estimated AE signal ( ) can be solved by minimizing the cost function (i.e., the first term on the right-hand side; F(x)), as expressed in Equation (2).
where λ is the regularization parameter and D is the first-order difference operator, as expressed in Equation (3).
The estimated AE signal ( ) (Equation (2)) could also be efficiently solved by majorization-minimization (MM) algorithm [32,33]. Rather than direct minimization of the cost function (F(x)), MM algorithm solves a sequence of optimization problems (Gk(x), where k = 0, 1, 2,…), based on the premise that solving for Gk(x) is less demanding than solving for F(x). MM generates a sequence of xk, each derived by minimizing Gk−1(x); and MM algorithm iterates until arriving at minimum F(x), as shown in Figure 3.
In the MM algorithm, the functions Gk(x) should be convex functions and must be specified such that it closely approximates the cost function (F(x)). In addition, MM algorithm requires that individual Gk(x) function be a majorizer of F(x) (i.e., Gk(x) ≥ F(x), ∀ ) and that the Gk(x) function is tangent to F(x) function at xk (i.e., Gk(xk) = F(xk)). In this research, the parabolic functions Gk(x) were used to approximate the minimum F(x), which is the optimal denoised AE signal ( ) as shown in the Algorithm 1. Given that x is a piecewise constant signal with sparse representation in first-order derivative, the estimated AE signal (x) can be solved by minimizing the cost function (i.e., the first term on the right-hand side; F(x)), as expressed in Equation (2).
where λ is the regularization parameter and D is the first-order difference operator, as expressed in Equation (3).
The estimated AE signal (x) (Equation (2)) could also be efficiently solved by majorizationminimization (MM) algorithm [32,33]. Rather than direct minimization of the cost function (F(x)), MM algorithm solves a sequence of optimization problems (G k (x), where k = 0, 1, 2, . . . ), based on the premise that solving for G k (x) is less demanding than solving for F(x). MM generates a sequence of x k , each derived by minimizing G k−1 (x); and MM algorithm iterates until arriving at minimum F(x), as shown in Figure 3.
In the MM algorithm, the functions G k (x) should be convex functions and must be specified such that it closely approximates the cost function (F(x)). In addition, MM algorithm requires that individual G k (x) function be a majorizer of F(x) (i.e., G k (x) ≥ F(x), ∀x) and that the G k (x) function is tangent to F(x) function at x k (i.e., G k (x k ) = F(x k )). In this research, the parabolic functions G k (x) were used to approximate the minimum F(x), which is the optimal denoised AE signal (x) as shown in the Algorithm 1.
(2) Choose G k (x) such that G k (x) ≥ F(x) for all x and G k (x k ) = F(x k ) (3) Set x k+1 as the minimizer of G k (x), as shown in Equation (4).
(4) Set k = k + 1 and repeat step (2) and iterate until arriving at the minimum F(x).     Figure 4 illustrates the typical deep learning algorithm for classification, consisting of the input (feature) layer, hidden layers through to N hidden layer, and output (target) layer. In this research, the activation function between hidden layers was hyperbolic tangent function (tanh(z)) since the function is ideal for the transient nature of AE signal and is capable of suppressing noises and interferences. The softmax activation function (softmax(z)) was used for classification based on probability. The aim of deep learning algorithm for classification is to localize cracks in the rail head, rail web, and rail foot of steel rail.
(4) Set k = k+1 and repeat step (2) and iterate until arriving at the minimum F(x). Figure 4 illustrates the typical deep learning algorithm for classification, consisting of the input (feature) layer, hidden layers through to N hidden layer, and output (target) layer. In this research, the activation function between hidden layers was hyperbolic tangent function (tanh(z)) since the function is ideal for the transient nature of AE signal and is capable of suppressing noises and interferences. The softmax activation function (softmax(z)) was used for classification based on probability. The aim of deep learning algorithm for classification is to localize cracks in the rail head, rail web, and rail foot of steel rail. Prior to algorithm training and validation, input (feature) data were scaled using standardization to normalize, as shown in Equation (5).

Processing and Classification
where X is the feature (input data), Mean X is the mean value of feature, and SD is standard deviation.
In training the algorithmic model, the initial weight (W) and bias (B) (i.e., , , , , , , , ), were random. In feedforward, hyperbolic tangent function (tanh(z)) was the activation function between hidden layers, as shown in Equation (6) where tanh(z) = [-1,1]. The output layer used softmax(z) as the activation function for classification based on probability, as shown in Equation (7) and the linear combination, as shown in Equation (8).
Partial derivative of the hyperbolic tangent function (tanh(z)) was used in back propagation, as shown in Equation (9). Prior to algorithm training and validation, input (feature) data were scaled using standardization to normalize, as shown in Equation (5).
where x is the feature (input data), Mean x is the mean value of feature, and SD is standard deviation.
In training the algorithmic model, the initial weight (W) and bias (B) (i.e., W 1 , B 1, W 2 , B 2 , W 3 , B 3 , W 4 , B 4 ), were random. In feedforward, hyperbolic tangent function (tanh(z)) was the activation function between hidden layers, as shown in Equation (6) where tanh(z) = [−1,1]. The output layer used softmax(z) as the activation function for classification based on probability, as shown in Equation (7) and the linear combination, as shown in Equation (8).
Partial derivative of the hyperbolic tangent function (tanh(z)) was used in back propagation, as shown in Equation (9). The cross-entropy loss function for multiclass was used to fine-tune the optimization of the algorithmic model, as shown in Equation (10).
where Y n is true output value andŶ n is predicted output value.

Experimental Steel Rail
The experimental steel rail was of UIC 54 type, which is the steel rail commonly used in countries in the Southeast Asian region. The elemental composition of UIC 54 steel rail was analyzed by handheld laser-induced breakdown spectroscopy (LIBS) analyzer (SciAps, model Z-300). Table 1 tabulates the elemental composition of UIC 54 steel rail.  Figure 5a shows the AE data acquisition module with 16-bit ADC and a sampling rate of 20 MHz (Vallen AMSY-6). Figure 5b illustrates the AE sensor (S/N 10453) with 80 kHz-500 kHz bandwidth and −40 • C to +85 • C operating temperatures (Vallen model VS150-RIC). The MAG4R magnetic holder was used to attach the AE sensor to the steel rail (on the field side of the rail head).

AE Data Sensor and Acquisition Module
The cross-entropy loss function for multiclass was used to fine-tune the optimization of the algorithmic model, as shown in Equation (10).
where is true output value and is predicted output value.

Experimental Steel Rail
The experimental steel rail was of UIC 54 type, which is the steel rail commonly used in countries in the Southeast Asian region. The elemental composition of UIC 54 steel rail was analyzed by handheld laser-induced breakdown spectroscopy (LIBS) analyzer (Sci-Aps, model Z-300). Table 1 tabulates the elemental composition of UIC 54 steel rail.  Figure 5a shows the AE data acquisition module with 16-bit ADC and a sampling rate of 20 MHz (Vallen AMSY-6). Figure 5b illustrates the AE sensor (S/N 10453) with 80 kHz-500 kHz bandwidth and −40 °C to +85 °C operating temperatures (Vallen model VS150-RIC). The MAG4R magnetic holder was used to attach the AE sensor to the steel rail (on the field side of the rail head).

AE Data Sensor and Acquisition Module
The Hsu-Nielsen source was used as the artificial source of AE signal of crack, consisting of 2H pencil lead (0.5 mm), guide tube, and mechanical pencil. The pencil lead was broken against the head, web, and foot of steel rail at 30°, in accordance with the American Society for Testing and Materials (ASTM E976) standard.

AE Sensor Amplitude Testing
The AE sensor amplitude was analyzed prior to capturing the AE signals of cracks in the steel rail to determine the sensor sensitivity and sensor coupling. The AE sensor amplitude testing was carried out at the rail head by breaking 2H pencil lead (0.5 mm) against the steel rail. The testing was performed five consecutive times and results averaged. The Hsu-Nielsen source was used as the artificial source of AE signal of crack, consisting of 2H pencil lead (0.5 mm), guide tube, and mechanical pencil. The pencil lead was broken against the head, web, and foot of steel rail at 30 • , in accordance with the American Society for Testing and Materials (ASTM E976) standard.

AE Sensor Amplitude Testing
The AE sensor amplitude was analyzed prior to capturing the AE signals of cracks in the steel rail to determine the sensor sensitivity and sensor coupling. The AE sensor amplitude testing was carried out at the rail head by breaking 2H pencil lead (0.5 mm) against the steel rail. The testing was performed five consecutive times and results averaged. Figure 6 illustrates the peak amplitude (dB) of the AE sensor relative to time (s), and the results are tabulated in Table 2. In Figure 6, the maximum peak amplitude of the  Figure 6 illustrates the peak amplitude (dB) of the AE sensor relative to time (s), and the results are tabulated in Table 2. In Figure 6, the maximum peak amplitude of the experimental AE sensor was greater than 80 dB (>80 dB), indicating that the AE sensor could be deployed to capture the AE signals of cracks in the steel rail.

Datasets for Training and Testing the Deep Learning Algorithmic Model
The digital signal data of PLB at the head, web, and foot of steel rail were datasets for training and testing the proposed deep learning algorithmic model. PLB were carried out 150 times each at the head, web, and foot of steel rail; and the AE signals were captured, totaling 450 AE signals (Figure 7).
The un-denoised AE signals were subsequently divided into two groupings (150 and 300 AE signals) to investigate the effect of number of input data on the classification accuracy of the deep learning algorithmic model. Under the first grouping (150 AE signals), the input data were divided into a training dataset (80% of the input data) and a testing dataset (20%). Under the second grouping (300 AE signals), 80% of the input data were training dataset and the rest (20%) were testing dataset. Figure 8a-c shows the un-denoised AE signals (prior to pre-processing) of PLB at the head, web, and foot of steel rail. The AE signals of PLB were captured by the AE sensor and converted by the AE acquisition module into un-denoised AE digital signal data prior to pre-processing by total variation denoising (TVD) algorithm to remove the ambient noise.

Datasets for Training and Testing the Deep Learning Algorithmic Model
The digital signal data of PLB at the head, web, and foot of steel rail were datasets for training and testing the proposed deep learning algorithmic model. PLB were carried out 150 times each at the head, web, and foot of steel rail; and the AE signals were captured, totaling 450 AE signals (Figure 7).  The un-denoised AE signals were subsequently divided into two groupings (150 and 300 AE signals) to investigate the effect of number of input data on the classification accuracy of the deep learning algorithmic model. Under the first grouping (150 AE signals), the input data were divided into a training dataset (80% of the input data) and a testing dataset (20%). Under the second grouping (300 AE signals), 80% of the input data were training dataset and the rest (20%) were testing dataset. Figure 8a-c shows the un-denoised AE signals (prior to pre-processing) of PLB at the head, web, and foot of steel rail. The AE signals of PLB were captured by the AE sensor and converted by the AE acquisition module into un-denoised AE digital signal data prior to pre-processing by total variation denoising (TVD) algorithm to remove the ambient noise.  Figure 9 illustrates the conversion procedure of denoised AE signals (after pre-processing) of PLB into feature datasets (input data) for training and testing the deep learning algorithmic model. The training and testing datasets were transferred to a spreadsheet. The feature datasets were divided into two groupings (150 and 300 feature datasets) to investigate the effect of number of input data on the classification accuracy of the deep learning algorithmic model.
Under the first grouping (150 feature datasets), 50 datasets each belonged to PLB at the head, web, and foot of steel rail. The feature datasets were divided into training dataset (80%) and testing dataset (20%). Under the second grouping (300 feature datasets), 100 datasets each belonged to PLB at the head, web, and foot of steel rail. Likewise, 80% of the  Figure 9 illustrates the conversion procedure of denoised AE signals (after preprocessing) of PLB into feature datasets (input data) for training and testing the deep learning algorithmic model. The training and testing datasets were transferred to a spreadsheet. The feature datasets were divided into two groupings (150 and 300 feature datasets) to investigate the effect of number of input data on the classification accuracy of the deep learning algorithmic model.
Under the first grouping (150 feature datasets), 50 datasets each belonged to PLB at the head, web, and foot of steel rail. The feature datasets were divided into training dataset (80%) and testing dataset (20%). Under the second grouping (300 feature datasets), 100 datasets each belonged to PLB at the head, web, and foot of steel rail. Likewise, 80% of the input data were training dataset and the rest (20%) were testing dataset. Given the sampling rate of 20 MHz of the AE acquisition module, one feature dataset (i.e., one denoised AE signal) contained 300,000 data points.
In this research, one AE signal contains 300,000 data points, and the time to capture one data point is 0.05 µs, given f s = 20 MHz. As a result, the time required to capture one AE signal (i.e., 300,000 data points) is 0.015 s (= 300,000 × 0.05 × 10 −6 ). input data were training dataset and the rest (20%) were testing dataset. Given the sampling rate of 20 MHz of the AE acquisition module, one feature dataset (i.e., one denoised AE signal) contained 300,000 data points.
In this research, one AE signal contains 300,000 data points, and the time to capture one data point is 0.05 µs, given fs = 20 MHz. As a result, the time required to capture one AE signal (i.e., 300,000 data points) is 0.015 s (= 300,000 × 0.05 × 10 −6 ). In Figure 10, one row of yellow grid cells represents one AE signal (300,000 data points), and the total number of rows (i.e., N of Dataset) represents 450 AE signals (PLB digital signal data = 450 signals) at the rail head, web, and foot. In the columns "Target", the rail head, rail web, and rail foot are represented by red, green, and blue colors, respectively.
In training the deep learning algorithm, a specific target (either rail head, web, or foot) was assigned to each row of yellow grid cells (i.e., each AE signal) using one-hot encoding, where 1 denotes a 100% probability and 0 a zero probability. The training was carried out separately for the first grouping (80% of the input data = 120 signals) and second grouping (80% of the input data = 240 signals) of AE signals.
In testing the deep learning algorithm, a given AE signal was applied to the trained deep-learning algorithm and the algorithm classified the crack location based on probability, where Y1, Y2, and Y3 denote the head, web, and foot of steel rail, respectively. In testing the algorithm, there were 30 signals (20% of the input data) for the first grouping, consisting of 10 signals each for rail head, web, and foot. Meanwhile, there were 60 signals (20% of the input data) for the second grouping, consisting of 20 signals each for rail head, web, and foot.
The following example is to show the testing process of the algorithm, assuming that the deep learning-generated probability of Y1, Y2, and Y3 of the first row of yellow grid cells (AE signal#01) are 0.1 (10%), 0.4 (40%), and 0.5 (50%), the proposed deep learning algorithm would select Y3 (rail foot) as the location of the crack, based on the highest probability value. In Figure 10, one row of yellow grid cells represents one AE signal (300,000 data points), and the total number of rows (i.e., N of Dataset) represents 450 AE signals (PLB digital signal data = 450 signals) at the rail head, web, and foot. In the columns "Target", the rail head, rail web, and rail foot are represented by red, green, and blue colors, respectively.  Figure 11 illustrates the proposed deep learning algorithm for classification. The AE data points were input of the feature layer, and one AE signal contained 300,000 data points, as represented by X1, X2, X3…X300,000. In this research, there were 450 AE signals of PLB under two groupings. The proposed algorithmic model consisted of three hidden layers and one output layer. The first, second, and third hidden layers had 5, 4, and 3 nodes, respectively. The activation function between hidden layers was hyperbolic tangent function (tanh(z)), which transforms linear to nonlinear function. The output layer consisted of Y1 (crack in the rail head), Y2 (crack in the rail web), and Y3 (crack in the rail foot). The softmax activation function (softmax(z)) was used for classification based on probability. The aim of the proposed deep learning algorithm for classification is to localize cracks in the rail head, rail web, and rail foot of steel rail. In training the deep learning algorithm, a specific target (either rail head, web, or foot) was assigned to each row of yellow grid cells (i.e., each AE signal) using one-hot encoding, where 1 denotes a 100% probability and 0 a zero probability. The training was carried out separately for the first grouping (80% of the input data = 120 signals) and second grouping (80% of the input data = 240 signals) of AE signals.

Proposed Deep Learning Algorithm for Classification
In testing the deep learning algorithm, a given AE signal was applied to the trained deep-learning algorithm and the algorithm classified the crack location based on probability, where Y1, Y2, and Y3 denote the head, web, and foot of steel rail, respectively. In testing the algorithm, there were 30 signals (20% of the input data) for the first grouping, consisting of 10 signals each for rail head, web, and foot. Meanwhile, there were 60 signals (20% of the input data) for the second grouping, consisting of 20 signals each for rail head, web, and foot. The following example is to show the testing process of the algorithm, assuming that the deep learning-generated probability of Y1, Y2, and Y3 of the first row of yellow grid cells (AE signal#01) are 0.1 (10%), 0.4 (40%), and 0.5 (50%), the proposed deep learning algorithm would select Y3 (rail foot) as the location of the crack, based on the highest probability value. Figure 11 illustrates the proposed deep learning algorithm for classification. The AE data points were input of the feature layer, and one AE signal contained 300,000 data points, as represented by X1, X2, X3 . . . X300,000. In this research, there were 450 AE signals of PLB under two groupings.  Figure 11 illustrates the proposed deep learning algorithm for classification. The AE data points were input of the feature layer, and one AE signal contained 300,000 data points, as represented by X1, X2, X3…X300,000. In this research, there were 450 AE signals of PLB under two groupings. The proposed algorithmic model consisted of three hidden layers and one output layer. The first, second, and third hidden layers had 5, 4, and 3 nodes, respectively. The activation function between hidden layers was hyperbolic tangent function (tanh(z)), which transforms linear to nonlinear function. The output layer consisted of Y1 (crack in the rail head), Y2 (crack in the rail web), and Y3 (crack in the rail foot). The softmax activation function (softmax(z)) was used for classification based on probability. The aim of the proposed deep learning algorithm for classification is to localize cracks in the rail head, rail web, and rail foot of steel rail.  Figure 12 illustrates the process of training and testing the proposed deep learning algorithmic model and the model validation. The AE datasets (feature and target datasets) were divided into a training dataset (80%) and a testing dataset (20%). The proposed algorithmic model consisted of three hidden layers and one output layer. The first, second, and third hidden layers had 5, 4, and 3 nodes, respectively. The activation function between hidden layers was hyperbolic tangent function (tanh(z)), which transforms linear to nonlinear function. The output layer consisted of Y1 (crack in the rail head), Y2 (crack in the rail web), and Y3 (crack in the rail foot). The softmax activation function (softmax(z)) was used for classification based on probability. The aim of the proposed deep learning algorithm for classification is to localize cracks in the rail head, rail web, and rail foot of steel rail. Figure 12 illustrates the process of training and testing the proposed deep learning algorithmic model and the model validation. The AE datasets (feature and target datasets) were divided into a training dataset (80%) and a testing dataset (20%).

Proposed Deep Learning Algorithm for Classification
The training dataset consisted of X_Train (normalized feature/input dataset) and Y_Train (target/output dataset), while the testing dataset comprised X_Test (normalized feature dataset) and Y_Test (target dataset). The initial weight (W) and bias (B) in the hidden layers (W 1 , B 1, W 2 , B 2 , W 3 , B 3 , W 4 , B 4 ) were random, given the learning rate (α) of 0.1 and epoch of 1000. W and B were fine-tuned by gradient descent iterative optimization algorithm.
In the training and testing processes, L1-norm regularization was utilized to avoid overfitting due to excessive data points (300,000 data points per AE signal) and divergence between the cross-entropy loss of training and testing datasets.
The classification results of the optimized algorithmic model (Y_Predict) were compared against the training (Y_Train) and testing target (Y_Test) datasets to evaluate the classification performance of the proposed deep learning algorithm.
The training dataset consisted of X_Train (normalized feature/input dataset) and Y_Train (target/output dataset), while the testing dataset comprised X_Test (normalized feature dataset) and Y_Test (target dataset). The initial weight (W) and bias (B) in the hidden layers ( , , , , , , , ) were random, given the learning rate (α) of 0.1 and epoch of 1000. W and B were fine-tuned by gradient descent iterative optimization algorithm.
In the training and testing processes, L1-norm regularization was utilized to avoid overfitting due to excessive data points (300,000 data points per AE signal) and divergence between the cross-entropy loss of training and testing datasets.
The classification results of the optimized algorithmic model (Y_Predict) were compared against the training (Y_Train) and testing target (Y_Test) datasets to evaluate the classification performance of the proposed deep learning algorithm.

Evaluation of the Deep Learning Algorithm
The classification performance of the proposed deep learning algorithmic model was evaluated by comparing the classification results using the algorithmic model against the training and testing target datasets. The performance metrics were the receiver operating characteristic (ROC) curve, confusion matrix, F1 score, and total accuracy (average F1 score).

Evaluation of the Deep Learning Algorithm
The classification performance of the proposed deep learning algorithmic model was evaluated by comparing the classification results using the algorithmic model against the training and testing target datasets. The performance metrics were the receiver operating characteristic (ROC) curve, confusion matrix, F1 score, and total accuracy (average F1 score).

Receiver Operating Characteristic (ROC) Curve
A receiver operating characteristic (ROC) curve shows the classification performance of an algorithmic model whereby the area under the ROC curve is used as a performance indicator of the model. The ROC curve plots two parameters: true positive rate (y-axis) and false positive rate (x-axis). The true positive rate (TPR) and false positive rate (FPR) can be calculated by Equations (11) and (12), respectively.
True positive rate (TPR) = TP TP + FN (11) where TP is the number of true positives classified by the algorithmic model and FN is the number of false negatives classified by the algorithmic model. where FP is the number of false positives classified by the algorithmic model and TN is the number of true negatives classified by the algorithmic model.

Confusion Matrix, F1 Score and Total Accuracy (Average F1 Score)
The confusion matrix is a table layout that enables visualization of the classification performance of an algorithmic model. F1 score is a value that indicates the classification performance of the algorithmic model using Precision (Equation (13)) and Recall. The Recall equation in the F1 Score calculation is identical to Equation (11) (true positive rate). As a result, equation (11) can be used in the Recall calculation. F1 Score, Individual-class Accuracy and Total accuracy (the average of F1 scores) can be calculated by Equations (14)-(16), respectively.

Onsite Advanced Phased Array Ultrasonic Testing (PAUT)
Prior to the implementation of the proposed AE scheme for crack detection in steel rail, advanced PAUT with 5 MHz 64-channel linear array probe (model GEKKO; M2M, France) was applied to welding joints of the steel rail to determine and localize cracks. The PAUT detection was carried out on the welding joints due to their susceptibility to subsurface crack. The proposed AE scheme was subsequently applied to the same welding joints with defects, and the detection results of the AE scheme were validated against those of PAUT to evaluate the classification performance of the deep learning algorithmic model. Figure 13a,b show the advanced PAUT on a welding joint of the steel rail and the defect in the welding joint.

Steel Rail Temperatures
Given the operating temperature range of −40 to 85 °C of the experimental AE sensor, the temperatures of steel rail around mid-day were measured by using a thermal imaging camera (FLIR, USA.) prior to the installation of the AE sensor. The steel rail temperatures were 56.7-61.1 °C, as shown in Figure 14. The steel rail temperatures were within the operating temperature range of the AE sensor.

Steel Rail Temperatures
Given the operating temperature range of −40 to 85 • C of the experimental AE sensor, the temperatures of steel rail around mid-day were measured by using a thermal imaging camera (FLIR, USA.) prior to the installation of the AE sensor. The steel rail temperatures were 56.7-61.1 • C, as shown in Figure 14. The steel rail temperatures were within the operating temperature range of the AE sensor.
(a) (b) Figure 13. Onsite experimental setup: (a) advanced phased array ultrasonic testing (PAUT) on welding joint of the steel rail; (b) PAUT detection with defect in the welding joint.

Steel Rail Temperatures
Given the operating temperature range of −40 to 85 °C of the experimental AE sensor, the temperatures of steel rail around mid-day were measured by using a thermal imaging camera (FLIR, USA.) prior to the installation of the AE sensor. The steel rail temperatures were 56.7-61.1 °C, as shown in Figure 14. The steel rail temperatures were within the operating temperature range of the AE sensor.  Figure 15a illustrates the installation of the AE sensor on the field side of the head of steel rail to detect crack. Figure 15b shows the onsite experiment of the AE scheme with the steel rail under a load. The AE signals of the steel rail under a load were fatigue cracks induced by wheel/rail contact. The AE signals were captured by the sensor and converted into digital signal data by the data acquisition module. The digital signal data were preprocessed to denoise and processed by the deep learning algorithmic model to localize cracks in the steel rail.  Figure 15a illustrates the installation of the AE sensor on the field side of the head of steel rail to detect crack. Figure 15b shows the onsite experiment of the AE scheme with the steel rail under a load. The AE signals of the steel rail under a load were fatigue cracks induced by wheel/rail contact. The AE signals were captured by the sensor and converted into digital signal data by the data acquisition module. The digital signal data were pre-processed to denoise and processed by the deep learning algorithmic model to localize cracks in the steel rail.  Table 5. Results and Discussion.

Receiver Operating Characteristic (ROC) Curve
The effects of different number of feature datasets (input data) on the classification accuracy of the deep learning algorithmic model were determined by dividing the datasets into two groupings (150 and 300 AE signals). Figure 16a,b respectively illustrate the ROC curves under the first and second groupings, where classes 0, 1, and 2 denote the head, web, and foot of steel rail.
Under the first grouping, the accuracy performance of the deep learning algorithmic model for class 0 (crack from the rail head), class 1 (rail web), and class 2 (rail foot) were 100% (ROC curve area = 1.0), 95% (0.95), and 82% (0.82). Under the second grouping, the accuracy performance for class 0, class 1, and class 2 were 97% (0.97), 99% (0.99), and 96% (0.96), respectively. The results indicated that the classification accuracy of the deep learning algorithmic model increased with increase in the datasets.

Receiver Operating Characteristic (ROC) Curve
The effects of different number of feature datasets (input data) on the classification accuracy of the deep learning algorithmic model were determined by dividing the datasets into two groupings (150 and 300 AE signals). Figure 16a,b respectively illustrate the ROC curves under the first and second groupings, where classes 0, 1, and 2 denote the head, web, and foot of steel rail.
Under the first grouping, the accuracy performance of the deep learning algorithmic model for class 0 (crack from the rail head), class 1 (rail web), and class 2 (rail foot) were 100% (ROC curve area = 1.0), 95% (0.95), and 82% (0.82). Under the second grouping, the accuracy performance for class 0, class 1, and class 2 were 97% (0.97), 99% (0.99), and 96% (0.96), respectively. The results indicated that the classification accuracy of the deep learning algorithmic model increased with increase in the datasets.

Receiver Operating Characteristic (ROC) Curve
The effects of different number of feature datasets (input data) on the classification accuracy of the deep learning algorithmic model were determined by dividing the datasets into two groupings (150 and 300 AE signals). Figure 16a,b respectively illustrate the ROC curves under the first and second groupings, where classes 0, 1, and 2 denote the head, web, and foot of steel rail.
Under the first grouping, the accuracy performance of the deep learning algorithmic model for class 0 (crack from the rail head), class 1 (rail web), and class 2 (rail foot) were 100% (ROC curve area = 1.0), 95% (0.95), and 82% (0.82). Under the second grouping, the accuracy performance for class 0, class 1, and class 2 were 97% (0.97), 99% (0.99), and 96% (0.96), respectively. The results indicated that the classification accuracy of the deep learning algorithmic model increased with increase in the datasets.  Figure 17a,b show the confusion matrix of the deep learning algorithmic model for classification under first (150 datasets) and second (300 datasets) groupings of AE signal datasets. In Figure 17a (the first grouping), given 10 crack signals from the rail head (actual), the algorithmic model accurately classified 10 signals (predicted) as from the rail head. For the rail web, out of 10 crack signals (actual), the algorithmic model accurately classified 9 signals (predicted) as from the rail web and erroneously one signal as from the  Figure 17a,b show the confusion matrix of the deep learning algorithmic model for classification under first (150 datasets) and second (300 datasets) groupings of AE signal datasets. In Figure 17a (the first grouping), given 10 crack signals from the rail head (actual), the algorithmic model accurately classified 10 signals (predicted) as from the rail head. For the rail web, out of 10 crack signals (actual), the algorithmic model accurately classified 9 signals (predicted) as from the rail web and erroneously one signal as from the rail foot. For the rail foot, out of 10 crack signals (actual), the model accurately classified 7 signals (predicted) as from the rail foot and erroneously three signals as from the rail head. rail foot. For the rail foot, out of 10 crack signals (actual), the model accurately classified 7 signals (predicted) as from the rail foot and erroneously three signals as from the rail head.

Confusion Matrix of the Deep Learning Algorithmic Model
In Figure 17b (the second grouping), given 20 crack signals from rail head (actual), the algorithmic model accurately classified 19 signals (predicted) as from the rail head and erroneously one signal as from the rail foot. For the rail web, out of 20 crack signals (actual), the algorithmic model accurately classified 20 signals (predicted) as from the rail web. For the rail foot, out of 20 crack signals (actual), the model accurately classified 19 signals (predicted) as from the rail foot and erroneously one signal as from the rail web.  Table 3 compares the F1 scores and total accuracy of the first (150 AE signals) and second (300 AE signals) groupings. Precision and Recall were calculated from the confusion matrix ( Figure 17). Under the first grouping, the F1 scores for the head, web, and foot of steel rail were 90%, 90%, and 85%, respectively, with a total accuracy of 88.33%. Under the In Figure 17b (the second grouping), given 20 crack signals from rail head (actual), the algorithmic model accurately classified 19 signals (predicted) as from the rail head and erroneously one signal as from the rail foot. For the rail web, out of 20 crack signals (actual), the algorithmic model accurately classified 20 signals (predicted) as from the rail web. For the rail foot, out of 20 crack signals (actual), the model accurately classified 19 signals (predicted) as from the rail foot and erroneously one signal as from the rail web. Table 3 compares the F1 scores and total accuracy of the first (150 AE signals) and second (300 AE signals) groupings. Precision and Recall were calculated from the confusion matrix ( Figure 17). Under the first grouping, the F1 scores for the head, web, and foot of steel rail were 90%, 90%, and 85%, respectively, with a total accuracy of 88.33%. Under the second grouping, the corresponding F1 scores were 95%, 95%, and 92.5%, with a total accuracy of 94.5%. The F1 scores and total accuracy improved as the number of AE signals increased from 150 to 300 signals.  Figure 18 shows the AE signals of cracks in the welding joint of the steel rail under a load using the nondestructive single-sensor AE scheme. Prior to the implementation of the proposed AE sensor scheme under a load, the advanced PAUT was first used to localize cracks in the steel rail, and the crack locations were used as reference (i.e., actual results). The crack localization by PAUT was carried out at welding joints of the steel rail since the welding joints are highly susceptible to crack. In the crack location, the PAUT probe was placed on top of the steel rail above the welding joint (offline monitoring), in accordance with ASTM E2700 standard ( Figure 13).

F1 Scores of the Deep Learning Algorithmic Model
The proposed AE sensor scheme was then implemented under a load (online monitoring) at the same welding joints previously analyzed by the PAUT. The proposed deep learning algorithm localized the crack in steel rail for each AE signal (from steel rail under a load) based on the highest deep learning-generated probability value. The crack localization results by the proposed deep learning-based AE sensor scheme were compared against the reference (actual results by the PAUT). In this research, the F1 scores (classification accuracy) of the proposed AE sensor scheme for onsite localization of cracks in the rail head, web, and foot were 78%, 80%, and 74%, compared with the PAUT results. The total accuracy (average F1 score) of the proposed AE sensor scheme was 77.33%. learning algorithm localized the crack in steel rail for each AE signal (from steel rail under a load) based on the highest deep learning-generated probability value. The crack localization results by the proposed deep learning-based AE sensor scheme were compared against the reference (actual results by the PAUT). In this research, the F1 scores (classification accuracy) of the proposed AE sensor scheme for onsite localization of cracks in the rail head, web, and foot were 78%, 80%, and 74%, compared with the PAUT results. The total accuracy (average F1 score) of the proposed AE sensor scheme was 77.33%. Figure 18. AE signals of cracks in the welding joint of the steel rail under a load using the singlesensor AE scheme.

Conclusions
This research proposed the deep learning-based single-sensor AE scheme for localization of cracks in the head, web, and foot of steel rail under a load. The AE signals captured by the AE sensor were converted into digital signal data and denoised using a TVD algorithm. The denoised signal data were used to localize the cracks by using a deep learning algorithm. The proposed AE sensor scheme was also implemented onsite to detect cracks in the head, web, and foot of steel rail. The experimental results showed that the classification accuracy (F1 scores) of the AE sensor scheme for onsite localization of cracks in the rail head, web, and foot were 78%, 80%, and 74%, compared with the PAUT results. The total accuracy (average F1 score) of the AE sensor scheme was 77.33%. To improve the total accuracy (average F1 score), subsequent research should increase the number of AE signal data in training the deep learning algorithm, in addition to experimenting with alternative pre-processing algorithms with a higher denoising capability.

Conclusions
This research proposed the deep learning-based single-sensor AE scheme for localization of cracks in the head, web, and foot of steel rail under a load. The AE signals captured by the AE sensor were converted into digital signal data and denoised using a TVD algorithm. The denoised signal data were used to localize the cracks by using a deep learning algorithm. The proposed AE sensor scheme was also implemented onsite to detect cracks in the head, web, and foot of steel rail. The experimental results showed that the classification accuracy (F1 scores) of the AE sensor scheme for onsite localization of cracks in the rail head, web, and foot were 78%, 80%, and 74%, compared with the PAUT results. The total accuracy (average F1 score) of the AE sensor scheme was 77.33%. To improve the total accuracy (average F1 score), subsequent research should increase the number of AE signal data in training the deep learning algorithm, in addition to experimenting with alternative pre-processing algorithms with a higher denoising capability.
Author Contributions: Conceptualization, W.S.; methodology, W.S. and P.P.; validation, W.S. and P.P.; formal analysis, W.S. and P.P.; investigation, W.S. and P.P. writing-original draft preparation, W.S.; writing-review and editing, W.S. and P.P.; funding acquisition, P.P. All authors have read and agreed to the published version of the manuscript.