An Intelligence Method for Recognizing Multiple Defects in Rail

Ultrasonic guided waves are sensitive to many different types of defects and have been studied for defect recognition in rail. However, most fault recognition algorithms need to extract features from the time domain, frequency domain, or time-frequency domain based on experience or professional knowledge. This paper proposes a new method for identifying many different types of rail defects. The segment principal components analysis (S-PCA) is developed to extract characteristics from signals collected by sensors located at different positions. Then, the Support Vector Machine (SVM) model is used to identify different defects depending on the features extracted. Combining simulations and experiments of the rails with different kinds of defects are established to verify the effectiveness of the proposed defect identification techniques, such as crack, corrosion, and transverse crack under the shelling. There are nine channels of the excitation-reception to acquire guided wave detection signals. The results show that the defect classification accuracy rates are 96.29% and 96.15% for combining multiple signals, such as the method of single-point excitation and multi-point reception, or the method of multi-point excitation and reception at a single point.


Introduction
As an infrastructure, the structural health of the rails has attracted much attention in the fields of engineering and NDT. Because of the influence of the manufacturing process, the operating situation, and the geographic conditions, rails are prone to various defects. Based on the analysis of the operating situation, rolling contact is the main reason for the rail surface crack. In the manufacturing process, the inclusion in the railhead can lead to an area-shaped section within the rail, which will lead to the formation of transverse cracks under the shelling. Moreover, the natural status, such as air pollution, natural rainfall, and temperature change, provides favorable conditions for the generation of corrosion. In the case of in-service rail, the expansion speed of defects will increase as the size of the defect increases, and the expansion speed is different for different types of defects [1][2][3]. Therefore, further research determining the type and size of rail defects is necessary to ensure proper and effective maintenance and replacement. Attracted by multi-mode and low attenuation, ultrasonic guided waves can perform nondestructive testing of multiple types of defects in long-range rails [4][5][6][7][8]. For instance, the vertical vibration mode is sensitive to cracks at the bottom of rail [9]; the SH mode can detect a transverse defect in rail [8]; the flexural mode can measure axial stress to monitor rail breakage [10]; Evans et al. [11] used ultrasonic guided waves to detect defects in rail level crossings. Lee et al. [12] presented a hybrid analytical-FEM technique based on the dispersion characteristics of the guided wave to design the sensor which can excite specific modes and frequencies for identifying transverse cracking under the shelling. Xing et al. [13] constructed a mathematical model composed of a modal vibration factor Figure 1. Flow chart of the segmented principal components analysis (S-PCA) for rail defect identification. The enclosed area with the dashed line represents the process of acquiring data; the enclosed area with the dotted line represents the process of extracting features by S-PCA, and the rest area represents the process of classification. Si is the ith sub-signal of the origin signal. And W is the matrix of correlation coefficient obtained via PCA with the threshold θP.

Signal Dividing
The guided wave detection signal is a typical nonstationary time-series signal. The existent defects may cause modal conversion and dispersion of the guided wave signal, and the most intuitive effect is the change of the wave shape. The diverse defect information may be retained in the local zone of the guided wave detection signal. Therefore, we consider that extracting features from each local area realizes the feature extraction of the detection signal. The dividing is an effective means that the feature extraction of the high-dimensional signal is converted into the feature extraction of P low-dimensional signals.

Signal Dividing
The guided wave detection signal is a typical nonstationary time-series signal. The existent defects may cause modal conversion and dispersion of the guided wave signal, and the most intuitive effect is the change of the wave shape. The diverse defect information may be retained in the local zone of the guided wave detection signal. Therefore, we consider that extracting features from each local area realizes the feature extraction of the detection signal. The dividing is an effective means that the feature extraction of the highdimensional signal is converted into the feature extraction of P low-dimensional signals.
L is a parament in signal segmentation and is determined by both L min and N. The special expression is L = N × L min , in which L min is the smallest size of the defect detected Sensors 2021, 21, 8108 4 of 18 by the guided wave selected. N is a constant that needs to be determined experimentally. The choice of N or L will be discussed in the following section. Tc, as shown in Equation (1), is equal to the time interval between two adjacent dividing points. V P is the guided wave group velocity. Figure 2 is a schematic diagram of dividing signals. The guided wave detection signal of crack defect is divided by Tc (12.5 µs) or L (5 mm).
L is a parament in signal segmentation and is determined by both Lmin and N. The special expression is L = N × Lmin, in which Lmin is the smallest size of the defect detected by the guided wave selected. N is a constant that needs to be determined experimentally. The choice of N or L will be discussed in the following section. Tc, as shown in equation 1, is equal to the time interval between two adjacent dividing points. VP is the guided wave group velocity. Figure 2 is a schematic diagram of dividing signals. The guided wave detection signal of crack defect is divided by Tc (12.5 μs) or L (5 mm).

Figure 2.
Diagram of the signal divided by Tc (12.5 μs) or L (5 mm), in which the black boxes represent the dividing points. Tc is the time interval between two adjacent dividing points. The vertical axis is the signal amplitude. The horizontal axis is the sampling time of the signal.

Principal Component Analysis
PCA is a common method to reduce data dimensions. Its theme is to use lowdimensional data to reflect the valuable information contained in high-dimensional data based on the mapping. Gottumukka et al. [25] used a modular PCA method to improve face recognition technology. Senneville B et al. [26] applied PCA to the motion estimation of abdominal organs. Mazzeo et al. [27] combined wavelet transformed with PCA for the preprocessing of bolts image.
The role of PCA for rail defect recognition is manifested in two aspects: removing redundant information and extracting features. The specific implementation steps are as follows: • Construct a sample set X of the rail damage characteristics. Matrix A is composed of guided wave detection signals of different defect types or the same type of defects with different levels of damage. As shown in Equation (2), m is the number of detection signals, and n is the number of features in each sample; m a represents the mth detected signal sample with n data, and

Principal Component Analysis
PCA is a common method to reduce data dimensions. Its theme is to use lowdimensional data to reflect the valuable information contained in high-dimensional data based on the mapping. Gottumukka et al. [25] used a modular PCA method to improve face recognition technology. Senneville B et al. [26] applied PCA to the motion estimation of abdominal organs. Mazzeo et al. [27] combined wavelet transformed with PCA for the preprocessing of bolts image.
The role of PCA for rail defect recognition is manifested in two aspects: removing redundant information and extracting features. The specific implementation steps are as follows: • Construct a sample set X of the rail damage characteristics. Matrix A is composed of guided wave detection signals of different defect types or the same type of defects with different levels of damage. As shown in Equation (2), m is the number of detection signals, and n is the number of features in each sample; a m represents the mth detected signal sample with n data, and a m1 represents the first data in the mth sample.
The set of samples X of the rail damage characteristic is obtained after centralized processing A. The centralization is expressed in Equation (3), which A represents the average value of each column in A.
• Construct the damage covariance matrix C. The correlation coefficient in guided wave detection signals can be used to characterize different defects. A covariance matrix is an efficient tool for characterizing the correlation coefficient. The covariance matrix C is described by Equation (4).
• Calculate the eigenvalues and eigenvectors of the damage covariance matrix C. According to the matrix decomposition method, the eigenvalues and corresponding eigenvectors of matrix C are solved. Then the eigenvalues are ranked in descending order, λ 1 ≥ λ 2 ≥ . . . ≥ λ n , and the corresponding eigenvectors in sequential order, . . x n . Among them, each eigenvector represents a principal component. • Determine the number of principal components K. The information percentage of a principal component is an important reference when the number of principal components is determined. The ratio of one eigenvalue to the sum of all eigenvalues is the information percentage of this principal component, as shown in Equation (5).
Set a threshold θ (0 < θ < 1) and accumulate the information percentages of the principal components sorted in sequence. When the cumulative sum of the information percentages of the Kth principal component is greater than or equal to θ, there is selected for the K principal components.
• Extract features. The K principal components extracted from the covariance matrix C are gathered to form a local weight matrix W. The process of feature extraction is shown as Equation (7), where X' is a feature set extracted.
2.3. Build S-PCA Figure 3 is a schematic diagram of the S-PCA. There are two steps based on the S-PCA to extract features. The first step is to divide. According to the time interval Tc, the detection signal S is divided into P segments of equal length. S i represents the ith signal segment (1 ≤ i ≤ P). The second step is feature extraction in two sub-steps. The first sub-step is extracting features from each segment by PCA with a threshold θ P (0 < θ P < 1) to form a feature set named F. The second sub-step is to get the final features from F by PCA with a threshold θ (0 < θ < 1).

SVM Classification Model
SVM realizes classification by searching for the optimal hyperplane determined by a certain number of support vectors. It is often used for solving linearly separable problems. For nonlinear separable problems, the introduction of the kernel function maps the sample data to a high-dimensional space, making it a linearly separable problem, which is then solved by linear classification. The Gaussian radial basis function (RBF) has excellent characteristics of nonlinearity and continuity, so it is often used as the kernel function of SVM. Equation (8) is the expression of RBF, in which g is the kernel size of the RBF, and X and X i are feature vectors with the same dimension.  S is the original signal; F is a feature set that combines features from each segment. Si represents the ith subsection; Tc is the time interval used for dividing S; θP is a threshold used for extracting features from each segment; θ is a threshold used for extracting features from F. The grey chip represents a feature.

SVM Classification Model
SVM realizes classification by searching for the optimal hyperplane determined by a certain number of support vectors. It is often used for solving linearly separable problems. For nonlinear separable problems, the introduction of the kernel function maps the sample data to a high-dimensional space, making it a linearly separable problem, which is then solved by linear classification. The Gaussian radial basis function (RBF) has excellent characteristics of nonlinearity and continuity, so it is often used as the kernel function of SVM. Equation (8) is the expression of RBF, in which g is the kernel size of the RBF, and X and Xi are feature vectors with the same dimension.

Classification Model Evaluation
Evaluating the performance of the model provides a basis for the credibility of the model classification results. These are called respectively the precision rate and the recall rate, which are usually used for evaluating the model performance. The precision rate reflects the ability of the model to detect positive samples from the sample set, its expression is shown in Equation (9). The recall rate reflects the ability of the model to detect the number of correctly classified positive samples from all positive samples and is expressed as Equation (1)0. This article also introduces the F1-score to balance the precision rate and the recall rate to make the model evaluation more accurate, as shown in Equation (1)1. All the equations above, True Positive (TP) is the number of samples whose labels and the predictions are positive samples. False Positive (FP) is the number of samples where the object labels are negative and the predicted labels inverted. True Negative (TN) is the number of samples whose object labels are positive and the predicted labels inverse. False Negative (FN) is the number of samples whose labels of the object and predicted are negative samples. S is the original signal; F is a feature set that combines features from each segment. S i represents the ith subsection; Tc is the time interval used for dividing S; θ P is a threshold used for extracting features from each segment; θ is a threshold used for extracting features from F. The grey chip represents a feature.

Classification Model Evaluation
Evaluating the performance of the model provides a basis for the credibility of the model classification results. These are called respectively the precision rate and the recall rate, which are usually used for evaluating the model performance. The precision rate reflects the ability of the model to detect positive samples from the sample set, its expression is shown in Equation (9). The recall rate reflects the ability of the model to detect the number of correctly classified positive samples from all positive samples and is expressed as Equation (10). This article also introduces the F1-score to balance the precision rate and the recall rate to make the model evaluation more accurate, as shown in Equation (11). All the equations above, True Positive (TP) is the number of samples whose labels and the predictions are positive samples. False Positive (FP) is the number of samples where the object labels are negative and the predicted labels inverted. True Negative (TN) is the number of samples whose object labels are positive and the predicted labels inverse. False Negative (FN) is the number of samples whose labels of the object and predicted are negative samples.

Experimental Setup
There are 7 different degrees of crack, 5 different degrees of transverse crack under the shelling, and 4 different degrees of corrosion within the railhead. The detect signals of the above defects are acquired by combining the experiments and the simulations.

Experimental Setup
The experimental system is shown in Figure 4. The experimental object is a 60 kg/m rail with a length of 250 mm. Ceramic piezoelectric sheets (PZT-5H, diameter 14 mm, thickness 1 mm, center frequency 200 kHz) are arranged symmetrically on the railhead and marked in sequence as 1, 2, 3, A, B, C. The distance between the center of the piezoelectric films and the rail end is 7 mm. The vertical distance between the rail tread and the center of the piezoelectric films 1, 3, A, or C is 23.57 mm. The piezoelectric plates at positions 1, 2, and 3 are used to excite guided wave signals, and the piezoelectric plates at positions A, B, and C are used to receive detection signals. Any exciting sensors are paired with any receiving sensors to form a signal acquisition channel. There are 9 channels for acquiring signals, like 1TA, 1TB, 1TC, 2TA, 2TB, 2TC, 3TA, 3TB, 3TC. Figure 5 is a connection diagram of the detection equipment. The arbitrary function generator (TektronixAFG3021B) can modulate a sinusoid exciting signal of 200 kHz and 5 circles loaded alternately on piezoelectric slices 1, 2, or 3. The guided wave signal will propagate in the rail. The detection signals are collected using piezoelectric slices A, B, or C. Both the excitation signal and the reception signal are input into the oscilloscope (Tektronix DPO4054) with a specific sampling frequency.
rail with a length of 250 mm. Ceramic piezoelectric sheets (PZT-5H, diameter 14 mm, thickness 1 mm, center frequency 200 kHz) are arranged symmetrically on the railhead and marked in sequence as 1, 2, 3, A, B, C. The distance between the center of the piezoelectric films and the rail end is 7 mm. The vertical distance between the rail tread and the center of the piezoelectric films 1, 3, A, or C is 23.57 mm. The piezoelectric plates at positions 1, 2, and 3 are used to excite guided wave signals, and the piezoelectric plates at positions A, B, and C are used to receive detection signals. Any exciting sensors are paired with any receiving sensors to form a signal acquisition channel. There are 9 channels for acquiring signals, like 1TA, 1TB, 1TC, 2TA, 2TB, 2TC, 3TA, 3TB, 3TC. Figure 5 is a connection diagram of the detection equipment. The arbitrary function generator (Tek-tronixAFG3021B) can modulate a sinusoid exciting signal of 200 kHz and 5 circles loaded alternately on piezoelectric slices 1, 2, or 3. The guided wave signal will propagate in the rail. The detection signals are collected using piezoelectric slices A, B, or C. Both the excitation signal and the reception signal are input into the oscilloscope (Tektronix DPO4054) with a specific sampling frequency. As shown in Figure 4, the location of the crack is indicated by the red square. The crack is artificially cut in the fillet radius on one side of the railhead, located 125.5 mm from the rail end. The crack depth gradually increases at an interval of 1 mm to reflect the varying degrees of crack damage. In this study, 8 kinds of crack defects with different damage levels, 0 mm, 1 mm, 2 mm, 3 mm, 4 mm, 5 mm, 6 mm, 7 mm, are considered, and the corresponding sample labels are 1, 2, 3, 4, 5, 6, 7, 8, respectively. In the experiment, uncertain physical factors are inevitable, such as the tightness of coupling between the piezoelectric sheet and the surface of the rail. Hence the experiments are repeated 10 times under each excitation-reception mode for each degree of crack to reduce the effect of tightness. The detection signal is picked up at a sampling  As shown in Figure 4, the location of the crack is indicated by the red square. The crack is artificially cut in the fillet radius on one side of the railhead, located 125.5 mm from the rail end. The crack depth gradually increases at an interval of 1 mm to reflect the varying degrees of crack damage. In this study, 8 kinds of crack defects with different damage levels, 0 mm, 1 mm, 2 mm, 3 mm, 4 mm, 5 mm, 6 mm, 7 mm, are considered, and the corresponding sample labels are 1, 2, 3, 4, 5, 6, 7, 8, respectively.
In the experiment, uncertain physical factors are inevitable, such as the tightness of coupling between the piezoelectric sheet and the surface of the rail. Hence the experiments are repeated 10 times under each excitation-reception mode for each degree of crack to reduce the effect of tightness. The detection signal is picked up at a sampling frequency of 100 MHz, and 10 samples are taken accordingly for each crack degree. 80 samples may be obtained in each signal acquisition mode, and a total of 720 may be obtained from 9 signal acquisition channels. Table 1 shows the types of cracks and the number of samples taken through the experiments of 1TA.

Numerical Simulation
The finite element simulation software ABAQUS is used to build a model consistent with the experimental object. Table 2 shows the material parameters of the rail model. ρ represents density, E represents the Elastic Modules, and υ represents Poisson's ratio. The rail model is divided by grid cells C3D8, and the grid size is 1.5 mm. The arrangement of the piezoelectric sheets on the railhead is consistent with the experiment, as shown in Figure 6. The total time to analyze the model is 2 ms, and the time step is 0.01 µs. shown in Figure 6. The total time to analyze the model is 2 ms, and the time step is 0.01 μs.     Figure 7a is the model of rail defects made by ABAQUS, such as crack, corrosion. The position of the black box corresponds to the location of the failure. The method of fine-tuning the defect parameters is adopted to make the numerical simulation and the experimental effect more consistent. Moreover, more detect signals are collected. For crack, the depth of the defect is continuously adjusted with a deviation of 0.001 mm. Taking a 1 mm sample as an example, the crack depth of 0.999 mm, 0.998 mm, 1.001 mm, and 1.002 mm, etc., and these defects are classed as 1 mm cracks. According to the above method, 10 samples are expanded under each excitation-reception mode, there are a total of 80 samples expanded for 8 damage categories. In practice, the depth of crack defect is always irregular. The author uses rounding to make the mark of defects in integers. For instance, the depth of crack less than 1.5 mm carries a 1 mm tag, and the depth of crack which equals 1.5 mm or more carries a 2 mm tag. If the difference between the marked value and the defect depth is within 0.2 mm, the defect depth is indicated with the marked value. For example, the crack depth of 1.2 mm, 1.1 mm, 0.9 mm, and 0.8 mm are all classified as 1 mm cracks. The type of sample expansion is only in the crack depth of 1 mm and 7 mm in this study, and the total number of expanded samples is 58, as shown in Table 3. With respect to corrosion, four defect models of different damage levels, 6 mm 2 , 12 mm 2 , 25 mm 2 and 30 mm 2 , are constructed by ABAQUS. And the corresponding samples are marked as 9, 10, 11, and 12, respectively. For each type of corrosion defect, the corrosion size is modified by 0.004 mm 2 to produce more samples. 80 samples are obtained per the excitation-reception method. The transverse crack under shelling is also an area defect. One rectangle is used to approximate the shape of the internal transverse crack. Figure 7b is a rail model with a 6 mm × 7 mm transverse crack under the shelling defect constructed by ABAQUS. 6 mm is the defect's width, and 7 mm is the length of the defect. The defect is located at a rail length of 125.5 mm and a depth of 12.5 mm from the rail tread. Five transverse cracks under shelling models with different damage levels, 2 mm × 3 mm, 3 mm × 4 mm, 4 mm × 5 mm, 5 mm × 6 mm, 6 mm × 7 mm, are constructed through numerical simulation. Moreover, sample labels corresponding to defect sizes are marked as 13,14,15,16,17. Under each excitation-reception channel, 21 samples are collected under each type of transverse crack under shelling by adjusting the length or width with the deviation of 0.001 mm.  Table 4 shows that the total number of samples obtained by experiments and simulations under the channel of 1TA, and the sample labels corresponding to defects. Subsequently, all samples obtained by 1TA are made into a dataset including samples and the sample labels. Moreover, the dataset is divided into two parts, which randomly take 80% of the dataset as the training set, the remaining 20% as the testing set. The classify model is trained and tested by training set and testing set, respectively. The data obtained from the remaining 8 signal acquisition methods are processed by the same method with 1TA for subsequent rail defect identification.  Table 4 shows that the total number of samples obtained by experiments and simulations under the channel of 1TA, and the sample labels corresponding to defects. Subsequently, all samples obtained by 1TA are made into a dataset including samples and the sample labels. Moreover, the dataset is divided into two parts, which randomly take 80% of the dataset as the training set, the remaining 20% as the testing set. The classify model is trained and tested by training set and testing set, respectively. The data obtained from the remaining 8 signal acquisition methods are processed by the same method with 1TA for subsequent rail defect identification.  14 3 mm × 4 mm 21 15 4 mm × 5 mm 21 16 5 mm × 6 mm 21 17 6 mm × 7 mm 21 Figure 8 shows the waveform diagram of the crack detection signal obtained at the time range of 0-0.002 s by 2TA in the experiment, and the cracks' depth is 2 mm, 4 mm, 6 mm, 7 mm, respectively. Based on the guided wave theory, the guided wave group velocity is about 3845 m/s in the experiment. And the speed in the simulation is 3850 m/s, approximately like the result of the experiment. In addition, the experimental signal waveform is like the simulated signal waveform, under different degrees of defect damage. Thus, the simulated signals are used for the expansion of defect samples.

12
30 mm 2 20 Internal nuclear defect 13 2 mm × 3 mm 21 14 3 mm × 4 mm 21 15 4 mm × 5 mm 21 16 5 mm × 6 mm 21 17 6 mm × 7 mm 21 Figure 8 shows the waveform diagram of the crack detection signal obtained at the time range of 0-0.002 s by 2TA in the experiment, and the cracks' depth is 2 mm, 4 mm, 6 mm, 7 mm, respectively. Based on the guided wave theory, the guided wave group velocity is about 3845 m/s in the experiment. And the speed in the simulation is 3850 m/s, approximately like the result of the experiment. In addition, the experimental signal waveform is like the simulated signal waveform, under different degrees of defect damage. Thus, the simulated signals are used for the expansion of defect samples.  Figures 8-10 show the signal waveforms of crack defects, transverse cracks under the shelling, and corrosion defects at different damage levels, respectively. We find that the waveforms of detection signals caused by different damage degrees of defects at the same position are generally similar. According to the elastic wave theory, the corresponding wave packet will cause amplitude changes and overlap due to some factors such as defects. However, the existing method is difficult to distinguish the specific defects by the time domain waveform. In this paper, the amplitude corresponding to each time point on the detection signal is regarded as a feature value in each sample. The statistical analysis and machine learning algorithms are used to find the features between the sample data of defects and correspondence defect types and eliminate redundant features.

Feature Extraction
Feature extraction is a significant part of structural health monitoring (SHM). Redundant features are removed from the detection signal through feature extraction, and several important features are retained to achieve accurate and rapid identification of defects.
Tc is an important parameter to achieve feature extraction by S-PCA. And the selection of Tc depends on parament L and the group velocity V P of the guided wave signal. In this study, the guided wave group velocity V P is 3850 m/s. The L participating in the discussion is shown in Table 5. K P is the number of data points in each segment. The influence of different Tc on defect identification is stated in Section 6.
sponding wave packet will cause amplitude changes and overlap due to some factors such as defects. However, the existing method is difficult to distinguish the specific defects by the time domain waveform. In this paper, the amplitude corresponding to each time point on the detection signal is regarded as a feature value in each sample. The statistical analysis and machine learning algorithms are used to find the features between the sample data of defects and correspondence defect types and eliminate redundant features.   such as defects. However, the existing method is difficult to distinguish the specific de-fects by the time domain waveform. In this paper, the amplitude corresponding to each time point on the detection signal is regarded as a feature value in each sample. The statistical analysis and machine learning algorithms are used to find the features between the sample data of defects and correspondence defect types and eliminate redundant features.   PCA is a means of globally reducing data dimensions. Selecting an adequate number of principal components is critical when the PCA algorithm filters the feature. In the process of extracting features from detection signals with S-PCA, θ P is a threshold to extract principal components from sub-signal segments using PCA. θ is a threshold for extracting the principal components from feature set F. The influences of these two parameters on rail defect identification are outlined in Section 6, respectively.

Model Parameter Adjustment
The number of support vectors in the SVM classification model is an essential factor influencing the model performance [28]. The number of support vectors for nonlinear SVM is modified by adjusting the penalty coefficient C and the kernel radius g. There are always optimal C and g values for different datasets to achieve accurate classification of the samples. It is considered that the method of grid search automatically finds suitable C and g within the range of [−3, 20], and [−25, 0], for nine datasets.

Influence of Tc on the Defect Recognition
The guided wave detection signals are divided into P sub-signal segments with a time interval Tc. To select a suitable Tc, we assume that both θ P and θ are equal to 99%. When Tc is equal to 0.5 µs, 2.5 µs, 5 µs, 7.5 µs, 10 µs, and 12.5 µs, the accuracy of SVM varies in shown in Table 6. In Table 6, the red number represents the maximum classification accuracy attained by the classifier based on the selected parameters in each dataset. When Tc is 5 µs, seven datasets, such as 1TA, 1TB, 1TC, 2TA, 2TB, 3TA, and 3TB, reach the maximum classification accuracy rate. When Tc is equal to 7.5 µs, both 2TC and 3TC reach the maximum classification accuracy rate. But when Tc is 5 µs, the classification results of 2TC and 3TC are not much different from the optimal result maintaining a relatively high classification level. According to the content above, most of the nine datasets may achieve the best classification effect when Tc is equal to 5 µs. So 5 µs is selected as the optimal parament of Tc for subsequent analysis.

Influence of θ P on The Rail Defects Identification
θ P controls the number of principal components extracted from each sub-signal. Different θ P can extract different numbers of principal components from the sub-signal segments and affects the classification results of the classification model. It is the initial step that Tc is equal to 5 µs and θ is equal to 99%. Table 7 shows the variation of the classification results in the nine datasets with θ P selected in turn from 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 99%.  As shown in Table 7, when θ P equals 50%, any one of 1TB, 1TC, 2TA, 2TB, 2TC, 3TA, and 3TC reaches the maximum classification accuracy. For 1TB, when θ P is equal to 60%, the classification accuracy is the same as the optimal classification result. However, when θ P equals 50%, the number of the features used to classify defects is 5. When θ P equals 60%, the number is 7. Thus 50% is taken at 1TB to reduce the loss of computation. When θ P equals 60%, 1TA and 3TB reach the best classification outcomes at 91.95% and 94.25%, respectively. For 1TC and 2TC, when θ P equals 10%, they attain the maximum classification accuracy. Moreover, when θ P is equal to 20%, 30%, 40%, and 50%, in turn, the classification accuracy keeps the same classification result equaled to the optimal result. However, when θ P equals 10%, 20%, 30%, and 40%, in turn, the 13th category cannot be identified in 3TB, as shown in Table 8. The reason is that the value of θ P is too low to extract enough principal components. When θ P equals 50%, all categories can be recognized, and when θ P equals 60%, the classification accuracy rate is 94.25% that is the best-classified outcome for 3TB. According to the analysis above, most of the nine datasets can maintain a relatively high classification accuracy when θ P equals 50%. Thus 50% is selected as the optimal parament of θ P for subsequent research.

Influence of θ on the Rail Defect Identification
θ is an important parameter to extract features from F which are final features. Table 9 shows that when Tc equals 5 µs and θ P equals 50%, the classification results of the nine datasets vary with θ changed from 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 99%. According to Table 9, the classification accuracy rates of the nine datasets show an increasing trend as the θ value increases. Therefore, 99% is chosen as a fixed parameter of θ for follow-up research. In summary, 5 µs, 50%, and 99% are selected, respectively, as the optimal parameters, which are the fixed values of Tc, θ P , θ.

Classification Result of Single-Channel Acquisition Signal
The signals collected by different signal acquisition channels are turned into corresponding datasets. Once each dataset is normalized, the S-PCA model with selected paraments is utilized to extract features. The dataset is split into the training set and the testing set. The outcome of the classification is the average of the 10 results of the classification. The variance S represents the standard deviation of 10 classification results, which is used for evaluating the stability of the classification. And the results of nine datasets are presented in Table 10.  Table 10 shows that among the 9 signal acquisition methods, the accuracy rates of the SVM classification model exceed 76% that is a medium score. In 1TA, 1TB, 2TA, 2TB, 2TC, and 3TB, the SVM classification accuracy rates reach nearly or more than 90%, which is a good score. Both 2TA and 2TB have excellent classification because the sensor at position 2 can be smoothly attached to the track surface, making the defect information in the collected detection signal more obvious. For 3TB, the distance between the sensor and the defect is relatively close, which makes the difference of the defect signals more prominent, which is conducive to the identification of the defect. Both 1TA and 1TB have higher classification accuracy because the signal reflected by the defect can be received by the A and B sensors, which is conducive to the recognition defect. In addition, the S-PCA is an effective method of extracting features and can extract the local differences in the detection signal, which is useful for the classifier to make the correct distinction. Similarly, the distance between the exciting sensor and the receiving sensor is relatively long, so the classification results of 1TC and 3TA are 76.15% and 76.53%, respectively, lower than the signals collected by other signal acquisition methods.

Classification Result of Multi-Channel Signal Combination
The above research shows that a single channel collecting signal can identify different types and degrees of defects damage on the railhead. Based on theoretical analysis, a singlechannel acquisition signal can only reflect part of the characteristics of the defect. Thus, 1TC and 3TA classification accuracy rates are lower. The two types of combination methods are comprehensively analyzed to explore the influence of multi-channel signals on rail defect recognition, such as multi-point excitation and single-point reception or single-point excitation and multi-point reception. The types of combinations and the corresponding classification results are indicated in Tables 11 and 12.  Table 11 shows the results of the classification of the combined methods of singlepoint excitation and multi-points reception signals. With the number of combined signals increasing, it is found that the classification accuracy rate shows an increasing trend, such as 1TAC, 1TBC, and 1TABC; 2TAB, 2TAC, 2TBC, and 2TABC; 3TAB, 3TAC, 3TBC, and 3TABC. Moreover, compared to the classification of signals collected by one single channel, the standard deviation S of the classification is slightly reduced after multiple signal combinations. The above results show that the accuracy of the classification can be effectively enhanced by the combination method. Since the propagation path of each signal is different, the effective combination can more widely reflect the health status of the detected object. The same results can also be found in Table 12.

Conclusions
Aiming to identify multiple defects in rail, experiments and numerical simulations are used to focus on the ultrasonic guided wave detection signals of crack defect, transverse crack under shelling, and corrosion defect. By modifying and combining the sensor positions, a sample library of defect detection signals is obtained by 9 different signal excitation-reception methods. The S-PCA algorithm is proposed to extract the features of the signal to eliminate the dependency on professional knowledge. Furthermore, the extracting features are input into the SVM classifier to identify the type and extent of defects. At the end of the research, the following conclusions are drawn:

•
The method of extracting features from the segments of detection signal by PCA can effectively eliminate redundant information in the signal and retain adequate information, which improves the accuracy of quantitative and qualitative identification of rail defects.

•
The detection signals collected from different excitation-reception positions describe the overall health of the different parts of the rail. Obtaining the combined detection signal through single-point excitation and multi-point reception or multi-point excitation and single-point reception can more comprehensively describe the health status of the detection object, which is good to improve the accuracy of defect recognition.
• The S-PCA algorithm is an efficient method of extracting features based on statistical theory. It does not rely too much on the professional knowledge of guided wave detection, which reduces the difficulty of rail defect identification. Furthermore, the method could be more easily implemented in practical engineering in the future. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.