Systematic Literature Review on Data-Driven Models for Predictive Maintenance of Railway Track: Implications in Geotechnical Engineering

: Conventional planning of maintenance and renewal work for railway track is based on heuristics and simple scheduling. The railway industry is now collecting a large amount of data with the fast-paced development of sensor technologies. These data sets carry information about the conditions of various components in railway track. Since just before the beginning of the 21st century, data-driven models have been used in the predictive maintenance of railway track. This study presents a systematic literature review of data-driven models applied in the predictive maintenance of railway track. A taxonomy to classify the existing literature based on types of models and types of applications is provided. It is found that applying the deep learning methods, unsupervised methods, and ensemble methods are the new trends for predictive maintenance of railway track. Rail geometry irregularity, rail head defect, and missing rail components detection were the top three most commonly considered issues within the application of data-driven models. Prediction of rail breaks has received increasing attention in the last four years. Among these data-driven model applications, the collected data types are the most critical factors which a ﬀ ect selecting suitable models. Finally, this study discusses upcoming challenges in the predictive maintenance of railway track.


Introduction
Railway track is one of the most critical parts in railway system. Track-caused accidents has consistently constituted 30-40% of total accidents for the past decade in America [1]. With high traffic levels, huge axle loads and varying environmental conditions even small flaws in railway track may develop into severe damage [2,3]. Therefore, to avoid disruption in rail network, railway tracks need to be maintained regularly and monitored for unusual degradation.
Globally, the railway industry spends a large amount of money on maintenance and renewal projects. The annual maintenance expenditure of British railway infrastructure was more than £1 billion in 2015; almost two-thirds of the Network Rail organization's employees engaged in maintenance work [4]. In the United States of America, over half of the railway maintenance costs are related to track [5].
To reduce the maintenance costs, it is crucial to find suitable maintenance strategies. In the literature, maintenance strategies include corrective, preventive, condition-based, and predictive maintenance [6][7][8][9].

•
Corrective maintenance happens only when the track needs to be repaired. It is performed after a fault has occurred, resulting in the need for a backup transport service to be organized and dispatched as soon as possible. These factors lead to remarkably high costs and fees from the incurred service interruptions.

•
Preventive maintenance happens periodically with a planned schedule before track failures, which reduces the useful life of track components due to early replacement. Unnecessary maintenance actions may be taken, leading to additional cost. • Condition-based maintenance aims to optimize maintenance strategies based on the estimation of the track status. Recent advancements in smart sensors enable railway engineers to estimate the real-time track conditions. Components are then repaired or replaced only when conditions exceed some thresholds.

•
Predictive maintenance is a predictive framework to estimate the time when a fault is likely to occur and to adopt maintenance interventions accordingly. It is a proactive process that requires the development of a predictive model. The maintenance can be carried out whenever it is convenient for the railway asset managers before the predicted failure time.
Predictive maintenance strategy is the most desirable because it reduces track failure rates and minimizes maintenance costs by extending the life of track components and allowing operators to plan maintenance operations ahead of time [9].
Optimum maintenance strategy in the railway industry depends on information collected from existing monitoring facilities. Railway track monitoring methods include walking patrol [10], mechanized track patrol [11], and wayside detectors [12]. In a walking patrol, patrollers look for signs of track defects, particularly where immediate action is required. The mechanized track patrol includes specific inspection cars and in-service vehicles. Wayside detectors are fixed sensors along the track. These inspection methods aim to detect obstructions, broken rails, track geometry defects, signs of earthworks or drainage failure, and security of track signage.
An abundance of data is now available in the railway industry. The characteristics of these data include large-volume, multi-source, highly imbalanced towards normal behavior, and high noise. These are described in Table 1. Table 1. Characteristics of railway measurement data.

Characteristics Descriptions Examples
Large-volume Collecting huge amounts of data from two aspects: Time domain (real-time, nearly real-time or streamlines). Space domain (thousands of miles).
More than 60 million records of track geometry measurement data set were collected in a U.S. Class I rail network for 100 miles from 2012 to 2017 [11].

Multi-source
Multi-source refers to the various measurement methods from which data can be generated in multiple types.
Predict the remaining useful life of railcar by fusing data from wheel impact load detector, machine vision systems and optical geometry detectors [12].

Highly-imbalanced
The rail defects are highly skewed in the collected data sets. Majority of observations belong to the normal states while only a small portion are related to defects.
Pre-process the imbalanced rail image data by sampling techniques and semi-supervised techniques before rail defect detection [13].

High noise
Noise come from two aspects during the data collection: The inherent environmental uncertainty along the track: soil type, climate condition, track profile, and materials. The precision of sensors.
Use of derivatives and smoothing can reduce the noise present in the raw measurements and thus improve the quality of the signal [12].
The models used to evaluate the track conditions for diagnostic and prognostic purposes can be grouped into mechanical models and data-driven models [7]. Mechanical models are based Geosciences 2020, 10, 425 3 of 24 on mechanistic knowledge of component behavior and rely on simplifying assumptions of track components. Data-driven approaches, which do not have such dependencies, have been increasingly applied in predictive maintenance of railway track. Analyzing the railway measurement data sets with data-driven approaches has recently been an area of focus, both within academia and industry.
Data-driven methods discover viable feature sets and decision criteria from observed data. These methods include statistical models and machine learning models [14]. The primary difference between these two types lies in the main goal of the analysis. Statistical models make inferences about the relationships between variables, whilst machine learning models focus on making the most accurate predictions possible. Both types can handle high dimensional and multivariate data, and extract hidden relationships between the track status and measurement data. Overall, data-driven methods help railway engineers to understand the status of the railway track better and make corresponding maintenance decisions. However, the performance of the data-driven methods depends on the appropriate choice of data pre-processing and analysis models.
There are several reviews in the literature on the application and challenges in predictive maintenance of railway track. However, most of these studies focus on a specific aspect of railway track. For example, in Soleimanmeigouni [15], a survey of track geometry degradation and maintenance models was conducted. A survey of track degradation prediction models based on mechanical models, statistical models and artificial intelligence models was provided in Reference [14]. Sol-Sánchez [16] conducted a literature review focusing on the effectiveness of the major conventional techniques and materials for track design and maintenance, as well as innovative solutions being developed to reduce track degradation. Other survey articles on the application of data analytics in a specific aspect of railway track can be found in the literature [17][18][19][20]. To the best of the authors' knowledge, the literature in this field suffers from the lack of a holistic survey covering all of the data-driven solutions in both railway track defects detection, prediction, and maintenance decision-making.
This study provides a taxonomy to classify the existing literature based on types of models and types of applications. The emphasis is on the selection of appropriate feature extraction methods and data-driven models for different data sets, track defects, and maintenance strategies. This provides a thorough overview of which approaches are being developed in this field and the performance of the current state-of-the-art techniques.
Our fundamental motivation is to answer the following research questions: 1.
What are the measurement methods and data sets used in railway track engineering? 2.
How are data-driven models employed in the predictive maintenance of railway track? 3.
How should one choose suitable methods for different data types, track defects, and maintenance strategies?
The remaining part of the paper is organized as follows: Section 2 develops a statistical analysis aiming to provide a complete picture of current research interests and publication trends for predictive maintenance of railway track. Section 3 covers the track measurement methods and various classical and advanced data-driven algorithms applied in this field. Then, Section 4 specifically studies data-driven models applied with different measurement data types, track defects, and maintenance strategies. Section 5 provides future challenges and suggestions for the development of predictive maintenance methods for railway track. Finally, Section 6 concludes this paper.

Systematic Literature Review
Systematic literature review is commonly used to summarize and interpret the relevant parts of research [21,22]. From a methodological point of view, systematic literature review is a secondary study. Related publications are labeled to critically evaluate the available approaches provide for statistical analysis. The implementation of this systematic literature review was based on the proposed methodology from Reference [21].

Material Collection
The mainstream literature databases were searched for this analysis, including ScienceDirect, Scopus, and IEEE Xplore. Due to a lack of standard terminology in this field, several keywords were used in the search to ensure all relevant papers were captured. The keywords used in this paper were the combination of the following words ("predictive maintenance" OR "condition-based maintenance") AND ("railway" OR "rail" OR "track") AND ("data-driven" OR "big data" OR "artificial intelligence" OR "machine learning" OR "statistical"). Note that this survey only considered English papers in journals, conferences, and dissertations. Articles with purely mathematical or physical methods were not included. With the stated search parameters, a total of 218 publications were identified from 1999 to 2019 in this review.

Literature Statistical Analysis
This study provides overall statistics about the current state in data-driven models applied in predictive maintenance of railway track. Figure 1 shows the number of articles published between 1999 and 2019 in this field with a quadratic trend line. Research on the data-driven models for predictive maintenance of railway track started just before the beginning of the 21st century. The publication rate increased slightly with the increasingly more widespread use of computers and new measurement technologies. Before 2013, less than ten papers were published each year. After 2013, there has been considerable growth in the field's publication rate. Specifically, the average number of papers increased from 3.1 articles per year during 1999-2012 to 25 papers per year in 2013-2019. Especially in the last three years, the publication rate has jumped to an average of approximately 40 per year.
Systematic literature review is commonly used to summarize and interpret the relevant parts of research [21,22]. From a methodological point of view, systematic literature review is a secondary study. Related publications are labeled to critically evaluate the available approaches provide for statistical analysis. The implementation of this systematic literature review was based on the proposed methodology from Reference [21].

Material Collection
The mainstream literature databases were searched for this analysis, including ScienceDirect, Scopus, and IEEE Xplore. Due to a lack of standard terminology in this field, several keywords were used in the search to ensure all relevant papers were captured. The keywords used in this paper were the combination of the following words ("predictive maintenance" OR "condition-based maintenance") AND ("railway" OR "rail" OR "track") AND ("data-driven" OR "big data" OR "artificial intelligence" OR "machine learning" OR "statistical"). Note that this survey only considered English papers in journals, conferences, and dissertations. Articles with purely mathematical or physical methods were not included. With the stated search parameters, a total of 218 publications were identified from 1999 to 2019 in this review.

Literature Statistical Analysis
This study provides overall statistics about the current state in data-driven models applied in predictive maintenance of railway track. Figure 1 shows the number of articles published between 1999 and 2019 in this field with a quadratic trend line. Research on the data-driven models for predictive maintenance of railway track started just before the beginning of the 21st century. The publication rate increased slightly with the increasingly more widespread use of computers and new measurement technologies. Before 2013, less than ten papers were published each year. After 2013, there has been considerable growth in the field's publication rate. Specifically, the average number of papers increased from 3.  To give readers an intuitive understanding of the topics from the selected publications, a novel text analysis method [23] was used. The titles and abstracts of all 218 papers were extracted to pick up the word frequency lists and draw the word frequency distribution plots. The larger the word, the more frequently it appears in the collected publications. Figure 2 shows the 65 most frequently used words among the titles and abstracts. To give readers an intuitive understanding of the topics from the selected publications, a novel text analysis method [23] was used. The titles and abstracts of all 218 papers were extracted to pick up the word frequency lists and draw the word frequency distribution plots. The larger the word, the more frequently it appears in the collected publications. Figure 2 shows the 65 most frequently used words among the titles and abstracts.
It is worth mentioning that, other than the keywords used in searching for these publications, the most frequent words "geometry", "inspection", "defects", "prediction", and "degradation" indicate the most popular topics among these studies. The details of these topics are covered in the following sections. It is worth mentioning that, other than the keywords used in searching for these publications, the most frequent words "geometry", "inspection", "defects", "prediction", and "degradation" indicate the most popular topics among these studies. The details of these topics are covered in the following sections.

Results of the Systematic Literature Review
This section summarizes the methods and models used in the identified publications to answer the fundamental research questions 1 and 2. Firstly, data acquisition and track measurement methods in railway track engineering is discussed. Then, an overview of the publication distribution among the data-driven methods is presented. Finally, various classical and advanced data-driven algorithms applied in predictive maintenance of railway track are discussed.

Data Acquisition in Railway Track Engineering
Data acquisition is the first step in the application of data-driven models. The railway industry uses a set of measurement methods to collect relevant data. These measurements are carried out at different frequencies based on analysis and experience, targeting specific aspects of the track conditions. The commonly used measurement methods for railway track are summarized in Table 2.

Results of the Systematic Literature Review
This section summarizes the methods and models used in the identified publications to answer the fundamental research questions 1 and 2. Firstly, data acquisition and track measurement methods in railway track engineering is discussed. Then, an overview of the publication distribution among the data-driven methods is presented. Finally, various classical and advanced data-driven algorithms applied in predictive maintenance of railway track are discussed.

Data Acquisition in Railway Track Engineering
Data acquisition is the first step in the application of data-driven models. The railway industry uses a set of measurement methods to collect relevant data. These measurements are carried out at different frequencies based on analysis and experience, targeting specific aspects of the track conditions. The commonly used measurement methods for railway track are summarized in Table 2. As shown in Table 2, railway measurement methods include walking patrols, mechanized track patrols, and wayside detectors. Walking patrols are used in the areas that the rail vehicles are at slow speed or are not allowed to operate. In this method, patrollers look for signs of track defects, particularly where immediate action is required [24]. The mechanized track patrols include specific inspection cars and in-service vehicles. To avoid the dangerous and inefficient walking inspections along the railway track, camera-based measurement has been widely adopted by railway industry. This method is useful in measuring the surface defects and identifying missing track components. However, visual inspection cannot find defects within rails. Internal microscopic defect detections can be identified with ultrasonic testing [25]. Ultrasonic inspection cars can be used to detect rail breakages and internal cracks [26]. In this method, the ultrasonic energy beam generated by the piezoelectric element is first transmitted to the rail, and then the reflected or scattered energy of the transmitted beam is detected by a sensor. Then, the amplitude and time information of the received signal is used to identify defects. The main advantages of this approach are the possibility for extremely high testing speed and the inherent sensitivity to the critical transverse-type defects in rail [27]. However, ultrasonic inspection is not sufficient to detect cracks at an early stage where the cracks are too small to penetrate deep enough through the material for ultrasonic detection. The eddy current measurements and magnetic flux leakage are more suitable to detect such early defects in rails [25].
Measurements by specific inspection cars are usually carried out with a sparse frequency and rarely on busy routes. This means that these data sets are usually not continuous nor do they have a small enough time interval between readings [28]. Rather than using dedicated inspection cars, the measurement platform can be mounted in standard trains with daily service [29]. The continuous measurement from every train provides a stream of data that can be used in the data-driven analysis. These measurements can be used to monitor the dynamic train response (e.g., vertical axle acceleration, spring nest displacement, bogie bounce), track geometry data (e.g., track twist, vertical rail profile, track curvature), and train driving parameters (e.g., brake cylinder pressure, in-train forces). These monitoring results can provide regular and rapid insight into track behavior over time [28].
Wayside detectors are fixed sensors along the track that collect information of interest. These wayside detectors employ a variety of sensing technologies to measure force, heat, sound, and geometry, among other values [12]. For example, the versatility of optical fiber Bragg grating sensors is utilized in monitoring high voltage overhead lines, rail corrugations, and wheel-rail interactions. The advantages of fiber sensors are immunity to electromagnetic interference, multiplexing capabilities, long reach, lightweight, and high signal fidelity [30]. Wayside detectors can also be used in environmental and meteorological monitoring, such as temperature sensors and weather recorders. These sensors monitor the environment in which the tracks are placed.

Publication Distribution among the Data-Driven Methods
Before presenting the publication distribution, it is important to introduce some basic concepts about the data-driven models. These models include statistical and machine learning models. For statistical methods, the purpose is to estimate a small number of parameters from a large collection of samples. The basic assumption is that the data fit a specific hypothesis, like the Weibull distribution [31]. This is in contrast to machine learning, where a usually large set of model parameters is estimated from huge amounts of samples. The machine learning method can extract more information from the data without a priori knowledge [32]. The statistical models are better suited to inference about the relationships between parameters, while the goal of machine learning is in making the most accurate predictions, whether regression or classification. In this way, machine learning methods can contribute to railway maintenance decision-making in a more direct and robust way.
A summary of the distribution of publications where data-driven models were applied in the railway track prediction maintenance field is provided in Figure 3. Note that the statistics here only include publications that give detailed input data, model construction, and output results descriptions. In this way, a total of 109 publications were identified. Geosciences 2020, 10   3 reveals a preference for the application of data-driven methods where each color represents a kind of data-driven method. It is worth mentioning that most papers use machine learning (74%) instead of statistical models (26%). The classical machine learning models dominate in machine learning. The most employed classical machine learning algorithm was the support vector machine (SVM) (33%), followed by artificial neural networks (ANN) (26%) and tree-based models (21%). The advanced machine learning approaches of deep learning models, unsupervised learning models, and ensemble models account for a combined 22% of applications.   To date, researchers have shown great interest in classical machine learning models, especially during the last three years. The number of statistical model papers has also slowly increased, although its proportion of all publications has slightly decreased. The advanced machine learning  3 reveals a preference for the application of data-driven methods where each color represents a kind of data-driven method. It is worth mentioning that most papers use machine learning (74%) instead of statistical models (26%). The classical machine learning models dominate in machine learning. The most employed classical machine learning algorithm was the support vector machine (SVM) (33%), followed by artificial neural networks (ANN) (26%) and tree-based models (21%). The advanced machine learning approaches of deep learning models, unsupervised learning models, and ensemble models account for a combined 22% of applications.    3 reveals a preference for the application of data-driven methods where each color represents a kind of data-driven method. It is worth mentioning that most papers use machine learning (74%) instead of statistical models (26%). The classical machine learning models dominate in machine learning. The most employed classical machine learning algorithm was the support vector machine (SVM) (33%), followed by artificial neural networks (ANN) (26%) and tree-based models (21%). The advanced machine learning approaches of deep learning models, unsupervised learning models, and ensemble models account for a combined 22% of applications.   To date, researchers have shown great interest in classical machine learning models, especially during the last three years. The number of statistical model papers has also slowly increased, although its proportion of all publications has slightly decreased. The advanced machine learning To date, researchers have shown great interest in classical machine learning models, especially during the last three years. The number of statistical model papers has also slowly increased, although its proportion of all publications has slightly decreased. The advanced machine learning models (unsupervised, ensemble, and deep learning models) have started to be used in the last five years.

Publication Trend Analysis
This may have occurred due to ability of these advanced models to exploit and cope with the modern data set characteristics of large volume, multi-source, highly-imbalanced, and high noise of railway measurement data (see Table 1 for details). These models will be discussed in the following sections.

Classical Data-Driven Models in Railway Predictive Maintenance
Classical data-driven models consist of statistical models and classical machine learning models as shown in Figure 3. Table 3 discusses the advantages and disadvantages of classical data-driven models and how they are employed in the predictive maintenance of railway track. More details about these models can be found in Reference [14,18,33].

Deep Learning Models
A new trend in machine learning sees neural networks with greater and greater numbers of layers and are known as deep learning algorithms. These methods rarely require pre-processing of data, as they can learn the representation directly. These methods have been applied in many complicated applications, such as image, audio, video, natural language, sentiment analysis, and landslides prediction [43]. These deep learning methods have also been shown to be advantageous in supporting the decision-making for railway track engineering. Typical deep learning models applied in this field include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long-short-term memory (LSTM) models and are described next.
Convolutional neural network (CNN): CNNs have revolutionized machine learning applications in the computer vision field for track defects detection. CNN models can reach human-level ability (often used as a proxy for the Bayes error rate) in image recognition tasks [44]. It is a specific type of deep neural networks with special convolutional layers based on the convolution operation [44,45]. The convolution operation is able to extract features from general two-dimensional images in a reliable manner. CNNs have been successfully applied to recognize track surface faults and identify missing components. Faghih-Roohi [46] proposed a CNN solution for detecting rail surface defects where the CNN was used to skip elaborate procedures of image feature extractions in other machine vision methods. The accuracy of the rail defect classification was almost 92%. Zauner [47] used a modified CNN structure called a fully convolutional network to realize the image semantic segmentation for an autonomous rail tamping assistance system. The idea behind a fully convolutional network is to extend a conventional CNN by replacing the fully connected layer with a convolution layer and adding a convolution transpose layer. The semantic segmentation enables the computer to recognize objects in images at the pixel level. In this way, a new turnout-tamping assistance system is proposed, which can support and relieve the operator in complex tamping areas.
Recurrent Neural Network (RNN): RNNs are widely utilized to analyze time series or sequence data obtained from track measurements, such as track geometry data. The most important feature of the RNN is the ability to handle long period time series data through the use of an internal memory to process inputs in place of the fully connected layers. The RNN unit takes the current and previous input data into consideration at the same time, enabling the method to perform better at predicting future trends given some historical sequence of data [43,48]. Heidarysafa [49] employed an RNN to discover accident causes from the narrative field in the Federal Railroad Administration (USA) reports. The term-frequency and Word2Vec method (where each word is mapped to a vector) is adopted to change the accident text report into sequence vectors. The RNN was then used to classify the accident cause and found important inconsistencies based on these sequence vectors. Lopes [50] used an RNN to predict rail and geometry defects based on the integrated defect and inspection data. The prediction from the RNN was used in the discounted Markov decision process model to determine optimal inspection and maintenance scheduling strategies.
Long short term memory (LSTM): The training error for RNNs may accumulate to an unacceptable level when the period is too long [51]. To bypass this problem, LSTM structures were proposed by Hochreiter [52]. The LSTM units are composed of the input, output, and forget gates. This specific structure contributes to a longer period of remembering ability. Ma [53] used an LSTM model to predict vehicle-body vibration. A CNN was used to extract features and the LSTM was utilized to find the inherent pattern in the track geometry time-series. By comparing the results with the LSTM, it was shown that the CNN combined with LSTM improved performance. The predicted vehicle-body acceleration could act as a new track quality index [53].
Unfortunately, the capabilities of advanced deep learning models often comes at significant cost in terms of time-consuming training requirements when the data set is huge. In addition, deep learning models are "black box" methods [54], making it difficult to explain predictions or communicate its trustworthiness to railway engineers.

Unsupervised Learning Models
Unsupervised learning aims to find patterns automatically from unlabeled data. Clustering methods and dimensionality reduction techniques are the most widely used unsupervised methods in railway track engineering.
Clustering gives insights into the data distributions and is normally used in data processing. Schalk [55] used the clustering method to find the worst affected areas (hotspots) in railway track based on the RCF damages. The K-means method is commonly used to determine clusters. The data gets divided into k partitions based on the distance between the samples [56]. When new data arrives, the model can adjust the cluster centers automatically. Li [57] used the K-means clustering method to obtain the possible normal state features of the track grids. Similar health features were classified into the same cluster based on their track grid health index. The results showed better accuracy than conventional health evaluation methods. The downsides of the K-means clustering include: difficulty in determining the number of clusters; non-repeatable classification results unless the random seed remains constant; and results are sensitive to the scale of input data [58].
Deep autoencoders are now being used as a dimensionality reduction technique. These networks are a type of neural network that learns to copy its input to its output, with the middle layers containing fewer "neurons" than those at the input and output layers. Thus, the encoder maps the input into a smaller dimension mid-parameter, and the decoder maps this mid-parameter to a reconstruction of the original input. This method is widely used in feature extraction. Li [57] used deep autoencoder networks to reduce the dimension of multiple track grid condition indices (functional performance, structural fitness/integrity, safety and aesthetics), constructing new lower-dimensional condition indices. These indices were then used to evaluate the overall health of the track grids. The dimensionality-reduction performance of the deep autoencoder is better than that of the principal component analysis (PCA) method [57].

Ensemble Models
To improve on the performance of individual machine learning models, combining two or more models to build ensemble models is an approach used by many researchers [59][60][61]. Ensemble learning creates an advanced model by combining the strengths of a set of base models. This can reduce the bias of the final predictions or classifications since the results are less dependent on a particular model [59]. Aggregating and stacking are the two major methods for combining the base models. Aggregation is the combination of the individual results based on a majority voting scheme, while stacking uses a meta-learner concept to determine which classifiers are reliable and which are not [59]. The meta-model uses the predicted result of each base model as input to get the final results. Cárdenas-Gallo [60] proposed an aggregating ensemble model to forecast the degradation of track geometry. This model included three aspects: deterioration, regression, and classification. The results showed that the ensemble method improves the predictive accuracy. Lasisi Ahmed Nii [61] proposed a stacking ensemble model to predict the annual track fatigue defects. This study showed that classical Weibull analysis underestimated annual fatigue defects by at least 25% throughout rail life. The stacking ensemble model can compensate for this shortfall by aggregating the probability predictions of diverse learners.
If the researchers are only interested in the best classification or prediction accuracy, an ensemble model is usually better than any single one. However, this method will increase storage and computation cost. Moreover, with the involvement of multiple models, the interpretability of the model will decrease.

Data-Driven Model Application in Predictive Maintenance of Railway Track
Discussing the disadvantages and advantages of each algorithm is beyond the scope of this paper because the performance of the data-driven methods depends on an appropriate choice of the data with different characteristics and the problems of interest. No single algorithm is always better than other algorithms over all datasets and all application scenarios. This section provides an overview for the answer to the third fundamental research question: how should one choose suitable methods for different data types, track defects, and maintenance strategies? Through this review, suggestions for model selection are provided. In practice, researchers can try suitable methods based on related research and experimentation. New technologies, such as the deep learning, ensemble, and unsupervised learning models are recommended since they have proved to be feasible in many other research fields.

Models for Different Measurement Methods
Different measurement methods can be grouped based on the outcome data types. For example, for the track geometry recording car, the outcome data type is time series associated with position information, while, for the camera-based method, the data may be collections of images or videos. The choice of algorithm is strongly influenced by the input data types. The commonly used data types and applications are summarized in Table 4.  [11,86,87] Text record Accident/maintenance records [49,81,88] It can be seen in Table 4 that the commonly used data types in the railway track engineering include text, value, images (video can be captured as images), and time series. The following section aims to provide suggestions on selecting appropriate data-driven models.

Time Series-Based Measurement Data Sets
The main track measurement methods being utilized around the world are track geometry recording cars and in-service vehicles. The output data type for these measurements are time series.
Signal processing techniques are commonly used to extract time series features in both time and frequency domains. These methods create features with clear physical meanings or interpretations. Using the generated features can support the development of more concise and accurate classifiers. In the time domain, instead of using the raw time-series data, statistics information can be extracted to reduce the data dimension. Li [89] used several statistical features, including maximum values, quantiles, means and standard deviations to represent the wayside detector values in the time domain. In the frequency domain, the unusually high or low frequency may indicate anomalies [90]. The Fourier transform is usually used to generate the frequency spectrum from time series signals. Lederman [91] used the Fourier transform to extract frequency domain features from vibration signals collected from the in-service vehicles to detect track degradation. The results showed that signal energy was useful to detect track failures. It should be noted that the Fourier transform cannot be used directly to process the non-stationary time series. An advanced approach to overcome this shortfall is a combined time-frequency analysis [92]. Wavelet transforms have also proven to be useful in feature extraction and anomaly detection [73]. Jiang [40] utilized the wavelet packet transform to decompose signals of track surface defects in different frequency bands. The result showed that wavelet packet transform can enhance the extraction of relevant information from the original signal.
Another important method to extract main features in railway measurement time series is using a simplified index, such as the track quality index (TQI) [93]. These indices combine the track geometry measurements, such as the longitudinal level, alignment, gauge, cant, and twist, and can be used as the primary indicator supporting decision-making in the railway industry. Sadeghi [34] distinguished TQIs into two parts: track geometry index (TGI) (a function of rail geometry parameters, such as profile, gauge, and twist) and traffic index (a function of dynamic effects, speeds and loads). Globally, a set of TQIs has been developed to evaluate track geometry measurements. Berawi [94] evaluated the geometrical track quality based on the J Synthetic Coefficient [95], the Indian TGI [96], and also the approach presented in the European Standard EN 13848-5 [97]. The results showed that using the comprehensive index was better than when using any one particular index. PCA was also widely used to capture the significant variations in the time series. PCA compresses the data set by generating a set of new variables that have no linear correlation [89]. Lasisi [98] pointed out that the evaluation indices from PCA were better for predicting defects and revealing salient characteristics in track geometry data than TGI.
The determination of maintenance activities on a track requires the establishment of a good condition indicator value. For this, time series classification can be used to classify the status of the tracks. Nadarajah [99] utilized a number of classification algorithms, like SVM, linear regression, tree bagger, and KNN, to categorize the responses over 50-m sections of track into four distinct classes and then determine the maintenance requirement based on these results. Sun [100] used a CNN to present an end-to-end time series classifier for the detection of rail joints using acceleration data. Time series were treated as one-dimensional images, in which the useful features were extracted from the time domain. An important concept for classification is the "distance" between two given time series; some metric defining the how similar two series are. The Euclidean norm is often used to calculate the distance using the corresponding values directly. Dynamic time warping [101] is an algorithm for measuring the similarity between two sequences by using the optimal match. The sequences are warped in a nonlinear fashion to match each other. Dynamic time warping is robust to time shifts and can align time series with different phases. Tan [42] used the data between two tamping dates to form a time series. The KNN was then employed to classify the time series based on the dynamic time warping. The results showed that the dynamic time warping based KNN performed better in predicting tamping effectiveness than the decision tree and naïve Bayes methods.
As for time series forecasting, if the measurement data sets consist of univariate data, the autoregressive integrated moving average (ARIMA) model could be considered. Narezo [102] utilized the ARIMA model to predict the evolution of incipient switch failures. The ARIMA model assumes a linear correlation structure among the time series, and, therefore, no nonlinear patterns can be captured. The approximation of linear models to complex real-world problem is not always satisfactory [103]. Deep learning methods are recommended to deal with multiple time series forecasting problems. These methods are robust to noise and can even learn in the presence of missing values, accept multivariate inputs, and perform multi-step forecasts. Specific deep learning methods, such as the RNN and LSTM, are quite suitable for long-term time-series forecasting. However, these methods are yet to have been fully exploited in the time series forecasting of railway track engineering.

Image-Based Measurement Data Sets
Machine vision inspection is more objective and consistent compared to manual vision measurement. Data-driven methods can process large amount of image data in a short time [104]. Li [105] proposed a real-time automatic machine vision inspection system. This system inspected along the track at 16 km/h to detect defects or missing components, such as tie plates, ties, and anchors using multiple cameras to collect images and videos. The location of the inspection car was determined by the global position system and a distance measurement instrument. By using a global optimization method, the authors achieved high accuracy in finding the missing track components. In machine vision analysis, the first step is to extract important features from the images using filters or signal processing methods. The second step is to use a classifier, such as an ANN, to analyze the features. In the literature, gradient-based features, such as the histogram of oriented gradients [104], scale-invariant feature transforms [76], and Gabor filters [80], were commonly used for railway images analysis.
In the traditional computer vision models, the difficult task of hand-crafting features is needed before classification. However, features extracted using hand-crafting features may not result in good performance due to the high noise level in the rail images [13]. Deep neural networks have been driving significant advancements in real-world computer vision applications over the last decade [106]. These networks can learn features of interest directly and automatically from the data. For instance, Jamshidi [107] combined track inspection images and crack growth data from ultrasonic inspection to classify squat defects using a CNN architecture. Faghih-Roohi [46] trained three different CNN architectures on the track inspection image data set. The results showed that the deeper architecture outperforms on the multi-class classification of squat defects. Zhuang [108] used the linear iterative crack aggregation method to obtain the boundary of cracks. Then, a cascading classifier ensemble integrating three single cascading classifiers with a major voting scheme was proposed to detect the presence of cracks in the track image. The result was compared to Otsu's method, the geometrical approach, fully convolutional networks and Unet. Results showed that the proposed ensemble framework was the most effective one among these methods in the detection of rail surface cracks.

Discrete Value-Based Measurement Data Sets
Discrete value data are not collected with a sensor at a fixed frequency. Most of the time, these records are collected after an accident has occurred. It could be a log-file of the warnings or failure messages recorded by an automatic monitoring system or logged by railway track engineers. This information could be the specific temperature reading, track age, break type or tonnage data in a particular location of the track. Discrete value data can help engineers to identify the reasons for the failure. Model selection for discrete value data is recommended based on the size of data sets. If the size of the data set is not small, the Bayesian methods are more suitable [86,109]. SVMs and neural networks tend to perform much better when dealing with multiple dimensions and continuous features [84]. If the data set includes noisy features, the KNN method should not be applied [87] because this method is sensitive to irrelevant features. If the interpretability is important in the application, tree-based methods [110] and the Bayesian method would be more suitable than a neural network or an SVM. Lopes Gerum [50] used a random forest and an RNN to predict rail defects of different severity levels based on the discrete value data. This data contained (a) the time in days since the last inspection, (b) the gross load endured by the tracks since the last inspection, (c) the month, (d) the season, and (e) the number of minor and major defects found in the previous inspection. Then, a discounted Markov decision process model utilized these predictions to determine optimal inspection and maintenance scheduling strategies.

Text-Based Measurement Data Sets
Text data analysis is a topic growing in popularity in the predictive maintenance of railway track. The purpose is to parse (usually human-generated) textual data to discern patterns or provide summary statistics. Heidarysafa [49] used an RNN to discover accident causes from the railway accident reports. The accidents report texts were embedded using Word2Vec and GloVe methods into sequence vectors. The RNN was then used to find the primary causes and significant inconsistencies in accident reporting. Soleimani [111] employed a text mining tool to discover highway-rail crossing crashes from the narrative description in the crash reports. The word importance was explored by considering term frequency and inverse document frequency indices. Random forest and logistic regression methods were applied using these indices. The results showed that the type of a train-vehicle crash could be predicted using the text information with an accuracy of 86%. Narrative descriptions are widely used for reporting in the railway industry, so further work in this direction is likely to yield useful results.

Models for Different Railway Track Defects
Railway track defects mainly include geometry irregularities and structural defects (such as rail head defects, rail breaks, missing rail components, and substructure failures). Predicting the geometric degradation and structural defects, either implicitly or explicitly, is necessary for the development of maintenance strategies. Track geometry measurement mainly includes the longitudinal level, alignment, gauge, cant, and twist [15]. These measurements are mostly used to represent the quality of the track and to predict the track degradation process. Rail geometry irregularities include wide gauge, excessive warp/twist, and horizontal and vertical rail deformities [98]. When the track geometry degrades to an unacceptable level, severe consequences, such as derailment, may occur.
Structural defects, such as rail head defects and rail breaks, are major threats to the safe operation of a railway system. Rail structural defects occur due to wear (primarily in curves), fatigue (in the form of surface and sub-surface initiated cracks), and plastic flow (in the form of corrugation in rails) [112]. These failures usually start with a small initial crack but propagate quickly due to the significant shear and normal stresses on the rail caused by the rolling-sliding contact loading. As the cracks develop, the cracking area may lead to spalling of material from the rail surface [113]. Isolated cracks may develop to the bottom of the rail and may cause a rail break. Figure 5 presents the publication counts by track defects in the literature identified and is compared with that of just the most recent four years. Application of data-driven models in rail geometry irregularity defects were the most popular. Rail geometry irregularity, rail head defects, and missing rail components were the top three used in the data-driven models for the all-years count. More than half of the papers on rail geometry irregularity were published in the last four years, and almost all of the papers for rail head defects were published in the last four years. Interest in the detection of missing rail components has dwindled in recent years, while rail geometry irregularity, rail head defects, and rail breaks received increasing attention in the last four years. The application of data-driven models in predicting the substructure failure were fewest. This may be due to the measurement data for the substructure being difficult to obtain. Some representative studies categorized by railway defect type are summarized in Table 5. Gabor-filtered images Multiple signal classification [80] Rail break In-service vehicles Axle box acceleration Continuous wavelet transform [119] Eddy current sensor Eddy current signals Bayesian network [70] Substructure failure: sleeper and ballast Fiber bragg grating Track slab deformation Variational heteroscedastic Gaussian process [72] Camera Stiffness of the ballast Bayesian method [37] count. More than half of the papers on rail geometry irregularity were published in the last four years, and almost all of the papers for rail head defects were published in the last four years. Interest in the detection of missing rail components has dwindled in recent years, while rail geometry irregularity, rail head defects, and rail breaks received increasing attention in the last four years. The application of data-driven models in predicting the substructure failure were fewest. This may be due to the measurement data for the substructure being difficult to obtain. Some representative studies categorized by railway defect type are summarized in Table 5.

Models for Maintenance Strategy
The methods for optimizing maintenance strategies include: implementation of track recovery and degradation models based on statistical methods; determination of maintenance strategies based on remaining useful life prediction; and prediction of maintenance requirements directly using a machine learning algorithm.
Statistical methods attempt to solve this problem by implementing recovery and degradation models to predict long-term behavior of the track. Arasteh Khouy [120] presented a track geometry degradation model based on exponential regression and discussed possible reasons for the distribution of failures along the track. The tamping effectiveness was considered by the actual longitudinal level degradation rates between two consecutive maintenance interventions. The track geometry inspection interval was optimized by minimizing the total ballast maintenance costs per unit traffic load. Mercier [121] used a bivariate Gamma process to model the longitudinal and transversal levelling indicators. These indicators were then used to predict the optimal time for interventions. The results showed that using the combined deterioration indices enabled maintenance schedules to be determined that ensured the railway track remained of good quality with a high probability.
Remaining useful life (RUL) prediction has received considerable attention in predictive maintenance. RUL in this domain is defined as the period of time from the present to the end of the useful life for track components. Recently, RUL prediction was based on data-driven models, such as Reference [122], where regression analysis was used to estimate and predict the RUL for various usage profiles of railway tracks. The discrete data (service failure data, signal data, ballast history, grinding history, remedial action history, and traffic data, as well as curve and grade data) and time series data (measurement data from mechanized track patrols) are both used for predicting RUL [12,112]. Li [12] proposed six popular statistical and machine learning regression models: random forest, quantile regression forest, decision tree, KNN, support vector regression, and principal component regression to predict the mid-term (60-180 days) RUL of railcar components. Accurate mid-term prediction of RUL allows railway managers to plan predictive maintenance with sufficient time. Here, the random forest and quantile regression forest had similar prediction accuracy and outperformed the other models used.
From an application perspective, another way for optimizing the maintenance strategy is to predict the maintenance requirement directly. The maintenance requirements are labeled as the learning targets for historical measurement data, and the model outputs are the specific maintenance types. Allah Bukhsh [88] used tree-based methods to predict the railway switches' maintenance needs. The input parameters included detected problem, switch component, problem reason and cause, functional location of a switch, track type, technical details, age, and term frequency-inverse document frequency. Specific maintenance activity types and trigger status could also be predicted with the tree-based method. The random forest made predictions with the highest accuracy among the tree-based models. Lopes Gerum [50] presented a new framework to integrate the prediction with inspection and maintenance scheduling activities. The defects were initially predicted based on risk-averse and hybrid prediction methods. The inspection and maintenance scheduling strategies were then optimized with the discounted Markov decision process model. This framework is effective for defect prediction and formulating long-term maintenance scheduling strategy considering real-time track conditions.
The scheduling of predictive maintenance should be involved in the process of optimizing the maintenance strategy. Large-scale activities, such as grinding, turnout maintenance, tamping, and other geometry maintenance measures, have particular requirements in terms of cost, track possession time, demanded quality, the machinery involved and scheduling challenges. Maintenance tasks that occur with a high frequency or have long maintenance window requirements have the most significant effect on track availability and network capacity [123]. In order to minimize these negative effects, scholars tried to maximize the availability of track by optimizing the possession time. This includes lean optimization, maintenance window optimization, subtask optimization, maintenance interval optimization, and better planning. Famurewa [123] provided an analysis of track geometry maintenance to reduce the required possession time. The condition of the track geometry was determined by a simulation approach. The support intervention decisions and the track possession time optimization were solved by a schedule optimization method. The results showed that optimizing the maintenance shift length and cycle length are opportunities to reduce the extent of track possession required for the maintenance of the track geometry. Consilvio [124] used a mixed integer linear programming method to consider the space-distributed aspect of railway infrastructure. The optimal scheduling can consider the best path and the activities assignment for each maintenance team. The conventional maintenance methods for optimizing the maintenance strategies are offline models that cover the long-term horizon but neglect operational disturbances. In order to consider the real time information and adapt the day-to-day planning, Consilvio [125] proposed a rolling-horizon approach for risk-based maintenance planning in the rail sector. A mixed-integer linear programming framework was utilized to optimize the maintenance strategy based on risk minimization. The proposed framework was able to react to execution delays or priority changes for the maintenance tasks. All in all, the optimal maintenance strategy is to build up an efficient automated decision-making system. Automation decision-making provides continuous railway service over a specified time period based on automated track defects detection and prediction. The role of maintenance experts is to help the data analyst shift the maintenance strategy from the corrective and planned maintenance into the predictive maintenance. The data analyst turns the maintenance expert's manual judgment to an automated decision process based on the data-driven models. However, the predictive maintenance framework cannot be created just from data; instead maintenance experts' knowledge is also vital for the model's structure, as some variables representing the underlying state of the system may not be present in the data, such as considering the cost, track possession time, demanded quality, the machinery involved, and scheduling challenges. The goal of the infrastructure manager is to ensure that railway service is provided, while incurring the least total negative impacts which can be incurred through normal use and the execution of inspections and interventions [126]. The automatically generated decisions should be performed to ensure the infrastructure manager with adequate information about when and which type of interventions should be executed.

Future Challenges and Suggestions
Current data-driven methods applied to the predictive maintenance of railway track suffer from some shortcomings. Recommendations for future research include: • Pay more attention to the advanced machine learning methods. The advanced methods, such as the deep learning, ensemble, and unsupervised learning methods, are able to better utilize and handle the large-volume, multi-source, highly-imbalanced, and high noise of modern railway measurement datasets. These methods have proven to be immensely useful in other fields, yet are rarely used in railway predictive maintenance. • Make use of the text-based data in the railway industry. Narrative descriptions are widely used in the railway industry, but there are still only a small number of applications in the predictive maintenance of railway track. Text mining techniques can tackle the problems of text representation, classification, clustering, information extraction, or the search for and modeling of hidden patterns [127]. In this way, the recorded narrative descriptions can be utilized as a valuable source of information to combine with other data types.

•
Develop automatic data labeling methods. The performance of the data-driven models depends on high-quality labeled samples. Although large volumes of data are collected from sensors in the railway industry, most of the data needs to be labeled manually. Data-driven algorithms, such as unsupervised learning models, can contribute by labeling the data automatically [13]. In addition, as mentioned in Table 1, one of the important characteristics of the railway measurement data is highly imbalanced. High quality automatic data labeling algorithms help to identify more faulty samples, which alleviates the extreme imbalance distribution in railway defects data.

•
Enhance the interpretability of the models. As mentioned in the Section 3, data-driven methods, such as deep learning models, are "black box" methods [54]. It is hard to justify the classification or prediction basis to end users. Much attention has been given to attempting to improve the interpretability of these machine learning methods in the research community [128]. More details about the relevant methods can be found in Reference [129].

•
Consider cost information in model performance evaluation. To evaluate the model performance of track defects detection or prediction, the defects detection accuracy is commonly used, which measures the proportion of track status correctly identified. In general, there are two common errors in track status prediction. One is false alarm prediction, and the other is false safe prediction. False alarm prediction means that the actual safe condition is falsely predicted as a problem. False safe prediction means that the actual problem is falsely identified as a safe condition. From the engineer's perspective, high false alarm prediction usually leads to ineffective and unnecessary decision-making, while false safe prediction would cause huge loss for the railway service suspensions, putting the maintenance organization in reactive mode. Thus, compared to the prediction accuracy, railway managers care more about the percentage of the false safe prediction. A scientific evaluation system should take cost information into account, considering the huge and asymmetric cost for false safe predictions in railway engineering [130]. The further work is expected to take the various costs (false safe, early replacement, false alerts) into account and return the expected gain in dollars as an evaluated metrics instead of only considering the accuracy of the prediction.

Conclusions
This paper presented a systematic literature review covering the main publications of data-driven methods in the predictive maintenance of railway track. Based on the literature review, data-driven models are proved to be able to avoid unnecessary replacement of track components, save costs, and improve the safety, availability, and efficiency of railway service.
Among the data-driven methods, the machine learning models are becoming more and more popular in this field. The deep learning, unsupervised learning, and ensemble methods are attracting growing attention. Statistical models for track predictive maintenance will probably not disappear in the near term, mainly due to their ability to provide informative inferences on the relationships between the parameters and the track degradation processes. Among the applications of data-driven models, rail geometry irregularity, rail head defect, and missing rail component detection were the top three issues addressed in the literature. Rail break prediction has also been receiving increasing attention in