Use of Deep Learning to Study Modeling Deterioration of Pavements a Case Study in Iowa

: This paper describes the process and outcome of deterioration modeling for three di ﬀ erent pavement types (asphalt, concrete, and composite) in the state of Iowa. Pavement condition data is collected by the Iowa Department of Transportation (DOT) and stored in a Pavement-Management Information System (PMIS). In the state of Iowa, the overall pavement condition is quantiﬁed using the Pavement Condition Index (PCI), which is a weighted average of indices representing di ﬀ erent types of distress, roughness, and deﬂection. Deterioration models of PCI as a function of time were developed for the di ﬀ erent pavement types using two modeling approaches. The ﬁrst approach is the long / short-term memory (LSTM), a subset of a recurrent neural network. The second approach, used by the Iowa DOT, is developing individual regression models for each section of the di ﬀ erent pavement types. A comparison is made between the two approaches to assess the accuracy of each model. The results show that the LSTM model achieved a higher prediction accuracy over time for all di ﬀ erent pavement types.


Introduction
Public agencies use pavement management systems (PMSs) to make objective decisions and conduct activities for maintaining pavements in acceptable conditions at minimal cost [1]. Since the early 1970s, departments of transportation (DOTs) and other transportation agencies have been implementing and establishing PMSs to match their needs, achieving significant savings and improvement in network conditions [2]. The Arizona DOT, for example, saved $14 million and $101 million during the first year and the first four years of PMS implementation, respectively [3]. The Colorado Department of Transportation (CDOT) uses PMS to efficiently spend its $740 million annual budget for maintaining and preserving more than 9100 center-line miles (about 23,000 total lane miles) [4]. It appears that there is potential for all such expenses to be more effective if PMS improvements can be developed and implemented.
A major component of any PMS is evaluation and modeling of pavement conditions at the network level. Recently, most states have begun to use automated pavement-condition surveying tools that generate images from remote sensors to collect distress information and report individual distresses through an overall condition index [5]. The concept of Pavement Condition Index (PCI), was developed by the U.S. Army Corps of Engineers in 1970 based on different types of distresses and severity levels [6]. Since then, most DOTs and related agencies have been using the PCI to evaluate pavement conditions. The PCI provides important information to pavement engineers by describing • Classification: Supervised learning in neural networks can be used to deal with unknown inputs. Neural network models have been used to investigate the classification of pavement distresses from digital images [23]. Another research study by [24] reported using a neural network to detect pavement cracks. • Performance Prediction: Neural networks have been used in various studies as powerful and versatile computational tools for both determining the performance of existing pavement systems and predicting future conditions. The Pavement Distress Index (PDI), based on surface thickness, pavement age, and traffic level, was predicted using an NN model that outperformed other multiple-linear regressions [25]. A back-propagation neural network model was developed by [26] for predicting IRI based on pavement distress. • Optimization and maintenance strategies: Neural networks have been used as computational tools to determine which maintenance and rehabilitation actions should be performed on deteriorated pavement sections, using a hybrid NN and genetic algorithm method developed for optimizing the maintenance strategy of flexible pavements [27].
• Distress Prediction: Neural networks can help pavement engineers predict future distresses, and a multi-layer perceptron back-propagation NN with one hidden layer has been used to predict future roughness distress in flexible pavements [28]. NNs could be a powerful alternative to traditional techniques that are always limited by normality, linearity, and collinearity assumptions. Two major advantages of using NNs are their ability to model complex and nonlinear large amounts of data and detect all possible interactions between predictor variables.
It should be mentioned that, because pavement deterioration happens over time, it is important to include the dependency of performance measures on historical data (time) in a prediction model. Accurate time-series prediction is also critical for abnormality detection, resource allocation, and financial planning [29]. Predicting data time-dependency is challenging because such prediction depends on external factors like weather and traffic load [30]. Time-series analysis works better with highly correlated measurements over time, because explanatory variables may fail to explain the correlation mechanisms. On the other hand, in regression analysis, the explanatory variables should sufficiently explain the trend, resulting in independent fitting residuals.
A deep-learning method designed for sequential data is the recurrent neural network (RNN) that has recently received additional attention from researchers primarily because of its capability in learning sequences [31][32][33]. RNNs have been widely applied to many time-dependent datasets for use in prediction problems like speech prediction, pattern prediction, economic prediction, and traffic prediction [34][35][36]. Since RNNs are developed to utilize historical data in time-series analysis, inclusion of a regression model that relies on explanatory variables and historical data of the response variable improved the model accuracy. These networks are designated as recurrent because future forecasting depends on both current and previous stages. Several RNN algorithms, such as the long short-term memory (LSTM) network, have been developed over the past two decades. LSTM was introduced to support modeling and forecasting of long-term data series. The network was developed to overcome the vanishing gradient problem in which algorithms tend to accumulate errors when a long string of observations are added as predictor variables, increasing prediction variability and associated total error. Based on the literature, another RNN network called the gated recurrent unit (GRU) also solved the vanishing gradient problem, but the LSTM outperformed the GRU in many details [37].
In this research, the LSTM method was used for time-dependent prediction of the pavement condition index. This network is suitable for pavement applications because the data is presented in time series with both low observation frequency and high levels of variability. The goal of this study was to develop a new robust deterioration model suitable for long-term forecasting, in which the model performance can be objectively evaluated. An LSTM network will utilize historical pavement condition records of the Iowa DOT (PMIS) in the time span between 1998 and 2018. The new time series algorithm, a deep-learning approach specifically developed by LSTM networks, was used to predict future conditions of the three different pavement types in the state of Iowa. The Keras software package, a high-level neural network API written in Python, was used for generating the LSTM model, with a focus on enabling fast experimentation. This package uses a deep-learning open-source library based on the TensorFlow software library. The performance and results of the new algorithm are compared to the current method used by Iowa DOT for deterioration modeling. Figure 1 shows the steps required to be completed in the proposed method, with the individual steps described in detail in the following subsections.

Data Collection
To develop and implement the new framework, historical records of pavement condition data were acquired from the Iowa DOT (PMIS). These data were collected for Iowa's interstate and primary network since 1997, the year in which the Iowa DOT began collecting automated pavement distress data [38]. The data used in this study were acquired between 1998 and 2018, and include information regarding highway system classification, construction and reconstruction dates, unique section identifiers, traffic levels, automated pavement distress data, and pavement ride quality. The pavement types included in the study were asphalt concrete (AC), Portland cement concrete (PCC), and composite (COM) pavements.
The pavement distress information collected includes rutting and cracking, data such as transverse cracking, longitudinal cracking, alligator cracking, wheel-path cracking, and patching, with low-, medium-, and high-severity levels assigned to cracking data for all pavement types. For AC and COM pavements, rutting was reported as the average rut depth in both wheel paths, and for PCC pavements, faulting was estimated using the acquired longitudinal profile. The international roughness index (IRI) was also used to characterize ride quality for all pavement types. Pavement condition data is collected in two-year cycles in which half the network is surveyed every other year. The Iowa DOT spends about $1 million annually on collecting pavement condition data [8].
In many cases, minor maintenance and rehabilitation records were not available, so the maintenance impact on pavement condition over time was not modelled in this study. Moreover, segments with PCI values increasing over time were discarded from the analysis because they might be associated with unrecorded maintenance activities. A 10-point PCI increase was arbitrarily considered to be a normal fluctuation due to measurement errors or seasonal impacts. Figure 2 shows the number of different sections for each pavement type, with the descriptive statistics for each pavement type given in

Data Collection
To develop and implement the new framework, historical records of pavement condition data were acquired from the Iowa DOT (PMIS). These data were collected for Iowa's interstate and primary network since 1997, the year in which the Iowa DOT began collecting automated pavement distress data [38]. The data used in this study were acquired between 1998 and 2018, and include information regarding highway system classification, construction and reconstruction dates, unique section identifiers, traffic levels, automated pavement distress data, and pavement ride quality. The pavement types included in the study were asphalt concrete (AC), Portland cement concrete (PCC), and composite (COM) pavements.
The pavement distress information collected includes rutting and cracking, data such as transverse cracking, longitudinal cracking, alligator cracking, wheel-path cracking, and patching, with low-, medium-, and high-severity levels assigned to cracking data for all pavement types. For AC and COM pavements, rutting was reported as the average rut depth in both wheel paths, and for PCC pavements, faulting was estimated using the acquired longitudinal profile. The international roughness index (IRI) was also used to characterize ride quality for all pavement types. Pavement condition data is collected in two-year cycles in which half the network is surveyed every other year. The Iowa DOT spends about $1 million annually on collecting pavement condition data [8].
In many cases, minor maintenance and rehabilitation records were not available, so the maintenance impact on pavement condition over time was not modelled in this study. Moreover, segments with PCI values increasing over time were discarded from the analysis because they might be associated with unrecorded maintenance activities. A 10-point PCI increase was arbitrarily considered to be a normal fluctuation due to measurement errors or seasonal impacts. Figure 2 shows the number of different sections for each pavement type, with the descriptive statistics for each pavement type given in Table

Preprocessing
After collecting and arranging the data based on pavement type, condition indices were estimated using the reported condition data. Pavement condition can be summarized using four scaled indices with values ranging from 0 to 100, with 0 corresponding to the worst condition and 100 to the best condition. These indices can then be used to calculate the overall PCI using the same scale for individual indices, resulting in the definition of a global index for comparing different pavement types. In this study, the indices were calculated based on definitions provided in a previous study for the Iowa DOT [8] and included: Rutting index (AC and COM Only); • Cracking index; • Faulting index (PCC Only).
In AC and COM pavements, four different cracking sub-indices were used to calculate the cracking index; these included transverse, longitudinal, alligator, and longitudinal-wheel-path cracking. Only two sub-indices, transverse and longitudinal cracking, were used to characterize PCC pavements. Three severity levels were used by the Iowa DOT in evaluating pavement distresses, with 1, 1.5, and 2 coefficient values, used for low-, medium-, and high-aggregated severities, respectively. All severity levels were then converted into low severity. Since a maximum value (threshold) corresponds to a deduction of 100 points, a cracking sub-index of 0 was determined for each crack type within pavement type, and all threshold values were extracted from a previous Iowa DOT study [8]. Tables 2 and 3 represent threshold values for the cracking sub-indices and weights for calculating the cracking index by pavement types, based on the coefficient values provided by Iowa DOT experts.

Preprocessing
After collecting and arranging the data based on pavement type, condition indices were estimated using the reported condition data. Pavement condition can be summarized using four scaled indices with values ranging from 0 to 100, with 0 corresponding to the worst condition and 100 to the best condition. These indices can then be used to calculate the overall PCI using the same scale for individual indices, resulting in the definition of a global index for comparing different pavement types. In this study, the indices were calculated based on definitions provided in a previous study for the Iowa DOT [8] and included: Rutting index (AC and COM Only); • Cracking index; • Faulting index (PCC Only).
In AC and COM pavements, four different cracking sub-indices were used to calculate the cracking index; these included transverse, longitudinal, alligator, and longitudinal-wheel-path cracking. Only two sub-indices, transverse and longitudinal cracking, were used to characterize PCC pavements. Three severity levels were used by the Iowa DOT in evaluating pavement distresses, with 1, 1.5, and 2 coefficient values, used for low-, medium-, and high-aggregated severities, respectively. All severity levels were then converted into low severity. Since a maximum value (threshold) corresponds to a deduction of 100 points, a cracking sub-index of 0 was determined for each crack type within pavement type, and all threshold values were extracted from a previous Iowa DOT study [8]. Tables 2 and 3 represent threshold values for the cracking sub-indices and weights for calculating the cracking index by pavement types, based on the coefficient values provided by Iowa DOT experts.
The cracking index values for all three pavement types, based on the coefficient values provided by Iowa DOT experts, were as follows:   The International Roughness Index (IRI) is the most commonly used ride-quality index. The riding index used in this study was based on the IRI acquired by the Iowa DOT and expressed on a scale of 100. IRI values below 0.5 m/km were taken as a perfect 100, while values above 4.0 m/km were taken as 0 on the index scale. Other values between 0.5 and 4 m/km were calculated using linear interpolation [8]. Rutting is defined as the permanent total deformation or consolidation accumulated in an asphalt pavement surface wheel path. The rutting index from this study used rut depths available in the PMIS database, and, based on previous research, a threshold value of 12 mm corresponded to 0 on the rutting index scale of 100, and values below 12 mm were applied as corresponding deductions [8]. Faulting is defined as the difference in slab elevation across a joint or crack occurring due to differential vertical displacement between two sides. Similar to the rutting index for AC pavements, the faulting index is expressed on a scale of 100, with the faulting value equal to or greater than 12 mm set to 0 and the faulting value equal to zero set to 100 on the index scale [8].
After calculating all cracking, riding, rutting, and faulting indices for AC, COM, and PCC pavements, a weighted average formula was used to calculate the PCI values. The current formulae for calculating the PCI for AC, COM, and PCC pavements are as follows [8]: Based on PCI values, the Iowa DOT classifies pavement condition for the interstate highway system as good, with a PCI value between 76 and 100; fair, with a PCI value between 51 and 75; and poor, with a PCI value between 0 and 50. Based on these classifications, approximately 91% and 79% of the interstate highway system and the non-interstate highway system in the state of Iowa was categorized as good-condition pavement up to the end of 2017 [39].

Developing LSTM Model
To predict the future condition of individual pavement sections, a modified RNN algorithm called LSTM was used in this research. While in conventional feed-forward neural networks, all observations are considered independent, the models in RNN consider the effects of previous observations and therefore account for the correlation between consecutive observations. It is worth mentioning that traditional RNNs can work properly only with short-term dependencies, and for making an accurate prediction with an RNN, having information from previous stages is mandatory. In fact, an RNN fails when too many inputs from historical observations are used. Observations added as predictor variables will increase variability in the predictions and the total error, a phenomenon referred to as the vanishing gradient effect.
Generally, in feed-forward neural networks, the multiplication of errors from previous layers, rate of learning, and input for a layer define the updating weight for the following layer. As a result of several multiplications of the small value of activation-function derivatives (Sigmoid, Tanh, ReLU), the gradient approaches zero, increasing training complexity and causing information loss within the training layers. To overcome this limitation, LSTM was proposed as a modified version of traditional RNNs while taking advantage of the effectiveness of RNN methods. The information in LSTMs flows through a cell states mechanism in which LSTMs can selectively either forget or remember information based on its impact on model performance [40]. Figure 3 is a schematic of the repeating module in an RNN that goes through three major steps. In fact, an RNN fails when too many inputs from historical observations are used. Observations added as predictor variables will increase variability in the predictions and the total error, a phenomenon referred to as the vanishing gradient effect. Generally, in feed-forward neural networks, the multiplication of errors from previous layers, rate of learning, and input for a layer define the updating weight for the following layer. As a result of several multiplications of the small value of activation-function derivatives (Sigmoid, Tanh, ReLU), the gradient approaches zero, increasing training complexity and causing information loss within the training layers. To overcome this limitation, LSTM was proposed as a modified version of traditional RNNs while taking advantage of the effectiveness of RNN methods. The information in LSTMs flows through a cell states mechanism in which LSTMs can selectively either forget or remember information based on its impact on model performance [40]. Figure 3 is a schematic of the repeating module in an RNN that goes through three major steps. In the first step, the LSTM passes the output from the previous time step ( − 1) to the forget gate, where it is classified using the sigmoidal function shown in Equation (1) either as significant information passed to the next step in the training or insignificant information dropped from the training model [40]: where represents the forget gate, is the Sigmoid function, represents the weight for the forget gate neurons, ℎ( − 1) is the output of a previous LSTM block at time ( − 1), represents the input at the current time step, and represents biases for the forget gate.
where represents the input gate, represents the weight for respective gate neurons, represents the input at the current time step, and ' represents the candidate for cell state at time step (t).
By combining information from the previous cell and the input gate from the current time step, the information for the later step will be updated. Equation (4)   In the first step, the LSTM passes the output from the previous time step (t − 1) to the forget gate, where it is classified using the sigmoidal function shown in Equation (1) either as significant information passed to the next step in the training or insignificant information dropped from the training model [40]: where Ft represents the forget gate, σ is the Sigmoid function, W f represents the weight for the forget gate neurons, h(t − 1) is the output of a previous LSTM block at time (t − 1), Xt represents the input at the current time step, and b f represents biases for the forget gate.
In the second step, the LSTM decides what new information should be stored in the cell state by identifying values requiring updating by the Sigmoidal function and the vector of new candidate values created by the Tanh function that could be added to the next state. These two functions are shown in Equations (2) and (3) [40]: where It represents the input gate, Wi represents the weight for respective gate neurons, Xt represents the input at the current time step, and C t represents the candidate for cell state at time step (t). By combining information from the previous cell and the input gate from the current time step, the information for the later step will be updated. Equation (4) represents how information is filtered from the forget gate layer combined with new information from the current time step.
Other Sigmoid and Tanh functions help the LSTM cell decide what information should be taken as output. Equations (5) and (6) represent the Sigmoidal and Tanh functions in the last step [40]: where Ct is a cell state (memory) at time step (t), Ot represents the output gate, and ht represents the output of the LSTM block at time step (t).

Training
For the learning process in the LSTM algorithm, the dataset corresponding to PCC and COM pavements is divided into training (70%) and validation (30%) sets. Because the number of records in AC pavements was less than that of the two other pavement types, the database was divided into training (80%) and validation (20%) sets for AC pavements. The training dataset was used for developing the model and conducting the learning process, while the validation dataset was used for checking the accuracy of the model.

Validation
Model validation is performed to confirm that the output of the statistical model is acceptable with respect to the collected data (actual data). In order to evaluate any machine learning model, it is necessary to test the model with data not used in the training set. In this study, a Train Test split approach was used for cross-validation (CV), a validation technique that checks the effectiveness of the machine-learning model. After performing model training on 70% of the database (the training dataset), the validation dataset was used as a test sample to validate model performance.

Comparison
The LSTM model performance was compared with the sigmoidal and exponential functions used by Iowa DOT to fit deterioration models for individual sections. The accuracy of each model with respect to riding, cracking, and rutting in AC and COM pavement types, and riding, cracking, and faulting in PCC pavement types was compared for both models.

Results and Discussion
In the following sections, the application of each modeling approach in the databases of the three different pavement types is described and the results are presented and discussed. The overall results from both models are presented in Table 4, with the actual value of each index compared with the predicted value of the same index from the LSTM and Iowa DOT regression models.  The Iowa DOT has an individual regression model for each individual section with specific factors for predicting the future condition of the pavements based on age. While the sigmoidal transformation functions were applied to cracking, rutting, and faulting indices, the exponential function was used to fit the riding index. Based on the actual and predicted values of each index, the PCI value was calculated for each pavement type. Figures 4-6 present the comparisons between the actual PCI value and predicted PCI value for each pavement type in the DOT and LSTM models.
It should be noted that the evaluations of the regression models are restricted to the residuals between the fitted functions and the actual readings, although the LSTM evaluation was based on its ability to predict full performance curves not included during the training stage. For validating the prediction results of the individual regression models and comparing the results of the current Iowa DOT method with LSTM models, 50 AC, 80 PCC, and 80 COM sections were tested. The results were compared with the actual value of each index.
The comparison included models developed for AC, COM, and PCC pavements. R-square and the standard error of estimate (SEE) were considered to evaluate the accuracy of the models. The R-square and SEE functions are shown in Equations (7) and (8): where Y i is the actual value,Ŷ i is the predicted value, Y i is the average of actual values, and N represents the number of observations. The results for AC pavements show that the LSTM model obtained a higher prediction accuracy, compared to the individual DOT regression models. The R-square values in the LSTM models were 0.61 for the riding index, 0.19 for the rutting index, 0.35 for the cracking index, and 0.61 for the PCI while the values for the DOT models were 0.55, −5.11, 0.15, and 0.31, respectively. It is worth mentioning, that R-square is defined as the proportion of variance explained by the fit; if the fit is actually worse than just fitting a horizontal line, then R-square is negative. Additionally, the result of SEEs for both models indicates that the LSTM model obtained less standard error of estimate, compared to DOT models. The SEE values in the LSTM models were 18      It should be noted that the evaluations of the regression models are restricted to the residuals between the fitted functions and the actual readings, although the LSTM evaluation was based on its ability to predict full performance curves not included during the training stage. For validating the prediction results of the individual regression models and comparing the results of the current Iowa DOT method with LSTM models, 50 AC, 80 PCC, and 80 COM sections were tested. The results were compared with the actual value of each index.  Figures 7-9 also reflect the effect of age on the prediction residuals for each model in both the shortand long-term duration. These results show that the errors will more significantly widen and fluctuate after the first five years of pavement age for all three pavement types. Residuals can generally be either positive or negative; however, consistent differences between the predicted and observed values to one side of the prediction model is referred to as bias, and the variability in the mean observed value of these residuals is referred to as variance. Bias can be formally defined as the expected value of the model residuals, as shown in Equation (9): whereŷ is the predicted value, y is the observed value, and is the model residual =ŷ − y.  As can be seen in Figures 7-9, the DOT regression models show a consistently higher bias as the average line deviates from the zero value. To check whether the bias of the DOT regression model was significantly higher or lower than the LSTM model bias, a hypothesis test was performed to calculate the regression and LSTM models' average absolute residual values. To determine the possibly unequal residual variance between the models, the Kolmogorov-Smirnov test, a non-parametric test that allows for testing with unequal variances, was performed. Results showed that the regression model had a significantly higher bias with a negative value, meaning that the regression model will consistently overestimate the index values and result in less conservative predictions. Even though the variance of the residuals increased in the LSTM over time, the mean of the residual in the LSTM model was still less than that of the regression models. The solid black line and dotted blue line in the figures show how the mean errors changed over time. Table 5 represents the mean of residuals of the PCI for the DOT and LSTM models.

Conclusions
The deterioration models of the historical pavement condition data for the state of Iowa were developed using an LSTM approach. The proposed model and current method in Iowa DOT were compared to investigate the model accuracy.

•
The comparison between the developed model and the individual regression models used by the Iowa DOT from the three different pavement types indicates that the prediction accuracy in the LSTM model is higher than individual regression models.

•
The LSTM achieved a higher PCI prediction accuracy than the individual regression models in all three pavement types.

•
A hypothesis analysis of the mean was conducted for the PCI residual in both techniques and the results exhibited less LSTM bias than that of the individual regression models.

•
Each of these two methods has its own advantages and disadvantages. The equation of the individual regression models requires an annual update, and each section will exhibit a new year-by-year behavior, making the prediction process more complex. The LSTM is only one more consistent model compatible for all sections using a training process. The LSTM approach was sensitive to the data fluctuation resulting from unrecorded maintenance activities.

•
While the evaluation of the regression models was restricted to residuals between the fitted functions and the actual readings, the evaluation for the LSTM was based on its ability to predict full performance curves not included during the training stage.