Smart Structural Health Monitoring of Flexible Pavements Using Machine Learning Methods

: Construction of different roads, such as freeways, highways, major roads or minor roads must be accompanied by constant monitoring and evaluation of service delivery. Pavements are generally assessed by engineers in terms of the smoothness, surface condition, structural condition and surface safety. Pavement assessment is often conducted using the qualitative indices such as international roughness index (IRI), pavement condition index (PCI), structural condition index (SCI) and skid resistance value (SRV), which are used for smoothness assessment, surface condition assessment, structural condition assessment, and surface safety assessment, respectively. In this paper, Tehran-Qom Freeway in Iran has been selected as the case study and its smoothness and pavement surface conditions are assessed. At 2-km intervals, a 100-meter sample unit is selected in the slow-speed lane (totally, 118 sample units). In these sample units, the PCI is calculated after a visual inspection of the pavement and the recording of distresses. Then, in each sample unit, the average IRI is computed. The purpose of this study is to provide a method for estimating PCI based on IRI. The proposed theory was developed by Random Forest (RF), and Random Forest optimized by Genetic Algorithm (RF-GA) methods and these methods were validated using correlation coefficient (CC), scattered index (SI), and Willmott’s index of agreement (WI) criteria. The proposed method reduces costs, saves time and eliminates the safety risks.


Introduction
Pavement has been the foundation of the modern mobility. Today, pavement has been seen as a complex structure that is influenced by a variety of environmental and loading conditions. Pavement maintenance therefore, has become of utmost importance to save resources and energy [1]. Infect, pavement network requires constant maintenance and repair due to gradual destruction induced by factors such as aging and fatigue due to the traffic flow [2]. Maintaining the desired service level of a pavement network requires constant evaluation by engineers. The common pavement assessment methods are summarized below [3, 4]: • Assessment of the pavement Roughness (determining ups and downs) • Assessment of surface conditions (determining surface distresses) • Assessment of the pavement structural condition (determining the load capacity) • Assessment of pavement safety condition In this paper, we focus on roughness and surface assessment of pavement. The surface roughness of pavement is a primary concern of driving quality [5,6]. Given that the pavement roughness escalates the vertical stresses acting on the pavement and exacerbates pavement fatigue, roughness can be seen as a factor contributing to the aggravation of pavement distresses. Besides, the pavement roughness unravels deformations in the pavement surface, which influences road drainage and driving safety [7]. On the other hand, any pavement distress will deteriorate the pavement roughness [8]. Therefore, according to the above, it can be argued that there is a causal and bilateral relationship between the pavement distresses and pavement roughness and they affect each other directly [9,10].
Transportation agencies and departments worldwide employ generally indices to evaluate pavement. Two common indicators that are extensively used for assessing the roughness and surface conditions of pavement are international roughness index (IRI) and pavement condition index (PCI), respectively. IRI is a numerical index provided by the World Bank, which is derived from dividing pavement roughness by the longitudinal distance [11,12]. IRI is a characterization of longitudinal road roughness, which is generally estimated by the road surface profiler (RSP) device [13,14]. Also, PCI is a numerical index with a score of zero (worst condition) to 100 (perfect condition) that evaluates the pavement. In fact, each pavement is initially assigned with a score of 100, and then it is subtracted based on the type, severity, and extent of its distresses [15,16].
Today, new methods have been developed to determine PCI including image-based approaches, laser scanning, etc. Unfortunately, in some parts of the world, including Iran, new equipments and technologies have not been made available to engineers due to existing limitations. Engineers of these regions must use the traditional method (assessment of pavement surface distresses by the pavement inspector) for determining PCI. Therefore, pavement engineers are always concerned about financing challenges and finding expert inspectors. On the other hand, the process of visual evaluation of distresses, recording distresses, performing calculations, and ultimately determining PCI is laborious and time-consuming. The last and most important concern of pavement engineers is safety during the surface inspection of pavement, where the inspectors are at the risk of collision with the road traffic. Also, drivers and passengers may face safety issues as a result of traffic intervention. This paper is the final part of a study. This study was conducted for evaluating the relationship between IRI and PCI. In phase one, (experimental phase), IRI and PCI of the case study were calculated. The second phase includes using artificial intelligence methods for developing a model between IRI and PCI determined in the previous phase. To develop the proposed theory, the results of roughness and surface condition assessment of the Tehran-Qom Freeway, in Iran, have been used. In this freeway, at 2-km intervals, a 100-meter sample unit was selected from the slow-speed lane. For each sample unit, IRI was calculated by RSP and PCI was determined from the inspection of surface distresses. The proposed theory was continued by analyzing IRI and PCI values with the help of Random Forest (RF), and Random Forest optimized by Genetic Algorithm (RF-GA) methods. For validation of the results, correlation coefficient (CC), scattered index (SI), and Willmott's index of agreement (WI) citeria were utilized.
The proposed theory largely eliminates the concerns of pavement engineers in the traditional process of determining PCI. In this theory, PCI can be computed by using IRI. Therefore, the challenges of the traditional method for computing the PCI, including time-consuming, very costly, tedious, variability, labor-intensive, and potential human error, are removed. Given the promptness of RSP, the proposed theory enables the assessment of roughness and surface condition of the pavement in a short time, which naturally curtails the time required for pavement assessment. Another strength of the proposed theory is the resolution of safety issues associated with the inspection of pavement surface conditions.

Pavement qualitative indices
The pavement distresses are caused due to the traffic (loading) and environmental factors when a road is in-service. Even in an experimental pavement section, which is not subject to traffic, distresses appear after a while as a result of environmental factors [17]. Therefore, the pavement distress is a critical issue that should be considered by the engineers during the maintenance period. For the effective maintenance of pavement, engineers often use surface quality indicators. Each qualitative index, defined based on certain distresses, indicates the quality of pavement at the time of inspection. Therefore, suitable indices allow pavement engineers to plan, prioritize and allocate road maintenance budgets [15]. Sections 2.1.1 and 2.1.2 introduce two important indices for pavement maintenance named PCI and IRI.

Pavement Condition Index (PCI)
One of the most recognized qualitative indices of pavement presented by the US Army's engineering department in the late 1970s is PCI [18]. The PCI is a numerical index that evaluates pavement conditions based on pavement surface distresses and demonstrates structural integrity and surface operational conditions. The value of this index varies from zero to 100, with zero representing the worst conditions and 100 showing perfect conditions [15,19]. ASTM evaluates pavement condition based on PCI, as displayed in Table 1. For each pavement sample unit, the maximum PCI value (100) is initially allocated and then based on pavement surface condition, the maximum corrected deduct value (CDV) is deducted from 100 (see Eq. 1) [19]: where CDVmax is the maximum of CDV based on type, severity, and extent of distresses. For each type of distress, the deduct values (DVs) are measured by specific curves. Then, the number of DVs is diminished to the maximum number allowed and also the number of DVs greater than 2 (q) is determined. Finally, CDV can be calculated by using q, TDV ( sum of DVs), and related curves. Readers can refer to [19] for more details.

International Roughness Index (IRI)
Effective maintenance of the pavement network to ensure the safety of transportation requires awareness of the surface roughness of the pavement surface [20]. In pavement management systems (PMSs), surface roughness data is used for planning at the project level and the network level. This data is utilized at the project level for quality control during the pavement construction phase, and the determination of areas with excessive roughness. At the network level, however, roughness data is used to determine the permissible roughness limits, to divide the pavement network into uniform pieces, and prioritize maintenance operations [15,21]. All types of pavement have roughness. Even in newly constructed pavements that have not been exposed to the traffic load, there may be some roughness due to inappropriate execution quality. Generally, pavement roughness increases as a result of traffic loading and environmental conditions [15,22].
The most common index of longitudinal roughness of pavements and the driving quality is the International Roughness Index (IRI), which has been introduced and used for nearly three decades. This indicator was proposed by the World Bank in 1982, based on the International Road Roughness Experiment (IRRE) [23,24]. IRI is a quantitative index of pavement smoothness, which in addition to driving quality, determines the pavement performance [20,23,25]. IRI is based on the simulation of a Quarter-Car System (QCS) driving at a speed of 80 km/h [15,20]. Figure 1 shows the quarter-car system. Figure 1. Quarter-car system in the IRI calculation process [26] IRI is the average rectified slope, which is generally expressed in terms of mm/m or m/km [20,27]:

IRI =
The accumulated suspension vertical motion The distance traveled during the test (2) According to FHWA, in-service pavements can be evaluated based on the IRI value, as shown in Table 2.

Related studies
In this section, a review of research on the assessment of smoothness and surface conditions of pavement is presented.
In 1998, Sharaf and Fathy conducted one of the oldest studies on IRI and PCI in the North Atlantic [29]. In 2002, a relation was offered to estimate IRI based on pavement distresses in the Bay Area, California [30]: where IRI is the international roughness index (m/km) and PCI is pavement condition index. Lin et al. analyzed IRI and pavement distresses in 2003. They used a three-layer neural network for their analysis, the results of which are as follows [9]: • Severe potholes, digging/patching, and rutting have a strong correlation with IRI, • Man-holes, stripping, and corrugation have a medium correlation with IRI, • Cracking, alligator cracking, bleeding, and road level have a low correlation with IRI.
Park et al. undertook a study to investigate the relationship between surface distress and roughness in asphalt pavements. They presented a power regression model for the association between IRI and PCI [10]: where K1 and K2 are regression coefficients.
In 2008, the Korean Institute of Construction Technology presented NHPCI after investigating its national highways [31]: where NHPCI is National Highway Pavement Condition Index, XIRI is international roughness index (m/km), XRD is rut depth (mm), and XCR is Crack ratio (%). Shah et al. proposed the Overall Pavement Condition Index (OPCI) after inspecting 10 pavement sections of approximately 30 km in urban roads of Noida city. OPCI was developed based on four main parameters of "distress, roughness, structural capacity, and skid resistance" [32]: where OPCI is the overall pavement condition index, PCIDistress is pavement condition index related to distress, PCIStructure is pavement condition index related to structural capacity, PCISkid is pavement condition index related to skid resistance, and PCIRoughness is pavement condition index related to roughness.
PCI Roughness = 1.227 × IRI 2 − 17.73 × IRI + 100 The Korean Expressway Corporation Research Institute introduced the highway pavement condition index (HPCI) based on evaluations made by a team of experts [31]: where HPCI is the highway pavement condition index, SD is surface distress (m 2 ), RD is rut depth (mm), and IRI is international roughness index (m/km). Arhin et al., drawing on data derived from the District Department of Transportation, which was collected from 2009 to 2012, presented a model of calculating PCI using IRI in dense urban areas. This relation is as follows [33]: where A and K are constant coefficients, and ε is model error.
The coefficients of the above model are calculated based on the road functional classification and type of pavement, as shown in tables 3 and 4.

Case Study
The pavement sections investigated in this paper were selected from both directions of the Tehran-Qom Freeway. The total length of the freeway is 236 km, which comprises of two 118kilometer routes from Tehran to Qom and vice versa. This freeway has 3 lanes in each direction with a width of 3.65 meters per line. It is further divided into two branches of Tehran-Qom and Qom-Tehran. The sample units under study, which are 100 meters in length and 3.65 meters in width, were selected at 2-km intervals from the slow-speed lane. After field inspection of sample units, the surface distress conditions were determined, and the results are presented in tables 5 and 6. For each sample unit, PCI was calculated using the method presented in the ASTM standard. In the next step of pavement assessment, sample units are investigated by a two-laser road surface profiler (RSP) device, and the IRI index is calculated per unit. The IRI of each unit is the average IRI of that unit.

Random Forest (RF)
As a Machine Learning (ML) algorithm, RF uses many Classification and Regression Trees (CARTs) for the prediction process [63,[71][72][73]. This algorithm was proposed by Breiman in 2001 and belonged to the ensembled learning method [73,74]. The main idea of the ensemble learning method is to construct and to combine the multiple base learners for obtaining better general abilities [75]. CART is one type of decision tree which uses as the base learner in RF [74]. In the training process, CARTs are built by a random subset of the original dataset [54,76]. In CART construction, the tree splitting continues until the minimum node impurity is gotten. The node impurity depicts the total squared error between actual and predicted values [77,78]. Fig. 1 shows the concept of RF regression. Figure 2. the random forest concept [72].
As a strength, RF is able to appoint the relative importance of inputs for output estimation. The other strength of RF is to overcome on overfitting of regression trees [54,79].

Number of regression trees (ntree) •
Number of different predictors in each node (mtry) Eventually, the RF produces an average prediction (from all CARTs) as the output [54,63,79].

Random Forest optimized by Genetic Algorithm
In this study, RF is optimized by the GA algorithm. Developed by Holland (1992) and Goldberg (1989) [84], GAs are part of evolutionary algorithms [51,85]. These algorithms are inspired by some biological evolution mechanisms such as selection, crossover, and mutation [51,[84][85][86][87]. The typical form of GA includes three steps [84][85][86]: 1. Initial population generation 2. Computation of fitness 3. Construction of the new generation In step 1, GA produces a population or generation that includes a series of decision parameters. Then, GA examines the fitness value (target function) of each individual of the initial population (step 2). In step 3, GA generates the new generation by the selection, crossover, and mutation processes. The next step is to repeat steps 2 and 3. This repetitive process is held until the highest number of generations or the desired accuracy is achieved [85,86,88,89]. Evaluation of methods accuracy was done by three performance criteria: Correlation coefficient (CC), Scattered Index (SI), and Willmott's Index of agreement (WI), as expressed in Eqs. (10) to (12) [54].

Validation of the modeling
where PCIOi, PCIPi, PCI Oi ̅̅̅̅̅̅̅ , and n are ith observed PCI, ith PCI predicted by the model, mean of observed PCI values, and the number of observed PCI values, respectively. The CC shows the correlation between input and output. The domain of CC values is between -1 and +1. The values of +1 and -1 shows a complete correlation between PCI and IRI. Positive and negative values of CC mean direct and inverse correlation, respectively. SI depicts model error and WI is an index between 0 and 1. The closer the absolute values of CC, assumed a linear distribution of the error, and WI values are to 1, or the closer the SI values are to zero, the greater the model accuracy [54].

Results and Discussion
Analysis methods, RF and RF-GA, were presented in section 2.4. and this section represents the results of these methods. Statistical characteristics of IRI and PCI indices are presented in Table 7. The data was extracted from IBM SPSS 23 software (version 2015, International Business Machines Corporation (IBM), Armonk, New York, U.S.). The mean IRI is 1.843, therefore the average condition of all sections is fair (see Table 2). About mean PCI (72.746) can be said that the average condition of all sections is satisfactory (see Table 1). Maximum and minimum values help to understand the range of sections conditions. Standard deviation is one of the scattering indices that shows how much the average data is far from the mean. For a dataset, if the standard deviation is close to zero then data scattering is low. Also, a high standard deviation means data scattering is high [90,91].
For calculating the correlation coefficient, the first step is to determine the normality status of data. Therefore, skewness, kurtosis, and significance level in the Kolmogorov-Smirnov test were calculated. Kurtosis and skewness are measures of the "tailedness" of the probability distribution and the asymmetry of the probability distribution of a real-valued random variable, respectively. A negative/positive skew value indicates that the tail on the left/right side of the distribution is longer than the right/left side and the bulk of the values lie to the right/left of the mean. Therefore, skewness of a normal distribution is zero, but if skewness of data is zero it is not proof for normal distribution [54,90,92]. On the other hand, Kurtosis is a measure of the distribution peakedness. Negative/positive values of kurtosis refer to flat-topped and high peak distribution curves, respectively. When kurtosis is three, then it is a normal distribution [90]. In scientific resources, when the skewness and Kurtosis values are between 2 and -2, it can be said that the data distribution is normal. As a non-parametric test, Kolmogorov-Smirnov helps to determine if the data is normal or not. In this test, data is normal when the significance level (sig.) is above 0.05 [91]. So, Table 7 confirms IRI had better normal distribution than PCI. Also, both IRI and PCI are abnormal because their sig. are less than 0.05.
The adopting of the test type for correlation is based on normality condition. even if one of the variables is abnormal, then the Spearman test is used [91]. The correlation between IRI and PCI is 15%, which represents a weak value.
RF has seven main parameters including the number of trees (A), maximal depth (B), confidence (C), minimal leaf size (D), minimal size for split (E), number of prepruning alternatives (F), and subset ratio (G). The quality of RF performance depends on the value of these parameters. Table 8 depicts the values of these parameters for the RF as well as the optimized values by the GA.  Table 9 shows the three performance criteria introduced in this paper. Based on the description in section 2.5. and the values in Table 9, it can be concluded that GA can improve RF results. Because of low-quality values of CC, SI, and WI, theory assumed by authors is not practical. Authors assume that after implementing the experimental phase of study they can create a high-quality model for prediction PCI using IRI. There has been no standard method for splitting training and testing data. For example, Nabipour et al. [54] utilized 70% of their data for training, whereas Mohammadzadeh et al. [55], Shamshirband et al. [93], and Samadarianfard et al. [94] applied 70%, 67%, and 80% of total data to develop their models. In this study, the dataset includes 118 pavement segments where approximately 70% of the data (i.e., 83 segments) are utilized for training, and the remaining 35 segments are used for testing. Figure 3 depicts the PCI predicted by the RF and RF-GA methods as well as the PCI calculated in the experimental phase of the study.  Figure 4 shows the calculated PCI in the experimental phase of the study and predicted PCI with models used in this paper. The proposed method delivers an acceptable prediction performance with high accuracy with a fit line equation of y = x. Evedeintly, the intercept of the line slope is equal to zero [54]. Figure 4 is plotted for the test dataset. Although GA improves RF, it can be comprehended from Figure 4 that the slope of trend lines of both models is very low and the accuracy of them is low.

Conclusion
This paper is the final report of a research study in the field of pavement engineering. Initially, authors assume the main theory: For flexible pavements, the prediction of surface conditions is possible with the help of roughness conditions. For easy implementation of theory, authors use two standard indices represented surface and roughness conditions of pavements: PCI for surface conditions and IRI for roughness conditions. This work was conducted in two phases: Phase 1. Experimental phase and Phase 2. Application of artificial intelligence for analyzing the result of phase 1. In the experimental phase, the Tehran-Qom freeway was selected as the case study. The authors adopted 118 sample units of this freeway and calculated IRI and PCI indices. IRI of units was computed by two-laser RSP, then PCI of units was determined after inspection of units, distresses recording, and related calculations. In phase 2, AI methods were used to provide an applied model that was consistent with the theory of study. Among the methods used, the best results were obtained from RF and RF-GA methods. The authors used three parameters CC, SI, and WI to validate the modeling results. For the RF method, the values of the three parameters mentioned were -0.177, 0.296, and 0.281, respectively, whereas in the RF-GA method -0.031, 0.238, and 0.297 values were obtained for these parameters.
New inspection methods such as laser scanning and image-based techniques have been developed, however, in many parts of the world, especially Iran, the scientific, political, and financial constraints do not allow engineers access to these types of equipment. Therefore, deprived parts of new equipment have to use the traditional methods for pavement evaluation. The theory proposed in this study allows pavement engineers to minimize these constraints. In the proposed theory, two important pavement indices, IRI and PCI, can be calculated simultaneously only with the help of RSP and artificial intelligence methods. As a result, the challenges in the PCI calculation traditional process (time-consuming, very costly, tedious, variability, labor-intensive, and potential human error) are eliminated.
Unfortunately, the proposed theory has not been fully successful in the second phase and the RF and RF-GA methods do not show acceptable accuracy in predicting PCI. Given the benefits of the proposed theory, the authors suggest that this theory is of interest to other researchers in future studies. The first suggestion of the authors is to use more freeway/highway for study. Especially if they are selected from different climates to help the final model be comprehensive. Also, if other nondestructive (ND) pavement assessment equipment (GPR, RWD, TSD, etc.) is available, it is recommended to use their data in the modeling. The use of the ND equipment, while less interfering with the flow of traffic, will eliminate the challenges of traditional methods.