Comparative Analysis of Machine Learning Models for Prediction of Remaining Service Life of Flexible Pavement

: Prediction of the remaining service life (RSL) of pavement is a challenging task for road maintenance and transportation engineering. The prediction of the RSL estimates the time that a major repair or reconstruction becomes essential. The conventional approach to predict RSL involves using non-destructive tests. These tests, in addition to being costly, interfere with traffic flow and compromise operational safety. In this paper, surface distresses of pavement are used to estimate the RSL to address the aforementioned challenges. To implement the proposed theory, 105 flexible pavement segments are considered. For each pavement segment, the type, severity, and extent of surface damage and the pavement condition index (PCI) were determined. The pavement RSL was then estimated using non-destructive tests include falling weight deflectometer (FWD) and ground-penetrating radar (GPR). After completing the dataset, the modeling was conducted to predict RSL using three techniques include support vector regression (SVR), support vector regression optimized by the fruit fly optimization algorithm (SVR-FOA), and gene expression programming (GEP). All three techniques estimated the RSL of the pavement by selecting the PCI as input. The correlation coefficient (CC), Nash–Sutcliffe ef ﬁ ciency (NSE), scattered index (SI), and Willmott’s index of agreement (WI) criteria were used to examine the performance of the three techniques adopted in this study. In the end, it was found that GEP with values of 0.874, 0.598, 0.601, and 0.807 for CC, SI, NSE, and WI criteria, respectively, had the highest accuracy in predicting the RSL of pavement.


Introduction
Pavement performance prediction models are an essential part of pavement management systems (PMSs) [1][2][3][4].In addition to estimating the future condition of the pavement, these models help pavement engineers in the following [5,6]: (1) determining the optimal time for maintenance, rehabilitation, and reconstruction (MR&R) activities, (2) suggesting the most economical MR&R strategies, (3) estimating the required budget for MR&R activities, and (4) anticipating the results of various strategies.Until the late 1950s, little attention was paid to pavement performance and pavements were often categorized as satisfactory or unsatisfactory [7].With the development of computational methods and the presentation of ML methods, many researchers have studied and developed pavement performance prediction models.These models differ in two respects: firstly, the index that was adopted as a criterion for pavement performance, secondly, the selected approach for modeling.
In general, pavement management activities are split into two categories [18]: network-level management and project-level management.Project-level management specifies roads that need to be repaired, repair process, and repair timetable.Therefore, predicting future conditions of pavement is essential for project management [18].Forecasting pavement future conditions will require ongoing pavement assessment and inspection that will improve the operational quality of maintenance operations [19,20].On the other hand, network-level management focuses on determining the budget required to preserve the pavement network at the standard level.Hence, it is indispensable to determine the RSL of pavement at this level of management [18].Various factors such as traffic, characteristics of pavement materials, subgrade properties, climatic conditions, and maintenance quality have a destructive impact on the road pavement.As a result, pavement service life is surrounded by uncertainties that complicate the prediction of RSL [21].
The necessity of determining RSL in the previous paragraph has been recognized.Besides RSL, the other common indicators of pavement performance is PCI, which represents the general conditions of the pavement surface and ranges from zero for a practically unusable pavement to 100 for a flawless pavement (according to ASTM D-6433-07) [22].PCI is determined based on pavement inspection results in terms of type, severity, and the extent of distresses [23,24].After a thorough inspection of pavement surface distresses by an experienced inspector, PCI will be determined according to the related standard [22].PCI is an index adopted in the project-level pavement management process.Hence, this index is specified before entering network-level pavement management.Conversely, the RSL of pavement, which is one of the pillars of network-level pavement management, is determined by applying the falling weight deflectometer (FWD) non-destructive test.In this test, a traffic lane is first blocked.Then an impulsive loading is applied to the pavement to induce pavement surface deflections.By analyzing pavement surface deflections, the RSL is calculated.In addition to the characteristics of the pavement layers materials (Young's modulus and Poisson's ratio) required for back-analysis in stress and strain calculations, a key point in calculating the RSL using the above method is knowing the pavement layers thickness for the analysis of deflections.The thickness of the pavement layers is determined using the ground-penetrating radar (GPR) non-destructive test.As a result, two non-destructive tests are required to determine the RSL of pavement.Traffic interference during FWD and GPR tests should also not be forgotten.
In light of the above, the current method of determining RSL is not only costly but also compromises the safety of users due to traffic interference during testing.Given the limited budget resources of transportation agencies, determining the RSL of pavement to manage a pavement network represents one of the ongoing concerns of such agencies [20].Considering the necessity of RSL determination for PMSs as well as the problems mentioned in the current procedure of RSL determination, a new pavement performance model is proposed in this paper.In this study, three methods of SVR, SVR-FOA (support vector regression optimized by the fruit fly optimization algorithm), and GEP have been applied to predict the RSL of road pavements.Given the importance and widespread use of PCI in PMSs, RSL modeling was based on PCI.
The main objective of this study was to save time, reduce costs, and increase safety in the RSL determination process.According to the PCI calculation on project-level activities, using PCI for network-level management operations, estimation of RSL, will contribute to the overlapping of activities and saving time.The methods presented in this paper, by excluding non-destructive tests from the process of determining the RSL of pavement, drastically reduce costs.On the other hand, by eliminating non-destructive tests, traffic interference during testing and the potential safety hazards to road users are also eliminated.
The data required for developing the innovative SVR-FOA method proposed in this paper was collected from Shahrood-Damghan highway in the Semnan province, Iran.The pavement type of this highway is flexible.In this highway, a 100-m long pavement segment was taken from the beginning of each kilometer, and after assessing the surface distresses of each segment, its PCI was calculated.In the next step, FWD and GPR tests were applied to the selected segments to determine the RSL.Finally, using three SVR, SVR-FOA, and GEP methods, the RSL modeling based on PCI was implemented.
This paper is organized as follows: Section 2 offers a review of studies on the prediction of RSL.Section 3 introduces the method employed in this study.This section is made of four sub-sections titled PCI, RSL, machine learning techniques, and case study.In Section 4, the results of the analyses are presented and discussed.Finally, the conclusions are drawn in Section 5.

Literature Review
PMS is an Assessment Management System (AMS) used by road network administrators to maintain the entire network at the desired level.Predicting pavement performance is a key factor in PMSs [25].The models of predicting pavement's RSL fall into the category of the pavement performance prediction models.In this section, studies carried out by other researchers on predicting RSL are reviewed.Table 1 lists the results of studies that have strived to estimate RSL to date.

Category
Model Inputs Equation Author  In Table 1, the models of determining the RSL of pavement are divided into three categories based on the model inputs:

•
First category: Models that predict the RSL based on the response (stress and strain) of pavement to the applied loads.

•
Second category: Models that predict the RSL based on pavement quality indices.• Third category: Models that predict the RSL based on the results of pavement non-destructive tests.
Among the above categories, models that predict the remaining pavement service based on qualitative indices appear to be more appropriate.It is because such models neither call for the analysis of pavement behavior and response, as in the first category nor require non-destructive tests, like the third category.Instead, they estimate the RSL of the pavement by assessing pavement and calculating a qualitative index in the simplest possible way.In light of the above points, in this paper, PCI has been adopted as a qualitative index for predicting RSL.Setyawan et al. conducted a similar study on East Line of South Sumatera the results of which are displayed in Table 1.They evaluated the condition of road performance and damages and determined PCI.Then they calculated the RSL of the pavement using the deflection data acquired from falling weight deflectometer measurements.Finally, the relationship between these two values was examined.The research involves five sections of the route with the different damaged condition.After calculating the PCI and RSL values, the relationship between these two parameters was determined with the help of Microsoft Excel software.The model presented in their study has two drawbacks: (1) The number of data for building a reliable model is too small, (2) instead of using simple regression with Microsoft Excel, more advanced methods, such as ML can be used to achieve better results.In contrast, the methods proposed in this paper are based on the data gathered from 105 pavement segments using machine learning methods.
In general, PCI offers a valid index accepted by all transportation agencies around the world, and it is widely used in their evaluations.Compared to the current method of estimating the RSL of pavements (using two non-destructive FWD and GPR tests), the proposed method provides a far simpler, safer, and less costly way of estimating RSL.

Pavement Condition Index (PCI)
Extensively used in roads, parking lots, and airports, PCI is recognized as a standard practice by many organizations around the world, including the Federal Aviation Administration, the American Public Works Association, and the U.S. Air Force [34].PCI is a numerical index that expresses the rate of pavement surface distresses.PCI exhibits structural integrity and Surface operational condition but is not able to measure structural capacity [22].
The first step in determining the PCI of each pavement segment is to determine the type, extent, and severity of surface distresses.The second step is to determine the deduct values (DVs) with the help of the specific curves for each type of distress.Next, reduce the number of DVs to the maximum number allowed (mi) [22,34]: where HDV = greatest individual deduct value.
Next, specify the number of DVs greater than 2 (q).In the next step, the corrected deduct value (CDV) should be specified in a special curve with the help of TDV (sum of DVs) and q [34].Then among DVs that are larger than 2, decrease the smallest ones to 2. At this point, repeat the process of calculating q, computing CDV, and reducing DV until q is equal to 1.By calculating the maximum CDV, PCI can be calculated [22,34]: Readers can refer to [22] and [34] for reaching more details and related curves.The classification of a pavement segment based on PCI follows Table 2.

Remaining Service Life (RSL)
The RSL of pavement under operation is a key factor in implementing PMSs.It is because learning about the future conditions of the pavement network is essential for decision making, life cycle cost analysis, planning, and budget allocation [14,35].In general, the definitions of RSL by different agencies and departments of transportation can be split into two general categories [36]: The remaining time to reach a level of distress when the pavement needs to be rehabilitated or reconstructed.For example, the Minnesota Department of Transportation (MnDOT) defines the RSL as the time until the next major rehabilitation.

•
The time until pavement conditions reach a specific condition index limit.For example, the Michigan Department of Transportation (MDOT) defines the RSL based on the Michigan Ride Quality Index, assuming an RSL of zero when the said index is 50.
The RSL of pavement segments in this paper has been determined using the heavy falling weight deflectometer (HWD).The HWD is an FWD, the application of which is not constricted to the road and could be used for airport pavement assessment.Figure 1 shows the HWD device employed in this study.The HWD applies a tension similar to the standard axle load (8.2 t) to the pavement surface over 10 to 35 ms.A number of geophones are placed on the pavement surface at specified distances from the loading center.The task of the geophones is to record the pavement deflections induced by the load applied with the HWD device.Standard axle load simulation in HWD is generated by a series of weight drops on a loading plate placed on the pavement surface [14].Table 3 reveals the details of the HWD test undertaken in this study.The deflections recorded by geophones are transferred to the central computer in HWD.In this computer, the analysis of pavement surface deflections is performed using ELMOD6 software.ELMOD uses Miner's Law to calculate critical stresses and strains.Miner's Law is used to interact with the impact of distresses that occur during each season and by each load.With the help of backcalculation analysis, ELMOD provides important outputs including RSL, overlay thickness, and elasticity module [38].
A prerequisite of estimating RSL in accordance with the process described in this section is knowing the thickness of pavement layers.For determining the pavement layer's thickness, GPR nondestructive testing was carried out for all pavement segments.GPR is capable of calculating pavement thickness as a continuous profile by transmitting electromagnetic waves through a transmit antenna and receiving recursive signals [39].The receiver of GPR takes reflected waves and shows them as a plot of amplitude and time [40].The layers thickness (hi) of the pavement is calculated by the following equation [41]: where ∆ = time between amplitudes Ai and Ai+1, c = electromagnetic wave speed through the vacuum, and  = the relative dielectric constant of the layer.Table 4 shows the complete details of the GPR experiment carried out in this study.

Machine Learning Techniques
In this paper, the main objective is to present a new approach for predicting the RSL of flexible pavements based on PCI using machine learning techniques.The data set consisted of the RSL and PCI of all segments under study, which were analyzed using the GEP, SVR, and SVR-FOA methods.GEP is an evolutionary algorithm that investigates the relationship between input and output variables by developing computer programs [44].Different from both the genetic algorithm (GA) and genetic programming (GP), GEP is a combination of both introduced by Ferreira in 2001 [45].SVR is a supervised machine learning technique employed to solve regression problems.SVR is especially popular due to its desirable management and performance in handling nonlinear issues [46].The success of an SVR in problem-solving depends on the proper selecting of its basic parameters.SVR's basic parameters include c, ε, and kernel function parameters [14].Improper values of the basic parameters in the SVR can lead to under-fitting or over-fitting.Thus, the optimum values of them must be selected during training the SVR method [47].In other words, different optimization algorithms were developed and may be utilized for selecting the suitable values of SVR.FOA is an intelligent swarm algorithm introduced by Pan in 2012, which utilizes the food searching strategy of a fruit fly to find the optimal values for SVR basic parameters [48].These methods are introduced in the following subsections.

Gene Expression Programming (GEP)
GEP is a developed GP method that solves a problem by creating expression trees (ETs).In fact, GEP is an evolutionary algorithm for creating computer programs.The designed computer programs have sophisticated tree structures that are trained similar to a living organism by changing size, shape, and composition, and adapted to the conditions.Like living organisms, GEP programs are coded as simple fixed-length linear chromosomes.Hence, GEP is a genotype-phenotype system that employs a simple genome to store and transmit genetic information and adopts a complex phenotype to explore and adapt to the environment.The genome consists of a chromosome or a fixed-length string that combines one or more genes of the same size.In fact, each chromosome contains one or more genes known as Sub-ETs.In GEP, all Sub-ETs are linked through the root with connection functions.The connection functions in GEP include division, multiplication, subtraction, and addition [49].Figure 2 shows an instance of the genotype-phenotype structure in GEP.These genes, despite their fixed length, are coded for ETs of varying size and shape.It implies that the size of the coding region varies from one gene to another to allow for progressive adaptation and evolution.Each gene has a coding area called open reading frame (ORF), which, after being coded as an expression tree, provides a solution to the problem [51].Figure 3 demonstrates the coding region (ORF), the non-coding region, and the expression tree for a gene.Like other evolutionary approaches, GEP begins by randomly generating the initial population chromosomes.In the first population, each chromosome is assessed based on the fitness function and receives a fitness value.Various fitness functions have been used in GEP, including root relative squared error (RRSE), relative square error (RSE), root mean square error (RMSE), and mean square error (MSE) [44].The proper chromosomes are more likely to be picked in the next generation.After being selected, chromosomes are amended by genetic operators (including transposition, inversion, mutation, recombination, and gene crossover) and then reconstructed.This process is sustained until a suitable solution or the maximum number of generations is reached [44,53].

Support Vector Regression (SVR)
The SVR is considered as a support vector machine (SVM) used for regression problems.SVR has been widely used for civil engineering problems [54,55].Supervised learning techniques, including SVR, utilize structural risk minimization (SRM), while conventional neural networks use empirical risk minimization (ERM).ERM minimizes the error of training samples, but SRM is able to minimize a higher level of error.As a result, SVM is capable of overcoming the deficiencies of conventional neural networks [14,46].The main idea in SVR is to map nonlinear information into a higher dimensional space and then solve a linear regression problem in the new space [46,56,57].In the new space, a simple linear kernel function is adopted to solve the problem.However, in complex problems, a simple linear kernel function will be inadequate.The kernel function k(x, z) for all ,  ∈  is defined as follows [46]: Each kernel function must have two features [46]: 2. Compliance with the Cauchy-Schwartz criterion (, ) = ().() ≤ ‖()‖ ‖()‖ .
These two conditions guarantee that the new space is definable by the kernel function.The most famed kernel functions are polynomial kernel, radial basis functional kernel, linear kernel, and sigmoid kernel [57].Given the above explanation, it is clear that SVR requires an appropriate function to explain the nonlinear relationship between input (xi) and output (yi) [57]: where ( ) = transformation function, w = weight, b = bias.w and b are obtained by minimizing the following function, which is known as the regularized risk function [57]: where ‖‖ = regularization term, C = penalty coefficient, and  ( , ( )) = -insensitive loss function: where  = permitted error threshold.
To solve the optimization boundaries, two factors of  and  * are defined [57]: subject to We now need to define a Lagrange function based on the objective function and boundary conditions [57]: subjected to ∑  ( −  ) = 0 ,  ,  ∈ 0 ,  .
Therefore, the regression function can be shown as follows [57]: where   ,  = kernel function.Figure 4 summarizes the overall structure of the SVR.In general, SVR performance depends on its parameters.SVR parameters are [59,60]: • : this parameter supervises the width of the ε-insensitive zone, used to fit the training data.
The value ε can affect the number of support vectors used to build the regression function.For the bigger ε, estimates are more 'flat', and the fewer support vectors are chosen.
• C: this parameter specifies the trade-off between the complexity of the model and the grade to which deviations larger than  are bearable in optimization formulation.
• : this parameter determines the relation between error minimization and smoothness of the estimated function.
These parameters are chosen by the user based on prior knowledge of SVR, so this method is not suitable for non-professional users.Various algorithms have been developed to optimize the amounts of SVR parameters.In the following subsection, one of the optimization algorithms used in this paper is introduced.

Fruit Fly Optimization Algorithm (FOA)
FOA is an optimization algorithm developed based on the food search behavior of the Drosophila insect [48].The fruit fly has superior smell and vision senses, which discriminates it from other insects.Fruit flies track the smell of the food sources dispersed through the air and heads towards it.This insect is even capable of smell tracking from a distance of 40 km.When approaching the food source, the fruit fly employs a sense of vision to locate food and other fruit flies.The best information is shared among the fruit flies, and finally, the route leading to the food source is identified [61].Figure 5 illustrates the process of food search by a fruit fly.In this paper, this optimization algorithm has been selected to find the optimal values of SVR parameters, which is known as SVR-FOA.The general steps of SVR-FOA can be summarized as follows [63,64].The flowchart of the SVR-FOA method is illustrated in Figure 6.
Step 2. Parameter initialization; including the maximum number of iterations, location of initial population (X-axis, Y-axis), population size, and random flight distance domain: Step 3. Population initialization A random location (Xi, Yi) and food founding distance are assigned to each fruit fly: where: i = Population size.
Step 4. Population evaluation The distance from the origin to the food source (D) and the smell concentration parameter (S) are calculated: Step 5. Replacement S value is substituted with the fitness function or smell concentration judgment function so that the smell concentration for each fruit fly location can be attained: Step 6. Detect the maximal smell concentration At this point, the fruit fly with the highest Si is identified and located within the population.

𝑏𝑒𝑠𝑡𝑆𝑚𝑒𝑙𝑙𝑏𝑒𝑠𝑡𝐼𝑛𝑑𝑒𝑥 = 𝑚𝑎𝑥(𝑆𝑚𝑒𝑙𝑙). (22)
Step 7. Keep smell concentration The coordinates of the maximum smell concentration are set, and the fruit fly swarm flows in that direction, _ = (), ( 23) Step 8. Iterative optimization Steps 3 to 6 are repeated until the smell concentration does not show any improvement compared to the previous one or the maximum number of repetitions in Step 2 is reached.
Step 9. Output the optimum parameter of SVR

Case Study
To implement the theory proposed in this paper, a stretch of 105 km from Shahrood-Damghan highway in Iran was selected and inspected.Given that this highway is part of the route between Tehran (the capital of Iran) and Mashhad (the second most important city of Iran), it constitutes one of the major roads.The highway consists of two lanes in each direction and uses the flexible pavement.A prerequisite of implementing the proposed theory is to select a number of sample segments from his highway.To do so, segments 100 m in length and 7 m in width were selected from the beginning of each kilometer.By inspecting the selected segments, the surface distress data (type, severity, and extent) of each segment was recorded in the assessment forms.The PCI of each segment was calculated as described in Section 3.1.With PCI known, the RSL of the pavement segments need to be known.Hence, HWD and GPR tests were performed on all segments and the RSL of each segment was determined.Figure 7 shows the location of the Shahrood-Damghan Highway as well as the starting and ending points of the segments understudy on this Google map.

Results and Discussion
The results of the analysis are presented in this subsection.First, the statistical specifications of the input and output modeling variables are listed in Table 5.The data was extracted from IBM SPSS 23 software (version 2015, International Business Machines Corporation (IBM), Armonk, New York, U.S.).As depicted in Table 5, the mean PCI of all pavement segments is 59.97, which according to Table 2, is indicative of the fair state of all segments.The mean RSL of pavement is 17.77 years.For interpreting the mean RSL, it is worth noting that ELMOD6 software does not suggest the application of an overlay layer for this RSL.Hence, the average RSL of the segments is fairly desirable.Before calculating the correlation coefficient of the modeling input and output, the normality of the data must be determined.This is determined by kurtosis and skewness coefficients, as well as the results of the Kolmogorov-Smirnov test.In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable.Moreover, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.The skewness value can be positive or negative, or undefined.A zero-skewness dataset has perfect symmetry (completely normal distribution).A positive/negative skew value indicates that the tail on the right/left side of the distribution is longer than the left/right side and the bulk of the values lie to the left/right of the mean.Kurtosis acts as a measure of the distribution peakedness.For a perfectly normal distribution, kurtosis should be zero.Positive kurtosis means a high peak distribution curve, and negative kurtosis refers to the flat-topped distribution curve [65,66].It is very difficult to achieve zero skewness and Kurtosis, then the data distribution is normal when the skewness and Kurtosis values are between 2 and −2 [67].So, it is clear from Table 5 that PCI, with lower kurtosis value, had better normal distribution than RSL.Kolmogorov-Smirnov is a nonparametric test that helps to determine if the data is normal or not.If the significance level (sig.) in this test is above 0.05, the data are normal [67].When choosing a correlation test, even if one of the variables is abnormal, then the Spearman test is used.According to Table 5, since PCI variables have a normal distribution but RSL distribution is abnormal, the Spearman correlation test must be used.The correlation between PCI and RSL is 57.2%, which represents an average value.
As noted in Section 3.3.2,SVR consists of three basic parameters (C, ε, and γ), with the quality of SVR performance depending on the values selected for these three parameters.Table 6 shows the values of these basic parameters for the SVR as well as the optimized values of these parameters by the FOA algorithm.In Table 7, the characteristics of the GEP model used in this study, including model parameters and genetic operators, are shown.It should be noted that the default parameters of Gene Expro Tools 4.0 software (Version 2.0, Gepsoft Limited, Bartolomeu Messines, Portugal) were selected for further computations.Equation (25) shows the proposed formula of the GEP method, extracted from Gene Expro Tools 4.0, for estimating pavement remaining service life in terms of PCI: . ) . .
The results of scientific research are generally assessed with indicators that exhibit the accuracy and error of the analysis.In this study, four criteria entitled Correlation Coefficient (CC), Scattered Index (SI), Nash-Sutcliffe efficiency (NSE) and Willmott's Index of agreement (WI) were used to determine the quality of the outputs [68,69]: where RSLOi = observed RSL ith value, RSLPi = predicted RSL ith value, and RSL = average of RSLOi.
The CC is a number in the range of [−1, +1], with values of +1 and −1 indicating a complete correlation between model inputs and outputs.Positive values represent direct correlation and negative values demonstrate an inverse correlation.As the absolute value of CC approaches zero, the strength of the correlation decreases.SI represents an error and smaller values indicate lower errors in modeling.The highest NSE value is one with values close to one indicating greater modeling accuracy so that NSE = 1 represents the best modeling quality.WI is an index between 0 and 1 with values close to one suggesting higher modeling accuracy.
Table 8 reveals the four criteria introduced for all three methods employed in this study.To shed further light on the results of Table 8, a three-dimensional bar histogram of the criteria is presented in Figure 8.Based on the description in the preceding paragraph and the values in Table 8, it can be concluded that FOA has improved SVR results.Comparing the SVR-FOA and GEP modeling results, it can be contended that the GEP results are partially superior in the modeling proposed in this paper.Moreover, although the CC values of studied methods are not so high, due to lower SI values, they can be used with acceptable accuracy in RSL estimation.There has been no standard method for splitting training and testing data.For instance, Choubin et al. [70] used a total of 63% of their data for model development, whereas Shamshirband et al. [71] utilized 67% of data, Mohammadzadeh et al. [72] 70%, and Samadarianfard et al. [73] implemented 80% of total data to develop their models.In this study, the dataset containing 105 pavement segments is used, where, approximately 70% of the data (i.e., 75 segments) are used for training, and the remaining 30 segments are utilized for testing.Figure 9 shows the RSL predicted by the three SVR, SVR-FOA, and GEP methods as well as the RSL measured in the HWD test for the segments selected as the test.As can be seen in Figure 9, the studied models do not accurately predict the maximum and minimum RSL values.So, this may affect the scheduling of maintenance actions.Figure 10 displays the predicted RSL values versus the RSL values calculated by the HWD test for all three machine learning techniques adopted in this paper.In this regard, the method with the best prediction accuracy is the one that has a fit line equation of y = x, meaning that the line slope is equal to one and its intercept is equal to zero. Figure 10 is plotted for the test dataset.It can be comprehended from Figure 10, although the slope of trend lines of all studied methods is lower than 0.5, the accuracy of GEP is higher than SVR and SVR-FOA models.The presented Taylor Diagram shows the superior accuracy of the GEP model due to the lower distance of its correspondent point (the point with Magenta color) from the observed green point.Overall, by examining Figures 8-11, it can be concluded that the SVR technique offers an average accuracy for the purpose of this article.Using the FOA algorithm to select the basic parameters of this technique significantly enhanced the accuracy of this method.On the other hand, the GEP method provides a formula for RSL prediction.By re-examining Figures 8-11, it turned out that both SVR-FOA and GEP methods yielded desirable accuracy for RSL prediction.However, the accuracy of the GEP method was slightly higher than that of the SVR-FOA method.

Conclusions
Pavement management at both project and network levels are always associated with substantial costs.Due to the budget constraints inflicted on organizations in charge of PMS, optimizing pavement management costs is one of the priorities of any organization.RSL is a crucial factor for pavement management at the network level.The current procedure for determining RSL involves using FWD and GPR tests.These devices are not only costly but also interfere with the traffic flow and compromise the safety of road users.The aim subject of the study was to present a new approach for predicting the RSL of flexible pavement, which eliminated the drawbacks of current methods.Therefore, the proposed method can lead to lower costs, reduced time consumed, and also increase safety.The idea of estimating RSL has been followed by various researchers.The major differences between their studies were in the methodology and methods of analysis.After a review of previous studies on estimating the pavement RSL, we decided to use pavement surface distresses as a criterion for predicting RSL.Therefore, PCI pavement was employed as an input variable in modeling pavement RSL.Modeling was done with the help of ML techniques, which is the innovation of this research.The dataset utilized for modeling was selected from the Shahrood-Damghan highway in Iran.After selecting 105 pavement segments from the highway, PCI and RSL of all segments were determined.Modeling was conducted using GEP and SVR techniques after completing the dataset.The results of modeling with these techniques were evaluated based on four criteria include CC, SI, NSE, and WI, to determine the most appropriate technique for estimating pavement RSL.After exploring all four criteria, it was found that the GEP outcomes were far more accurate than the SVR.Then, to improve the accuracy of the SVR method, the FOA optimization algorithm was employed to add a third technique, as an innovative model (namely SVR-FOA), to the methods applied in this paper.Again, the four criteria CC, SI, NSE, and WI revealed a significant improvement in the accuracy of the SVR-FOA method compared to the SVR method, yet the GEP method still had the highest prediction precision.In sum, the findings of this paper suggested that the GEP method (with values of 0.874, 0.598, 0.601, and 0.807 for the four criteria CC, SI, NSE, and WI, respectively) offered an alternative to current methods of predicting pavement RSL.Based on the desired accuracy and the cost-benefit analysis, transportation agencies can use the GEP method to optimize the model accuracy, robustness, and reliability considering either new projects or for calibrating the former models.
In general, this study highlights the usefulness of the use of artificial intelligence in transportation engineering.The results show that artificial intelligence techniques can be used as an optimization tool (at time and cost) in pavement management systems.In terms of pavement engineering, this study concluded that the RSL of pavement, as a fundamental factor in the planning of the pavement network maintenance, is a flexible factor and is not limited to current cost-intensive methods.With the help of similar methods with the proposed method in this paper, the RSL can be determined as soon as possible.This will ease the future planning of road maintenance activities.

3 .*
Horizontal tensile strain at the bottom of the asphalt layer, EAC = Modulus of asphalt, a, b, and c = Constant coefficients of regression.(RSL ) =  −  ( ) −  ( ) Hossain & Wu (2002) εt = Tensile strain at the asphalt layer bottom, K and c = Regression coefficients.RSL = ( ) IRI (where age is zero), b = Curvature of performance line.Based on the result of the nondestructive test  = Pavement surface curvature,  =  −   and  = Material constants.Di = Deflection of pavement surface on distance i cm from the center of the loading plate in the FWD test.

Figure 1 .
Figure 1. the heavy falling weight deflectometer (HWD) used in this study for determining the remaining service life (RSL).

Figure 6 .
Figure 6.The flowchart of the support vector regression optimized by the fruit fly optimization algorithm (SVR-FOA) method [62].

Figure 8 .
Figure 8. Three-dimensional bar graphs of the statistical parameters.

Figure 9 .
Figure 9. Observed and estimated values of RSL with SVR, SVR-FOA, and GEP models for test data.

Figure 10 .
Figure 10.The scatter plots of calculated RSL by HWD and estimated RSL by SVR, SVR-FOA, and GEP models for test data.

Figure 11
Figure 11 represents the Taylor diagram used in this study.Introduced by Taylor in 2001, the Taylor diagram is a mathematical diagram that graphically allows a comparison of several models of a system.In this diagram, there are three categories of contours [74]: • Blue contours It shows the Pearson correlation coefficient.• Orange contours It indicates the RMS error that is proportional to the distance from a green spot on the horizontal axis called observed.• Black contours It indicates the standard deviation proportional to the radial distance from the center.

Figure 11 .
Figure 11.Taylor diagrams of estimated RSL for all models.

Table 5 .
Statistical characteristics of the utilized data.

Table 6 .
Parameters of the SVR and SVR-FOA models.

Table 7 .
Characteristics of the GEP model.