1. Introduction
Transportation infrastructure is an essential part of mobility and economic development in modern societies, being a pillar of the economy [
1]. Pavement, as a component of transportation infrastructure, plays a significant role in roadway safety and performance. In the United States alone, the paved road infrastructure covers a distance of approximately 5 million kilometers, enabling the transport of people and large volumes of freight [
2]. While pavements are an unavoidable aspect of infrastructure, they deteriorate over time due to various reasons, such as aging, increasing traffic volumes, and severe weather events. Therefore, it is essential to keep pavements in operational condition. Still, in 2023, around 39% of roads in the United States were in poor or mediocre condition, and only 45% were classified as good condition. Such inappropriate road conditions imposed drivers with additional operational costs and driving on deteriorated pavements costs around
$1400 for each driver annually due to vehicle operating costs and lost time [
3]. In 2023, the number of fatalities due to vehicle traffic crashes was around 41 thousand people, and one of the reasons could be the poor condition of pavements [
4]. Improving pavement conditions can address these issues.
Maintenance and rehabilitation (M&R) practices are required to maintain pavements in acceptable performance conditions. Effective M&R planning is based on precise forecasting of future pavement conditions, achievable through the use of pavement condition indicators [
5]. Global pavement surface roughness is one of the most widely used indicators, including the international roughness index (IRI), and it has been extensively utilized in previous research to predict the IRI in the planning horizon of M&R scheduling. That is, the M&R scheduling models minimize the IRI of pavement sections in a planning horizon [
6]. Due to the importance of the IRI, many research papers have employed various models to predict its trend over time [
7]. Although various IRI prediction models have been developed in previous studies, IRI prediction uncertainty modeling has not received enough attention. This study applies conformal prediction intervals to cover this research gap.
The primary objectives of this study are as follows:
Accurate IRI prediction: all the vital variables are applied in the modeling; ten prediction methods (i.e., ensemble learning, machine learning, and deep learning methods) are used to identify the best-performing prediction method for IRI prediction.
Identify the top determinant of IRI prediction: After detecting the best-performing prediction method, it is synchronized with an interpretation technique to capture the relative influence of variables on IRI.
Optimal uncertainty model: Various conformal prediction methods at different target coverage levels are compared by a multi-criteria decision-making method to detect the optimal uncertainty model.
Uncertainty parameters in IRI prediction: A further analysis is performed on the results of the optimal uncertainty model to determine the uncertain parameters in IRI prediction.
2. Literature Review
Many prediction models have been applied to predict IRI. Most of these models involve several factors to investigate their influence on pavement roughness. Age is one of the primary factors widely used in previous studies. Al-Suleiman and Shiyab [
8] presented two IRI prediction models based on regression analysis for slow and fast traffic lanes separately, with pavement age as the primary independent variable. The coefficient of determination (R
2) was calculated at 0.80 for the slow lane model and 0.61 for the fast lane model. The results suggested that there is a strong correlation between age and IRI of pavement sections. Similarly, Tsunokawa and Schofer [
9] proposed an IRI model with the initial IRI and pavement age as independent variables. IRI modeling that only uses initial conditions and pavement age may not yield the most accurate predictions. Many studies have incorporated traffic loads and structural features as additional parameters to improve IRI prediction models.
Traffic-associated variables like equivalent single axle load (ESAL) and cumulative ESAL (CESAL) have been commonly used in past investigations of IRI prediction models [
10,
11,
12,
13,
14,
15,
16]. Along with the traffic-associated variables, structural characteristics have also been considered as an independent variable using structural number (SN), which has emerged as a major factor in many IRI prediction models [
10,
11,
12,
14,
15]. For example, Choi et al. [
11] proposed IRI prediction models based on CESAL, asphalt content, and SN using multiple linear regression (MLR) and a backpropagation neural network. The MLR model had a correlation coefficient (R) of 0.46, and the backpropagation neural network R was 0.87, indicating a higher level of predictive performance.
In another study, Jaafar and Fahmi [
14] collected data from 34 long-term pavement performance (LTPP) observations and applied initial IRI, pavement age, ESAL, construction number, and SN as independent variables to develop two IRI prediction models. They applied MLR and artificial neural network (ANN) methods to develop the models and achieved R
2 values of 0.25 and 0.80, respectively.
Furthermore, in addition to structural and traffic parameters, climatic characteristics have also been included in IRI modeling. Common climatic parameters used in IRI modeling are average annual temperature, the number of freeze–thaw cycles per year, freezing index, total annual snowfall, and annual precipitation [
12,
13,
16]. For instance, using average annual precipitation, as well as ESAL and SN, Albuquerque and Nuñez [
12] developed two IRI models using MLR and achieved an R
2 of 0.87. A regression-based IRI model for flexible pavements was developed by Khattak et al. [
13] using climatic parameters (i.e., precipitation and temperature), as well as initial IRI, CESAL, and pavement thickness, on a dataset of 623 observations. The developed model reached an R
2 of 0.47. Dalla Rosa et al. [
17] developed a regression-based IRI prediction model that included climatic conditions, treatment type, subgrade properties, traffic loadings, and pavement type as independent variables. The root mean square error (RMSE) was 0.21 m/km.
Multiple studies have highlighted that pavement distress ought to be included as an independent variable in IRI modeling [
18,
19,
20,
21]. Hence, some studies considered pavement distress variables in the IRI prediction models. For example, Owolabi et al. [
19] developed an IRI prediction function with multiple linear regression (MLR) solely with three distress variables, including rut depth, patches, and longitudinal cracks length.
Although some research has focused strictly on pavement distress in their studies, others have included other attributes in addition to the distress indicators for the IRI model. For example, Abdelaziz et al. [
21] combined initial IRI, pavement age, along with three pavement distress indicators—fatigue cracking, transverse cracking, and the standard deviation of rut depth—to create two IRI models, one using MLR and the other using ANN. The R
2 values for the two models were 0.57 for the MLR-based model and 0.75 for the ANN model.
As can be seen from the above-mentioned studies, MLR and ANN have been widely used algorithms for IRI prediction. However, there are other machine learning algorithms that can outperform MLR and ANN when comparing prediction accuracy. To this end, researchers have begun to apply other prediction techniques to improve prediction performance. For example, Marcelino et al. [
22] used random forest regression to develop a long-term IRI prediction framework to predict IRI for up to 10 years. The model included climatic, structural, and traffic data, resulting in prediction errors of 6.95%. In another publication, Marcelino et al. [
23] used the AdaBoost algorithm to develop yet another IRI model for flexible pavements, considering the variables pavement thickness, average annual daily traffic, SN, and climate, including average annual temperature and precipitation. The AdaBoost-based model resulted in the highest predictive accuracy with an R
2 value of 0.986. Damirchilo et al. [
24] applied extreme gradient boosting, random forest regression (RFR), and support vector machine (SVM) algorithms, which produced R
2 of 0.70, 0.66, and 0.44, respectively. The independent variables of their models included ESAL, pavement age, precipitation, level of freeze–thaw days, freeze index, and the number of hot days for IRI prediction.
More in-depth literature review shows that only a select number of input features were used in earlier research studies to avoid increasing model complexity. Although the integration of all significant categorized variables, such as stripping of initial IRI, lane width, traffic loading, structure properties of links in the models, climate conditions, and pavement distresses, is needed, the previous models reported in the literature have rarely employed all of these variables simultaneously. To address this, this study aims to collect a comprehensive dataset including all the vital parameters to maximize the performance of IRI prediction models. Also, there are powerful machine learning models whose applications to predict IRI have been limited, such as categorical boosting and light gradient boosting machine. Accordingly, this study employs many machine learning algorithms and compares their prediction performance using various performance indicators. Moreover, to the best of the authors’ knowledge, all the previous studies applied deterministic prediction models to predict IRI, and uncertainty has not been modeled in this research domain. To address this, this study applied a conformal prediction interval framework to investigate the uncertainty in IRI prediction and identify which factors can increase such uncertainty. Unlike previous IRI prediction models, the proposed uncertainty framework can indicate which data observations and in which situations the predicted IRI may be unreliable. This feature improves the interpretability and practical usefulness of the model by allowing decision makers to assess the trustworthiness of individual predictions.
3. Methodology
In this section, the applied dataset for the modeling is first described. Then, the prediction methods are briefly introduced. Subsequently, the performance metrics used to identify the best-performing prediction technique are presented. Afterward, the uncertainty methods are described. Finally, the further analysis used to detect uncertain parameters is described. The methodology flowchart is shown in
Figure 1.
3.1. Datasets
The long-term pavement performance (LTPP) database, containing condition data for 526 flexible pavement sections in the 50 states of the United States, supplied the data required to build the IRI-prediction model. The flexible pavement sections are from intercity highways, which are roads that physically connect cities, but are outside the urban centers of those cities.
One of the goals of this research is to identify which features are most effective in predicting IRI over time. The input variables selected spanned multiple variable categories, including initial pavement condition, pavement age, lane width, traffic data, structural data, climate characteristics, and pavement surface distress. Traffic loading was represented by cumulative equivalent single axle loads (CESAL), and the structural data was represented by structural number (SN). We selected these variables for two reasons: first, they have been used in previous studies (please see the literature review), though not all together; second, they represent all the available variables in the LTPP dataset.
Climatic features included in the model were average annual temperature, annual freezing index, freeze–thaw cycles in each year, and annual amounts of precipitation. Furthermore, the statistical value of parameters is presented in
Table 1. In this table, the final row represents the statistical summary of the dependent variable (IRI) used for developing and evaluating prediction models. Although IRI is the predicted outcome, including its descriptive statistics in the methodology section is essential for understanding the characteristics of the dataset and the modeling context.
In total, 35 independent variables were used to build the IRI prediction model. We extracted data from the LTPP database of pavement sections with complete lists of the 36 variables for which the sections received no maintenance and rehabilitation (M&R) treatments. The data reviewed represents a wide range of observation periods on pavement sections from 106 days to 7 years of IRI measurements [
25].
Table 1 summarizes descriptive statistics including the maximum, minimum, average, and standard deviation for selected variables.
From this dataset, we developed two sample sets to construct the models: the training set and the testing set. The training set was applied to train and tune the machine learning algorithms. The testing set used pavement section data that had not been seen in the model in the training process, and they were applied to evaluate the predictive performance of the developed models [
26]. In this study, we randomly split the data sets into 80% training and 20% testing data. Further, we applied Optuna and k-fold cross-validation (considering k = 5) to tune the hyperparameters of machine learning techniques.
3.2. Prediction Techniques
Ten prediction techniques are applied to predict IRI and identify the best-performing technique on IRI prediction. These methods are briefly introduced in this section.
3.2.1. K-Nearest Neighbor
K-nearest neighbor (KNN) is a black-box machine learning method that has been widely employed for modeling and prediction since the 1970s. This method estimates the dependent variable of a given observation in testing data based on the dependent variable of similar observations in the training set. This method identifies K similar observations based on their independent variables. That is, the nearest observations are determined by a distance function and the values of independent variables. Finally, the K nearest observations are selected, and their dependent variables are used to predict the dependent variable of the testing data [
27].
3.2.2. Linear Regression
Linear regression (LR) is a common and well-known statistical method typically used to measure relationships between independent (input) variables and a dependent (output) variable. In the last several years, LR has gained much favor as a machine-learning algorithm [
28]. LR aims to generate linear models whereby the dependent variable (
y) is represented as a linear combination of any number of independent variables (
x). On the one hand, when the model has one independent variable, we label it simple linear regression, and if we have two or more independent variables to consider, we have multiple linear regression. The simple linear regression model can be generally represented as:
Equation (1) is considered the mathematical description of a straight line; is the y-intercept and designates the slope of the line and represents the slope of the line.
If we think about multiple linear regression, we can see that the model not just extends from a line to a plane, but also to a hyperplane in more than two dimensions. The general representation of the multiple linear regression model is:
In Equation (2), the model consists of n predictor variables, represented as .
3.2.3. Multi-Layer Perceptron
The multi-layer perceptron (MLP) can be defined as a deep learning method that introduces a complex architecture in artificial neural networks [
29]. MLPs are mainly used for supervised learning and can be thought of as a feedforward neural network, which maps a single or a set of inputs to an output or a set of outputs. The MLP consists of multiple layers of perceptrons across its architecture; the input nodes of each layer are connected, like a directed graph.
In the MLP, both the adaline (adaptive linear neuron) rule and the perceptron rule have been used to train single feedforward neural networks, where the connections (weights) to the units are adjusted according to a unit function applied to a linearization of the inputs. An MLP is a fully connected neural network, typically including three types of layers: input, hidden, and output. The input layer receives the information and transfers it to the hidden layer. The hidden layer(s) apply non-linear transformations to the inputs using activation functions. The output layer produces the final prediction [
30].
3.2.4. Decision Tree
Decision tree regression is a type of supervised learning that predicts continuous numerical outcomes based on learned feature representations in a tree structure. It works by following a series of recursive binary judgements about the feature values, whereby each judgement continues to refine subsets of the dataset, going from larger, increasingly smaller, and homogenous subsets until the final predictive output is a continuous variable. This process allows the model to predict a meaningful output for the numeric continuous variable according to the learning experience of the model [
31].
The tree is made up of internal decision nodes and terminal leaves. With an internal node representing a split based on either a single feature or the two levels of the feature, while the topmost internal node, called a root node, is the node based on the feature that best predicts the dependent variable. The search is carried out by top-down recursive partitioning, where the splits try to minimize the variance of the whole dataset into subsets. Specifically, the metric of measurement that is often used in terms of variance is the number of values (standard deviation) represented through partitions: a zero standard deviation indicates the subset is homogeneous, while larger groups are more heterogeneous [
32].
3.2.5. Random Forest
The random forest (RF) algorithm is an ensemble method that can be utilized for classification and regression problems. It generates a predictive model by combining multiple decision trees as its base learners [
33]. The method uses a technique called bootstrap aggregating, or bagging, in order to produce the model.
With bagging, for the original training dataset X, the algorithm samples B random samples from X, with replacement, to form
training datasets. Each of those
training datasets is then used to train a single decision tree. These decision trees are generated in parallel and do not share any information in the training process. After the training phase is complete, the ensemble of trees may be used to predict unseen instances of “x” created based on outputs from each of the B trees combined with the following equation:
Decision trees independently have high variance, but when combined in a random forest, the variance of the model is much lower. In regression problems, random forest uses the average value of all of the predictions from the trees to generate the final output [
34].
3.2.6. Adaptive Boosting
Adaptive boosting (AdaB) is an ensemble learning method that improves the prediction performance by combining multiple weak learners into a single strong predictor. Usually, AdaB only uses decision stumps—single-level decision trees—as weak learners [
35]. The algorithm assigns weights to instances of the training samples according to how difficult they are to predict. That is, the harder to predict samples are assigned a higher weight, while correctly classified samples are assigned a lower weight. This provides the model with the ability to focus on the more difficult cases more than other samples in each iteration of training.
AdaB for regression is a meta-estimator. It starts by fitting a base regressor onto the original dataset, and then fits additional regressors iteratively to the data. Each time a new regressor is fitted, the model will update the weights of the training samples, depending on the prediction errors for the previous regressor. This means that the model focuses on the hardest examples with the fitted new regressor to improve the model’s predictive performance.
3.2.7. Gradient Boosting
Gradient boosting (GB) is another ensemble learning method used for prediction purposes. Gradient boosting uses the boosting paradigm, where multiple weak learners (usually decision trees) are sequentially combined to produce a stronger predictive model. A simple first tree is built, often a one-node stump. The models of each additional tree are trained on the previous tree ensemble’s errors (residuals).
A learning rate is used to regulate the impact of each tree on the model’s final prediction, which lowers the impact of each individual tree on the correction. The approach is to add trees over time to tune the model. This process is continued until a predetermined number of iterations or an improvement in predictive accuracy is no longer observed [
36].
3.2.8. Light Gradient Boosting
Light gradient boosting (LGB) is another ensemble learning method applied to predict IRI. Like other ensemble learning techniques, LGB produces a given number of weak learners and syncs them to develop a more accurate prediction model. This method is an updated version of GB, which was introduced by Microsoft. LGBM is an efficient method, and it can reduce memory usage by parallel learning. A leaf-wise leaf growth strategy is used by LGB, which controls the depth growth of decision trees. In this strategy, multiple leaves at the same depth can be split simultaneously, enabling multi-threaded optimization during training [
37].
3.2.9. Extreme Gradient Boosting
Extreme gradient boosting (XGB) is a powerful supervised learning algorithm for regression problems in machine learning [
38]. It is computationally efficient and scalable, and provides much higher prediction ability than normal decision trees, while it is less interpretable. It works by learning a series of base learners, normally decision trees, to build a strong ensemble.
The objective function of XGB has two pieces: a loss function and a regularization term. The loss function measures the difference between predicted target values and actual target values. The second piece, regularization, penalizes model complexity to minimize overfitting. XGBoost iteratively minimizes its objective function during training, allowing it to create successive base learners to strengthen its prediction ability.
3.2.10. Categorical Boosting
Categorical boosting (CatB) is another efficient and powerful ensemble learning method applied in this study to predict IRI. CatB uses Bayesian estimators to reduce the likelihood of overfitting, and it replaces categorical variables with binary variables in the training phase, which can reduce computational complexity. Like other ensemble learning methods, CatB generates various decision trees and combines their outcomes to create a powerful ensemble model. However, this method has a symmetric structure and the depth of decision trees across all nodes is uniform, which is different from other ensemble learning methods, like XGB.
3.3. Performance Metrics
To compare the performance of prediction techniques and identify the most accurate one, a number of statistical metrics are used, including the coefficient of determination (R
2), mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE), the Akaike information criterion (AIC), and the explained variance (EV). These metrics are calculated as follows:
Where and represent the actual and predicted values of data observation i, respectively. Also, var signifies that variance.
AIC will normally be used to compare the quality of the statistical models for a given dataset; that is, a lower value indicates better fitting (but a penalty is given for increasing the number of parameters).
shows the number of the model’s parameters and
is the maximum likelihood of the model. For linear regression models (with normally distributed errors), the equivalent form is as follows:
Residual sum of squares is presented by and is the number of observations. In terms of evaluating model performance, a higher R2 and EV signify that the model has a better fit to the observed data. In contrast, it is preferable that MAE, MSE, and MAPE yield a lower value. Moreover, if a model has the lowest negative AIC value, it has better performance for predicting.
3.4. Interpretation Method
Another objective of this study is to determine the relative influence of variables on IRI. In this regard, after detecting the best-performing machine learning method by performance metrics, it is synchronized with the SHapley Additive exPlanation (SHAP) to interpret its results and identify the top determinants of IRI prediction. SHAP is a game theory-based method that applies local explanations to prioritize independent variables based on their relative influence on the dependent variable. To this end, this method uses a unitless metric, called the SHAP value, to compare the relative influences. We used SHAP since it performs better than other interpretation techniques in terms of consistency and computational performance [
39].
3.5. Uncertainty Modeling
As mentioned, one of the main objectives of this study is to model uncertainty in IRI prediction. To this end, the best-performing machine learning method (identified by performance metrics) is used in the uncertainty analyses. For modeling uncertainty, we applied the model agnostic prediction interval estimator (MAPIE). MAPIE implements conformal prediction techniques to estimate uncertainties, offering strong theoretical guarantees on marginal coverage under mild assumptions about the model and the underlying data distribution. For more details about MAPIE, refer to Taquet et al. [
40].
MAPIE relies on three main elements: the base prediction model, the method for constructing prediction intervals (called conformal prediction methods), and the target coverage level. As mentioned, we used the best-performing machine learning technique (among the ten methods applied) as the base prediction model. We used four conformal prediction methods: Naïve, Jackknife, Jackknife+, and Minmax, since they are the conventional methods to create prediction intervals for regression prediction problems [
40]. Rather than predicting a single value, these methods evaluate the lower bound and upper bound of the predicted value for each data observation and provide us with a predicted interval. These intervals are calculated based on the residuals. The target coverage level is the desired proportion of observations for which the true values fall within the predicted intervals [
40]. In this study, three target coverage levels are considered: 80%, 90%, and 95%. The conventional values for the target coverage level are 80%, 90%, 95%, and 99% [
41]. We excluded 99% since one of the methods could obtain an effective coverage of 100% with an effective target coverage level of 95%. That is, increasing the target coverage level after this threshold (95%) was not effective since the width of intervals was increased without any improvement in the effective coverage.
Therefore, 12 uncertainty models are developed by combinations of four conformal prediction methods and three target coverage levels. To identify the optimal combination (out of 12), a multi-criteria decision-making method, called grey relational analysis (GRA), is used. For more details about GRA, refer to Panda et al. [
42]. This method compares the combinations of prediction interval methods and target coverage levels using two criteria: the average interval width and the actual coverage level of testing data (i.e., effective coverage). Each method predicts an interval (i.e., range) for every observation. The average interval width is calculated as the mean difference between the upper and lower bounds of these intervals. The actual coverage level is defined as the percentage of testing observations whose true values fall within the predicted intervals. An optimal uncertainty model refers to a model with a lower average interval width and a higher actual coverage level.
After detecting the optimal uncertainty model, it is applied to identify uncertainty parameters. In this regard, the independent variables of the IRI prediction models (shown in
Table 1) were considered independent variables. The interval width of data observations calculated by the optimal uncertainty model is considered the dependent variable. Then, an MLR model is applied to determine the significant variables on the uncertainty of IRI prediction.
5. Conclusions
This study investigated IRI prediction and the associated uncertainty with these models. Previous studies generally applied a few independent variables to predict IRI. However, this study collects a comprehensive dataset including all the essential features to maximize the performance of IRI prediction models, and a dataset of over 500 observations and 35 independent variables was used for modeling. Another contribution of this study is to apply powerful machine learning models that can be widely applied to predict IRI, such as LGB and CatB. Another novelty of this investigation is the application of a conformal prediction interval framework to analyze the uncertainty in IRI prediction and to identify the factors that contribute to increased uncertainty. In the following paragraphs, a summary of the objectives and results is presented.
The first objective of this study was to detect the best-performing prediction method for IRI prediction. Ten prediction methods were tuned and trained on the dataset, and their performance was compared based on different metrics. The results showed that the light gradient boosting machine (LGB) is the best-performing method, and it could predict the IRI of testing data observations with an R2 of 0.927, an MAE of 0.09 m/km, an MSE of 0.016 (m/km)2, an EV of 0.926, and an MAPE of 7.2%.
The second objective of this investigation was to analyze the relative influence of independent variables on IRI prediction. In this regard, SHAP was synchronized with the best-performing method (LGB) to sort variables based on their relative influence on IRI. The results suggested that initial IRI was the top-ranked variable on IRI, followed by mean annual temperature, the duration between data collections, and the length of high-severity longitudinal cracking in the non-wheel path. A more detailed look at the SHAP results revealed that higher initial IRI, lower mean annual temperature, longer durations, and higher length of high-severity longitudinal cracking in the non-wheel path led to higher IRI values.
The third goal of this study was to determine the optimal uncertainty model for IRI prediction. To this end, 12 uncertainty models are developed by combinations of four conformal prediction methods and three target coverage levels. These methods were compared by effective coverage and average interval width. Then, a multi-criteria decision-making method (GRA) was utilized to identify the optimal uncertainty model. The outcomes of GRA suggested that Minmax/80 was the optimal model with a GRG of 0.667. This method could cover 93.4% of testing data observation with an average interval width of 0.256 m/km. The Minmax/80 can be applied to predict the upper bound of IRI. Therefore, practitioners can apply the upper bound of predicted IRI in the pavement M&R planning for the pavement sections where the IRI is critical.
The fourth objective of the current study was to determine which parameters and conditions lead to an increase in the uncertainty of IRI prediction. A further analysis was performed on the outcomes of the optimal uncertainty model, and the results showed that higher initial IRI, longer duration of IRI prediction, lower ESALs, lower length of low severity transverse cracks, lower medium severity sealed transverse cracks, higher length of transverse cracks longer than 183 cm, and higher total annual precipitation were all associated with increased uncertainty in the predicted IRI.
These results offer actionable insights for pavement engineers and decision-makers. First, they can apply LGB for IRI prediction in pavement M&R planning as a reliable tool, and therefore, the accuracy for pavement M&R planning can be increased. Second, the results of SHAP can help researchers and practitioners identify the most influential variables on IRI. Therefore, they can use this information to monitor and manage the most impactful variables and develop powerful pavement management systems. We also present the optimal uncertainty model (Minmax/80) that can be used in conservative M&R planning, particularly in the sections where safety and service quality are significant concerns. This study introduces the parameters that can increase uncertainty in IRI prediction. Hence, the decision-makers can be informed of situations where the uncertainty of IRI prediction is higher, and they need to apply stochastic programming to plan M&R activities for pavements under such conditions.
One of the limitations of this study is that it checks only three target coverage levels. To address this, future studies can consider more target coverage levels to detect the optimal uncertainty methods. Another limitation of this study is that it does not check the influence of data size on the uncertainty of IRI prediction. Hence, it is recommended that future studies use different datasets of different sizes to check the relationship between dataset size and the level of uncertainty. Another important limitation is the lack of geographic diversity in the data used. The findings may not generalize well to regions with different road conditions or construction practices. Hence, future studies are encouraged to apply the proposed technique in this study to other case studies and compare their results with the findings of the current study.