Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads

Radwan, Mostafa M.; Faris, Sundus A.; Barakat, Ahmed Y.; Mousa, Ahmad

doi:10.3390/eng6060114

Open AccessArticle

Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads

¹

Faculty of Engineering, AlMaaqal University, Al-Maaqal, Basra 61014, Iraq

²

Faculty of Engineering, Nahda University, Nahda University Road, Beni Suif 52611, Egypt

³

Department of Civil Engineering, Faculty of Science and Engineering, University of Nottingham Ningbo China, 199 Taikang East Road, Ningbo 315100, China

^*

Authors to whom correspondence should be addressed.

Eng 2025, 6(6), 114; https://doi.org/10.3390/eng6060114

Submission received: 18 March 2025 / Revised: 18 May 2025 / Accepted: 19 May 2025 / Published: 28 May 2025

(This article belongs to the Special Issue Artificial Intelligence for Engineering Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The pavement is a complex construction subject to a range of environmental and loading conditions. Transportation organizations use pavement management systems (PMSs) to maintain satisfactory pavement performance. The pavement condition index (PCI) is a commonly used performance indicator, yet PCI evaluation is costly and time-consuming. Machine and deep learning algorithms have recently been more instrumental for forecasting pavement conditions. This research uses AI tools to develop a correlation between PCI and collected distress in urban road networks. The distresses for 15,000 pavement segments in Egypt were investigated through a desk study and field data collection. To this end, several machine learning (ML) and deep learning approaches were developed. The ML techniques include random forest (RF), support vector machine (SVM), decision tree (DT), and the deep learning approach entails artificial neural networks (ANN). The proposed techniques provide precise PCI estimates and can be seamlessly integrated with PMCs using ubiquitous spreadsheet programs. The results have shown excellent predictions of the ANN model, as demonstrated in the high coefficient of determination (

R^{2}

= 0.939) and the low root mean squared error (RMSE = 7.20) and the mean absolute error (MAE = 2.94). This study sets out to provide a reliable and affordable alternative to specialized tools like MicroPAVER. The ANN model exhibited greater prediction accuracy than the other developed models and can also reliably forecast PCI values by using only measured distress data.

Keywords:

PCI; random forest; support vector machine; decision tree; artificial neural networks; pavement distresses

1. Introduction

Pavement networks are an integral component of the critical infrastructure for any country. They must maintain functionality at a reasonable cost. Effective Pavement Management System (PMS) programs include monitoring pavement distress, e.g., cracks, potholes, and faulting, and considering the numerous environmental and traffic loading factors [1,2]. Adequate PMS ensures the safety and functionality of pavement networks [3,4,5]. PMSs are used to select suitable intervention levels and maintenance plans, considering grip, bearing, and roughness levels [6,7,8,9,10]. The replacement of a damaged pavement is expensive and often causes serious traffic delays [5,11,12]. A proactive approach based on continuous and limited pavement maintenance is rather less invasive and far more efficient. It ensures long-term performance and reduced road congestion while eliminating safety concerns due to total rehabilitation (replacement) [5,13]. However, a proactive approach, as an integral part of modern PMS, necessitates information gathering and examination at several levels [14,15].

Growing attention has been paid to the applications of PMS in urban regions. This is often driven by the limited economic resources available to transportation bodies [16,17,18,19]. PMSs utilize several pavement performance indices encompassing visual and automatic inspections [10], the most popular of which is PCI [9,20]. PCI is a numerical value that indicates the distress level of the asphalt pavement surface, and, thus, it offers a measure of the current state of the pavement [21,22]. The U.S. Army Corps of Engineers proposed using the PCI technique in 1997 [23]. However, PCI is not a direct measure of structural capacity, skid resistance, or road roughness. It is rather an objective tool for assessing the maintenance and rehabilitation (M&R) needs of a roadway section in the network.

This research aims to develop simpler, faster, and more economical tools for forecasting PCI using pavement distress. As such, this study attempts to build accurate predictive models for PCI assessment. The data were collected through a field survey. Machine learning (RF, SVM, and DT) and deep learning algorithms (ANN) are developed for this purpose, rather than using traditional regression or classical software packages. MicroPAVER software version 5.2 is a commonly used method to estimate the PCI. This traditional method, however, is based on manual data entry, which is time-consuming and cumbersome. So, the AI-based models developed in this study eliminate manual effort and save time.

2. Prediction Models of PCI

Several studies are available to model or estimate the PCI using collected pavement distresses. Galehouse et al. (2003) [21,24] identified a number of advantages of using PCI for pavement M&R. It helps improve road systems and offers protective repair approaches [21,24]. Hajj et al. (2011) [25] explained that the PCI score of a roadway only handles the surface distress collected from the field, and it is not a direct measure of structural size, skid resistance, or pavement roughness. As such, it is an impartial measure for evaluating the M&R requirements of roadways. In 2015, Alwan conducted site and laboratory experiments to evaluate pavement distress for a main road using the standard PCI technique [26].

Recent studies utilize new modeling techniques like instance neural networks. Issa et al. (2022a) [27] utilized six pavement distresses and developed an optimized hybrid model to calculate PCI using the Long-Term Pavement Performance (LTPP) database. Their model consists of a cascade architecture with three traditional ML models, in addition to an ANN model. The model successfully predicted PCI with excellent accuracy,

R^{2}

values of 0.998, 0.997, and 0.997 for training, testing, and cross-validation, respectively [27]. Another study by Issa et al. (2022) [2] explained how the PCI could be predicted through the development of an artificial intelligence (AI) approach. Furthermore, the use of ANN allows presenting different local variables. For instance, the presence of manholes in pavement sections. The results showed that the ANN model outperformed the other models in predicting the PCI with a great level of robustness—

R^{2}

values of 0.997, 0.998, and 0.996 for training, testing, and validation, respectively. The regression slope between measured and predicted PCIs varies between 0.996 and 0.997 [2]. To this end, AI techniques can yield excellent PCI predictions (as illustrated in Figure 1).

The LTPP database can be a useful source for this purpose. Badr et al. (2022) [29] used the LTPP database for nine states of the United States of America to calculate the PCI of the flexible pavement. Pavement segments were grouped into two sets. The first set of segments (codes SPS-1&3&8) includes pavement sections with no strengthening overlays. The second set of segments (code SPS-5) comprises the pavement sections overlain by a protection layer. The model indicated an excellent prediction of the PCI, with

R^{2}

values above 0.8 for almost all segments [29]. Jalal et al. (2017) [28] proposed an enhanced ANN model among different ANN architectures to estimate PCI. The model was developed using 173 datasets by Texas A&M University. All distresses were recognized, assessed, and measured from 2014 to 2016. The

R^{2}

values were calculated for training, testing, and validation subsets, as well as all data, 0.978, 0.965, 0.973, and 0.974, respectively. The high values support the competence of the model [28].

3. Originality

This study proposes a new approach for predicting PCI using AI, particularly ML and deep learning algorithms. The suggested method provides reliable PCI estimates and can be integrated into PMS using widely available spreadsheet software. Unlike traditional methods relying on specialized software like MicroPAVER, this approach offers greater flexibility and accessibility. It enables seamless data import and export, eliminates the need for data re-entry, and provides a faster alternative for PCI calculation, making it particularly useful for regions where specialized software is scarce. Though this study was conducted on urban agricultural roads in Egypt, it applies to roads in similar climatic and operational conditions. This study uniquely offers a detailed visual and quantitative comparison between the proposed models. It is important to note that similar reported studies are generally limited in area and data size. For instance, the study area of Jalal et al. 2017 [28] is only 22 km². Also, Issa et al. (2022b) [2] used merely 10 different roads, which yielded a simple ANN model. They also used the LTPP database for this purpose.

4. Objective and Methodology

This study attempts to develop a pavement performance model for flexible pavement. In this way, the methodology comprises four steps. As shown in Figure 2, the first step involves collecting data through visual inspection. This is followed by data analysis, including PCI calculation. The data are then used to develop four AI-based models using the following approaches: RF, SVM, DT, and ANN. The fourth step examines models’ performances via validation and error analysis. This step sets out to compare the four developed models. Finally, the statistical performance of the developed models is used to select the most accurate model based on error analysis.

5. Study Area and Data Collection

The data used in this research is based on selected urban flexible road segments in the governorate of Beni Suif, Egypt. The area of the study is located in an arid, non-freezing region. Approximately 15,000, 100 m long pavement segments were gathered over 3 years from regions that have the same environmental conditions (e.g., temperature, rainfall) and structural layer thickness and materials. The data were collected through integrating a desk study and new field observations. Office data consist of available maps, types of pavement layers and thicknesses, cross-section elements, traffic volumes, costs, and environment-related data provided by the Directorate of Roads and Transport in Beni Suif. The field data consists of geometry-related data and physical distress. The common pavement distresses were alligator cracking, longitudinal cracking, bleeding, rutting, and weathering. The PCI, based on visual examination, calculates every single type of distress which is categorized into three levels of severity, as reflected on pavement functionality and its structural performance and journey level: low (L), moderate (M), and high (H) [30,31].

Data collection of distress (i.e., defective areas) has been carried out using gadgets that facilitate real-time data transfer and downloading. In this study, handheld computers and global positioning system (GPS) technology have been used to capture pavement conditions and spot distressed areas of the pavement. The distress has been accordingly defined based on the 19 distresses listed in the PAVER system [32]. The PCI values for 15,000 pavement segments were calculated using MicroPAVER software. The gathered values and observed distresses were used for forecasting the proposed models. Classification of the distresses and respective severity considered in this study are included as Supplementary Data (Table S1).

A sample of the statistical characteristics of the training and testing subsets is summarized in Table 1. The table summarizes the statistics of the surveyed distresses, including the number of occurrences (count), standard deviation, and minimum and maximum values. It is noticed that there is a wide range in the values of bleeding, with relatively low means and high maximum values. However, upon close inspection, these high values are not indicative of outliers or anomalies. Rather, they reflect genuine variability, naturally expected in large-scale urban pavement networks. Hence, observing a few segments with considerably larger areas of distress compared is not unusual given the data size.

Figure 3 illustrates the distribution of measured PCI values across the 15,000 pavement segments. The distribution exhibits a notable skewness toward higher PCI ranges (70–100), signaling satisfactory to good condition for most of the examined network [33]. Such skewness could typically influence the performance of predictive models, particularly those sensitive to data distribution, e.g., SVR (using the RBF kernel) and ANN. Generally, unbalanced data distributions can cause models to become biased, providing excellent predictions within the dominant range but potentially underperforming for infrequent or extreme cases (lower PCI values in this study).

6. Development of AI Models

To train and evaluate forecasting models for PCI prediction, the evaluation begins by preprocessing the dataset and preparing it for AI analysis. The dataset consists of pavement distresses collected from various pavement segments, each characterized by a set of distress indicators and an associated PCI value. The feature matrix, which contains 51 features (3 severity levels × 17 distress types) and one target variable (PCI), was extracted. Next, the dataset was split into training and testing sets to facilitate model training and evaluation. A standard train-test split strategy was employed, allocating 80% of the data for training and 20% for testing [34]. The adopted split ratio aligns with common practices and widely recommended guidelines in ML. Split ratios of 80–20 or 70–30 allow sufficient training data while reserving an adequately sized independent dataset for model validation and performance evaluation. This ratio also ensures a good balance between model stability and reliable performance for generalization [35].

The PCI was treated as a continuous variable to preserve the full resolution of the pavement performance data. It is common in some studies, however, to convert PCI into ordinal or discrete classes (e.g., “Good”, “Fair”, “Poor”) for simplification or rule-based decision-making. This transformation inevitably leads to a loss of information by discretizing inherently continuous measurements. Modeling PCI as a continuous outcome offers several key advantages, including enabling learning algorithms to capture subtle variations in pavement conditions, leading to more precise predictions. Ordinal classification, in contrast, flattens this granularity and may ignore near-boundary effects that are critical in practice. Moreover, regression models also provide continuous outputs that can later be mapped to any decision threshold or management category (e.g., maintenance planning, budget allocation), making them adaptable across agencies with different classification schemes. Continuous PCI predictions can also be directly compared against national or agency-specific performance targets, which allows monitoring deterioration trends without being limited to fixed-condition bands.

6.1. Support Vector Regression (SVR)

SVR is a supervised learning algorithm that extends the principles of SVMs to the regression setting recommended by VAPNIK [36]. The SVM model is a machine learning technique that can also solve a wide range of grouping problems of samples, non-linearities, and high-dimensional statistics [37,38]. The Vapnik–Chervonenkis (VC) theory of statistical principle and structural risk minimization is the basis of its principle, and the user seeks the optimal key in data mining by forming an ideal hyperplane [39]. Generally, the length of the sample is reduced to make the problem simpler, whereas the SVM method works in another way. It uses the infinite-dimensional area to handle linear problems and even the kernel function to plot the model units to high-dimensional [40].

Unlike traditional regression methods, SVR aims to find a hyperplane in a high-dimensional feature space that has the maximum margin with respect to the training data points. This hyperplane serves as the decision boundary, and SVR seeks to minimize the error of the predictions while still staying within a specified margin of tolerance. One of the key advantages of SVR is its ability to handle non-linear relationships between input features and target variables using kernel functions. The radial basis function (RBF) kernel, employed in this study, is particularly well-suited for capturing complex, non-linear relationships between pavement distress indicators and the PCI. Also, unlike linear kernels or polynomial kernels, the RBF kernel offers flexibility and enhanced performance for datasets exhibiting intricate non-linear patterns, such as pavement distress data. In this study, careful hyperparameter tuning was conducted to select optimal values that maximize the prediction accuracy.

Specifically, for the SVR model, the hyperparameter was systematically optimized using a grid-search methodology combined with 5-fold cross-validation. Firstly, the SVR model was initialized with an RBF kernel for the reasons mentioned earlier. The regularization parameter (C) was tested at 0.1, 1, 4, 10, 50, and 100. The value C = 4 was ultimately selected, as it provided the best balance between model complexity and generalization [40]. While Epsilon (ε) was set to 0.1, aligning with the commonly used default value recommended in the literature, balancing prediction accuracy and model robustness. Additionally, several values of Gamma (γ) were tested [‘scale’, ‘auto’, 0.001, 0.01, 0.1, 1, 10] for RBF Kernel, with the best-performing option identified as ‘scale’, which adjusts automatically based on the dataset’s characteristics. The selected hyperparameters warrant the lowest RMSE and highest R². This systematic tuning ensured the SVR model achieved robust predictive performance.

In this trial, separate SVR models were trained. The proposed model prepared on the training dataset is evaluated on unseen testing data to assess performance and generalization. The results for all datasets and the three groups (i.e., training, validation, and testing datasets) are shown in Figure 4. The scatter plot (left) and Kernel density (K-D) plot (right) are indicative of the model performance. The graph provides a visual insight into the model behavior that cannot be fully captured solely by statistics. The high scattering around the line of equal values indicates the model’s weakness. The outliers deviate significantly from the diagonal and may represent inaccurate predictions. This also may suggest overfitting or underfitting, as the value of

R^{2}

(less than 0.90). Also, the K-D plot shows the mismatch between the measured PCI curve and the predicted PCI curve. The error values of this model are high compared to those of the later models, thus indicating low predictability.

6.2. Decision Tree (DT)

DTs are among the most common classification algorithms. A reason for their popularity is their intelligibility and simplicity of clarification [35]. DT Regression is a non-parametric supervised learning method used for both classification and regression tasks. It works by recursively partitioning the input space into smaller regions and fitting a simple model (e.g., a constant value) within each region. In the case of regression, the predicted value for a given instance is the average (or another appropriate summary statistic) of the target values in the region to which that instance belongs. As their name implies, these algorithms create a classifier tree built on the trends in the dataset. First versions of DTs, for instance, ID3 and CLS, were merely capable of being learned from separate data [41], while the next versions (e.g., C4.5) can learn from connected and separate variables together [42].

In this study, DT regression was employed to predict PCI based on pavement distresses. The DT model can capture complex non-linear relationships between distress indicators and PCI values, making it well-suited for this predictive task. This study employed the CART (Classification and Regression Tree) algorithm, which utilizes variance reduction (based on MSE) as its splitting criterion. Thus, commonly known criteria like the ‘Gini Index’ and ‘Entropy’ are specifically associated with classification tasks and do not apply to regression trees. For regression trees, the optimal splits are determined by minimizing the variance (MSE) within each subset after splitting. The DT model was initialized with certain hyperparameters to control its complexity and generalization. The maximum depth of the tree was set to 10 to limit complexity, and a minimum number of samples required to split an internal node was set at 10, ensuring robust splits. Additionally, cost complexity pruning was incorporated by setting the cost complexity parameter (ccp_alpha) to 1 to provide additional pruning. These hyperparameters were selected systematically to optimize predictive performance, prevent overfitting, and enhance generalization.

The DT model was trained using the training dataset to learn the underlying patterns between distress indicators and PCI values. The results for all datasets and the three groups (i.e., training, validation, and testing datasets) are shown in Figure 5 as the values of

R^{2}

in datasets increase (close to 0.84), while the values of MAE and RMSE decrease, the data in the scatter plot show more closeness to the neutral line compared to the SVR model and the K-D curves almost overlap although the deviation still exists. Thus, the predictions of the decision tree algorithm are more accurate compared to the previous model.

6.3. Random Forest (RF)

RF is an ensemble learning method that combines the predictions of multiple individual DTs to improve predictive performance and robustness. This procedure was recommended by Breiman in 2001 [43,44]. In RF Regression, each tree in the ensemble is trained on a random subset of the training data and a random subset of features, resulting in a diverse set of trees that collectively provide more accurate predictions. Ensemble classification approaches have stood the test of time and demonstrated to be extremely precise prediction and classification systems [45].

RF algorithm offers several advantages, including the ability to handle high-dimensional datasets with a large number of features and the capability to capture complex nonlinear relationships [46]. The RF model was initialized with hyperparameters tailored to our predictive task through experimentation and cross-validation. Hence, hyperparameters are systematically tuned to ensure optimal performance and robust generalization. Specifically, the maximum depth of each tree was set to 20 to control tree complexity and mitigate overfitting. The minimum number of samples required to split an internal node was 60 to ensure each decision split is supported by sufficient data, improving model robustness. Additionally, the maximum number of features considered for splitting at each node was set to the square root of the total number of features, which aligned with a widely accepted practice for regression using RFs. An ensemble of 400 trees was constructed to provide robust and stable predictions by averaging outputs from multiple regression trees. Random state was fixed at 101, ensuring the reproducibility of the results.

The RF model was trained using the training dataset to learn the underlying patterns between distress indicators and PCI values. The results for all datasets and the three groups (i.e., training, validation, and testing datasets) are shown in Figure 6, a noticeable difference from the previous models can be noticed as the value of

R^{2}

is almost close or equal to 0.90. This is also confirmed by the data’s closer alignment to the neutral line in the scatter plot, as well as by the K-D plot, where the deviation between the curves decreases, showing the difference in accuracy compared to the two previous models.

6.4. Artificial Neural Network (ANN)

ANNs are a class of deep learning models inspired by the structure and function of the human brain. ANNs are capable of learning complex non-linear relationships between input features and target variables. In this study, an ANN model was employed to predict PCI based on pavement distress. ANNs offer the flexibility to capture intricate patterns and relationships in the data, making them suitable for our regression task. A feedforward backpropagation ANN is usually formed from three layers, as shown in Figure 7 [47,48,49,50]. The first layer is named the input layer, which is a vector that characterizes the input variables entirely. Commonly, the inputs are regularized before being fed into the input component. This regularization is utilized to check that the ANN is neutral, where all inputs will have a similar variety as soon as they are regularized. In this case, the input variables are the pavement distresses, so the input layer serves as the entry point for the pavement distress data into the neural network. It consists of neurons equal to the number of distress indicators, allowing the model to receive and process information about the condition of the pavement segments [2].

The second layer (hidden) includes two hidden layers, each containing a specific number of neurons responsible for learning and abstracting features from the input data. The first hidden layer comprises 64 neurons, each connected to every neuron in the input layer. This layer captures the primary patterns and relationships present in the distress indicators. The second hidden layer consists of 32 neurons, providing additional capacity for the model to learn complex representations of the input data. Each neuron in this layer integrates information from the previous layer to refine the learned features. The number of neurons in the hidden layers is a trainable factor. Lastly, the third layer of ANN is the output layer, and it accumulates all the transmitted signals from the hidden layer and then performs a series of operations on those signals toward the output vector. For instance, the output layer consists of a single neuron responsible for predicting the PCI for each pavement segment. As PCI prediction is a regression task, the output neuron produces continuous values without the application of an activation function [2]. The sigmoid activation function in the hidden layers was employed to introduce non-linearity into the model. This activation function enables the model to capture non-linear relationships within the data and improve its predictive performance. The specifications of the ANN architecture are presented in Table 2.

The model parameters are optimized using Adam’s optimizer, a stochastic optimization algorithm widely used in neural network training. The MSE loss function is utilized to measure the discrepancy between predicted and measured PCI values and thus enhances prediction accuracy. The network was trained for 50 epochs, using the default batch size (equal to the number of samples unless otherwise specified), with no dropout or batch normalization. This architecture was determined empirically using manual tuning, guided by performance on the validation set. Preliminary experiments explored various numbers of hidden layers (ranging from 1 to 3), neurons per layer (16, 32, 64, 128), and activation functions (ReLU, Sigmoid, Tanh). The two layers and sigmoid activation configuration showed the best balance between model complexity and performance. While automated tuning methods such as grid search or Bayesian optimization were considered, manual tuning was deemed sufficient for the current dataset size and problem scale.

The results for all datasets and the three groups (i.e., training, validation, and testing datasets) are shown in Figure 8. The results show that the value of the

R^{2}

reached its maximum (approx. 0.93) while MAE (approx. 3) and RMSE (approx. t 7.0) values were the minimum among the models; the plots clearly show the robustness of the ANN model. The points are clustering along the line of equal values in both plots, indicating acceptable accuracy of unseen data as well as generalization. The curves in the K-D plot overlapped strongly, and the deviation between the curves became smaller, indicating an accurate prediction of PCI. The previous remarks confirm the precision of the model as well amongst the rest of the models.

7. Statistical Performance of the Models

Statistical indicators are used to evaluate the performance of the models and the quality of their predictions, namely, MAE, RMSE, and

R^{2}

. Table 3 summarizes the statistical performance for each data subset. The ANN model outperforms all other models, as captured by the high

R^{2}

value, and the low MAE and RMSE values. The least-performing model was SVR, with

R^{2}

= 0.8062, RMSE = 12.69%, and MAE = 7.74%. The similar

R^{2}

values for training and testing datasets rule out data overfit, which promotes model generalization. This was observed for all models. Other error indices were used to test the performance of the models: the Average Bias (B) (Equation (1)), which is the relation between predicted and measured PCI values, and the Willmott index of agreement (

I_{a}

) (Equation (2)), which is the relation between the MSE and the potential error (PE). As indicated in Table 4, the ANN model has the highest

I_{a}

(0.993) and the lowest B (1.04). To this end, the ANN model has proven to be the most accurate among the tested models, supporting the previous findings.

B = \frac{{P C I}_{s - p}}{{P C I}_{s - m}}

(1)

I_{a} = 1 - \frac{\sum_{i = 1}^{n} {[{({P C I}_{s - m})}_{i} - {({P C I}_{s - p})}_{i}]}^{2}}{\sum_{i = 1}^{n} {[|{({P C I}_{s - p})}_{i} - \bar{{P C I}_{s - m}}| + | {({P C I}_{s - m})}_{i} - \bar{{P C I}_{s - m}} |]}^{2}} = 1 - n \frac{M S E}{P E}

(2)

Figure 9a depicts the R² surface of the SVR model. The surface reveals a steady improvement in model performance with values of “C”, stabilizing at values of 4 to 5. Similarly, the R² values improved gradually as “epsilon” increases, with insignificant gains beyond “epsilon” of 0.8. The selected configuration (“C” = 4, “epsilon” = 0.1) lies in the optimal region of the plot, where R² values are maximum. The visualization confirms that this point falls within a stable and high-performing plateau, validating the robustness and suitability of the final hyperparameter settings. Figure 9b illustrates that the DT model’s performance improved significantly by increasing the tree depth up to a value of approximately 10 and plateaued thereafter. Similarly, the R² score showed a positive correlation with increasing “ccp alpha” up to a value of approximately 1.0, beyond which further gains were negligible. The highest performance was achieved in the region bounded by tree depth values between 10 and 16 and “ccp alpha” values between 0.8 and 1.0, indicating a zone of optimal model complexity.

Figure 9c illustrates the impact of “max depth” and “min samples split” on the RF’s R² value, with the “n estimators” fixed at 400. As shown in the plot, R² performance improves substantially as the tree depth increases, particularly between values of 5 and 20. The model performance becomes more stable at higher “min samples split” values (15 to 25), indicating reduced variance and improved generalization. The chosen combination of “max depth” of 20 and “min samples split” of 60 lies in the high-performance region of the surface, targeting the optimal zone. This is supported visually by reaching the maximum value of R², which plateaued afterward (Figure 9d). The number of neurons in the first and second hidden layers was varied, holding all other parameters constant. The surface reveals a distinct performance peak around the selected 64–32 architecture. Minimal gain in R² was observed with a further increase in the number of neurons. The visual evidence confirms that the selected architecture performed well, and yielded stabilization and generalization.

The ANN achieved the highest predictive accuracy, as indicated by error metrics. This superior performance is attributed to the ANN’s ability to capture complex, non-linear relationships and interactions among the distress types. The ANN architecture provides a powerful framework capable of capturing these intricate relationships effectively. Its hierarchical learning approach allows the model to learn simple and complex patterns, which simpler models (like DTs and SVR) may struggle to represent adequately. Furthermore, the large and diverse dataset (15,000 pavement segments) used in this study provided sufficient variability and scale for the ANN model to generalize the predictions. Concurrently, this minimizes the risk of overfitting, thereby achieving superior predictive performance compared to the other algorithms tested.

8. Conclusions and Future Directions

PCI values were predicted in this study based on the collected pavement distresses in the study area, such as alligator cracks, bleeding, depression, and corrugation. The data were gathered through visual inspection of pavement sections for urban roads. Four different approaches were employed to forecast the model and estimate the PCI value: three machine learning approaches, SV Regression, DT and RF, and a deep learning model, ANN. Eighty percent (80%) of the datasets were used for training the model, whereas the remaining 20% were used for the test process. The results suggest that the ANN is the optimum model that can accurately estimate the PCI with several pavement distresses. The measured and predicted PCI values confidently show linear correlation, indicating an accurate and dependable prediction model. A close investigation of the four models supports the use of AI techniques to build a correlation between surface pavement distress and PCI. The ANN model can also be applied to large-scale pavement maintenance estimation, producing more effective and accurate pavement condition assessment. A statistical evaluation of the models was made using different error norms. The

R^{2}

value was the highest (0.924) for ANN among all models. Similarly, its RMSE (7.93%) and MAE (3.25%) were the lowest.

Compared to the proposed ANN model, the MicroPAVER software, however, is time-consuming, inaccessible, costly, and tedious. As such, the findings of this study are instrumental for countries with similar environmental and operational conditions. The proposed models can save effort and time if adequate pavement performance data are available. Unmanned aerial vehicles (UAVs) can be used to collect data on pavement distress, providing cheaper and more accurate predictions. In addition to pavement distress, PCI could also be correlated with other structural and environmental variables, e.g., pavement age, precipitation, or temperature.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/eng6060114/s1, Table S1. Classification of the distresses and respective severity considered in the study (all variables are continuous).

Author Contributions

Methodology, M.M.R.; Validation, M.M.R.; Formal analysis, M.M.R. and A.M.; Investigation, M.M.R. and A.Y.B.; Data curation, M.M.R., S.A.F. and A.M.; Writing—original draft, M.M.R. and A.M.; Writing—review & editing, S.A.F., A.Y.B. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AI	Artificial Intelligence
ANN	Artificial Neural Networks
B	Average Bias
CART	Classification and Regression Tree
CCP alpha	Cost-Complexity Pruning Parameter
DT	Decision Tree
FHWA	Federal Highway Administration
GPS	Global Positioning System
H	High
I_a	Willmott Index of Agreement
K-D plot	Kernel-Density plot
L	Low
LM	Levenberg–Marquardt
LTPP	Long-Term Pavement Performance
M	Moderate
MAE	Mean Absolute Error
M&R	Maintenance and Rehabilitation
ML	Machine Learning
MSE	Mean Square Error
PCI	Pavement Condition Index
PE	Potential Error
PMS	Pavement Management System
R²	Coefficient of Determination
RBF	Radial Basis Function
RF	Random Forest
RMSE	Root Mean Squared Error
SCG	Scaled Conjugate Gradient
SD	Standard Deviation
SVM	Support Vector Machine
SVR	Support Vector Regression
UAV	Unmanned Aerial Vehicles
VC	Vapnik–Chervonenkis theory

References

Jha, S.; Zhang, Y.; Park, B.; Cho, S.; Krogmeier, J.V.; Bagchi, T.; Haddock, J.E. Data-Driven Web-Based Patching Management Tool Using Multi-Sensor Pavement Structure Measurements. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 83–98. [Google Scholar] [CrossRef]
Issa, A.; Samaneh, H.; Ghanim, M. Predicting pavement condition index using artificial neural networks approach. Ain Shams Eng. J. 2022, 13, 101490. [Google Scholar] [CrossRef]
Mathavan, S.; Kamal, K.; Rahman, M. A review of three-dimensional imaging technologies for pavement distress detection and measurements. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2353–2362. [Google Scholar] [CrossRef]
Ragnoli, A.; De Blasiis, M.R.; Di Benedetto, A. Pavement distress detection methods: A review. Infrastructures 2018, 3, 58. [Google Scholar] [CrossRef]
Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image based techniques for crack detection, classification and quantification in asphalt pavement: A review. Arch. Comput. Methods Eng. 2016, 24, 935–977. [Google Scholar] [CrossRef]
Bonin, G.; Polizzotti, S.; Loprencipe, G.; Folino, N.; Rossi, C.O.; Teltayev, B. Development of a road asset management system in Kazakhstan. In Transport Infrastructure and Systems; CRC Press: Boca Raton, FL, USA, 2017; pp. 537–546. [Google Scholar] [CrossRef]
Han, D.; Kobayashi, K. Criteria for the development and improvement of PMS models. KSCE J. Civ. Eng. 2013, 17, 1302–1316. [Google Scholar] [CrossRef]
Loprencipe, G.; Cantisani, G.; Di Mascio, P. Global assessment method of road distresses. In Life-Cycle of Structural Systems; CRC Press: Boca Raton, FL, USA, 2014; pp. 1113–1120. [Google Scholar] [CrossRef]
Shahin, M.Y. Pavement Management for Airports, Roads, and Parking Lots; Chapman & Hall: New York, NY, USA, 1994. [Google Scholar]
Zoccali, P.; Loprencipe, G.; Galoni, A. Sampietrini stone pavements: Distress analysis using pavement condition index method. Appl. Sci. 2017, 7, 669. [Google Scholar] [CrossRef]
Karleuša, B.; Dragičević, N.; Deluka-Tibljaš, A. Review of multicriteria-analysis methods application in decision making about transport infrastructure. J. Croat. Assoc. Civ. Eng. 2013, 65, 619–631. [Google Scholar] [CrossRef]
Marcelino, P.; Antunes, M.d.L.; Fortunato, E. Comprehensive performance indicators for road pavement condition assessment. Struct. Infrastruct. Eng. 2018, 14, 1433–1445. [Google Scholar] [CrossRef]
Gransberg, D.D.; Tighe, S.L.; Pittenger, D.; Miller, M.C. Sustainable pavement preservation and maintenance practices. In Climate Change, Energy, Sustainability and Pavements; Springer: Berlin/Heidelberg, Germany, 2014; pp. 393–418. [Google Scholar] [CrossRef]
Chan, C.Y.; Huang, B.; Yan, X.; Richards, S. Investigating effects of asphalt pavement conditions on traffic accidents in Tennessee based on the pavement management system (PMS). J. Adv. Transp. 2010, 44, 150–161. [Google Scholar] [CrossRef]
Coenen, T.B.; Golroo, A. A review on automated pavement distress detection methods. Cogent Eng. 2017, 4, 1374822. [Google Scholar] [CrossRef]
Corazza, M.V.; Di Mascio, P.; Moretti, L. Managing sidewalk pavement maintenance: A case study to increase pedestrian safety. J. Traffic Transp. Eng. (Engl. Ed.) 2016, 3, 203–214. [Google Scholar] [CrossRef]
Cottrell, W.D.; Bryan, S.; Chilukuri, B.R.; Kalyani, V.; Stevanovic, A.; Wu, J. Transportation infrastructure maintenance management: Case study of a small urban city. J. Infrastruct. Syst. 2009, 15, 120–132. [Google Scholar] [CrossRef]
Loprencipe, G.; Pantuso, A. A specified procedure for distress identification and assessment for urban road surfaces based on PCI. Coatings 2017, 7, 65. [Google Scholar] [CrossRef]
Loprencipe, G.; Pantuso, A.; Di Mascio, P. Sustainable pavement management system in urban areas considering the vehicle operating costs. Sustainability 2017, 9, 453. [Google Scholar] [CrossRef]
Radwan, M.M.; Zahran, E.M.M.; Dawoud, O.; Abunada, Z.; Mousa, A. Comparative Analysis of Asphalt Pavement Condition Prediction Models. Sustainability 2024, 17, 109. [Google Scholar] [CrossRef]
Temimi, F.A.; Ali, A.H.M.; Obaidi, A.H. The Pavement Condition Index (PCI) Method for Evaluating Pavement Distresses of The Roads in Iraq-A Case Study in Al-Nasiriyah City. Univ. Thi-Qar J. Eng. Sci. 2021, 11, 17–23. [Google Scholar] [CrossRef]
Radwan, M.M.; Mousa, A.; Zahran, E.M.M. Enhancing Pavement Sustainability: Prediction of the Pavement Condition Index in Arid Urban Climates Using the International Roughness Index. Sustainability 2024, 16, 3158. [Google Scholar] [CrossRef]
Pinatt, J.M.; Chicati, M.L.; Ildefonso, J.S.; Filetti, C.R. Evaluation of pavement condition index by different methods: Case study of Maringá, Brazil. Transp. Res. Interdiscip. Perspect. 2020, 4, 100100. [Google Scholar] [CrossRef]
Galehouse, L.; Moulthrop, J.S.; Hicks, R.G. Principles of pavement preservation: Definitions, benefits, issues, and barriers. TR News 2003. [Google Scholar] [CrossRef]
Hajj, E.Y.; Loria, L.; Sebaaly, P.E.; Borroel, C.M.; Leiva, P. Optimum time for application of slurry seal to asphalt concrete pavements. Transp. Res. Rec. J. Transp. Res. Board 2011, 2235, 66–81. [Google Scholar] [CrossRef]
Atiya Alwan, J. Asphaltic Pavement Distresses and the possibility of repair. Al-Qadisiyah J. Eng. Sci. 2015, 8, 15–42. [Google Scholar]
Issa, A.; Sammaneh, H.; Abaza, K. Modeling pavement condition index using cascade architecture: Classical and neural network methods. Iran. J. Sci. Technol. Trans. Civ. Eng. 2021, 46, 483–495. [Google Scholar] [CrossRef]
Jalal, M.; Floris, I.; Quadrifoglio, L. Computer-aided prediction of pavement condition index (PCI) using ANN. In Proceedings of the International Conference on Computers and Industrial Engineering, Lisbon, Portugal, 11–13 October 2017. [Google Scholar]
Badr, Z.; El Gendy, M.; El Refaey, M. Using artificialneuralnetworksto predictthe distresses of flexiblepavement. J. Al-Azhar Univ. Eng. Sect. 2022, 17, 17–25. [Google Scholar] [CrossRef]
Aldelgawy, M.; Abo-Hashema, M.; Radwan, M. Integrating GPS and GIS technologies to enhance Pavement Management System process: A case study. Adv. Transp. Stud. 2022, 58, 105. [Google Scholar]
Miller, J.S.; Bellinger, W.Y. Distress Identification Manual for the Long-Term Pavement Performance Program; Federal Highway Administration, Office of Infrastructure: Washington, DC, USA, 2003.
Shahin, M.Y.; Kohn, S.D. Pavement Maintenance Management for Roads and Parking Lots; Construction Engineering Research Laboratory, U.S. Army Corps of Engineers: Champaign, IL, USA, 1981. [Google Scholar]
Kumar, R.; Suman, S.K.; Prakash, G. Evaluation of pavement condition index using artificial neural network approach. Transp. Dev. Econ. 2021, 7, 20. [Google Scholar] [CrossRef]
Shahnazari, H.; Tutunchian, M.A.; Mashayekhi, M.; Amini, A.A. Application of soft computing for prediction of pavement condition index. J. Transp. Eng. 2012, 138, 1495–1506. [Google Scholar] [CrossRef]
Franklin, J. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 2005, 27, 83–85. [Google Scholar] [CrossRef]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Abdi, M.J.; Giveki, D. Automatic detection of erythemato-squamous diseases using PSO–SVM based on association rules. Eng. Appl. Artif. Intell. 2013, 26, 603–608. [Google Scholar] [CrossRef]
Liu, Z.; Cao, H.; Chen, X.; He, Z.; Shen, Z. Multi-fault classification based on wavelet SVM with PSO algorithm to analyze vibration signals from rolling element bearings. Neurocomputing 2013, 99, 399–410. [Google Scholar] [CrossRef]
Liu, Q.; Zhang, Z.; Huang, F.; Zhu, Y. Prediction of asphalt pavement performance based on support vector machine. Highw. Eng. 2008, 43, 201–220. [Google Scholar]
Wang, X.; Zhao, J.; Li, Q.; Fang, N.; Wang, P.; Ding, L.; Li, S. A hybrid model for prediction in asphalt pavement performance based on support vector machine and grey relation analysis. J. Adv. Transp. 2020, 2020, 7534970. [Google Scholar] [CrossRef]
Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
Piryonesi, S.M.; El-Diraby, T.E. Data analytics in asset management: Cost-effective prediction of the pavement condition index. J. Infrastruct. Syst. 2020, 26, 04019036. [Google Scholar] [CrossRef]
Yang, G.; Yu, W.; Li, Q.J.; Wang, K.; Peng, Y.; Zhang, A. Random forest–based pavement surface friction prediction using high-resolution 3D image data. J. Test. Eval. 2021, 49, 1141–1152. [Google Scholar] [CrossRef]
Zhang, S.; Tan, Z.; Liu, J.; Xu, Z.; Du, Z. Determination of the food dye indigotine in cream by near-infrared spectroscopy technology combined with random forest model. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 227, 117551. [Google Scholar] [CrossRef]
Elyan, E.; Gaber, M.M. A genetic algorithm approach to optimising random forests applied to class engineered data. Inf. sci. 2017, 384, 220–234. [Google Scholar] [CrossRef]
Karballaeezadeh, N.; Mohammadzadeh, S.D.; Moazemi, D.; Band, S.S.; Mosavi, A.; Reuter, U. Smart structural health monitoring of flexible pavements using machine learning methods. Coatings 2020, 10, 1100. [Google Scholar] [CrossRef]
Ghanim, M.S. Florida statewide design-hour volume prediction model. In Proceedings of the Transportation Research Board 90th Annual Meeting, Washington, DC, USA, 23–27 January 2011. [Google Scholar]
Ghanim, M.S.; Abu-Lebdeh, G. Real-time dynamic transit signal priority optimization for coordinated traffic networks using genetic algorithms and artificial neural networks. J. Intell. Transp. Syst. 2015, 19, 327–338. [Google Scholar] [CrossRef]
Ghanim, M.S.; Abu-Lebdeh, G. Projected state-wide traffic forecast parameters using artificial neural networks. IET Intell. Transp. Syst. 2019, 13, 661–669. [Google Scholar] [CrossRef]
Ghanim, M.S.; Shaaban, K. Estimating turning movements at signalized intersections using artificial neural networks. IEEE Trans. Intell. Transp. Syst. 2018, 20, 1828–1836. [Google Scholar] [CrossRef]

Figure 1. R² values for common PCI prediction models [2,27,28].

Figure 2. Methodology used to conduct this study.

Figure 3. PCI distribution for the 15,000 pavement segments considered in this study.

Figure 4. Measured versus predicted PCI values using Support Vector Regression: (a) training; (b) testing; (c) all data.

Figure 5. Measured versus predicted PCI values using decision tree algorithm: (a) training; (b) testing; (c) all data.

Figure 6. Measured versus predicted PCI values using random forest algorithm: (a) training; (b) testing; (c) all data.

Figure 7. Schematic architecture of ANN model.

Figure 8. Measured versus predicted PCI by ANN: (a) training; (b) testing; (c) all data.

Figure 9. Models performance (a) support vector regression (b) decision tree (c) random forest (d) ANN architecture.

Table 1. Sample of the statistical characteristics of the database.

Purpose	Measure	Long. and Trans. Cracking (m)			Bleeding (m²)			Patching (m²)
Purpose	Measure	Low	Med	High	Low	Med	High	Low	Med	High
Training	Count	4827	5186	2534	7211	7164	3578	2808	2741	1368
	Mean	7.7	10.1	9.9	11.0	13.5	19.1	6.2	6.9	14.0
	S.D.	23.8	25.4	29.1	40.9	41.9	65.4	23.6	23.9	33.4
	Min	2.2	2.2	2.2	3.4	4.2	4. 7	2.7	3. 5	3. 8
	Max	100	115.4	200.3	600.6	429.7	650.4	100	124.2	200.6
Testing	Count	1209	1286	634	1807	1791	894	705	685	347
	Mean	8.6	11.0	10.6	13.1	14.7	19.7	7.05	8.0	14.7
	S.D.	25.3	26.6	30.3	48.4	44	62.8	25.2	25.8	34.2
	Min	2.2	2.2	2.3	3.8	4.3	4.9	3.1	3.3	3.6
	Max	100	115.1	200.2	601.1	429.5	649.9	100	124.0	200.4

Table 2. Characteristics of the proposed artificial neural network model.

Characteristics	Value
Number of layers	3
Number of hidden layers	2
Hidden layer #1 activation function	sigmoid
Number of units in hidden layer #1	64
Hidden layer #2 activation function	sigmoid
Number of units in hidden layer #2	32
Output layer activation function	linear
Number of units in the output layer	1
Performance measure	MSE

Table 3. Statistical performance of the proposed models for training, testing, and all data by subsets.

Model	Subset	$R^{2}$	MAE	RMSE
SVR	Training	0.813	7.81	12.61
	Testing	0.806	7.74	12.69
	All Data	0.816	7.63	12.47
Decision Tree	Training	0.840	7.94	11.65
	Testing	0.833	7.81	11.79
	All Data	0.842	7.80	11.58
Random Forest	Training	0.905	5.65	9.00
	Testing	0.891	5.79	9.54
	All Data	0.907	5.40	8.89
ANN	Training	0.940	3.04	7.18
	Testing	0.924	3.25	7.93
	All Data	0.939	2.94	7.20

Table 4. Average Bias (B) and Willmott index of agreement (I_a) of the proposed models.

	SVR	DT	RF	ANN
B	1.06	1.07	1.08	1.04
I_a	0.958	0.962	0.981	0.993

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Radwan, M.M.; Faris, S.A.; Barakat, A.Y.; Mousa, A. Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads. Eng 2025, 6, 114. https://doi.org/10.3390/eng6060114

AMA Style

Radwan MM, Faris SA, Barakat AY, Mousa A. Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads. Eng. 2025; 6(6):114. https://doi.org/10.3390/eng6060114

Chicago/Turabian Style

Radwan, Mostafa M., Sundus A. Faris, Ahmed Y. Barakat, and Ahmad Mousa. 2025. "Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads" Eng 6, no. 6: 114. https://doi.org/10.3390/eng6060114

APA Style

Radwan, M. M., Faris, S. A., Barakat, A. Y., & Mousa, A. (2025). Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads. Eng, 6(6), 114. https://doi.org/10.3390/eng6060114

Article Menu

Distress-Based Pavement Condition Assessment Using Artificial Intelligence: A Case Study of Egyptian Roads

Abstract

1. Introduction

2. Prediction Models of PCI

3. Originality

4. Objective and Methodology

5. Study Area and Data Collection

6. Development of AI Models

6.1. Support Vector Regression (SVR)

6.2. Decision Tree (DT)

6.3. Random Forest (RF)

6.4. Artificial Neural Network (ANN)

7. Statistical Performance of the Models

8. Conclusions and Future Directions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI