Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework

Al-Hamdan, Abdallah B.; Alatoom, Yazan Ibrahim; Nlenanya, Inya; Smadi, Omar

doi:10.3390/civileng5040048

Open AccessArticle

Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework

by

Abdallah B. Al-Hamdan

,

Yazan Ibrahim Alatoom

,

Inya Nlenanya

and

Omar Smadi

^*

Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA 50011, USA

^*

Author to whom correspondence should be addressed.

CivilEng 2024, 5(4), 949-970; https://doi.org/10.3390/civileng5040048

Submission received: 26 August 2024 / Revised: 13 September 2024 / Accepted: 9 October 2024 / Published: 17 October 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study proposes a novel framework for determining variables’ weights in transportation assets condition indices calculations using statistical and machine learning techniques. The methodology leverages subjective ratings alongside objective measurements to derive data-driven weights. The motivation for this study lies in addressing the limitations of existing expert-based weighting methods for condition indices, which often lack transparency and consistency; this research aims to provide a data-driven framework that enhances accuracy and reliability in infrastructure asset management. A case study was performed as a proof of concept of the proposed framework by applying the framework to obtain data-driven weights for pavement condition index (PCI) calculations using data for the city of West Des Moines, Iowa. Random forest models performed effectively in modeling the relationship between the overall condition index (OCI) and the objective measures and provided feature importance scores that were converted into weights. The data-driven weights showed strong correlation with existing expert-based weights, validating their accuracy while capturing contextual variations between pavement types. The results indicate that the proposed framework achieved high model accuracy, demonstrated by R-squared values of 0.83 and 0.91 for rigid and composite pavements, respectively. Additionally, the data-driven weights showed strong correlations (R-squared values of 0.85 and 0.98) with existing expert-based weights, validating their effectiveness. This advanceIRIment offers transportation agencies an enhanced tool for prioritizing maintenance and resource allocation, ultimately leading to improved infrastructure longevity. Additionally, this approach shows promise for application across various transportation assets based on the yielded results.

Keywords:

transportation assets; subjective rating; machine learning; feature importance; asset management; pavement performance; data science; pavement condition index (PCI); pavement condition data; weights estimate

1. Introduction

Infrastructure asset management involves the strategic management of physical assets to ensure their optimal performance and longevity. One crucial aspect of infrastructure asset management is the use of composite condition indices, which were initially developed to facilitate the management process since it is hard to manage what cannot be measured [1]. These indices are essential tools that provide a comprehensive assessment of the condition of various types of infrastructure assets [2]. Composite condition indices combine multiple individual condition indicators to give a holistic view of the overall health and performance of assets. They play a vital role in decision-making processes related to the maintenance, repair, and replacement of infrastructure assets [3]. The importance of composite condition indices in infrastructure asset management cannot be overstated. These indices enable asset managers to prioritize maintenance activities based on the criticality of assets, allocate resources efficiently, and extend the lifespan of infrastructure assets [2]. By utilizing composite condition indices, transportation agencies can make informed decisions that enhance the reliability, safety, and performance of their infrastructure assets [4].

In the realm of transportation assets, various types of assets have condition indices that are crucial for effective asset management. Some of these indicators, methods of assessment, and their corresponding scales are summarized in Table 1.

For example, in pavement infrastructure management, the pavement condition index (PCI) provides a standardized rating of overall pavement quality on a scale from 0 to 100 [5]. The PCI is calculated from visual inspections of pavement distresses, such as cracking, rutting, and roughness, which are converted into deductions that reduce the PCI score from a perfect 100 [6]. These deductions are weighted based on the relative impact of each distress type on ride quality, safety, and remaining service life. Similarly, in railway infrastructure management, the Track Quality Index (TQI) is employed to evaluate the condition of railway tracks [7,8]. The TQI is calculated based on various parameters such as track gauge, cant, longitudinal level, and lateral level, providing a holistic view of the track’s health [7,8]. On the other hand, the Bridge Condition Index (BCI) assesses the structural health of bridges. The BCI provides a quantitative assessment that aids in decision-making processes related to bridge maintenance and repair strategies, ensuring the safety and serviceability of bridge systems [2,9]. These condition indices provide valuable insights that help transportation agencies prioritize maintenance and rehabilitation efforts, ultimately ensuring the longevity and efficiency of transportation networks.

Objective condition indices are based on measurable, quantifiable data that can be consistently replicated. For instance, the PCI is an objective index as it relies on specific measurements of pavement distresses such as cracking and rutting. These measurements are standardized and can be consistently applied across different pavements. On the other hand, subjective condition indices are based on assessments that may involve personal judgment or qualitative evaluations. The overall condition index (OCI) is a pavement surface rating system that is utilized by some transportation agencies to evaluate the condition of road surfaces [10,11]. The OCI score is determined through the visual inspection of pavement distresses by trained raters using the Pavement Surface Evaluation and Rating (PASER) manual [12]. PASER provides guidance on identifying the types of distresses present and subjectively judging their severity levels. The OCI score ranges from 1 to 10, with lower scores indicating poorer pavement condition and higher scores representing better condition. A score of 1 suggests pavement failure, while a rating of 10 is for new or excellent pavement condition. While the OCI relies on subjective expert judgment for condition assessment, it serves a similar role to the PCI in helping agencies determine maintenance, repair, and rehabilitation needs based on surface condition assessments.

While these composite condition indices have proven invaluable in guiding infrastructure asset management decisions, a significant limitation exists in their current implementation. The selection of weights for various factors within these indices largely relies on engineering judgment, experience, and experimental approaches. While useful, this expert-based weighting has limitations in terms of transparency, consistency across contexts, and validation. The reliance on expert opinion introduces potential issues such as lack of standardization, bias, difficulty in validation, and challenges in knowledge transfer.

Data-driven techniques offer the potential to supplement or validate expert weights across various infrastructure asset management domains. Machine learning methods applied to comprehensive infrastructure inspection databases could help empirically derive relative impact factors for different condition indicators based on their measured correlations with overall asset performance. The relationships found between individual condition metrics and holistic assessment scores can quantify the predictive power of each variable. This study proposes a framework to utilize such condition data in tailoring composite index weights to specific infrastructure contexts. It applies feature selection techniques to determine the importance of various condition variables in predicting overall asset health. As a case study, this approach is demonstrated in the context of pavement management, where data-driven feature importance scores for distresses like cracking and roughness serve as updated weights in PCI calculations, validated against visually assessed OCI scores. This methodology, however, is adaptable to other infrastructure domains, such as bridges, railways, and water systems, where similar composite indices are employed. By leveraging data-driven approaches, asset managers can enhance the accuracy, consistency, and relevance of their condition assessments, leading to more informed and reliable decision-making across diverse infrastructure types. The purpose of this study is to develop a data-driven framework for weighting variables in transportation asset condition indices by integrating subjective expert condition assessments with objective data. This framework aims to improve the accuracy and transparency of condition assessments, enhancing decision-making in infrastructure management. This data-driven approach seeks to provide experts and asset managers with an advanced tool for estimating variable weights in condition index calculations. By correlating objective condition metrics of the assets with visual condition ratings, this method enables the assignment of weights to these metrics based on their relationship to the observed visual condition. This alignment has the potential to enhance the effectiveness of infrastructure asset management strategies significantly. This study holds significant value for the field of civil engineering, particularly in the domain of infrastructure management. Transportation asset condition indices are crucial tools for civil engineers in assessing, maintaining, and prolonging the lifespan of critical infrastructure. By introducing a novel data-driven framework that integrates both subjective and objective measurements, this study provides civil engineers with a more accurate, transparent, and adaptable approach to condition assessment. Ineffective evaluation methods can have significant consequences in infrastructure management, leading to poor decision-making that affects asset safety, performance, and cost efficiency. For instance, the miscalculation of pavement condition due to subjective assessments can result in delayed or unnecessary maintenance, ultimately leading to premature failure or increased repair costs. In bridge management, inadequate or biased assessments of structural health have, in some cases, led to the underestimation of critical maintenance needs, posing safety risks and increasing long-term rehabilitation expenses. Similarly, the inconsistent weighting of distress factors across different regions has contributed to inefficient resource allocation, where critical assets may not receive the necessary attention due to inaccurate condition assessments. These examples highlight the importance of adopting a more objective, data-driven approach, as proposed in this study, to ensure that transportation asset condition indices are reliable and lead to effective decision-making.

2. Literature Review

2.1. Condition Indicators

Infrastructure asset management relies heavily on condition indices to assess and monitor the health of various assets. These indices play a crucial role in decision-making processes for maintenance, repair, and replacement strategies. This review examines the examples of condition indices used across different infrastructure domains, focusing on their calculation methods, parameters, and weighting approaches.

One of the most widely used indices in bridge management is the BCI. The BCI is a metric used to assess the condition of bridges. It involves inspecting each bridge components regularly, typically every two years, to evaluate their state and assign a rating if necessary [13]. The BCI calculation method varies but often involves summing the ratings of all bridge components multiplied by their relative importance, resulting in a value that represents the overall condition of the bridge [14]. Weighting methods play a crucial role in determining the significance of different parameters in the BCI calculation process. For instance, the Analytical Hierarchy Process (AHP) has been utilized to assign weights to various factors influencing the BCI, enabling the prioritization of maintenance, repair, and rehabilitation programs for bridges [15]. Additionally, some studies have proposed innovative approaches like using the views of bridge experts to determine the BCI in concrete bridges [16].

The parameters used in calculating the BCI can include various aspects of bridge components, such as structural integrity, material quality, and overall serviceability [9]. The BCI calculation process aims to provide a comprehensive evaluation of bridge systems by decomposing them into different layers, such as system, component, and element layers, and using weighted algorithms to generate an overall assessment result [17]. By incorporating different weighting methods and parameters, the BCI offers a systematic approach to bridge condition assessment, aiding in decision-making regarding maintenance priorities and resource allocation for bridge management.

Transitioning from bridges to roadways, the Present Serviceability Rating (PSR) was initially developed as part of the AASHO Road Test, a set of experiments carried out by the AASHTO from 1956 to 1961 [18]. The PSR was a subjective measure of pavement ride quality that required a panel of observers to ride in a car over the pavement of interest [19]. Building upon the PSR, the Present Serviceability Index (PSI) was developed. Unlike the PSR, the PSI does not require a panel of experts, making it a more practical approach for assessing large-scale pavement networks. The PSI is based on objective measurements of roughness, cracking, rutting, and patching [20]. It uses a scale from 0 to 5, with 5 representing a perfect pavement condition. Further advancing pavement assessment, the PCI was developed to provide an even more comprehensive evaluation. The PCI is a numerical rating that represents the general condition of a pavement section [5]. The PCI provides a standardized way to quantify pavement condition for management purposes. It is used by transportation agencies to assess current pavement quality, provide a warning for required maintenance, and estimate future funding needs [21]. The PCI rating scale ranges from 0 to 100, with 100 representing a newly built or perfectly reconstructed pavement. Field inspection data on distress and roughness are converted into deductions that reduce the PCI score from 100. The PCI thus provides a snapshot of a pavement’s functional and structural condition.

The PCI equation generally assigns weight to various pavement distresses based on their relative impact on ride quality, structural integrity, and overall condition [22]. For example, the Iowa Pavement Management Program (IPMP) [23,24] assigns 35% to the normalized IRI, 25% to D-cracking, 15% to spalling joints, and 25% to transverse cracking for concrete pavements. The distress weights were chosen using engineering judgment and experience based on their correlations with overall pavement condition. Bektas et al. [25] proposed an updated PCI that provides a consistent, unified approach to rating pavement condition in Iowa. The new index is based on a 100-point scale and consists of five component indices derived from specific distress data or pavement properties: cracking index, ride index, rutting index, and faulting index. The overall condition index combines these component indices using weighting factors determined based on input from DOT experts, the ratings and discussions of an expert panel, and a sensitivity analysis comparing the new index to the existing PCI. For Portland Cement Concrete (PCC) pavements, it is weighted as follows: 40% cracking index, 40% ride index, and 20% faulting index. For Asphalt Concrete (AC) pavements, it is weighted as follows: 40% cracking index, 40% ride index, and 20% rutting index. Compared to Iowa DOT’s existing pavement condition index equations, the new index, according to the authors, offers a more standardized approach across pavement types and relies more heavily on observed distress data.

Moving beyond roads to pedestrian infrastructure, the Sidewalk Condition Index (SCI) is a valuable metric used in infrastructure management to assess the overall condition of pedestrian walkways. Similar to BCI and PCI for bridges and pavements, the SCI involves evaluating various aspects of sidewalks to determine their state and prioritize maintenance efforts [26,27]. The SCI calculation method typically includes assigning weights to different sidewalk distresses based on their impact on walkability, safety, and overall condition [26,27]. This weighted sum approach allows for a comprehensive evaluation of sidewalk conditions, aiding in decision-making regarding maintenance priorities and resource allocation for sidewalk management.

While these condition indices provide valuable insights across various infrastructure types, there is an ongoing need for the refinement and validation of the weighting factors used in their calculations. The diverse approaches observed in the literature reflect the continued research into optimal methods for quantifying and weighting key indicators of infrastructure condition. However, a significant gap remains in developing a robust, data-driven framework for calculating these weights across different infrastructure types. This framework should consider various factors such as asset type, location, the distribution of distresses, and other relevant variables. By adopting a more data-driven approach, asset managers could enhance the accuracy and reliability of condition assessments across all infrastructure domains, from bridges and pavements to sidewalks and beyond. This would lead to more informed decision-making and a more efficient allocation of resources in infrastructure asset management.

2.2. Machine Learning Models

In recent years, various machine learning models have been adopted in many research studies to model the performance of diverse infrastructure assets. Some examples include the use of Artificial Neural Networks (ANNs) to predict condition indicators and overall asset health across different infrastructure types [28,29,30,31,32,33]. Models like random forests, support vector machines, decision trees, and Extreme Gradient Boosting (XGBoost) have also been employed to forecast asset condition and performance in various infrastructure domains [34,35,36,37,38,39,40]. Other studies, such as that by Al-Hamdan et al. [41], have utilized clustering to detect probable maintenance activities. These data-driven approaches have shown promise in accurately capturing the complex relationships between asset characteristics, usage patterns, environmental factors, and condition deterioration over time. Recent studies highlight the shift towards data-driven decision-making frameworks for pavement preservation. Arezoumand et al. [42] developed an analytical approach that evaluates pavement performance before and after preservation treatments, utilizing data from Iowa DOT’s pavement management system. Their research emphasizes the importance of selecting the right treatment based on pavement condition and traffic characteristics to maximize service life extension and cost-effectiveness (engproc-36-00061). Similarly, the work of Arezoumand et al. [42] further expands on this approach by analyzing the cost-effectiveness of five different preservation methods, focusing on flexible and composite pavements. Their findings provide a decision matrix to help agencies select appropriate preservation strategies based on road conditions and expected benefits [43]. For instance, the application of hierarchical SVM for the semantic segmentation of 3D point clouds demonstrates the potential of supervised learning methods in accurately classifying infrastructure conditions, which is crucial for effective asset management [44]. Additionally, there is a growing body of research emphasizing the importance of data-driven approaches in understanding the life cycle of infrastructure assets, enabling organizations to make informed decisions regarding maintenance and rehabilitation strategies [2,45]. The ability to analyze historical data and predict future asset conditions not only aids in optimizing resource allocation, but also supports the development of comprehensive asset management frameworks that align with sustainability goals [46,47]. As such, the ongoing evolution of ML methodologies presents a promising avenue for improving the resilience and efficiency of infrastructure management practices.

A powerful machine learning method is ensemble learning, where multiple models are combined to boost predictive performance. As reviewed in Sagi and Rokach [48], ensemble techniques like random forests and boosting are considered state of the art for many prediction problems. By training multiple base learners and combining their outputs, ensembles can reduce overfitting, improve computational efficiency, extend the hypothesis space, and address challenges like class imbalance. Among ensemble methods, random forests have emerged as particularly powerful, demonstrating high accuracy across diverse tasks while remaining robust and easy to tune [49,50]. By building an ensemble of randomized decision trees trained on subsets of data and features, random forests achieve strong performance even with small training sets, nonlinear relationships, and high dimensionality. The popularity and empirical success of random forests highlight the capabilities of ensemble learning.

Random forests have emerged as a useful machine learning method for pavement condition modeling and estimation in recent years. Several studies have explored using random forest regression to estimate key pavement condition indices. Yu Ting et al. [51] developed a random forest model to predict the PCI using pavement distress data collected by a 3D road detection vehicle. The random forest model achieved an R-squared value of 0.895 compared to 0.562 for multiple linear regression in predicting PCI. The random forest model also had a lower Root Mean Square Error (RMSE) and training time compared to a neural network model. After additional outlier removal and model re-training, the random forest model R-squared value reached 0.898 for PCI prediction. Piryonesi and El-Diraby [52] developed random forest models to predict PCI 2, 3, 5, and 6 years into the future using data from the Long-Term Pavement Performance (LTPP) database. The random forest model achieved above 90% accuracy for multi-year PCI prediction, outperforming other machine learning methods like decision trees and K-nearest neighbors (kNN). The study demonstrated the capability of random forests for PCI modeling and estimating future pavement deterioration. Guo et al. [53] utilized random forests to identify key pavement condition indicators from standards, reducing unnecessary data dimensions. The key indicators were used to develop a simplified model, achieving over 90% consistency with traditional methods in predicting PCI. The approach illustrates the use of random forests for indicator importance analysis and developing parsimonious pavement condition models. Guo and Hao [54] used a random forest model to predict moisture damage in pavements using relevant influencing factors. Optimization improved the model’s accuracy to 83%. The study showed that random forests can fit complex pavement moisture damage issues with multiple factors compared to traditional linear regression. Jia et al. [55] analyzed the influence of data variability on network-level pavement condition using random forest models. They found that factors like measurement location impacted crack measurements. The results illustrate the benefit of using random forests with parallel pavement test data to understand variability and its impacts on evaluation.

2.3. Feature Importance

In the field of bridge condition assessment, various studies have explored the integration of machine learning techniques to improve the understanding and prediction of bridge conditions. Assaad and El-adaway [56] focused on the deck deterioration of bridges using machine learning, identifying the feature importance of bridges and implementing ANN and k-nearest neighbors (k-NN) algorithms to develop accurate predictive models. Kong et al. [57] utilized the Shapley additive explanation (SHAP) framework to investigate the relationships between different factors and bridges with deteriorating deck conditions.

Previous studies have utilized different techniques to determine the relative importance of variables when developing machine learning models for pavement performance prediction. For example, Damirchilo et al. [34] used two common methods to determine feature importance in predicting pavement roughness. For the XGBoost model, feature importance was calculated using the F-score, which indicates how many times each feature is split in the decision trees. This shows the relative contribution of that feature to the model. For the random forest model, feature importance was extracted inherently by the algorithm itself. Specifically, random forest calculates feature importance by looking at how much the error increases when a feature is randomly permuted. Features that result in a larger error increase when perturbed are considered more important by the algorithm. Damirchilo et al. [34] demonstrated that these methods can provide reliable metrics for ranking the predictive value of variables in ensemble models based on decision trees, as evidenced by the feature importance results for pavement roughness prediction. Cheng et al. [58] utilized a random forest algorithm to compute feature importance when developing their artificial neural network model for predicting rutting in pavements with flexible overlays. After extracting data from the Canadian Long-Term Pavement Performance database, the authors first conducted a sensitivity analysis using random forest to quantify the importance of different input variables on rut depth prediction. This random forest-based feature importance ranking helped identify key parameters to feed into their final artificial neural network model for rutting prediction.

Previous studies have utilized feature importance techniques to select the most predictive variables when modeling various infrastructure condition indices. However, these methods have typically been limited to ranking input attributes based on their contribution to a model’s forecasting accuracy, rather than directly estimating the weights and coefficients needed to calculate the index values themselves. This limitation is evident across different infrastructure domains. For instance, in pavement management, Piryonesi and El-Diraby [59] ranked pavement factors by importance for PCI prediction, but did not use this analysis to assign PCI deduction values to different distress types and severities. Likewise, Adesunkanmi et al. [60] developed models to predict OCI using several distress variables and ranked the variables based on their importance scores. Similar approaches have been observed in other infrastructure sectors, such as bridge management and water distribution systems, where feature selection identifies informative model inputs but falls short of determining the relative impacts of various condition indicators. Zeiada et al. [61] used a forward sequential feature selection (FSFS) algorithm in conjunction with an ANN model to determine feature importance for predicting pavement roughness (IRI) in warm climates. The FSFS-ANN method adds features sequentially to the ANN prediction model as long as model performance continues improving. Feature importance is quantified by the probability of selection over multiple FSFS-ANN runs. A threshold of 10% average probability of selection was used to identify the most critical features. Through this iterative process of adding features and tracking model accuracy, the study identified seven influential factors for IRI: initial roughness, relative humidity, wind velocity, albedo, emissivity, traffic volume, and structural capacity. Guo et al. [62] utilized SHAP values rather than traditional feature importance metrics to interpret the variables in their light gradient-boosting machine (LightGBM) models for predicting pavement roughness and rutting. After training LightGBM models on data from the Long-Term Pavement Performance database, the authors generated SHAP plots to visualize the impact of each input variable on the predicted IRI and rut depth. Features with higher mean absolute SHAP values were considered more important for the model output. This approach provided insights into how pavement structure, climate, traffic, and age factors influenced the performance forecasts. Consequently, while these techniques provide valuable insights into which factors are most influential in predicting asset condition, additional calculations are still needed to transform field data into actionable metrics like PCI. This gap between feature importance and weight determination represents a significant opportunity for advancing data-driven approaches in infrastructure asset management across multiple sectors.

While previous studies have successfully applied machine learning models like random forest and XGBoost to predict infrastructure conditions, most methods focus on ranking the predictive importance of variables without directly integrating these rankings into the calculation of condition indices. Traditional approaches, such as the AHP and expert-based methods, rely on subjective expert judgment to assign weights to condition indicators. These methods, though widely used, suffer from limitations related to transparency, bias, and difficulty in validation across different contexts. Our proposed framework goes beyond ranking predictive variables by utilizing feature importance scores to directly assign weights in condition index calculations. This novel integration of subjective and objective data allows for more context-specific adjustments, enhancing the accuracy and adaptability of condition assessments. Unlike existing methods, which often rely solely on expert input or focus on variable importance ranking, our approach provides a data-driven, empirically validated method for determining the relative impact of distress factors on condition indices, as demonstrated in our case study on PCI calculations.

3. Materials and Methods

3.1. Framework

This research paper presents a framework to estimate variable weights for the calculation of transportation assets condition indices using statistical and machine learning techniques. Figure 1 illustrates the proposed procedural framework for this purpose.

3.1.1. Database

The initial step involves the acquisition of both subjective and objective data for transportation assets. Subjective data are collected through expert assessments, which provide a qualitative evaluation of the asset’s condition. This step requires a clear definition of the rating scale, which may vary depending on the type of asset (e.g., roads, bridges, etc.) and the specific evaluation criteria applied. For example, pavement condition could be rated on a scale of 1 to 10, where 10 represents excellent condition and 1 indicates poor condition. Objective data, on the other hand, include measurable parameters such as structural health, surface distress or defects, and roughness. These metrics are sourced from manual measurements, automated sensors, or other monitoring techniques.

3.1.2. Data Preprocessing

This step involves two main aspects, namely, model selection and data sampling. Firstly, for the model selection task, once data are collected, data preparation is necessary to ensure consistency and quality. This includes the following tasks:

Data Cleaning: This step involves addressing missing values, outliers, and inconsistencies within the datasets. Missing data can be handled either by removing incomplete records, provided this does not result in significant data loss, or through imputation techniques. Common methods include mean or mode imputation, while more advanced approaches, such as Fractional Hot Deck Imputation (FHDI) and Fully Efficient Fractional Imputation (FEFI), can also be employed [63].
Normalization: This involves standardizing the scales of objective measures to ensure comparability. This is particularly important when dealing with parameters that may have different units (e.g., roughness in inch per mile vs. cracking in foot).
Data Splitting: This involves dividing the data into training and testing subsets, where the training set will be used to build the machine learning model and the testing set will validate its performance.

After the data are prepared, preliminary analysis can be performed on the processed dataset to select the best machine learning model to model the relationship between the subjection rating and the objective distress variables. Models are to be trained using the training dataset and then tested and validated using the testing dataset. Different machine learning algorithms can be examined. However, the algorithm should provide a feature importance functionality, such as random forest, gradient-boosted tree, extreme gradient-boosted tree, etc. The performance of each model is evaluated using metrics such as Mean Absolute Error (MAE), RMSE, and R-squared value. The model with the best performance on the test set is selected for further analysis.

Secondly, for the sampling step, given the variability inherent in transportation networks, this step focuses on creating smaller samples from the full dataset to examine the sensitivity of the estimated weights. This can be achieved via the following approaches:

Random Sampling: Suitable for small-scale analysis like city road networks, where random subsets of data can be drawn to create multiple samples for analysis.
Geographical Sampling: For large-scale analysis (e.g., state or national networks), data can be segmented based on jurisdictional boundaries (e.g., county or district) to ensure that different geographic or administrative regions are represented.

This step aims to capture the variability in the calculated weights across different samples, allowing for more robust and generalizable results. It is worth noting that the number of samples should be sufficient to avoid overfitting issues when using machine learning techniques.

3.1.3. Initial Weight Estimation

Once the samples are created, the selected machine learning model (e.g., random forest) is applied to each sample to estimate the relationship between subjective and objective measures. Feature importance scores are computed for each objective metric in each sample. These scores represent the relative contribution of each metric to the prediction of the subjective rating. The weights for each metric are then calculated using the following equation:

{W e i g h t}_{i j} = \frac{{F e a t u r e I m p o r t a n c e S c o r e}_{i j}}{\sum_{i} F e a t u r e I m p o r t a n c e S c o r e}; F o r e a c h f e a t u r e i i n s a m p l e j

(1)

where

Weight_ij is the weight of the ith variable in the jth sample.
Feature Importance Score_ij is the feature importance score of the ith variable in the jth sample obtained from the random forest model.

Following the computation of weights for each sample, statistical analysis is conducted to quantify the variability and distribution of each of the objective measures across all samples. The key statistical measures calculated for each variable include the following:

Measures of Central Tendency: Such as mean, median, and mode.
Dispersion Metrics: Including standard deviation and interquartile range, which provide insight into the variability and spread of the variables.
Other Statistical Measures: Such as skewness, kurtosis, etc., that describe the shape of the data distribution.

Upon estimating the initial weights and obtaining the statistical measures for each objective metric across all samples, a new dataset can be formulated. This dataset should be structured to include the sample number, the objective measure represented as a categorical variable (i.e., the name of the objective measure), the weight estimate of the objective measure for that specific sample, and the statistical measures acquired for the objective measure within that sample. The construction of this dataset facilitates progression to the final step of the methodology.

3.1.4. Final Weight Estimation

The final step of the methodology entails constructing a predictive model to estimate the weights of objective metrics based on their statistical characteristics. This model utilizes the calculated statistical measures (e.g., mean, standard deviation, etc.) as input variables to predict the weight of each objective measure, thereby offering a systematic approach to incorporate these weights into the asset condition index computations. To develop this predictive model, techniques such as multiple linear regression, generalized additive models (GAMs), machine learning, or deep learning methods may be employed.

The developed models should be rigorously evaluated using a testing dataset to ensure robustness. Once the model demonstrates satisfactory performance, supplementary analyses can be conducted to investigate the relationship between the calculated weights and the statistical measures of the objective variables. For linear or deterministic models, this analysis may involve examining the coefficient estimates, whereas for more advanced techniques such as machine learning, methods like partial dependence plots and SHAP can be used to interpret the model outputs and provide more insights about the estimated weights. Once the final weights are determined, they can be integrated into the calculation of the condition index.

The entire framework can be automated using programming languages that support machine learning and statistical analysis, such as Python or R. This automation would facilitate ease of use and implementation by practitioners, allowing the procedure to be efficiently repeated whenever required, without the need to go through each step of the process manually.

3.2. Case Study

As a proof of concept for the proposed framework, a case study was performed to obtain weights for the PCI calculations utilizing OCI subjective ratings and the different objective measures that are included in PCI calculations, such as IRI, cracking, rutting, patching, and spalling. The case study was performed using the pavement condition data of the city of West Des. Figure 2 shows a map of the study area and the road network that has been investigated.

3.2.1. Data Preparation

The data preparation for this research encompassed the acquisition and compilation of pavement surface distress data (i.e., objective measures) and OCI data (i.e., subjective rating) specifically for the case study area. The dataset covered the years 2013 and 2015 and predominantly consisted of PCC pavements, which accounted for approximately 70% of the total road network length. Composite (COM) pavements constituted approximately 25% of the network. Asphalt Cement Concrete, forming only 5% of the total network length, was excluded from the study due to its insufficient sample size, potentially leading to overfitting when employing machine learning techniques.

The data used in this study are described below:

Pavement surface distress data that were collected using automated pavement distress data collection equipment.
OCI data were acquired using the Pavement Surface Evaluation and Rating (PASER) method [12].

Once the raw pavement surface distress data were collected, they were aggregated into predefined pavement management sections and stored in both tabular and geospatial data formats. These sections were then intersected with the OCI data to compile a comprehensive database encompassing all relevant variables for the research.

To ensure data integrity and reliability, the dataset was divided into two distinct subsets: one for PCC pavements, comprising 5218 pavement sections, and the other for COM pavements, comprising 2006 pavement sections. Any pavement sections with incomplete cases, i.e., records with missing information about one or more variables, were meticulously removed from both subsets to ensure robust analysis.

The variables included in the analysis are the variables that are currently in use in the PCI calculation equation utilized by local transportation agencies in Iowa, without introducing any additional variables that could potentially skew the weights of the obtained variables. For PCC pavements, the variables encompassed transverse cracking, joint spalling, longitudinal wheel path cracking, longitudinal cracking, D-cracking, patching, IRI, and OCI. Similarly, for COM pavements, the same variables were considered, except that rutting and alligator cracking were included while D-cracking and joint spalling were excluded.

The inclusion of the IRI in the analysis, despite its difficulty in being captured by visual ratings and its measurement through specialized devices like road profilers, is justified for two key reasons. First, the IRI is one of the variables used in PCI calculation, and excluding it would lead to skewed weights since the sum of the variable weights must equal one. Second, although roughness cannot be directly and accurately assessed by the naked eye, it is often indirectly correlated with other distresses such as cracking, patching, and spalling [64]. Therefore, incorporating the IRI ensures a comprehensive evaluation of pavement conditions and maintains the integrity of the weight distribution in the analysis.

Table 2 and Table 3 below present the summary statistics of the variables for the PCC and COM pavements, respectively, allowing for a comprehensive understanding of their distribution. The summary statistics include standard deviation (STD), skewness, kurtosis, mean, median, minimum (Min), maximum (Max), and interquartile range (IQR). Nonetheless, the descriptive statistics of the dataset reveal that the distribution of the pavement sections in terms of the condition rating (i.e., OCI) is imbalanced, with the majority of segments classified as being in good or very good condition, and only a small portion of segments falling into the poor or very poor condition categories. Consequently, it is recommended that the proposed framework be applied to datasets with a more balanced distribution of pavement condition ratings to enhance the robustness and applicability of the results across different stages of the pavement life cycle.

3.2.2. Weight Estimation

The dataset comprises information for both PCC pavements and COM pavements, necessitating the segregation of the data into two subsets based on pavement type. Subsequently, the proposed framework is independently applied to each subset, considering the distinct characteristics of PCC and COM pavements as mentioned in the data preparation section. In both cases, selected variables are essential in evaluating the distresses and ride quality of pavements. The inclusion of these variables allows for a comprehensive analysis contributing to the development of accurate and effective models, which will yield better weight estimates. Figure 3 illustrates the procedural approach adopted in this study.

By employing advanced machine learning and statistical techniques, we aim to obtain precise weights for various distress factors involved in the PCI calculation. These accurate weights contribute to improving the efficacy of the PCI as a reliable indicator, accurately representing the condition of the pavement surface. To ensure the method’s robustness and mitigate potential overfitting issues, 500 distinct random samples are generated, each comprising 300 pavement sections randomly drawn from the PCC and COM subsets. For each of the 500 random samples, an independent random forest model is developed to model the relationship between the OCI and the distress variables. The choice of the random forest model is motivated by its proficiency in managing intricate relationships among variables and its resilience against overfitting, substantiated by its remarkable accuracy, as reported in previous research [51].

After constructing the random forest models, feature importance scores are calculated for all variables utilized in each of the 500 models. These scores signify the relative contributions of each variable in predicting the OCI. To convert the feature importance scores into weights, Equation (1) was utilized.

This normalization process ensures that the sum of weights for all variables equals unity, facilitating a coherent interpretation of variable contributions to the OCI prediction. Concurrently, descriptive statistics are calculated for each variable in the 500 random samples, including “Standard Deviation”, “Skewness”, “Kurtosis”, “Interquartile Range (IQR)”, “Min”, “Max”, and “Root Mean Squared (RMS)”. These statistics provide valuable insights into the distribution and characteristics of the variables across the random samples. Next, a new dataset is constructed that includes the obtained variable weights and descriptive statistics. Using this dataset, a new random forest model is trained to predict the weight of each variable. Before model training, the dataset is partitioned into two subsets: 70% is allocated for training the model, while the remaining 30% is reserved as a testing dataset. This split is designed to assess the model’s performance and ensure its robustness when applied to unseen data. To select the most informative variables and mitigate overfitting potential, Recursive Feature Elimination (RFE) is applied as a feature selection method [65].

The random forest model’s performance is evaluated using the R-squared metric, measuring the variance in weight data explained by the model. Partial dependency plots are generated to understand the impact of each variable on weight, providing insights into their influence on PCI calculation [66]. The proposed method is applied to the West Des Moines road network, with the results compared to existing PCI weights, demonstrating the method’s novelty and effectiveness. The conclusion outlines the approach’s advantages and limitations, and gives suggestions for future research.

4. Results

Upon the completion of data collection and thorough cleaning (i.e., excluding segments with missing data), the random sampling step was undertaken for the PCC and COM subsets, and 500 models were developed utilizing the 500 samples for each of the two subsets.

It is important to note that the entirety of the available pavement sections in each of the samples (i.e., 300 pavement sections) was utilized during model training. The primary objective at this stage was to derive accurate performance models, without a stringent emphasis on predictive power. As such, there was no requirement to reserve data for model testing. The decision to employ the full dataset for model training was driven by the aim to capture the underlying data patterns more effectively and obtain representative models that encapsulate the dataset’s characteristics.

The obtained R² values from the majority of the models demonstrated good performance, exceeding a value of 0.8. The minimum R² values observed among the models were 0.76 and 0.84 for the PCC and COM pavements, respectively. This signifies a satisfactory overall model performance.

Proceeding to the next step, summary statistics were computed for each variable within every sample of the 500 extracted samples. These summary statistics were then juxtaposed alongside the corresponding feature importance scores for each variable across the 500 samples. Consequently, the resulting dataset consists of 3500 records, representing the seven variables across the 500 individual samples. This comprehensive dataset allows for an extensive evaluation of the relationship between feature importance (i.e., weight) and the characteristics of the data distribution of each variable, such as standard deviation, skewness, kurtosis, RMS, and IQR, among others, within the models generated in this study. Through the analysis of 3500 records for each pavement type, obtained from the 500 individual samples of PCC pavement and the 500 individual samples of COM pavement, researchers can gain valuable insights into how the importance of each variable is influenced by the specific attributes of its data distribution. Additionally, this dataset facilitates the exploration of performance trends in relation to the aforementioned characteristics, providing a more profound understanding of the factors influencing the predictive power of the developed models.

Subsequently, the training subsets were used to train the weights model; during this step, the predictor variables were meticulously selected and encompassed a combination of factors, including distress type as a categorical variable, standard deviation, skewness, kurtosis, maximum value, RMS, and IQR. Prior to model development, the RFE method was employed to identify the optimal predictors that significantly contributed to the weight estimation.

Figure 4 presents the RMSE obtained from cross-validation, depicted against the number of predictors integrated into the modeling. Notably, the analysis revealed that five out of the seven features were selected for both the PCC pavement dataset and the COM pavement dataset. Specifically, the chosen features for both the PCC and COM datasets are shown in Table 4. The observed trend in Figure 4 showcased a gradual decrease in RMSE as the number of features increased, up to the point of five features. Beyond this point, a reversal in the trend occurred, with RMSE values beginning to rise. This behavior suggests a potential risk of overfitting, wherein the model becomes excessively tailored to the training data, compromising its generalization capability to unseen data.

The variables presented in Table 4 are ranked according to their significance in the weight estimation model, ordered from highest to lowest importance. Notably, the top four variables selected are consistent across both the PCC and COM models. Distress type is the most important variable, indicating that the variable of distress type explained a significant amount of variability in the weight given to the distress, followed by the standard deviation, RMS, and IQR, respectively. However, the fifth variable differs between the two models, with the maximum value variable being selected for the PCC model, while the skewness variable was identified as significant for the COM model.

Overall, the utilization of these selected features and the RFE technique allowed for the development of models that effectively estimate feature weights for both pavement types. This approach contributes to enhancing the accuracy and reliability of PCI calculations, ultimately bolstering the quality of pavement condition assessments.

Next, following the meticulous selection of features, the subsequent step involved the development of a random forest model, wherein the chosen features served as predictors to estimate the respective feature weights. Initially, the models were trained using the training dataset, and the corresponding training R² values were reported. Subsequently, the performance of the developed models was evaluated using the testing dataset, and the testing R² values were reported accordingly.

The random forest model for PCC pavement achieved R² values of 0.88 on the training set and 0.83 on the testing set. Conversely, the model for COM pavement obtained R² values of 0.93 and 0.91 for training and testing, respectively. These results demonstrate high model accuracy and suggest an absence of overfitting, as indicated by the close correspondence between training and testing R² values. Figure 5 shows partial dependence plots for the weights of IRI and transverse cracking for PCC pavement vs. the three selected measures of RMS, standard deviation, and IQR. RMS measures the magnitude of the data. However, IQR and standard deviation measure the variability and spread of the data. Based on the partial dependence plots shown, it can be seen that the weight of the IRI variable has a generally inverse trend with both the IQR and RMS of the IRI data. However, there is a positive trend between the IRI weight and the standard deviation of the IRI values. When there is a higher spread in the IRI data, as indicated by greater standard deviation, the model places more emphasis on IRI, increasing its weight in the PCI equation. On the other hand, for transverse cracking, all three measures have an increasing trend between the weights of transverse cracking and the three measures.

The partial dependence plots in Figure 6 illustrate how the model weights the IRI and transverse cracking variables in COM pavement for the PCI in relation to the same selected measures previously. For the IRI, the weight has a generally inverse trend with the IRI’s own IQR. However, the IRI weight and IRI standard deviation have a generally positive association: with higher IRI dispersion from greater standard deviation, the IRI takes on more weight. In contrast, transverse cracking exhibits positive trends between its weight and IQR. RMS and standard deviation exhibit opposite trends in their relationship with the estimated weight for the IRI, with RMS showing a declining trend and standard deviation showing an increasing trend. Describing the relationships shown in the partial dependence plots is complex, as the model does not treat all variables equally. Specifically, the model adjusts the weights for the IRI and transverse cracking unevenly based on the same dispersion metrics and responds non-uniformly to averaging versus variability for the IRI variable.

When comparing the IRI weight ranges for PCC and COM pavements, it is interesting to see that the weights for the IRI in PCC pavements were higher than those in COM pavements. This may be because faulting and joints exist in PCC pavements but not COM pavements. It is noteworthy that faulting is not used in the calculation of the PCI. Therefore, the higher weights given to the IRI in PCC indicate that the IRI also captures the effects of faulting.

Finally, to elucidate the association between the old PCI calculated using the expert-based weights and the new PCI derived from the updated feature weights (i.e., data-driven weights), a comparative analysis was performed for the West Des Moines road network in the year 2015. Scatter plots were generated, as shown in Figure 7, showing the old PCI values against the new PCI values for both the PCC and COM pavement datasets. Notably, the visual representation of the scatter plots demonstrated a pronounced linear relationship between the old and new PCI values for both pavement types. For PCC pavements, the R² for this relationship was determined to be 0.85, whereas the R² value for COM pavements notably reached 0.98. These R² values indicate a strong correlation between the old and new PCI values, suggesting a high level of consistency between the two approaches.

An interesting insight can be drawn from Figure 7 regarding the comparison between data-driven and expert-based weights for different pavement types. For PCC pavements, the new PCI shows noticeable variation from the old PCI, indicating that the data-driven weights used in the new PCI calculations differ significantly from the expert-based weights used in the old PCI. In contrast, for COM pavements, the new and old PCIs are closely aligned, suggesting that the data-driven and expert-based weights do not differ as much. These findings suggest that asset managers can leverage the new approach to adjust and refine PCI weights for a more accurate representation of asset condition, as demonstrated in the PCC case. Alternatively, the approach can be used to validate expert-based weights, ensuring that the selected weights align closely with data-driven results, as seen in the COM pavements.

The observed higher R² value for COM pavements compared to PCC pavements underscores a more robust correlation between the old and new PCI values for the former. This enhanced correlation further validates the efficacy of the updated feature weights in accurately estimating pavement conditions for COM pavements.

Given the high degree of correlation, transitioning from the old PCI to the new PCI becomes notably more facile, aided by the establishment of a linear regression equation. This regression equation enables a seamless conversion from the prior PCI calculation method, utilizing the old weights, to the updated approach employing the new feature weights. As such, this study’s findings not only verify the applicability of the new feature weights, but also provide a reliable means of transitioning to the enhanced PCI calculation method, thereby enhancing the accuracy and efficiency of pavement condition assessments.

5. Discussion

The findings indicate that the proposed methodology effectively utilizes machine learning techniques to enhance the accuracy of PCI calculations. Consequently, this approach is applicable to various transportation assets where condition can be quantified using a condition index. The high testing accuracy and strong correlation between existing and data-driven PCI values validate the efficacy of the optimized distress weights. Key findings include the differences in weighting between pavement types, the influence of statistical properties on weight assignments, and the complex variable interactions governing PCI calculations.

Additionally, the accuracy achieved in the case study implementation was notably high, with R² values of 0.85 for PCC pavements and 0.98 for COM pavements. This level of accuracy ensures a smooth transition between the traditional PCI and the new PCI derived from the data-driven framework, reinforcing its practical applicability and reliability. In practical applications, the data-driven weights produced by the framework offer a standardized and more transparent alternative approach to traditional expert-based weighting schemes. By comparing the model-generated weights with those derived from expert judgments, the study reveals that data-driven methods may lead to different weight assignments. This divergence highlights the flexibility of machine learning models in adjusting to specific datasets, providing a more tailored assessment of asset conditions compared to static, expert-defined weightings. For example, a city transportation department might use this framework to prioritize pavement maintenance by focusing on the most critical distress factors as determined by the model. This provides a real-world application of the framework that enhances decision-making in asset management by delivering a more data-informed approach to resource allocation.

This research builds on previous studies in transportation asset management, particularly those that have explored the use of machine learning for condition index calculation. Prior work, such as the study by Kheirati and Golroo [67], focused on developing a new condition index for pavement assets, known as the Universal Condition Index (UCI). Their approach employed machine learning techniques to model and weight pavement distresses, with application solely to AC pavement assets. The UCI achieved a validation accuracy and precision exceeding 80% when compared to the current PCI. In contrast, the present study offers a significant contribution by generalizing a framework that can be applied to various types of transportation assets, rather than being limited to a specific pavement type.

Moreover, the integration of subjective ratings with machine learning techniques in this study introduces a novel aspect. By combining both objective data and subjective evaluations, the proposed framework delivers a more holistic approach to asset condition evaluation. The inclusion of statistical analysis across multiple samples further enhances the methodology, providing deeper insights into the variability of weight assignments under different data conditions.

The findings of this study also contribute to the growing body of literature on the application of machine learning in infrastructure management. Previous studies, such as those by El-Diraby [59] and Adesunkanmi et al. [60], have demonstrated the effectiveness of machine learning in predicting pavement conditions and reported feature importance to rank the variables used in asset condition evaluation based on their predictive significance. However, their analyses did not extend to improving the condition index calculation by utilizing the obtained importance scores. This study builds on that foundation by introducing a structured methodology that incorporates both subjective and objective data. By combining these two types of information, the proposed framework offers a more comprehensive and robust approach to infrastructure evaluation. This methodology has the potential to be applied in other areas of transportation asset management, providing a versatile tool for improving the accuracy and reliability of asset condition assessments.

While the results demonstrate the potential of the proposed methodology, certain limitations should be acknowledged. The dataset encompassed a relatively limited geographic region and timeframe. Applying the approach to more extensive multi-year data from diverse contexts would further validate its efficacy. Ongoing work should focus on implementing the methodology for various types of transportation assets and at a broader level. Different machine learning algorithms could also be explored beyond the ones used in this research. Testing the transferability and scalability of the methodology should be a priority. With additional validation and development, the proposed technique could provide an objective data-driven tool to complement expert judgment in transportation assets condition assessment.

6. Conclusions

This study proposed a novel framework to determine objective measure weights for calculating asset condition indices using statistical and machine learning techniques in the domain of transportation asset management. The framework’s application yielded improved pavement assessment conditions, showcasing how data-driven methodologies can enhance accuracy in asset management practices. The findings illustrate the significant potential for this approach to be adapted across various types of transportation assets, leading to more reliable assessments that better inform maintenance and rehabilitation efforts.

Based on the results, it is recommended that transportation agencies integrate data-driven approaches into their asset management strategies. This integration can optimize resource allocation, assist in prioritizing maintenance activities, and contribute to extending the lifespan of infrastructure assets. Additionally, the research encourages civil engineers to adopt similar methodologies in other domains, such as bridges and railways, to enhance overall infrastructure management.

Overall, the framework integrates the objectivity of data analytics with the practical judgment of domain experts to enhance asset condition index calculations. By relying on data-driven solutions for obtaining an asset’s condition index, the condition index can become a better representative of the actual condition and more reliable. This facilitates better-informed maintenance and rehabilitation decisions, ultimately improving asset management. The adoption of this data-driven framework also presents significant economic benefits. By providing more accurate and context-specific condition assessments, transportation agencies can optimize resource allocation, prioritizing maintenance and rehabilitation based on actual infrastructure needs. This can help reduce unnecessary repairs, prevent premature failures, and extend the service life of assets. Moreover, the framework’s adaptability to new data allows for ongoing improvements in decision-making, ultimately leading to substantial long-term cost savings and a more efficient use of public funds.

By synthesizing subjective expert insights with objective data analytics, this study presents a robust and adaptable framework that reflects the actual conditions of transportation assets, fostering informed decision-making and improved outcomes in civil engineering practices.

Further research should evaluate the methodology using more extensive multi-year data covering full asset life cycles. Additionally, combining other techniques like regularization could help avoid overfitting and improve generalizability. There is also potential to integrate economic and usage considerations into the PCI weight optimization process. Nonetheless, this study demonstrates the promise of data science to augment infrastructure engineering best practices.

Author Contributions

Conceptualization, A.B.A.-H. and I.N.; methodology, A.B.A.-H.; validation, A.B.A.-H., I.N., and O.S.; formal analysis, A.B.A.-H. and Y.I.A.; investigation, A.B.A.-H. and I.N.; resources, A.B.A.-H. and I.N.; data curation, A.B.A.-H.; writing—original draft preparation, A.B.A.-H. and Y.I.A.; writing—review and editing, O.S.; visualization, A.B.A.-H.; supervision, O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mneina, A.; Smith, J. Harmonization of Pavement Condition Evaluation for Enhanced Pavement Management: An Ontario Case Study. In Proceedings of the Transportation Association of Canada 2022 Conference and Exhibition-Changing Ways for Our Changing Climate//Association des Transports du Canada 2022 Congrès et Exposition-Approches Adaptées pour un Climat Changeant, Edmonton, AB, Canada, 2–5 October 2022. [Google Scholar]
Chen, Z.; Liang, Y.; Wu, Y.; Sun, L. Research on Comprehensive Multi-Infrastructure Optimization in Transportation Asset Management: The Case of Roads and Bridges. Sustainability 2019, 11, 4430. [Google Scholar] [CrossRef]
Urrea-Mallebrera, M.; Altarejos-García, L.; García-Bermejo, J.; Collado-López, B. Condition Assessment of Water Infrastructures: Application to Segura River Basin (Spain). Water 2019, 11, 1169. [Google Scholar] [CrossRef]
Famurewa, S.M.; Stenström, C.; Asplund, M.; Galar, D.; Kumar, U. Composite Indicator for Railway Infrastructure Management. J. Mod. Transp. 2014, 22, 214–224. [Google Scholar] [CrossRef]
Shahin, M.Y. Pavement Management for Airports, Roads, and Parking Lots; Springer: Cham, Switzerland, 2005; Volume 501. [Google Scholar]
ASTM D6433-07; Standard Practice for Roads and Parking Lots Pavement Condition Index Surveys. ASTM International: West Conshohocken, PA, USA, 2015.
Li, Q.; Peng, Q.; Liu, R.; Liu, L.; Bai, L. Track Grid Health Index for Grid-Based, Data-Driven Railway Track Health Evaluation. Adv. Mech. Eng. 2019, 11, 168781401988976. [Google Scholar] [CrossRef]
Lubis, R.R.A.; Widyastuti, H. Penentuan Rekomendasi Standar Track Quality Index (TQI) Untuk Kereta Semicepat Di Indonesia (Studi Kasus: Surabaya—Cepu). J. Apl. Tek. Sipil 2020, 18, 39. [Google Scholar] [CrossRef]
Fereshtehnejad, E.; Hur, J.; Shafieezadeh, A.; Brokaw, M. Ohio Bridge Condition Index: Multilevel Cost-Based Performance Index for Bridge Systems. Transp. Res. Rec. J. Transp. Res. Board 2017, 2612, 152–160. [Google Scholar] [CrossRef]
Abiola, O.; Kupolati, W.K. Modelling Present Serviceability Rating of Highway Using Artificial Neural Network. OIDA Int. J. Sustain. Dev. 2014, 7, 91–98. [Google Scholar]
Bou-Saab, G.; Nlenanya, I.; Alhasan, A. Correlating Visual–Windshield Inspection Pavement Condition to Distresses from Automated Surveys Using Classification Trees. In Proceedings of the 12th International Conference on Low-Volume Roads, Kalispell, MT, USA, 15–18 September 2019; p. 589. [Google Scholar]
Walker, D.; Entine, L.; Kummer, S. Pavement Surface Evaluation and Rating: PASER Manual. 1987. Available online: https://trid.trb.org/View/260614 (accessed on 10 October 2024).
Taghaddos, M.; Mohamed, Y. Predicting Bridge Conditions in Ontario: A Case Study. In Proceedings of the 36th International Symposium on Automation and Robotics in Construction (ISARC), Banff, AB, Canada, 21–24 May 2019. [Google Scholar]
Rogulj, K.; Kilić Pamuković, J.; Jajac, N. Knowledge-Based Fuzzy Expert System to the Condition Assessment of Historic Road Bridges. Appl. Sci. 2021, 11, 1021. [Google Scholar] [CrossRef]
Darban, S.; Ghasemzadeh Tehrani, H.; Karballaeezadeh, N.; Mosavi, A. Application of Analytical Hierarchy Process for Structural Health Monitoring and Prioritizing Concrete Bridges in Iran. Appl. Sci. 2021, 11, 8060. [Google Scholar] [CrossRef]
Darban, S.; Tehrani, H.G.; Karballaeezadeh, N. Presentation a New Method for Determining of Bridge Condition Index by Using Analytical Hierarchy Process. Preprints 2020. [Google Scholar] [CrossRef]
Ren, Y.; Xu, X.; Liu, B.; Huang, Q. An Age- and Condition-Dependent Variable Weight Model for Performance Evaluation of Bridge Systems. KSCE J. Civ. Eng. 2021, 25, 1816–1825. [Google Scholar] [CrossRef]
Pederson, N.J. Pavement Lessons Learned from the AASHO Road Test and Performance of the Interstate Highway System. Transp. Res. Board 2007. Available online: https://onlinepubs.trb.org/onlinepubs/circulars/ec118.pdf (accessed on 10 October 2024).
Alatoom, Y.I.; Obaidat, T.I. Measurement of Street Pavement Roughness in Urban Areas Using Smartphone. Int. J. Pavement Res. Technol. 2022, 15, 1003–1020. [Google Scholar] [CrossRef]
Cary, W.N. The Pavement Serviceability-Performance Concept. HRB Bull. 1960, 250. Available online: https://cir.nii.ac.jp/crid/1570854174154516096 (accessed on 10 October 2024).
McNeil, S.; Markow, M.; Neumann, L.; Ordway, J.; Uzarski, D. Emerging Issues in Transportation Facilities Management. J. Transp. Eng. 1992, 118, 477–495. [Google Scholar] [CrossRef]
Smith, K.; Harrington, D.; Pierce, L.; Ram, P.; Smith, K. Concrete Pavement Preservation Guide, 2nd ed.; FHWA Publication No. FHWA-HIF-14-014; Institute for Transportation, Iowa State University: Ames, IA, USA, 2014. [Google Scholar]
Gross, J.; King, D.; Harrington, D.; Ceylan, H.; Chen, Y.; Kim, S.; Taylor, P.; Kaya, O. Concrete Overlay Performance on Iowa’s Roadways; IHRB Project TR-698; Iowa Highway Research Board: Ames, IA, USA, 2017. [Google Scholar]
Chen, Y.-A.; Ceylan, H.; Nlenanya, I.; Kaya, O.; Smadi, O.G.; Taylor, P.C.; Kim, S.; Gopalakrishnan, K.; King, D.E. Long-Term Performance Evaluation of Iowa Concrete Overlays. Int. J. Pavement Eng. 2022, 23, 719–730. [Google Scholar] [CrossRef]
Bektas, F.; Smadi, O.; Nlenanya, I. Pavement Condition: New Approach for Iowa Department of Transportation. Transp. Res. Rec. J. Transp. Res. Board 2015, 2523, 40–46. [Google Scholar] [CrossRef]
Patil, M.; Majumdar, B.B.; Sahu, P.K. Evaluating Pedestrian Crash-Prone Locations to Formulate Policy Interventions for Improved Safety and Walkability at Sidewalks and Crosswalks. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 675–689. [Google Scholar] [CrossRef]
Kim, J.; Park, D.; Suh, Y.; Jung, D. Development of Sidewalk Block Pavement Condition Index (SBPCI) Using Analytical Hierarchy Process. Sustainability 2019, 11, 7086. [Google Scholar] [CrossRef]
Alatoom, Y.I.; Al-Suleiman (Obaidat), T.I. Development of Pavement Roughness Models Using Artificial Neural Network (ANN). Int. J. Pavement Eng. 2022, 23, 4622–4637. [Google Scholar] [CrossRef]
Zhou, Q.; Okte, E.; Al-Qadi, I.L. Predicting Pavement Roughness Using Deep Learning Algorithms. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 1062–1072. [Google Scholar] [CrossRef]
Kumar, R.; Suman, S.K.; Prakash, G. Evaluation of Pavement Condition Index Using Artificial Neural Network Approach. Transp. Dev. Econ. 2021, 7, 20. [Google Scholar] [CrossRef]
Hosseini, S.A.; Alhasan, A.; Smadi, O. Use of Deep Learning to Study Modeling Deterioration of Pavements a Case Study in Iowa. Infrastructures 2020, 5, 95. [Google Scholar] [CrossRef]
Sadeghi, J.; Askarinejad, H. Application of Neural Networks in Evaluation of Railway Track Quality Condition. J. Mech. Sci. Technol. 2012, 26, 113–122. [Google Scholar] [CrossRef]
Fabianowski, D.; Jakiel, P.; Stemplewski, S. Development of Artificial Neural Network for Condition Assessment of Bridges Based on Hybrid Decision Making Method—Feasibility Study. Expert. Syst. Appl. 2021, 168, 114271. [Google Scholar] [CrossRef]
Damirchilo, F.; Hosseini, A.; Mellat Parast, M.; Fini, E.H. Machine Learning Approach to Predict International Roughness Index Using Long-Term Pavement Performance Data. J. Transp. Eng. Part B Pavements 2021, 147, 04021058. [Google Scholar] [CrossRef]
Marcelino, P.; de Lurdes Antunes, M.; Fortunato, E.; Gomes, M.C. Machine Learning Approach for Pavement Performance Prediction. Int. J. Pavement Eng. 2021, 22, 341–354. [Google Scholar] [CrossRef]
Madeh Piryonesi, S.; El-Diraby, T.E. Using Machine Learning to Examine Impact of Type of Performance Indicator on Flexible Pavement Deterioration Modeling. J. Infrastruct. Syst. 2021, 27, 04021005. [Google Scholar] [CrossRef]
Bashar, M.Z.; Torres-Machi, C. Performance of Machine Learning Algorithms in Predicting the Pavement International Roughness Index. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 226–237. [Google Scholar] [CrossRef]
Rashidi Nasab, A.; Elzarka, H. Optimizing Machine Learning Algorithms for Improving Prediction of Bridge Deck Deterioration: A Case Study of Ohio Bridges. Buildings 2023, 13, 1517. [Google Scholar] [CrossRef]
Guo, G.; Cui, X.; Du, B. Random-Forest Machine Learning Approach for High-Speed Railway Track Slab Deformation Identification Using Track-Side Vibration Monitoring. Appl. Sci. 2021, 11, 4756. [Google Scholar] [CrossRef]
Alatoom, Y.I.; Zihan, Z.U.; Nlenanya, I.; Al-Hamdan, A.B.; Smadi, O. A Sequence-Based Hybrid Ensemble Approach for Estimating Trail Pavement Roughness Using Smartphone and Bicycle Data. Infrastructures 2024, 9, 179. [Google Scholar] [CrossRef]
Al-Hamdan, A.B.; Nlenanya, I.; Smadi, O. Data-Driven Approach to Identify Maintained Pavement Segments and Estimate Maintenance Type for Local Roads. In Proceedings of the 13th International Conference on Low-Volume Roads, Cedar Rapids, IA, USA, 23–26 July 2023; Transportation Research Board: Cedar Rapids, IA, USA, 2023; pp. 188–205. [Google Scholar]
Arezoumand, S.; Sassani, A.; Smadi, O. Data-Driven Approach to Decision-Making for Pavement Preservation. In Proceedings of the Second International Conference on Maintenance and Rehabilitation of Constructed Infrastructure Facilities (MAIREINFRA2), Honolulu, HI, USA, 16–19 August 2023; MDPI: Basel, Switzerland, 2023; p. 61. [Google Scholar]
Arezoumand, S.; Sassani, A.; Smadi, O.; Buss, A. From Data to Decision: Integrated Approach to Pavement Preservation in Iowa through Treatment Effectiveness Analysis. Int. J. Pavement Eng. 2024, 25, 2361085. [Google Scholar] [CrossRef]
Mansour, M.; Martens, J.; Blankenbach, J. Hierarchical SVM for Semantic Segmentation of 3D Point Clouds for Infrastructure Scenes. Infrastructures 2024, 9, 83. [Google Scholar] [CrossRef]
Bloetscher, F.; Farmer, Z.; Barton, J.; Chapman, T.; Fonseca, P.; Shaner, M. Water System Condition and Asset Replacement Prioritization. J. Water Resour. Prot. 2023, 15, 165–178. [Google Scholar] [CrossRef]
Karimzadeh, A.; Shoghli, O. Predictive Analytics for Roadway Maintenance: A Review of Current Models, Challenges, and Opportunities. Civ. Eng. J. 2020, 6, 602–625. [Google Scholar] [CrossRef]
Jung, H.; Kim, B. Identifying Research Topics and Trends in Asset Management for Sustainable Use: A Topic Modeling Approach. Sustainability 2021, 13, 4792. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Ensemble Learning: A Survey. WIREs Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Alatoom, Y.I.; Al-Hamdan, A.B. A Comparative Study Between Different Machine Learning Algorithms for Estimating the Vehicular Delay at Signalized Intersections. J. Soft Comput. Civ. Eng. 2024, 9, 123–158. [Google Scholar] [CrossRef]
Yu, T.; Pei, L.-I.; Li, W.; Sun, Z.; Huyan, J. Pavement Surface Condition Index Prediction Based on Random Forest Algorithm. J. Highw. Transp. Res. Dev. (Engl. Ed.) 2021, 15, 1–11. [Google Scholar] [CrossRef]
Piryonesi, S.M.; El-Diraby, T. Climate Change Impact on Infrastructure: A Machine Learning Solution for Predicting Pavement Condition Index. Constr. Build. Mater. 2021, 306, 124905. [Google Scholar] [CrossRef]
Guo, W.; Zhang, J.; Cao, D.; Yao, H. Cost-Effective Assessment of in-Service Asphalt Pavement Condition Based on Random Forests and Regression Analysis. Constr. Build. Mater. 2022, 330, 127219. [Google Scholar] [CrossRef]
Guo, X.; Hao, P. Using a Random Forest Model to Predict the Location of Potential Damage on Asphalt Pavement. Appl. Sci. 2021, 11, 10396. [Google Scholar] [CrossRef]
Jia, X.; Woods, M.; Gong, H.; Zhu, D.; Hu, W.; Huang, B. Evaluation of Network-Level Data Collection Variability and Its Influence on Pavement Evaluation Utilizing Random Forest Method. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 331–345. [Google Scholar] [CrossRef]
Assaad, R.; El-adaway, I.H. Bridge Infrastructure Asset Management System: Comparative Computational Machine Learning Approach for Evaluating and Predicting Deck Deterioration Conditions. J. Infrastruct. Systems. 2020, 26. [Google Scholar] [CrossRef]
Kong, X.; Li, Z.; Zhang, Y.; Das, S. Bridge Deck Deterioration: Reasons and Patterns. Transp. Res. Rec. J. Transp. Res. Board 2022, 2676, 570–584. [Google Scholar] [CrossRef]
Cheng, C.; Ye, C.; Yang, H.; Wang, L. Predicting Rutting Development of Pavement with Flexible Overlay Using Artificial Neural Network. Appl. Sci. 2023, 13, 7064. [Google Scholar] [CrossRef]
Piryonesi, S.M.; El-Diraby, T. Using Data Analytics for Cost-Effective Prediction of Road Conditions: Case of The Pavement Condition Index: [Summary Report] (FHWA-HRT-18-065); Federal Highway Administration, Office of Research, Development, and Technology: Washington, DC, USA, 2018. [Google Scholar]
Adesunkanmi, R.; Al-Hamdan, A.; Nlenanya, I. Prediction of Pavement Overall Condition Index Based on Wrapper Feature-Selection Techniques Using Municipal Pavement Data. Transp. Res. Rec. J. Transp. Res. Board 2024, 2678, 208–221. [Google Scholar] [CrossRef]
Zeiada, W.; Dabous, S.A.; Hamad, K.; Al-Ruzouq, R.; Khalil, M.A. Machine Learning for Pavement Performance Modelling in Warm Climate Regions. Arab. J. Sci. Eng. 2020, 45, 4091–4109. [Google Scholar] [CrossRef]
Guo, R.; Fu, D.; Sollazzo, G. An Ensemble Learning Model for Asphalt Pavement Performance Prediction Based on Gradient Boosting Decision Tree. Int. J. Pavement Eng. 2022, 23, 3633–3646. [Google Scholar] [CrossRef]
Im, J.; Cho, I.-H.; Kim, J.K. FHDI: An R Package for Fractional Hot Deck Imputation. 2018. Available online: https://www.researchgate.net/profile/In-Ho-Cho-2/publication/328074285_FHDI_An_R_package_for_fractional_hot_deck_imputation/links/605ce5d7299bf173676ba434/FHDI-An-R-package-for-fractional-hot-deck-imputation.pdf (accessed on 10 October 2024).
Al-Suleiman (Obaidat), T.I.; Alatoom, Y.I. Development of Pavement Roughness Regression Models Based on Smartphone Measurements. J. Eng. Des. Technol. 2024, 22, 1136–1157. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification Using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Greenwell, B.M. Pdp: An R Package for Constructing Partial Dependence Plots. R J. 2017, 9, 421. [Google Scholar] [CrossRef]
Kheirati, A.; Golroo, A. Machine Learning for Developing a Pavement Condition Index. Autom. Constr. 2022, 139, 104296. [Google Scholar] [CrossRef]

Figure 1. Proposed framework.

Figure 2. Road network of the city of West Des Moines.

Figure 3. Proposed procedure for obtaining weights for PCI calculations.

Figure 4. Cross-validation RMSE for PCC pavement (left) and COM pavement (right) during the application of RFE for feature selection.

Figure 5. Partial dependence plots for the weights of IRI and transverse cracking for PCC pavement vs. RMS, standard deviation, and IQR.

Figure 6. Partial dependence plots for the weights of IRI and transverse cracking for COM pavement vs. RMS, standard deviation, and IQR.

Figure 7. The relationship between the old PCI and new PCI for PCC pavements (left) and COM pavements (right) for West Des Moines area in the year 2015.

Table 1. Example indicators for transportation asset condition assessment, methods of assessment, and corresponding scales.

Asset Type	Key Indicators	Method of Assessment	Scale of Assessment
Roads	Pavement Condition Index (PCI)	Visual inspection of cracks, rutting, etc.	0–100 scale
	Overall Condition Index (OCI)	Visual inspection (subjective ratings)	1–10 scale (PASER system)
	International Roughness Index (IRI)	Objective roughness measurement	Continuous scale (m/km)
Bridges	Bridge Condition Index (BCI)	Inspection of structural elements	0–9 component rating scale
Railways	Track Quality Index (TQI)	Measurements of track geometry	Varies by parameter (e.g., gauge, cant)
Sidewalks	Sidewalk Condition Index (SCI)	Visual inspection of sidewalk conditions	0–100 scale

Table 2. Variables’ descriptive statistics for PCC pavement.

Variable	Abbreviation	STD	Skewness	Kurtosis	Mean	Median	Min	Max	IQR
Transverse Cracking	TRANS	6.78	14.93	413.80	2.86	1.08	0.00	219.37	3.28
Joint Spalling	JSPAL	4.48	6.14	67.76	2.25	1.00	0.00	82.00	3.00
Longitudinal Wheel path Cracking	LONG_WP	12.96	22.33	588.60	0.96	0.00	0.00	413.55	0.00
Longitudinal Cracking	LONG_NP	88.68	4.02	27.31	49.05	13.62	0.00	1387.31	58.14
D-Cracking	DCRK	9.44	6.57	68.10	3.86	0.00	0.00	160.00	3.50
Patching	PATCH	4.04	7.41	94.78	1.36	0.00	0.00	83.00	1.00
International Roughness Index	IRI	90.02	0.79	0.90	233.16	223.87	64.53	684.86	126.16
Overall Condition Index	OCI	0.756	−1.94	7.35	8.60	8.5	2.0	10.00	1.5

Table 3. Variables’ descriptive statistics for COM pavement.

Variable	Abbreviation	STD	Skewness	Kurtosis	Mean	Median	Min	Max	IQR
Transverse Cracking	TRANS	25.49	2.76	18.38	26.67	20.77	0.00	354.02	27.00
Rutting	RUT	0.09	0.88	0.76	0.20	0.19	0.04	0.60	0.13
Longitudinal Wheel path Cracking	LONG_WP	127.68	2.78	10.42	92.35	50.01	0.00	1115.01	114.38
Longitudinal Cracking	LONG_NP	261.50	3.02	19.33	236.12	167.76	0.00	3179.33	261.93
Alligator Cracking	ALLIG	9.42	9.05	100.84	1.72	0.00	0.00	136.28	0.00
Patching	PATCH	1.28	5.49	39.65	0.42	0.00	0.00	13.00	0.00
International Roughness Index	IRI	79.19	1.49	4.35	205.76	191.79	70.10	712.59	96.36
Overall Condition Index	OCI	1.392	−0.55	−0.34	7.25	7.5	2.00	10.00	2.5

Table 4. Selected variables by the RFE for the weight estimation model.

Rank	Variable
Rank	PCC	COM
1	Distress Type	Distress Type
2	Standard Deviation	Standard Deviation
3	RMS	RMS
4	IQR	IQR
5	Maximum Value	Skewness

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Hamdan, A.B.; Alatoom, Y.I.; Nlenanya, I.; Smadi, O. Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework. CivilEng 2024, 5, 949-970. https://doi.org/10.3390/civileng5040048

AMA Style

Al-Hamdan AB, Alatoom YI, Nlenanya I, Smadi O. Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework. CivilEng. 2024; 5(4):949-970. https://doi.org/10.3390/civileng5040048

Chicago/Turabian Style

Al-Hamdan, Abdallah B., Yazan Ibrahim Alatoom, Inya Nlenanya, and Omar Smadi. 2024. "Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework" CivilEng 5, no. 4: 949-970. https://doi.org/10.3390/civileng5040048

APA Style

Al-Hamdan, A. B., Alatoom, Y. I., Nlenanya, I., & Smadi, O. (2024). Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework. CivilEng, 5(4), 949-970. https://doi.org/10.3390/civileng5040048

Article Menu

Weighting Variables for Transportation Assets Condition Indices Using Subjective Data Framework

Abstract

1. Introduction

2. Literature Review

2.1. Condition Indicators

2.2. Machine Learning Models

2.3. Feature Importance

3. Materials and Methods

3.1. Framework

3.1.1. Database

3.1.2. Data Preprocessing

3.1.3. Initial Weight Estimation

3.1.4. Final Weight Estimation

3.2. Case Study

3.2.1. Data Preparation

3.2.2. Weight Estimation

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI