Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures

Zahoor, Muhammad Farhan; Hussain, Arshad; Khattak, Afaq

doi:10.3390/infrastructures10060142

Open AccessArticle

Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures

by

Muhammad Farhan Zahoor

¹

,

Arshad Hussain

^1,*

and

Afaq Khattak

²

¹

School of Civil and Environmental Engineering (SCEE), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan

²

Department of Civil, Structural and Environmental Engineering, Trinity College Dublin, The University of Dublin, D02 PN40 Dublin, Ireland

^*

Author to whom correspondence should be addressed.

Infrastructures 2025, 10(6), 142; https://doi.org/10.3390/infrastructures10060142

Submission received: 18 December 2024 / Revised: 11 January 2025 / Accepted: 14 January 2025 / Published: 7 June 2025

(This article belongs to the Special Issue Advances in Artificial Intelligence for Infrastructures)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The longevity and safety of asphalt pavements, which form the foundation of our transportation infrastructure, are directly impacted by their performance. Pavement performance has traditionally been measured using the Marshall Mix Design method, which is a time- and resource-intensive laboratory procedure. Machine learning algorithms (MLAs) are increasingly popular today and are being utilized in various fields. Their performances vary; therefore, evaluating different MLAs and comparing them is important. The potential of various machine learning (ML) algorithms to predict Marshall Stability (MS) and Marshall Flow (MF) was investigated in this work. We collected data from published studies in the literature encompassing 732 data points to train and evaluate ML models. Eight key input parameters were considered for modeling. We used three feature importance analysis techniques (Random Forest, Permutation Importance, and Lasso Regression) to determine which parameters were the most significant. Linear regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machines (SVMs), Gradient Boosting Machines (GBMs), and Artificial Neural Networks (ANNs) were the six MLAs that were assessed. Robust statistical measures such as MSE, MAE, R², and RMSE were employed to evaluate each model’s performance. Our results indicate that the RF algorithm had the best performance for both MS and MF parameter prediction, followed by ANN and DT. The predicted and actual values showed a strong correlation, which was evidenced by the high R² and the lowest values in other error metrics, indicating good performance. This highlights the significance of selecting an optimal machine learning algorithm for a particular predictive task.

Keywords:

Marshall design; Marshall stability; Marshall flow; machine learning; artificial intelligence; prediction model

1. Introduction

Many countries aim to improve their road infrastructure, as this plays a significant role in the economy’s growth. Road infrastructure is the backbone of any country, allowing the efficient movement of people, goods, and services for the fulfillment of commercial and social activities while ensuring easy access to land and business [1]. Due to its durability, maintenance, and cost-effectiveness, an asphalt mixture is commonly used in road construction [2]. It mainly consists of an asphalt binder, fine aggregate, and coarse aggregate. It provides a comfortable ride surface, reduces noise, and improves driving safety [3]. To guarantee the performance and quality of asphalt pavements, asphalt mixtures require optimum ratios of aggregate and binder. Several methods are used in asphalt mix design, but various countries, especially Pakistan, extensively use the Marshall Mix Design (MMD) method. Marshall Stability (MS) and Marshall Flow (MF) properties are common metrics for evaluating the performance of asphalt [4].

The asphalt mixture’s stability, durability, and flexibility all affect the performance of flexible pavement. Traditional hot mix asphalt (HMA) design techniques aim for the optimum asphalt content to achieve these objectives [5]. There are numerous steps in asphalt mix design in a laboratory that are used to calculate the appropriate amount of asphalt binder, known as a design binder content, used for bulk manufacturing. This is performed to fulfill the required volumetric and strength requirements [6]. An asphalt mix design method determines the optimum ratios of aggregate and asphalt for a pavement mix. Asphalt mix design methods include several methods, such as the Marshall method, along with the Hveem and Superpave systems [7]. Among them, the Marshall method is used extensively by various countries. In Pakistan, commonly employed methods for asphalt mix design are the MMD method and the Modified MMD method.

Marshall Stability and Flow are two important outcomes of this Marshall Mix Design method. These parameters provide performance measures for asphalt pavement [8]. Marshall Stability measures the maximum load a specimen supports before permanent deformation. By contrast, the Marshall Flow measures specimen deforming due to loading until cracking. High stability provides long-lasting pavement performance, while moderate flow allows for changes under different temperatures and traffic loading conditions [5].

Machine learning algorithms (MLAs) are frequently used in the engineering field for their ability to recognize complex patterns among various parameters or features and generate reliable predictions [9]. Their techniques are popular today and are being used in many fields for different purposes, including complex pattern recognition, language interpretation, and research [10]. Each MLA has its limitations, and they perform differently in prediction tasks. As the performance of MLAs changes depending on the datasets and the target prediction task, it is important to evaluate different MLAs and compare their performance [11]. This involves analyzing each algorithm’s performance in predicting a specific parameter based on the provided data. This helps us find the appropriate algorithm for the predictive task and choose the best-performing ML model, resulting in improved and more reliable outcomes.

Many researchers have employed machine learning techniques to predict the properties of asphalt mixtures [12], as well as Marshall Stability and Flow and access performance in flexible pavements. These studies also compared various machine learning algorithms. For example, Ref. [13] used ML techniques to predict performance in flexible pavements by first utilizing Regression Forest (RF) for feature selection and then employing several MLAs, including Ensemble Trees, Regression Trees, Gaussian Process Regression, Support Vector Machines, and Artificial Neural Networks, to compare their performance. Also, Ref. [14] evaluated an ANN model for predicting hot mix asphalt volumetrics using the Marshall Mix Design by training 835 mix design data. Their model predicted air void parameters effectively. Ref. [15] employed neural networks (NNs) to forecast MS and MF for asphalt mixtures treated with polypropylene (PP). The modifier improved the MS and stiffness indicator, as well as the Marshall Quotient (MQ). The NN model was able to accurately predict MS, MF, and MQ values when compared to mechanical tests. Ref. [16] assessed ANN to evaluate the resilience modulus of bound and unbound C&D materials as naturally quarried materials to reduce the costs and time associated with testing. The model predicted the modulus for C&D aggregates effectively. Four parameters of asphalt modified with graphene oxide (GO) were predicted using an ANN [17]. Those authors utilized random sampling via Monte Carlo simulation for effective model training, achieving high accuracy in prediction. Similarly, ref. [18] employed three MLAs, namely Artificial Neural Network, Adaptive Neuro-Fuzzy Inference System, and Multi Expression Programming, and compared their performance for MS and MF prediction. The MEP model outperformed both ANFIS and ANN for MS and MF. Ref. [19] evaluated statistical methods and ANNs to develop a predictive model for assessing the rutting performance of asphalt mixtures modified with waste alumina. They incorporated recycled concrete aggregate (RCA) and conducted experiments using dynamic creep testing, wheel tracking testing, and volumetric properties analysis. Furthermore, ref. [20] tested 13 different HMA mixtures using the Cooper wheel tracking test, the Asphalt pavement analyzer test, and the repeated load axial test (RLAT). The resulting data were analyzed using three ANN algorithms, namely Backpropagation (BP), Conjugate gradient (CG), and Broyden-Fletcher Goldfarb Shanno (BFGS), to predict the rutting parameter. Ref. [21] utilized a large experimental dataset to predict MS and MF using ANNs in dense-graded glassphalt mixes. The predictive models exhibited high accuracy with R-squared values of 93.6% and 85.7% for MS and MF, respectively. Ref. [22] researched the development of an ANN model for predicting deterioration in asphalt pavement using the Back-Propagation Neural Networks (BPNN) technique. The model showed a strong correlation between the predicted IRI values and the corresponding measured values. Ref. [23] employed the least-square support vector machine and ANN methods to assess Marshall parameters in bituminous mixes modified with waste polyethylene (PE). The LS-SVM model showed better and more reliable prediction capability compared to the neural network (NN) model. Ref. [4] developed predictive models for MS and MF for both the Asphalt Base Course (ABC) and the Asphalt Wearing Course (AWC). They employed the Multi-Expression Programming (MEP) using extensive datasets consisting of 253 for ABC and 343 for AWC. The predictive models demonstrated strong correlation coefficients (R > 0.90) and effective prediction capability for both ABC and AWC. Ref. [24] employed machine learning (ML) techniques to predict Marshall design parameters and collected datasets from various literature sources. Four ML algorithms were employed, namely, Linear Regression, Polynomial Regression, k-Nearest Neighbor (KNN) and Support Vector Regression (SVR). Material properties and their ratios in the mixture were the input parameters, whereas six MDPs were the output parameters. Their results showed that SVR achieved the highest accuracy but exhibited reduced performance in nested cross-validation CV, so the model KNN with the second highest performance was recommended.

Many researchers have attempted to predict pavement properties like MS and MF but often fail to undertake a thorough analysis of feature importance. This study attempts to address this gap by employing and comparing three feature-importance techniques to identify the key factors influencing model performance. Additionally, numerous researchers have employed a limited selection of ML algorithms for predicting Marshall test parameters but have rarely comprehensively compared their predictive performance. To address this gap, this study compares and evaluates the performance of several ML algorithms.

This study employs several MLAs to predict the MS and MF of asphalt mixtures. It evaluates the effectiveness of six MLAs, namely, Linear Regression (LR), Decision Tree Regressor (DT), Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Artificial Neural Network (ANN). Furthermore, hyperparameter tuning is performed to optimize their performance. The study also aims to access the relative significance of each parameter through feature importance analysis using three algorithms. This is achieved by employing and comparing three different algorithms, namely Random Forest (RF), Permutation Importance (PI), and Lasso Regression (LassoR). The performance of each model is evaluated using key performance metrics, namely, Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared, and their results are compared.

2. Methodology

This study employed Python (v3.11.7), utilizing the Anaconda environment, to develop MS and MF machine learning models. Several Python libraries were used, namely, NumPy (v1.26.3) for numerical computations, Pandas (v2.1.4) for data processing, Scikit-learn (v1.3) for machine learning algorithms, and Matplotlib (v3.8) and Seaborn (v0.13) for data visualization. The methodology employed in this study is depicted in Figure 1. The data were collected from published literature and consolidated into a dataset. After completing the dataset, we identified the input and output parameters. In this study, eight parameters as input and two parameters as output were selected. Subsequently, data preprocessing steps were undertaken. This process involved handling missing data and identifying and addressing the outliers. To guarantee that each feature contributed equally to the model during training, the feature normalization or scaling technique was used, which prevented numerical instability and improved model performance.

A detailed literature review helped us select the models for this study. These models have been employed widely in the pavement engineering literature, proving their effectiveness and relevance for similar tasks. LR is one of the simplest ML models, making it a suitable starting point for comparison. It was selected as the baseline model because it provides a reference point for evaluating the performance of more complex algorithms. Since LR assumes a linear relationship between features and the target variable, it serves as a benchmark. If more advanced models, such as RF or ANN, do not significantly outperform LR, this implies that the relationships in the data may be simpler than expected, and the additional complexity of advanced models might not be justified. The DT algorithm is suitable for non-linear modelling and also works well with complex data to capture complex patterns. RF and GBM were chosen for their reliability and ability to handle limited datasets effectively. In addition, ANN and SVM were included to explore their potential, as they are known to perform well when properly tuned, even with limited data. The selected models enabled a comparative analysis of their performance, which was also a key objective of this study, i.e., to identify the most effective algorithm for predicting MS and MF.

To ensure data reliability in this study, we implemented several preprocessing steps. Missing data were carefully handled by identifying and addressing gaps before analysis, ensuring the dataset was complete. StandardScaler was applied to standardize the data for feature scaling, which was necessary to maintain the consistency across variables with different ranges and units. Additionally, to remove any potential bias resulting from the original dataset order and ensure that the training and testing sets were representative of the entire dataset, we randomized the data using a shuffling technique. Descriptive statistics for each parameter were also calculated, which provided valuable insights into the ranges, mean, median, standard deviation, range, skewness, and kurtosis of the dataset. A heatmap of multicollinearity among the input parameters was also created for all parameters. Feature importance analysis was performed to identify the significance of input parameters on MLA performance. Three algorithms were used, i.e., Lasso Regression, Random Forest and Permutation Importance. The collected data were divided into two training and testing datasets, in which 80% was allocated for the training set and the remaining 20% was assigned for testing. A method known as grid search was employed for MLAs for the optimum configuration of hyperparameters, which are important for the performance of models. Four key performance measures, namely, MAE, MSE, RMSE, and coefficient of determination, were calculated for both training and testing datasets to assess the performance of each model.

2.1. Dataset

Data from various papers, consisting of 732 datapoints, were combined into a comprehensive dataset for the development of ML models, as shown in Table 1. A total of eight features were considered as input parameters, namely, Penetration (P), Softening Point (S.P.), Amount of Bitumen (Pb), Bulk Specific Gravity Aggregate (Gsb), Bulk Specific Gravity of Compacted Aggregate (Gmb), Air Void (Va), VMA, and VFA. Marshall Stability and Flow were the output parameters. Only those values were collected from the literature where the Marshall test was conducted on a virgin asphalt mixture specimen without any modifiers. Different units for the features described in the literature were converted to consistent units. Additionally, VFA values were derived using VMA and Va values, as they were not reported in some of the papers. The statistics of the dataset are summarized in Table 2, providing insights into the data distribution for each parameter. For both the MS and MF models, the dataset was divided into 80% and 20% subsets for the training and testing of MLAs, respectively.

2.2. Data Scaling

Since the values of parameters were on different scales, a data scaling or normalization technique was used. Scaling the data to a common range is a preprocessing step to ensure that every feature contributes equally to the model prediction. The data can be normalized using several methods, including decimal scaling, robust scaling, Z-score normalization, and Min-Max scaling [34]. This study employed the Z-score normalization method to normalize the dataset. In this method, the data are transformed to have a standard deviation of 1 and a mean of 0 using the formula [35] shown below.

X_{standardized} = \frac{x - μ}{σ}

where x denotes the original parameter value, μ denotes the mean of the parameter values, σ denotes the standard deviation of parameter values, and X_standardized is the standardized value.

2.3. Correlation Heatmap

The correlation heatmap helps to visualize the relationships between various dataset features that indicate how strongly or weakly the features are correlated with each other. It ranges from −1 to 1. A value of +1 shows a strong positive (direct) linear relationship, while a value of –1 shows a strong negative (inverse) linear relationship and 0 means no linear relationship. The correlation heatmap provides information about the linear relationship between parameters, but there could still be a nonlinear relationship between the variables [36]. The correlations between the various properties in the data are presented in Table 3 and are shown in Figure 2.

The graph illustrates that MS has an inverse relationship of –0.61 and –0.51 with VMA and amount of bitumen, respectively, while showing a positive relationship of 0.52 with Bulk Specific Gravity (Gmb). Air voids and Marshall Flow have a negative relationship of −0.52. This suggests that the Marshall Flow tends to decrease as the air voids increase. Furthermore, Marshall Flow has a positive relationship of 0.45 with VFA. This implies that there is a tendency for the Marshall Flow to increase with the increase in VFA.

2.4. Feature Importance Analysis

In ML, feature importance analysis is a technique used to identify the relative significance of multiple features [37]. We used three algorithms, namely, Lasso Regression, Random Forest, and Permutation Importance. We analyzed the feature importance to determine the significance of each parameter in making model predictions for both Marshal Stability and Flow.

2.5. Algorithms

In this study, we employed six different MLAs to develop MS and MF prediction models, namely, Linear Regression (LR), Decision Tree Regressor (DT), Random Forest Regressor (RF), Support Vector Machines (SVM), Gradient Boosting Machines (GBM), and Artificial Neural Network (ANN). Except for LR, all models were trained using a set of hyperparameters. The grid search method was employed for these models. The complete set of hyperparameter values is listed in Table 4. The performance of ML models was measured using MAE, MSE, RMSE, and R².

2.5.1. Linear Regression (LR)

LR is a popular technique for modeling the relationship between the dependent and one or more independent variables. It assumes a linear relationship between them [38]. Unlike other algorithms, no specific hyperparameters were used for the LR model, and it was trained and tested with 80% and 20% datasets, respectively.

2.5.2. Decision Tree Regressor (DT)

DT is a supervised learning technique that operates like a tree structure for target value prediction. The parameters are arranged either in ascending or descending order based on the values. The model consists of nodes and branches. A node represents the features of groups that are to be classified, and a branch shows the potential values for the node [39].

First, the data were preprocessed by normalization and then separated into subsets for testing and training at an 80:20 ratio. The model was trained with the hyperparameters as criterion = friedman_mse, splitter = best, max_depth = 8, min_samples_split = 2, min_samples_leaf = 2, max_features and random state = None for the MS whereas criterion = friedman_mse, splitter = best, max_depth = 12, min_samples_split = 5, min_samples_leaf = 2, max_features = sqrt, and random state = None for the MF variable.

2.5.3. Random Forest Regressor (RF)

RF is an ML algorithm that creates numerous decision trees while training the model and combines their results to reach a prediction [40]. After normalization, the data were split into subsets of 80:20 ratio. For optimal performance of the RF model, several sets of hyperparameters were tested using the grid search method. The optimal hyperparameters were identified as “max_depth = 10, min_samples_split = 2, and min_samples_leaf = 2 for the MS, whereas the optimal hyperparameters were identical except for “min_samples_split = 5” for the MF variable.

2.5.4. Support Vector Machines (SVM)

SVM is an ML algorithm that locates a hyperplane to separate data points from one class to another. The hyperplane with the largest margin between the classes is considered optimal. Each data point is defined as a point in n-dimensional space in which n represents the number of features [41]. After normalization, the dataset was divided into 80:20 ratio for training and testing. To optimize the performance of the SVM model, we used the grid search method to test different sets of hyperparameters. Different optimal hyperparameters were used for the target variables: for MS, the grid search identified ”kernel = rbf, C = 10 and gamma = auto” as the optimal hyperparameters, whereas kernel = rbf, C = 1 and gamma = scale” were identified as the optimal hyperparameters for the MF variable.

2.5.5. Gradient Boosting Machines (GBM):

GBM is a machine learning algorithm that improves the prediction performance by iteratively combining weak decision trees and creating a strong prediction model. This algorithm aims to reduce prediction errors and increase the model’s accuracy gradually [42]. These algorithms have shown notable effectiveness in a variety of real-world applications [43]. Once the dataset had been normalized, it was divided into an 80:20 ratio and its performance was optimized using the grid search method to test various sets of hyperparameters to find the optimal ones. Different optimal hyperparameters were used for the target variables: for MS, the grid search identified “n_estimators = 200, learning_rate = 0.1 and max_depth = 5” as the optimal hyperparameters, whereas “n_estimators = 150, learning_rate = 0.01 and max_depth = 5” were identified as the optimal hyperparameters for the MF variable.

2.5.6. ANN

ANN is an ML algorithm that consists of a network of interconnected artificial neurons that work together for processing data, recognizing patterns, and producing outputs. It consists of a three layered structure, comprising the input, hidden, and the output layers [44]. As with the other models, the dataset was split into 80:20 ratio after normalization. Additionally, grid search method was used for tuning hyperparameters to find the optimal ones.

For MS, two hidden layers were set, with each containing 100 neurons. The hyperparameter “solver” was set to “Adam,” which is an algorithm used to update weights during training. The activation function, which applies to each neuron’s output in hidden layers, was set to “relu,” and the model was trained for a maximum of 500 iterations.

For MF, two hidden layers were set, with each containing 100 and 50 neurons, respectively. The hyperparameter “solver” was set to “Adam”. The activation function was set to “relu,” and the model was trained for a maximum of 200 iterations.

2.6. Model Performance Assessment

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(1)

M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(2)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(3)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \overline{y})^{2}}

(4)

where

y_{i}

denotes observed value,

{\hat{y}}_{i}

denotes predicted value,

\overline{y}

denotes mean of observed values, and n is the number of observations

3. Results and Analysis

3.1. Feature Importance Analysis

We employed three algorithms, i.e., Random Forest (RF), Permutation Importance (PI), and Lasso Regression (LassoR), to evaluate the feature significance for predicting MS and MF. The results of these algorithms are presented in Table 5 and Table 6 and also illustrated in Figure 3.

3.1.1. Marshall Stability (MS)

Feature importance analysis for MS using RF, PI, and LassoR revealed that the VMA was the most influential factor for the stability of the asphalt mix in RF and PI algorithms. The amount of bitumen Pb was the most influential factor for MS in the LassoR algorithm. Other features, such as Bulk Specific Gravity (Gsb) and Air Voids, also contributed, though less influentially, to the stability of the asphalt mixture.

3.1.2. Marshall Flow (MF)

For MF, Air Voids, and Bulk Specific Gravity, the compacted mixture (Gmb) exhibited the most significance, particularly in the RF and PI algorithms. For Air Voids, the algorithms yielded an importance score of 0.37 for RF, 0.35 for PI, and 0.42 for LassoR, whereas Bulk Specific Gravity (Gmb) yielded values of 0.18 and 0.39 for RF and PI, respectively. This also indicated the importance of both Air Void and Bulk Specific Gravity (Gmb) in controlling the flow behavior of the asphalt under loading, which is essential for its flexibility.

3.2. Machine Learning Models

3.2.1. Linear Regression (LR)

Linear Regression analysis was employed for predicting the MS and MF parameters of the asphalt mixture. The model’s performance on the test set is depicted in Figure 4a. The model exhibited poor predictive performance on the training and testing datasets for the MS parameter. The training dataset yielded values of MSE, MAE, and RMSE as 16.6, 3.4, 4, and R² as 0.61, respectively. By contrast, the testing dataset showed a low R-squared of 0.53, RMSE of 4, MAE of 3.4, and an MSE of 16.6. Overall, the accuracy of the linear regression model was limited.

For the MF parameter, the LR algorithm also performed poorly, as indicated by the evaluation metrics. On the training set, the model showed moderate performance with an MSE of 4.7, MAE of 4.6, RMSE of 0.68, and an R-squared (R²) score of 0.46. The LR model’s low predictive performance was also confirmed by test set results, yielding MSE, MAE, RMSE, and R-squared values of 0.34, 0.43, 0.58, and 0.47, respectively. Overall, the model exhibited a poor fit on both training and test sets.

3.2.2. Decision Tree (DT)

The DT model was employed for predicting the Marshall Stability and Flow of the asphalt mixture. Its performance for the test dataset is depicted in Figure 4a. The model performed well for the training datasets. It yielded values for MSE, MAE, RMSE, and R-squared as 0.91, 0.46, 0.95, and 0.97, indicting good model performance on the training set. The performance on the testing set showed a slight decline in performance. The results yielded the metrics MAE, MSE, RMSE, and R² as 1, 6.9, 2.6, and 0.83. These results demonstrated that the DT model has good predictive capability for the MS parameter.

The evaluation of the model for MF showed moderate results. For the training set, the model showed moderate performance, with values for MSE, MAE, RMSE, and R-squared of 0.15, 0.14, 0.39, and 0.82. The test dataset results showed average model performance and indicated values of MSE, MAE, RMSE, and R² as 0.19, 0.23, 0.43, and 0.70. Overall, the model exhibited moderate predictive performance.

3.2.3. Random Forest (RF)

The performance of the RF algorithm is presented in Figure 4a. The results after hyperparameter tuning and evaluation showed strong performance in predicting the MS variable. It yielded the metrics MSE, MAE, RMSE, and R² as 0.86, 0.47, 0.92, and 0.98 on the training set. Low error metrics suggested high accuracy, and a high R-squared score indicated a good fit of the model on the training set. The test set also exhibited good performance with values of MSE, MAE, RMSE, and R² as 2.9, 0.85, 1.7, and 0.93. These results indicate that the RF model had reasonably good performance for the MS variable, suggesting that the model is reliable for MS prediction.

The RF model for the MF parameter also demonstrated a good fit, as it performed well on both the training and test sets based on the evaluation metrics. For the training set, the model showed good performance, as indicated by MSE, MAE, RMSE, and R² values of 0.13, 012, 0.36, and 0.85. These performance metrics suggest that the model fit the training data well with decent accuracy. The test set results yielded values of MSE, MAE, RMSE, and R² as 0.13, 0.20, 0.35, and 0.80. These result shows that the model achieved good predictive accuracy with low error metrics. As compared to the training set, the RF model showed good prediction metrics for test dataset. All things considered, the Random Forest model fit the training data quite well and generalized to the test set effectively. The test set results showed that the model has good performance and predictive ability.

3.2.4. Support Vector Machines (SVM)

The performance of the SVM algorithm is shown in Figure 4b. For the provided data, the SVM algorithm for MS demonstrated moderate performance for the training set. It yielded values of MSE, MAE, RMSE, and R² as 8.2, 1.6, 2.9, and 0.80. These results indicated a good model fit for the training set. The test set results indicated a somewhat similar performance. It resulted in MSE, MAE, RMSE, and R² value of 9.7, 1.9, 3.1, and 0.77. Overall, the SVM model results demonstrated moderate performance on the training dataset.

According to the SVM model’s evaluation metrics, it exhibited poor performance on the training and test datasets for the MF variable, as indicated by the R² values of only 0.57 for the training and 0.62 for the testing datasets. In summary, the model resulted in poor performance and was not reliable for predicting the MF variable.

3.2.5. Gradient Boosting Machines (GBM)

The GBM model for MS performed exceptionally well for the training set, as indicated by values of MSE, MAE, RMSE, and R² as 0.12, 0.07, 0.11, and 0.99, respectively. This near-perfect performance suggests that the model may have overfitted, learning noise rather than the patterns. The testing metrics demonstrated a drop in performance, indicating the model had less predictive capability for new data. The model yielded MSE, MAE, RMSE, and R² values as 3.8, 0.8, 1.7, and 0.91. Consequently, the GBM algorithm performed poorly for the Marshall Stability (MS) variable due to overfitting.

For the MF variable, the evaluation metrics of the GBM model demonstrated similar performance for training and test datasets. It demonstrated near-perfect performance for training and a slight decline in performance for the testing set. These metrics suggest an overfitting model that is unreliable for new data.

3.2.6. Artificial Neural Network (ANN)

The ANN model performed well for the MS variable on the training set, yielding values of MSE, MAE, RMSE, and R² as 2.6, 1.1, 1.6, and 0.94. By contrast, the test dataset yielded values as 4.5, 1.4, 2.1, and 0.89. Overall, the ANN algorithm showed good performance for the MS variable.

For the performance metrics of the MF parameter, the model exhibited moderate performance. On the training set, the model yielded values of MSE, MAE, RMSE, and R² as 0.21, 0.27, 0.46, and 0.75. The testing dataset metrics demonstrated somewhat similar results compared to the training set, yielding MSE, MAE, RMSE, and R² values of 0.18, 0.31, 0.43, and 0.72. Overall, the model showed moderate performance for the MF variable.

3.3. Comparison of Model’s Performance

The predictive performance of all six models on the testing datasets for the MS and MF target variables is depicted in Figure 5. The performance metrics, i.e., MSE, MAE, RMSE, and R², were compared using bar graphs. Detailed results for these metrics on both the training and testing datasets of MS and MF variables are summarized in Table 7. A comparison of training and testing dataset performance metrics is also illustrated in Figure 6. Based on these results, for the test dataset, the analysis revealed that the Random Forest (RF) algorithm performed better among all algorithms, as it achieved the lowest MSE, MAE, and RMSE while attaining the highest R² score comparatively. Consequently, the RF model can be considered the top-performing model for predicting the MS parameter, followed by the ANN and DT.

For the Marshall Flow (MF) variable, the results demonstrated that the RF algorithm, followed by DT and ANN, outperformed other models. It is also noteworthy that the GBM and LR algorithms performed poorly among all models for both MS and MF prediction. This evaluation highlights the better performance of the RF algorithm in predicting MS and MF parameters, making it the preferable choice for reliable prediction of the target parameters.

4. Conclusions

This study aimed to employ various machine learning algorithms to evaluate and compare their effectiveness for predicting Marshall Stability and Marshall Flow parameters. The following are the deduced conclusions:

VMA was the most important factor influencing the stability of the asphalt mixture in RF and PI algorithms. The amount of bitumen Pb was the most influential factor for MS in the LassoR algorithm.
The feature importance analysis for the Marshall Flow (MF) parameter identified the Air Voids and Bulk Specific Gravity (Gmb) as the most significant factors, particularly in RF and PI algorithms. Although VMA was identified as an influential factor for the MS parameter, it showed less significance for the MF parameter.
The Random Forest (RF) algorithm performed better than the other algorithms in predicting MS, achieving the lowest error metrics and the highest R² score, making it a reliable MS prediction model. ANN and DT also performed well, but their performance was inferior to that of the RF model.
In the case of MF prediction, again, the Random Forest (RF) algorithm demonstrated good performance, followed by DT and ANN models, making them reliable for MF prediction.
The GBM and LR algorithms performed poorly compared to other models in predicting both the MS and MF parameters.
Implementing more complex hyperparameters optimization techniques, such as Bayesian optimization, may help achieve better configurations for the model, as overfitting was observed in some of them.
Using larger datasets may help the models to make better predictions by allowing them to recognize patterns more effectively and reduce the chances of overfitting.
Although this study focused on specific ML algorithms, evaluating and comparing other algorithms may yield better results or different insights, potentially lead to better performance.
Several other critical properties of asphalt pavements are also essential for long-term pavement performance, such as fatigue resistance, rutting potential, and thermal cracking resistance. It would be valuable to undertake a comparative analysis of the selected ML models to effectively predict these other properties as well.
Identifying features such as VMA, air void % and Gmb can help engineers understand their impact on MS and MF, which may facilitate the optimization of asphalt mix designs to create durable and flexible pavements that withstand traffic loads.
This study also highlighted the importance of selecting suitable algorithms for specific prediction tasks. The superior performance of the RF algorithm, compared to others, can serve as a recommendation for researchers to leverage ensemble-based approaches for similar engineering tasks.
The methodology applied in this study can be extended to predict other critical pavement properties, such as fatigue resistance, rutting potential, and thermal cracking resistance. This approach facilitates the use of ML in a broader range of pavement design and analysis tasks.

Author Contributions

Conceptualization, M.F.Z. and A.H.; Data curation, M.F.Z.; Formal analysis, M.F.Z., A.H. and A.K.; Investigation, A.H.; Methodology, M.F.Z.; Software, M.F.Z.; Supervision, A.H.; Validation, A.K.; Visualization, A.K.; Writing—original draft, M.F.Z.; Writing—review & editing, M.F.Z. and A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ng, C.P.; Law, T.H.; Jakarni, F.M.; Kulanthayan, S. Road infrastructure development and economic growth. IOP Conf. Ser. Mater. Sci. Eng. 2019, 512, 012045. [Google Scholar] [CrossRef]
Luo, D.; Khater, A.; Yue, Y.; Abdelsalam, M.; Zhang, Z.; Li, Y.; Li, J.; Iseley, D.T. The performance of asphalt mixtures modified with lignin fiber and glass fiber: A review. Constr. Build. Mater. 2019, 209, 377–387. [Google Scholar] [CrossRef]
Li, Y.; Hao, P.; Zhang, M. Fabrication, characterization and assessment of the capsules containing rejuvenator for improving the self-healing performance of asphalt materials: A review. J. Clean. Prod. 2021, 287, 125079. [Google Scholar] [CrossRef]
Awan, H.H.; Hussain, A.; Javed, M.F.; Qiu, Y.; Alrowais, R.; Mohamed, A.M.; Fathi, D.; Alzahrani, A.M. Predicting Marshall Flow and Marshall Stability of Asphalt Pavements Using Multi Expression Programming. Buildings 2022, 12, 314. [Google Scholar] [CrossRef]
Kim, Y.; Kim, Y.R. Prediction of layer moduli from falling weight deflectometer and surface wave measurements using artificial neural network. Transp. Res. Rec. 1998, 1639, 53–61. [Google Scholar] [CrossRef]
Chakroborty, P.; Das, A.; Ghosh, P. Determining Reliability of an Asphalt Mix Design: Case of Marshall Method. J. Transp. Eng. 2010, 136, 31–37. [Google Scholar] [CrossRef]
Radzi, H.M.; Muniandy, R.; Hassim, S.; Law, T.H.; Jakarni, F.M. An overview of asphalt mix designs using various compactors. IOP Conf. Ser. Mater. Sci. Eng. 2019, 512, 012031. [Google Scholar] [CrossRef]
Nouman, M.; Maqbool, Z.; Ali, S.; Saleem, A. Performance Evaluation of Wearing Course Asphalt Mixes Based on Resilient Modulus, Indirect Tensile Strength and Marshall Stability. Int. J. Pavement Res. Technol. 2022, 15, 63–72. [Google Scholar] [CrossRef]
Ayoub, M. A review on machine learning algorithms to predict daylighting inside buildings. Sol. Energy 2020, 202, 249–275. [Google Scholar] [CrossRef]
Dhall, D.; Kaur, R.; Juneja, M. Machine Learning: A Review of the Algorithms and Its Applications. Lect. Notes Electr. Eng. 2020, 597, 47–63. [Google Scholar] [CrossRef]
Gupta, S.; Saluja, K.; Goyal, A.; Vajpayee, A.; Tiwari, V. Comparing the performance of machine learning algorithms using estimated accuracy. Meas. Sens. 2022, 24, 100432. [Google Scholar] [CrossRef]
Li, Y.; Wang, L. Computer-aided procedure for determination of asphalt content in asphalt mixture using discrete element method. Int. J. Pavement Eng. 2017, 18, 765–774. [Google Scholar] [CrossRef]
Alnaqbi, A.J.; Zeiada, W.; Al-Khateeb, G.; Abttan, A.; Abuzwidah, M. Predictive models for flexible pavement fatigue cracking based on machine learning. Transp. Eng. 2024, 16, 100243. [Google Scholar] [CrossRef]
Ozturk, H.I.; Saglik, A.; Demir, B.; Gungor, A.G. An artificial neural network base prediction model and sensitivity analysis for marshall mix design. In Proceedings of the 6th Eurasphalt & Eurobitume Congress, Prague, Czech Republic, 1–3 June 2016. [Google Scholar] [CrossRef]
Tapkin, S.; Çevik, A.; Uşar, Ü. Prediction of Marshall test results for polypropylene modified dense bituminous mixtures using neural networks. Expert. Syst. Appl. 2010, 37, 4660–4670. [Google Scholar] [CrossRef]
Oskooei, P.R.; Mohammadinia, A.; Arulrajah, A.; Horpibulsuk, S. Application of artificial neural network models for predicting the resilient modulus of recycled aggregates. Int. J. Pavement Eng. 2020, 23, 1121–1133. [Google Scholar] [CrossRef]
Hoang, H.G.T.; Nguyen, T.A.; Nguyen, H.L.; Ly, H.B. Neural network approach for GO-modified asphalt properties estimation. Case Stud. Constr. Mater. 2022, 17, e01617. [Google Scholar] [CrossRef]
Gul, M.A.; Islam, M.K.; Awan, H.H.; Sohail, M.; Al Fuhaid, A.F.; Arifuzzaman, M.; Qureshi, H.J. Prediction of Marshall Stability and Marshall Flow of Asphalt Pavements Using Supervised Machine Learning Algorithms. Symmetry 2022, 14, 2324. [Google Scholar] [CrossRef]
Ismael, M.Q.; Joni, H.H.; Fattah, M.Y. Neural network modeling of rutting performance for sustainable asphalt mixtures modified by industrial waste alumina. Ain Shams Eng. J. 2023, 14, 101972. [Google Scholar] [CrossRef]
Shan, A.; Hafeez, I.; Hussan, S.; Jamil, M.B. Predicting the laboratory rutting response of asphalt mixtures using different neural network algorithms. Int. J. Pavement Eng. 2022, 23, 1948–1956. [Google Scholar] [CrossRef]
Jweihan, Y.S.; Alawadi, R.J.; Momani, Y.S.; Tarawneh, A.N. Prediction of Marshall Test Results for Dense Glasphalt Mixtures Using Artificial Neural Networks. Front. Built Environ. 2022, 8, 949167. [Google Scholar] [CrossRef]
Solatifar, N.; Lavasani, S.M. Development of an artificial neural network model for asphalt pavement deterioration using LTPP data. J. Rehabil. Civil. Eng. 2020, 8, 121–132. [Google Scholar] [CrossRef]
Khuntia, S.; Das, A.K.; Mohanty, M.; Panda, M. Prediction of Marshall Parameters of Modified Bituminous Mixtures Using Artificial Intelligence Techniques. Int. J. Transp. Sci. Technol. 2014, 3, 211–227. [Google Scholar] [CrossRef]
Atakan, M.; Yıldız, K. Prediction of Marshall design parameters of asphalt mixtures via machine learning algorithms based on literature data. Road Mater. Pavement Des. 2024, 25, 454–473. [Google Scholar] [CrossRef]
Ibrahim, A.H.A. Effects of long-term aging on asphalt mixes containing SBS and PP-polymer. Int. J. Pavement Res. Technol. 2021, 14, 153–160. [Google Scholar] [CrossRef]
Azarhoosh, A.; Pouresmaeil, S. Prediction of Marshall Mix Design Parameters in Flexible Pavements Using Genetic Programming. Arab. J. Sci. Eng. 2020, 45, 8427–8441. [Google Scholar] [CrossRef]
Pasha, S.N.; Madhuri, D.M. Investigation of Modified Bitumen Using Glass Fibre in Bituminous Concrete. Int. J. Adv. Res. Innov. Ideas Educ. 2017, 3, 298–311. [Google Scholar]
Zhu, J.; Ma, T.; Fan, J.; Fang, Z.; Chen, T.; Zhou, Y. Experimental study of high modulus asphalt mixture containing reclaimed asphalt pavement. J. Clean. Prod. 2020, 263, 121447. [Google Scholar] [CrossRef]
Abdel-Jaber, M.; Al-shamayleh, R.A.; Ibrahim, R.; Alkhrissat, T.; Alqatamin, A. Mechanical properties evaluation of asphalt mixtures with variable contents of reclaimed asphalt pavement (RAP). Results Eng. 2022, 14, 100463. [Google Scholar] [CrossRef]
Naser, M.; Abdel-Jaber, M.T.; Al-shamayleh, R.; Louzi, N.; Ibrahim, R. Evaluating the effects of using reclaimed asphalt pavement and recycled concrete aggregate on the behavior of hot mix asphalts. Transp. Eng. 2022, 10, 100140. [Google Scholar] [CrossRef]
Chowdhury, R.; Al Biruni, M.T.; Afia, A.; Hasan, M.; Islam, M.R.; Ahmed, T. Medical Waste Incineration Fly Ash as a Mineral Filler in Dense Bituminous Course in Flexible Pavements. Materials 2023, 16, 5612. [Google Scholar] [CrossRef]
Harsha, K.S.; Nikhil, M.; Raja, K.H. Article ID: IJCIET_08_04_131 Cite this Article: K. Sri Harsha, M. Nikhil and K. Hemantha Raja, Partial Replacement of Bitumen with Glass Fiber in Flexible Pavement. 2017. Available online: https://iaeme.com/MasterAdmin/Journal_uploads/IJCIET/VOLUME_8_ISSUE_4/IJCIET_08_04_131.pdf (accessed on 1 December 2023).
Pérez, I.; Pasandín, A.R.; Medina, L. Hot mix asphalt using C&D waste as coarse aggregates. Mater. Des. 2012, 36, 840–846. [Google Scholar] [CrossRef]
Raju, V.N.G.; Lakshmi, K.P.; Jain, V.M.; Kalidindi, A.; Padma, V. Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 August 2020; pp. 729–735. [Google Scholar] [CrossRef]
Mukhametzyanov, I.Z. MS-Transformation of Z-Score. In Normalization of Multidimensional Data for Multi-Criteria Decision Making Problems: Inversion, Displacement, Asymmetry; Springer International Publishing: Cham, Switzerland, 2023; pp. 151–166. [Google Scholar] [CrossRef]
Daoud, J.I. Multicollinearity and Regression Analysis. J. Phys. Conf. Ser. 2017, 949, 012009. [Google Scholar] [CrossRef]
Rengasamy, D.; Mase, J.M.; Kumar, A.; Rothwell, B.; Torres, M.T.; Alexander, M.R.; Winkler, D.A.; Figueredo, G.P. Feature importance in machine learning models: A fuzzy information fusion approach. Neurocomputing 2022, 511, 163–174. [Google Scholar] [CrossRef]
Yeturu, K. Machine learning algorithms, applications, and practices in data science. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2020; pp. 81–206. [Google Scholar] [CrossRef]
Matzavela, V.; Alepis, E. Decision tree learning through a Predictive Model for Student Academic Performance in Intelligent M-Learning environments. Comput. Educ. Artif. Intell. 2021, 2, 100035. [Google Scholar] [CrossRef]
Grillone, B.; Danov, S.; Sumper, A.; Cipriano, J.; Mor, G. A review of deterministic and data-driven methods to quantify energy efficiency savings and to predict retrofitting scenarios in buildings. Renew. Sustain. Energy Rev. 2020, 131, 110027. [Google Scholar] [CrossRef]
Ghosh, S.; Dasgupta, A.; Swetapadma, A. A Study on Support Vector Machine based Linear and Non-Linear Pattern Classification. In Proceedings of the 2019 International Conference on Intelligent Sustainable Systems (ICISS), Palladam, India, 21–22 February 2019; pp. 24–28. [Google Scholar] [CrossRef]
Ayyadevara, V.K. Gradient Boosting Machine. In Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R; Apress: Berkeley, CA, USA, 2018; pp. 117–134. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobotics 2013, 7, 21. [Google Scholar] [CrossRef]
Remesan, R.; Ahmadi, A.; Shamim, M.A.; Han, D. Effect of data time interval on real-time flood forecasting. J. Hydroinformatics 2010, 12, 396–407. [Google Scholar] [CrossRef]

Figure 1. Workflow diagram of the study.

Figure 2. Correlation Heatmap.

Figure 3. Evaluation results of feature importance analysis for MS (left) and MF (right) parameters.

Figure 4. (a) Performance of Linear Regression (LR), Decision Tree (DT), and Random Forest (RF). (b) Performance of Support Vector Machines (SVM), Gradient Boosting Machines (GBM), and Artificial Neural Network (ANN).

Figure 5. Comparisons of model performance metrics of test sets for the target parameters.

Figure 6. Comparison of Prediction Performance on Training and Testing Datasets for Target Parameters.

Table 1. Summary of data collected from published literature.

No	References	P	S.P.	P_b %	Gsb	Gmb	Va %	VMA	VFA%	MS	MF
		0.1 mm	°C	%	g/cm³	g/cm³	%	%	%	KN	mm
1	Ibrahim (2021) [25]	42	54	4–6%	2.634	2.32–2.368	3.6–7.2	14–15.5	51.7–76.8	8.7–10.75	2.8–4
2	Azarhoosh (2020) [26]	63–91	49–54	4–6%	2.49–2.58	2.151–2.263	2.64–2.74	13.72–17.76	54.5–84.2	6.9–15.1	1.83–3.55
3	Pasha & Madhuri, (2017) [27]	68	47.25	5–7%	2.63	2.237–2.261	3.16–6.15	18.54–20.44	66.8–84.5	6.26–9.23	2.2–4.5
4	Tapkın et al. (2010) [15]	55.4	48	3.5–7%	2.703	2.37–2.458	2.5–8.8	15.3–18.2	48.23–86.3	7.2–15.62	2.4–6.9
5	Zhu et al., (2020) [28]	29	60	3.8–5.8%	2.673	2.41–2.45	2.1–6.4	13.44–13.8	52.52–84.64	13.55–14.55	2.47–2.78
6	Abdel-Jaber et al., (2022) [29]	65	60	3.5–5.5%	2.52	2.117–2.167	1.24–8.48	17.75–20.46	53.7–93.9	7.24–10.19	1.54–5.01
7	Naser et al., (2022) [30]	65	60	4–6.5%	2.45	1.751–1.899	1–6.71	18.55–23.71	71.7–95.1	7.58–16.05	3.17–14.8
8	Chowdhury et al., (2023) [31]	61	49	4–6%	2.72	2.32–2.41	2–9.0	14.2–16.2	44.4–86.21	18.5–22	2.8–4.4
9	Harsha et al., (2017) [32]	62.83	49.33	5–6%	2.89	2.44–2.82	2.73–4.77	15.46–19.91	76.04–86.24	15.72–21.41	4.62–6.1
10	Pérez et al., (2012) [33]	69	48.5	4–5.5%	2.68	2.32–2.4	1.3–6.3	13.3–14.95	57.86–90.37	9.25–10.9	2.2–2.65
11	Awan et al., (2022) [4]	61–68	45–50.5	2.5–5.5	2.625–2.751	2.29–2.483	1.27–10.6	11.23–17.4	19.9–89.3	10.0–29.15	1.5–5.7

Table 2. Statistical description of the data.

	MS	MF	P	S.P.	P_b	G_sb	G_mb	Va	VMA	VFA
	KN	mm	0.1 mm	°C	%	(gm/cm³)	(gm/cm³)	%	%	%
Mean	17.253	3.087	65.9	49.07	4.112	2.65	2.352	5.344	14.57	62.95
Median	13.915	2.92	65	48.6	4	2.655	2.364	5.11	14.52	63.63
Mode	13.83	2.6	66	49	4.5	2.66	2.34	3.7	14.31	63.81
Standard Deviation	6.522	0.915	7.555	2.505	0.948	0.06	0.089	1.601	1.474	11.46
Sample Variance	42.538	0.837	57.1	6.277	0.898	0.004	0.008	2.564	2.172	131.3
Kurtosis	−1.3	36.5	9.2	6.4	0.3	2.4	10.1	0.4	4.8	0.3
Skewness	0.51	3.56	1.19	2.06	0.64	−0.76	−1.86	0.56	1.24	−0.4
Range	22.9	13.3	62	15	4.5	0.44	1.07	9.6	12.48	75.2
Minimum	6.26	1.5	29	45	2.5	2.45	1.751	1	11.22	19.89
Maximum	29.155	14.8	91	60	7	2.89	2.82	10.597	23.71	95.09

Table 3. Correlation of Input Parameters.

	Penetration	S.P.	P_b %	G_sb	G_mb	Va %	VMA %	VFA %
Penetration	1.00	−0.23	0.23	−0.40	−0.37	−0.07	0.13	0.09
S.P. °C	−0.23	1.00	0.31	−0.53	−0.55	−0.14	0.32	0.20
P_b %	0.23	0.31	1.00	−0.45	−0.39	−0.74	0.54	0.85
G_sb	−0.40	−0.53	−0.45	1.00	0.88	0.17	−0.33	−0.23
G_mb	−0.37	−0.55	−0.39	0.88	1.00	−0.01	−0.61	−0.14
Air Void %	−0.07	−0.14	−0.74	0.17	−0.01	1.00	−0.06	−0.96
VMA %	0.13	0.32	0.54	−0.33	−0.61	−0.06	1.00	0.32
VFA %	0.09	0.20	0.85	−0.23	−0.14	−0.96	0.32	1.00

Table 4. Hyperparameters and Grid Search ranges for predicting MS and MF variables.

Models	Hyperparameters	Grid Search Values Evaluated	Marshall Stability (MS)	Marshal Flow (MF)
Linear Regression (LR)	-	-	-	-
Decision Tree (DT)	Criterion	[‘friedman_mse’]	friedman_mse	friedman_mse
	Splitter	[‘best’, ‘random’]	best	best
	Max Depth	[None, 2, 6, 8, 12]	8	12
	Min Samples Split	[2, 5, 10, 20]	2	5
	Min Samples Leaf	[2, 5, 10, 20]	2	2
	min_weight_fraction_leaf	0	0	0
	Max Features	[None, ‘sqrt’, ‘log2’]	None	sqrt
	random_state	[None]	None	None
Random Forest (RT)	Max Depth	[3, 5, 7, 10]	10	10
	Min Samples Split	[2, 5, 10]	2	5
	Min Samples Leaf	[1, 2, 4]	2	2
Support Vector Machines (SVM)	Kernel	[‘linear’, ‘rbf’]	rbf	rbf
	C	[0.1, 1, 10]	10	1
	Gamma	[‘scale’, ‘auto’]	auto	scale
Gradient Boosting Machines (GBM)	n_estimators	[50, 150, 200]	200	150
	Learning Rate	[0.01, 0.1, 0.5]	0.1	0.01
	Max Depth	[3, 5, 7]	5	5
Artificial Neural Network (ANN)	Hidden Layers	[(50, 50), (100, 50), (100, 100)]	2 layers 100, 100 neurons each	2 layers 100, 50 neurons each
	Solver	adam	adam	adam
	Activation Function	[‘relu’, ‘tanh’]	relu	relu
	Max Iterations	[200, 500]	500	200

Table 5. Feature Importance Analysis for Marshall Stability.

Features	Random Forest	Permutation Importance	Lasso Regression
VMA %	0.537	0.498	2.265
Softening Point °C	0.135	0.338	1.369
G_sb	0.130	0.364	2.090
P_b %	0.078	0.169	3.347
G_mb	0.051	0.120	0.000
Air Void %	0.033	0.027	2.086
VFA %	0.021	0.009	0.000
Penetration	0.015	0.020	1.078

Table 6. Feature Importance Analysis for Marshall Flow.

Features	Random Forest	Permutation Importance	Lasso Regression
Air Void %	0.371	0.348	0.423
G_mb	0.180	0.390	0.000
VFA %	0.124	0.091	0.000
VMA %	0.089	0.096	0.000
G_sb	0.074	0.260	0.120
Softening Point °C	0.064	0.140	0.000
Penetration	0.052	0.102	0.000
P_b %	0.045	0.058	0.000

Table 7. Performance metrics for all six algorithms.

Target Variable	Dataset	Metrics	LR	DT	RF	SVM	GMB	ANN
MS	Training	MSE	16.635	0.913	0.855	8.196	0.120	2.531
		MAE	3.385	0.445	0.486	1.661	0.070	1.061
		RMSE	4.079	0.955	0.925	2.863	0.110	1.591
		R²	0.605	0.978	0.980	0.805	0.990	0.940
	Testing	MSE	20.153	6.989	2.902	9.771	3.851	4.535
		MAE	3.868	1.045	0.848	1.999	0.896	1.393
		RMSE	4.489	2.644	1.704	3.126	1.962	2.129
		R²	0.530	0.837	0.932	0.772	0.910	0.894
MF	Training	MSE	0.473	0.154	0.131	0.380	0.020	0.216
		MAE	0.458	0.136	0.120	0.287	0.033	0.279
		RMSE	0.688	0.392	0.361	0.616	0.043	0.465
		R²	0.465	0.826	0.852	0.570	0.997	0.756
	Testing	MSE	0.341	0.191	0.126	0.242	0.098	0.182
		MAE	0.433	0.238	0.202	0.290	0.159	0.314
		RMSE	0.584	0.437	0.355	0.492	0.314	0.427
		R²	0.473	0.705	0.805	0.626	0.848	0.718

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zahoor, M.F.; Hussain, A.; Khattak, A. Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures. Infrastructures 2025, 10, 142. https://doi.org/10.3390/infrastructures10060142

AMA Style

Zahoor MF, Hussain A, Khattak A. Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures. Infrastructures. 2025; 10(6):142. https://doi.org/10.3390/infrastructures10060142

Chicago/Turabian Style

Zahoor, Muhammad Farhan, Arshad Hussain, and Afaq Khattak. 2025. "Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures" Infrastructures 10, no. 6: 142. https://doi.org/10.3390/infrastructures10060142

APA Style

Zahoor, M. F., Hussain, A., & Khattak, A. (2025). Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures. Infrastructures, 10(6), 142. https://doi.org/10.3390/infrastructures10060142

Article Menu

Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures

Abstract

1. Introduction

2. Methodology

2.1. Dataset

2.2. Data Scaling

2.3. Correlation Heatmap

2.4. Feature Importance Analysis

2.5. Algorithms

2.5.1. Linear Regression (LR)

2.5.2. Decision Tree Regressor (DT)

2.5.3. Random Forest Regressor (RF)

2.5.4. Support Vector Machines (SVM)

2.5.5. Gradient Boosting Machines (GBM):

2.5.6. ANN

2.6. Model Performance Assessment

3. Results and Analysis

3.1. Feature Importance Analysis

3.1.1. Marshall Stability (MS)

3.1.2. Marshall Flow (MF)

3.2. Machine Learning Models

3.2.1. Linear Regression (LR)

3.2.2. Decision Tree (DT)

3.2.3. Random Forest (RF)

3.2.4. Support Vector Machines (SVM)

3.2.5. Gradient Boosting Machines (GBM)

3.2.6. Artificial Neural Network (ANN)

3.3. Comparison of Model’s Performance

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI