Next Article in Journal
A Hybrid Model of Elephant and Moran Random Walks: Exact Distribution and Symmetry Properties
Previous Article in Journal
The General Property of the Tensor Gravitational Memory Effect in Theories of Gravity: The Linearized Case
Previous Article in Special Issue
Optimized Scheduling for Multi-Drop Vehicle–Drone Collaboration with Delivery Constraints Using Large Language Models and Genetic Algorithms with Symmetry Principles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection

1
Jiangxi Ganyue Expressway Co., Ltd., Nanchang 330025, China
2
Jiangxi Provincial Key Laboratory of Pavement Performance Evolution and Life Extension of Highway Subgrade, Nanchang 330038, China
3
College of Transport & Communications, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(10), 1708; https://doi.org/10.3390/sym17101708
Submission received: 16 September 2025 / Revised: 30 September 2025 / Accepted: 4 October 2025 / Published: 11 October 2025

Abstract

Skid resistance is a key factor in road safety, directly affecting vehicle stability and braking efficiency. To enhance predictive accuracy, this study develops a multilayer perceptron (MLP) model for forecasting the Sideway Force Coefficient (SFC) of asphalt pavements and systematically examines the role of activation functions and optimizers. Seven activation functions (Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Mish, Swish) and three optimizers (SGD, RMSprop, Adam) are evaluated using regression metrics (MSE, RMSE, MAE, R2) and loss-curve analysis. Results show that ReLU and Mish provide notable improvements over Sigmoid, with ReLU increasing goodness of fit and accuracy by 13–15%, and Mish further enhancing nonlinear modeling by 12–14%. For optimizers, Adam achieves approximately 18% better performance than SGD, offering faster convergence, higher accuracy, and stronger stability, while RMSprop shows moderate performance. The findings suggest that combining ReLU or Mish with Adam yields highly precise and robust predictions under multi-source heterogeneous inputs. This study offers a reliable methodological reference for intelligent pavement condition monitoring and supports safety management in highway transportation systems.

1. Introduction

Skid resistance is a critical indicator of road safety, directly influencing vehicle stability and braking efficiency. Its importance becomes particularly evident under adverse weather conditions such as rainfall or snow, when insufficient skid resistance significantly increases accident risk [1,2]. Therefore, developing accurate prediction models is essential for enhancing traffic safety management and reducing crash likelihood [3,4].
Current measurement techniques, including Sideway Force Coefficient (SFC) testing, Pendulum Friction Number (BPN), and skid testers [5,6], provide reliable data but suffer from practical limitations. High costs, low efficiency, and restricted spatial and temporal coverage make large-scale and continuous monitoring infeasible [7,8,9]. These constraints underscore the need for predictive modeling approaches capable of utilizing available detection data to achieve broader applicability in real-world highway networks [10,11].
Traditional prediction methods, such as linear regression, ridge regression, and other statistical models, exhibit limited ability to capture nonlinear relationships inherent in pavement–environment–traffic interactions [12,13]. With the advancement of machine learning and deep learning techniques, models such as Multilayer Perceptron (MLP), Support Vector Machine (SVM), and Random Forest (RF) have demonstrated superior flexibility, nonlinear representation, and generalization ability, offering promising alternatives for pavement performance prediction [14,15,16,17,18,19]. Meanwhile, with the growing emphasis on intelligent transportation and pavement management systems, especially for expressways, machine learning methods are increasingly needed to support automated detection and intelligent analysis [20,21].
Despite these advances, the predictive performance of machine learning models remains highly sensitive to hyperparameter selection, particularly activation functions and optimizers. These parameters strongly affect model convergence, stability, and accuracy, yet systematic investigations into their roles in skid resistance prediction are scarce. Thus, the present study develops an MLP-based framework with a comprehensive comparison of seven activation functions and three optimizers, aiming to identify optimal configurations and provide practical insights for the development of robust and accurate skid resistance prediction models.
In this study, the multilayer perceptron (MLP) was selected as the baseline model for predicting the skid resistance of asphalt pavements. While other classical machine learning models, such as Random Forest (RF) [22,23], Support Vector Regression (SVR) [24,25], Convolutional Neural Networks (CNNs) [18,26], and Long Short-Term Memory networks (LSTM) [27,28] have been successfully applied to pavement performance prediction [29,30,31], the primary objective of this research is to systematically evaluate the effects of activation functions and optimizers within a controlled neural network framework. Compared with tree-based or kernel-based models, MLP provides a flexible and widely adopted architecture for regression tasks, while avoiding structural heterogeneity that could confound the analysis of activation–optimizer impacts [22,32,33]. Therefore, focusing on MLP ensures that the comparative results are attributable to activation functions and optimizers rather than to differences in model structures.
In summary, this paper develops an MLP-based prediction model for asphalt pavement skid resistance and examines the influence of seven activation functions (Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Mish, and Swish) and three optimizers (SGD, RMSprop, and Adam) on model performance. The results aim to enhance predictive accuracy and robustness, thereby supporting pavement condition monitoring and early-warning applications in intelligent transportation systems. Future research may further extend this framework by incorporating cross-model comparisons to strengthen the generalizability of the findings.

2. Overview of MLP Neural Network and Key Hyperparameters

2.1. Multilayer Perceptron (MLP)

The Multilayer Perceptron (MLP) is a typical feedforward neural network consisting of an input layer, one or more hidden layers, and an output layer. It is widely used in supervised learning tasks such as classification and regression. Compared to traditional linear models, MLP has the ability to model complex nonlinear relationships and effectively capture deep features in the data [34].
The development of MLP can be traced back to the perception model proposed by Rosenblatt in 1958, which simulates the decision-making mechanism of biological neurons by combining the input with the weighted sum and applying an activation function for pattern recognition [35]. However, a single-layer perception has obvious limitations when dealing with nonlinear separable problems. This deficiency was addressed in the 1980s with the introduction of multilayer structures and the Backpropagation algorithm. The Backpropagation algorithm, proposed by Rumelhart et al. in 1986, significantly improved the effectiveness of neural networks in modeling nonlinear problems, making MLP one of the foundational structures for the development of deep learning [36].
The core advantage of MLP lies in its deep structure, which allows for the extraction of high-order features from data in a layer-by-layer fashion. Through nonlinear mapping, it transforms the input space into a feature space that is easier to separate. In the forward propagation process, data passes sequentially from the input layer through the hidden layers to the output layer. During training, the Backpropagation algorithm iteratively optimizes the weights of each layer through error backpropagation and gradient descent, effectively improving the model’s fitting ability and generalization performance [37].
In the construction of MLP, the choice of activation function plays a key role in model performance. The activation function introduces nonlinear transformations, which are fundamental for the network’s ability to learn complex mappings. Without an activation function, the layers of the network would only form linear combinations, preventing the network from achieving nonlinear modeling. Different types of activation functions (such as Sigmoid, Tanh, ReLU, and their variants) significantly impact the network’s convergence speed, training stability, and prediction accuracy. Therefore, it is essential to choose the activation function that best matches the task characteristics during the modeling process.

2.2. Overview of Activation Functions

In neural networks, activation functions apply nonlinear transformations that shape the output of neurons, enabling the model to capture complex, nonlinear relationships. They play a crucial role in influencing convergence speed, training dynamics, and overall model stability and performance [38]. With the rapid progress of deep learning, activation functions have evolved from traditional forms to more expressive and adaptive variants. Commonly used functions include Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Swish, and Mish.
The Sigmoid function is one of the earliest activation functions, originally introduced in the perceptron model for binary classification tasks [38]. It maps real-valued inputs to the interval (0, 1), allowing outputs to be interpreted as probabilities, which makes it especially suitable for binary classification. Its mathematical form is given by Equation (1).
S i g m o i d x = 1 1 + e x
When the input to the Sigmoid function is a large positive value, its output approaches 1; conversely, for large negative inputs, the output approaches 0. Despite its early popularity in neural networks, Sigmoid suffers from the vanishing gradient problem during backpropagation—particularly when inputs have large magnitudes—due to its near-zero derivatives in these regions. This limitation hampers its effectiveness in deep architectures.
The Tanh function can be viewed as an improved alternative to Sigmoid. It outputs values in the range [−1, 1], providing zero-centered activation and better symmetry. Compared to Sigmoid, Tanh’s broader output range facilitates faster convergence during training. Its mathematical expression is Equation (2).
Tan h x = e x e x e x + e x
Because its output spans both positive and negative values, the Tanh function is better suited for representing inputs with bipolar characteristics. However, like Sigmoid, it remains vulnerable to the vanishing gradient problem in deep networks, which limits its effectiveness in training very deep models.
The Rectified Linear Unit (ReLU), introduced by Glorot et al. [39], is among the most commonly used activation functions in deep learning. Its mathematical definition is provided in Equation (3). ReLU effectively addresses the vanishing gradient problem by maintaining a constant gradient of 1 for positive inputs, thereby enhancing training efficiency in deep architectures. However, for negative inputs, the output is zero, which may lead to the “dying neuron” problem—where certain neurons become permanently inactive and cease to update their weights during training:
R e L U x = m a x 0 , x
Leaky ReLU is an improved version of ReLU, designed to mitigate the “neuron death” problem, introduced by Maas et al. [40]. Unlike ReLU, Leaky ReLU introduces a small slope in the negative region instead of always being zero. Its definition is as follows:
L e a k y R e L U x = x , x > 0 α x , x 0
Here, α is a small constant, typically set to 0.01. By maintaining a non-zero gradient in the negative region, Leaky ReLU reduces the risk of neuron deactivation, thereby improving the robustness of the network.
The Exponential Linear Unit (ELU), proposed by Clevert et al. [41], is a variant of ReLU designed to address the “dying neuron” problem. By introducing an exponential decay in the negative input region, ELU allows small negative outputs instead of zero, enabling neurons to remain active during training. This modification not only improves model robustness but also facilitates faster convergence. The function is defined as follows:
E L U x = x , x > 0 α e x 1 , x 0
ELU provides smooth nonlinear transformation in the negative range, retaining the advantages of ReLU in the positive range while enhancing the gradient propagation ability in the negative range. This results in faster network convergence and improved stability.
Swish is a novel activation function proposed by Ramachandran et al. [42]. It is defined as the product of the input and the Sigmoid function:
S w i s h x = x · 1 1 + e x = x · S i g m o i d x
The notable feature of Swish is its smoothness and non-monotonicity, which gives it excellent expressive power in deep neural networks. Compared to ReLU, Swish allows for smoother gradient propagation, resulting in better training outcomes and model performance in certain tasks.
Mish is a novel activation function introduced by Misra [43]. By integrating the benefits of ReLU and Swish, Mish facilitates improved gradient flow through an adaptive nonlinear formulation, thereby enhancing the training efficiency of deep neural networks. Its mathematical definition is as follows:
M i s h x = x · t a n h ln 1 + e x
Compared to ReLU, Mish produces smooth, non-zero outputs for negative inputs, effectively mitigating the “dying neuron” issue. Relative to Swish, Mish exhibits stronger nonlinearity and expressiveness. Empirical studies have demonstrated that Mish outperforms both ReLU and Swish across various deep learning tasks, particularly in terms of gradient stability, convergence speed, and generalization performance.
The selection of activation functions is critical to the training efficiency and overall performance of neural networks. From early functions such as Sigmoid to more recent developments like Mish, each iteration has driven the advancement of deep learning architectures. Foundational functions—including Sigmoid, Tanh, and ReLU—enabled the initial breakthroughs in deep networks, while newer alternatives such as ELU, Swish, and Mish have demonstrated improved capability in mitigating issues like vanishing gradients and inactive neurons. These enhancements contribute to more effective training and better generalization. Therefore, in practice, activation functions should be chosen based on the characteristics of the specific task and network architecture to optimize learning efficiency and model performance.
Notation: In all activation function equations (Equations (1)–(7)), x denotes the scalar input to the activation function, typically the weighted sum of a neuron’s inputs plus bias. α is a constant controlling the slope or scaling factor in the negative input range (commonly set to 0.01 for Leaky ReLU and 1.0 for ELU, but adjustable in practice). “ln” denotes the natural logarithm, and e represents Euler’s number (~2.71828). In Equation (6), Sigmoid(x) refers to the function defined in Equation (1).

2.3. Overview of Typical Optimizer

In neural networks, activation functions introduce nonlinearity into layer outputs, enabling the model to capture complex patterns and dependencies. While activation functions enhance the network’s expressive capacity, the optimization of model parameters is primarily governed by the optimizer. Through backpropagation, the optimizer updates weights and biases based on gradients derived from the loss function, aiming to minimize prediction errors. The choice of optimizer and its hyperparameters plays a crucial role in determining training efficiency and overall model performance.
Stochastic Gradient Descent (SGD) is a widely adopted optimization algorithm [44]. Unlike batch gradient descent, which computes gradients over the entire dataset, SGD updates model parameters using the gradient from a single sample per iteration. This approach significantly enhances computational efficiency, particularly for large-scale datasets, albeit at the cost of increased variance in the updates. The parameter update rule for SGD is given by
θ t + 1 = θ t η θ J θ t
where θ t represents the parameters, η is the learning rate, and θ J θ t is the gradient of the loss function J θ with respect to the parameters, and t denotes the iteration step index in the parameter update process.
Root Mean Square Propagation (RMSprop) is an adaptive learning rate optimization algorithm that adjusts the step size by averaging the squared gradients of each parameter. This mechanism helps stabilize learning rates during training [45]. RMSprop is especially effective for optimizing non-stationary objective functions, such as those encountered in training Recurrent Neural Networks (RNNs). Its update rule is given by:
g t = θ J θ t
v t = β v t 1 + 1 β g t 2
θ t + 1 = θ t η v t + ϵ g t
where g t represents the current gradient, v t is the exponentially weighted average of the squared gradients, η is the learning rate, β is the decay factor, typically set to 0.9, and ϵ is a small constant to prevent division by zero.
Adaptive Moment Estimation (Adam), proposed by Kingma and Ba [46], is an adaptive optimization algorithm that integrates the benefits of momentum and RMSprop. It dynamically adjusts the learning rate for each parameter by estimating the first moment (mean) and second moment (uncentered variance) of the gradients, thereby enhancing optimization efficiency and stability. Adam is well-suited for large-scale datasets and high-dimensional parameter spaces, making it one of the most widely adopted optimizers in deep learning. Its update rule is as follows:
m t = β 1 m t 1 + ( 1 β 1 ) g t
v t = β 2 v t 1 + ( 1 β 2 ) g t 2
m ^ t = m t 1 β 1 t
v ^ t = v t 1 β 2 t
θ t + 1 = θ t η v ^ t + ϵ m ^ t
where m t is the first moment estimate (momentum term), v t is the second moment estimate (the exponentially weighted average of the squared gradients) β 1 and β 2 are the decay factors for the momentum term and the second moment, typically set to 0.9 and 0.999, respectively. m ^ t and v ^ t are the bias-corrected estimates, and ϵ is a small constant to prevent division by zero.

3. Performance Evaluation of Asphalt Pavement Skid Resistance Prediction Model Based on MLP Neural Network

3.1. Data Source

This study utilized a dataset of skid resistance measurements spanning six years, collected from expressways in Jiangxi Province, China. The field data were obtained from the Jiangxi Provincial Highway Engineering Testing Center, an accredited institution responsible for routine pavement performance monitoring and evaluation. All measurements were conducted in accordance with the Field Test Methods of Highway Subgrade and Pavement (JTG 3450-2019) [47] and the Standards for Quality Inspection and Evaluation of Highway Engineering (Part I: Civil Engineering) (JTG F80/1-2017) [48]. Specifically, the dual-wheel sideway force coefficient (SFC) testing method was employed to perform continuous monitoring along all lanes of each highway, which is widely recognized as a reliable and standardized indicator of pavement skid resistance.
The use of standardized instruments and protocols guarantees the consistency, reliability, and comparability of the collected data across different highway sections. The dataset encompasses six representative highways: Tongwan Expressway, Ningding Expressway (Anding Section), Dongchang Expressway, Xiuping Expressway, Shangwan Expressway, and Ning’an Expressway. All selected highways are four-lane facilities paved with AC-13 asphalt mixtures, thereby ensuring a relatively uniform structural condition for analysis. Basic descriptive information for these highways is summarized in Table 1.
In total, 4222 valid records were compiled, covering both traffic directions. Each record integrates the effective SFC measurement results with contextual information, including the opening date of the highway, pavement surface temperature at the time of testing, actual testing speed, and surveyed traffic volumes for each expressway. These multi-year, multi-section records provide a representative and diverse empirical basis for model training and evaluation.

3.2. Key External Feature Variables

The friction coefficient and skid resistance of asphalt pavement surfaces are influenced by a range of factors, including material properties, environmental conditions, traffic loads, and pavement structure. These factors are generally categorized into intrinsic and external factors. Intrinsic factors include asphalt material characteristics, pavement structure, mixture types, and mix proportions. External factors encompass environmental conditions—such as temperature, rainfall, and humidity—as well as traffic-related variables like vehicle speed, traffic volume, and axle loads. In this study, the six expressways in Jiangxi Province were constructed in accordance with the same national and local design standards, using identical asphalt materials and similar construction techniques. As a result, construction quality and material properties are assumed to be consistent across the sites. Therefore, the analysis primarily focuses on key external factors affecting skid resistance performance.
Based on prior research, average annual precipitation and traffic volume are selected as key variables to capture the influence of climatic conditions and traffic loads on pavement skid resistance. Average annual precipitation serves as an indicator of pavement wetness across regions, while traffic volume, represented by Average Annual Daily Traffic (AADT) and annual cumulative traffic, reflects the long-term impact of vehicular loading on surface wear.
In addition, test vehicle speed and ambient temperature at the time of measurement are included in the analysis. Since frictional response varies with speed, the average test speed at each measurement point is recorded to improve the comparability of sideway force coefficient (SFC) values. Environmental temperature also significantly influences asphalt performance. High temperatures may soften the pavement, whereas low temperatures can cause brittleness, both of which affect friction characteristics.
The duration of pavement service is another important factor. As operational years increase, asphalt surfaces undergo aging due to traffic loads and environmental exposure, leading to a gradual decline in skid resistance. Therefore, the number of years since road opening is used as a key indicator of long-term performance. To calibrate the model and assess deterioration patterns, the initial SFC measured during acceptance testing is used as a baseline reference.
Therefore, these seven external variables were selected as they comprehensively capture the dominant environmental, traffic, measurement, and temporal influences on pavement skid resistance, while intrinsic material-related factors were controlled across the studied expressways. However, it is acknowledged that despite the use of identical asphalt binders, aggregates, mix proportions, and construction techniques, variations in pavement performance may still exist. Hence, future studies will expand the range of feature variables to further enhance the predictive performance of the model. In addition, the current SFC is employed as the output variable to quantify pavement skid resistance. Higher SFC values indicate better frictional performance and reduced skid risk, while lower values reflect diminished skid resistance.

3.3. Analysis of Feature Variable Correlation

To better understand the relationship between the external feature variables and the target variable, this study conducts a correlation analysis using Pearson’s correlation coefficient. This statistical metric quantifies the strength and direction of a linear relationship between two variables, with values ranging from −1 to 1. A coefficient near 1 indicates a strong positive correlation, a value near −1 indicates a strong negative correlation, and a value close to 0 suggests little or no linear relationship. Figure 1 shows the heatmap of the correlation matrix between each feature variable and the target variable.
The correlation analysis results show that some feature variables exhibit significant linear relationships with the SFC. The correlation coefficient between cumulative traffic volume (Annual Vol.) and SFC is −0.62, indicating that the long-term accumulation of traffic load significantly weakens the skid resistance of the pavement, which aligns with the engineering principles of pavement wear and structural degradation. The average annual daily traffic (AADT) is also moderately negatively correlated with SFC (r = −0.43), reflecting the short-term impact of traffic density on friction performance. Road opening time (OT) is negatively correlated with SFC (r = −0.40), suggesting that newly constructed roads typically have better initial skid resistance. Meanwhile, the initial sideway force coefficient (SFC@DT) is positively correlated with the current SFC (r = 0.40), indicating that sections with better initial performance are more likely to maintain good conditions over time, demonstrating some degree of performance continuity. Although the correlation between average testing speed (Avg Speed), pavement temperature (Avg Temp), and annual average rainfall (Avg Rainfall) with SFC is weak (r = 0.13, −0.13, and 0.06, respectively), these variables may still affect skid resistance through indirect mechanisms under specific conditions, such as extreme weather or particular traffic conditions, thus retaining their relevance. Further analysis of the correlations between the variables reveals strong multicollinearity between certain input variables. For instance, the correlation coefficient between AADT and Annual Vol. is 0.79, indicating a high degree of information overlap, which suggests the need to address multicollinearity issues when modeling. The correlation between OT and Annual Vol. is 0.54, potentially reflecting the higher traffic load on older sections. Additionally, Avg Speed and Avg Temp exhibit a significant negative correlation (r = −0.65), suggesting that testing speed may be influenced by temperature conditions, especially in hot regions or periods.
Although some key external feature variables exhibit relatively weak linear correlations with SFC, and potential multicollinearity and interaction effects exist among certain variables, there are several reasons for retaining all feature variables in the subsequent modeling and analysis using the MLP neural network. First, neural networks have strong nonlinear fitting capabilities, allowing them to capture complex nonlinear relationships and higher-order interactions among variables. Second, variables that appear weakly correlated in a linear context may still contribute to predictive performance when combined with other features during training. Third, from an engineering perspective, all selected variables possess clear physical meaning and explanatory value. Including them can enhance the model’s generalization and practical applicability. Therefore, to ensure the completeness and robustness of the modeling process, all seven key external feature variables are retained in the construction of the predictive model.

3.4. Normalization and Data Splitting

Due to the considerable differences in the value ranges of the feature variables, normalization is applied to both the feature and target values to prevent bias during model training caused by inconsistent data scales. This preprocessing step ensures that all variables contribute equally to the training process and prevents those with larger magnitudes from disproportionately influencing the model. The normalization formula is presented in Equation (17).
X n o r m a l i z e d = X X m i n X m a x X m i n
Here, X represents the original value, X m i n and X m a x represent the minimum and maximum values of the variable, respectively.
After normalization, the data is divided into a training set (3378 samples) and a test set (844 samples) with an 80:20 ratio (see Table 2). The training set is used for model learning, while the test set is used to evaluate the model’s generalization ability, ensuring the objectivity and reliability of the evaluation results.

3.5. Model Structure and Preliminary Hyperparameter Settings

To predict the skid resistance of asphalt pavement, this study develops a regression model based on an MLP. The network architecture comprises an input layer, two hidden layers, and an output layer. Each hidden layer contains 64 neurons, which provides a balance between feature extraction capability and computational efficiency (see Figure 2). The output layer consists of a single neuron with a linear activation function, suitable for continuous variable prediction.
The model is trained using the Mean Squared Error (MSE) as the loss function, with the Adam optimizer employed for parameter updates. The learning rate is set to 0.001, the batch size is 32, and the number of training epochs is 200. Detailed parameter settings are provided in Table 3. To enhance training efficiency and stability, both the input features and target values are normalized prior to training.
Using a unified network architecture and fixed hyperparameters, this study evaluates model performance from two perspectives. First, five activation functions (ReLU, Leaky ReLU, ELU, Swish, and Mish) are compared under the same network configuration to assess their effects on convergence speed and prediction accuracy. Second, based on the optimal activation function identified, three commonly used optimizers (SGD, RMSprop, and Adam) are further evaluated in terms of training efficiency and generalization performance. These comparisons provide empirical support for the informed selection of activation functions and optimizers in asphalt pavement skid resistance modeling.

3.6. Evaluation Metrics

The trained MLP models were evaluated on the test dataset, which was not used during training to ensure an unbiased assessment of generalization capability. Predictive performance was quantified using four commonly applied regression metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). In addition, graphical analyses including residual plots and scatter plots of predicted versus actual values were examined to visually assess the goodness of fit, error distribution, and potential bias.
MSE quantifies the average squared difference between predicted and actual values. It reflects the overall magnitude of prediction errors and is particularly sensitive to large deviations. The definition of MSE is as follows:
M S E = 1 n i = 1 n y i y ^ i 2
where y i is the actual value, y ^ i is the predicted value, and n is the total number of samples. A smaller M S E indicates a better fit.
RMSE is the square root of the MSE, which restores error to the same scale as the original data, making it more interpretable. Although RMSE remains sensitive to large errors, its intuitive meaning makes it a widely used evaluation metric in engineering applications. The calculation formula is given in Equation (19), where the symbols have the same definitions as in Equation (18).
R M S E = 1 n i = 1 n y i y ^ i 2
MAE measures the average absolute difference between predicted and actual values. Compared to MSE and RMSE, it is less sensitive to outliers and offers greater robustness. The calculation formula is given in Equation (20), where the symbols have the same definitions as in Equation (18). A lower MAE indicates reduced overall prediction bias and is particularly suitable for scenarios with relatively uniform error distributions.
M A E = 1 n i = 1 n y i y ^ i
The Coefficient of Determination measures the model’s ability to explain the variation in the target variable. Its value ranges from [0, 1], with values closer to 1 indicating better fit. It is defined as follows:
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2
where y ¯ is the mean of the actual values. The remaining symbols follow the same definitions as in Equation (18). R 2 = 1 indicates that the model perfectly fits the data, while R 2 = 0 indicates that the model has no explanatory power. It is important to note that R2 does not reflect whether the model is overfitting and should be used in conjunction with other metrics for a comprehensive assessment.

4. Results and Discussion

4.1. Analysis of Model Performance with Different Activation Functions

4.1.1. Analysis of Model Performance Results

The four evaluation metrics described above are used to comprehensively assess the predictive performance of MLP models employing different activation functions. To maintain consistency in interpretation where lower values indicate better performance, the coefficient of determination (R2) is presented as 1-R2. Figure 3 illustrates the performance of the MLP models with various activation functions in the SFC prediction task, based on the same dataset and network architecture.
The results indicate that the ReLU activation function performs the best overall. The model achieves an R2 of 0.86, with MSE, RMSE, and MAE values of 16.31, 4.04, and 3.14, respectively, all of which are the best among the tested activation functions. Additionally, it forms the smallest enclosing area on the radar chart, indicating good predictive accuracy and stability. ReLU maintains a linear response in the positive interval and suppresses weak signals in the negative interval, effectively preventing overfitting. It demonstrates strong robustness when handling features with significant noise, such as temperature disturbances and water film thickness.
Mish, Leaky ReLU, and ELU constitute the second tier in performance. Each achieves an R2 of 0.84, with MSE ranging from 19.48 to 22.20. Their RMSE and MAE values are close to those of ReLU, indicating good nonlinear expression capabilities. In particular, Mish’s smooth and continuously differentiable structure helps capture complex responses under local disturbances, making it suitable for prediction tasks like SFC, which are influenced by both macroclimate and microstructure.
Swish shows moderate performance, with RMSE of 4.82, MAE of 3.77, and R2 of 0.79. Despite its theoretical expression, its weaker activation in the negative input range likely reduces sensitivity to features such as low temperature and traffic volume, limiting its ability to detect subtle variations.
Traditional functions, Sigmoid and Tanh, show the poorest results. Sigmoid yields an R2 of 0.76 and MSE, RMSE, and MAE values of 28.64, 5.35, and 4.21, respectively. Its gradient saturation impairs training efficiency and restricts effective feature mapping. Tanh performs slightly better with an R2 of 0.83 but remains limited by its output range [−1, 1], which is less suitable for regression tasks involving large numerical fluctuations like SFC.
Compared to Sigmoid, ReLU improves goodness of fit (R2) and accuracy (MSE, RMSE) by 13–15%, highlighting its strength in nonlinear modeling and feature extraction. Mish achieves 12–14% improvement over Sigmoid, particularly excelling in MSE and RMSE, enhancing adaptability to complex data. Leaky ReLU also improves by 13–15%, introducing a non-zero gradient in the negative domain to avoid neuron death, thus boosting training stability and predictive performance. Its performance is comparable to ReLU but slightly behind Mish. ELU shows modest gains—10–12% improvement over Sigmoid—through exponential decay in the negative region, though it slightly underperforms compared to ReLU and Mish.
In summary, ReLU delivers the best results in SFC regression and is recommended as the preferred activation function. Mish, Leaky ReLU, and ELU offer enhanced nonlinear modeling for scenarios with complex features and local disturbances. Swish’s effectiveness varies with feature distribution and requires task-specific evaluation. Traditional functions like Sigmoid and Tanh are unsuitable for this regression context. This study suggests prioritizing activation functions with simple structures and stable convergence for skid resistance prediction, while considering feature distribution and model depth for comprehensive optimization.
Figure 4 illustrates the fitting performance of the MLP models with different activation functions in the SFC prediction task. The analysis focuses on the angle between the weighted regression line and the ideal reference line (y = x), along with the distribution of scatter points, to evaluate the prediction accuracy across different value intervals.
As shown in Figure 4a, the Tanh activation function yields an angle of 1.22°, representing moderate performance among all functions. The model performs well in the low SFC range (20–50), where scatter points are densely distributed and closely aligned with the ideal line. However, in the higher value range (>60), the prediction deviation increases, suggesting reduced fitting capability for larger values. In Figure 4b, the ReLU activation function achieves a smaller angle of 0.94°, indicating better overall fitting. The model shows good accuracy in the low-value range but begins to exhibit deviations when the SFC exceeds 70, reflecting a decline in prediction accuracy at higher values. Figure 4c shows the results for the Sigmoid activation function, which has the largest angle of 2.04° among all activation functions. While the model retains some accuracy in the lower range, significant underestimation occurs when SFC exceeds 80, revealing a clear underfitting issue and limiting its applicability in high-precision regression tasks. In contrast, Figure 4d demonstrates that the Mish activation function delivers a great performance, with an angle of just 0.74°. The scatter points are closely aligned with the ideal line across the entire value range, with no evident outliers. This indicates superior nonlinear modeling capacity and high prediction accuracy, making Mish highly suitable for practical applications requiring reliable performance. Figure 4e shows that the Swish activation function has an angle of 1.31°. While its performance in the low-value range (20–50) is acceptable, some deviations emerge at higher SFC values. This suggests a slight decline in accuracy as the target variable increases. In Figure 4f, the Leaky ReLU activation function achieves the best overall fitting stability, with the smallest angle of 0.45°. The scatter points are consistently concentrated near the ideal line throughout the full value range, and no significant outliers are observed. This indicates strong fitting reliability and minimal error fluctuation. Finally, Figure 4g presents the ELU activation function, which shows an angle of 1.39°. Although it performs well in the low-value range, substantial deviations occur when SFC exceeds 80, highlighting a decline in predictive accuracy under high-value conditions.
In summary, Leaky ReLU and Mish activation functions demonstrate the best overall performance in modeling asphalt pavement skid resistance. They provide both high accuracy and robust fitting across the entire value range, making them the recommended choices for such prediction tasks. ReLU and Swish are suitable alternatives, particularly effective in the low-value range and appropriate for applications with moderate tolerance for error. In contrast, Sigmoid and ELU are less suitable due to their poor responsiveness at high values and issues related to gradient saturation.

4.1.2. Comparison and Analysis of Loss Curves

This study further examines the evolution of the loss curves for models using different activation functions over training epochs. Figure 5 presents the loss trajectories of MLP models on both training and test sets, offering insight into the efficiency and stability of each activation function during model optimization.
The training loss curves show that ReLU and Leaky ReLU perform notably well. Both start with relatively low initial losses and converge rapidly within the first 10 epochs, stabilizing around 0.0045. These functions effectively mitigate the vanishing gradient problem. In particular, Leaky ReLU introduces a small gradient in the negative input range, preventing complete neuron deactivation. As a result, the model is able to capture both linear and nonlinear feature correlations efficiently, even with shallow architecture, thereby accelerating convergence.
Mish and Swish exhibit a slower but steady convergence pattern. Although their final losses also fall within the 0.0045 to 0.005 range, their early training progress is more gradual. Mish benefits from a smooth and continuous response in the negative input range and non-monotonic characteristics, which support more stable gradient flow during mid-stage training. This helps offset its higher initial loss, which starts around 0.03. Swish, through its self-gating mechanism, balances nonlinearity with information transmission efficiency. Its loss curve remains relatively smooth throughout training, indicating good numerical stability.
The ELU activation function shows rapid convergence in the initial phase, reaching a loss of approximately 0.006 within the first 20 epochs. However, it experiences noticeable fluctuations in later stages. This suggests high sensitivity to hyperparameters such as learning rate and weight initialization. Its exponential gradient decay in the negative domain may lead to oscillations in deeper networks, undermining training stability. The Tanh activation function shows a relatively smooth and consistent decline on the training sets. While its initial loss is slightly higher than that of ReLU and Leaky ReLU, it steadily decreases and eventually stabilizes around 0.005 on the training set, showing minimal oscillations. This suggests that the Tanh function maintains relatively good convergence behavior and numerical stability throughout training.
The Sigmoid function shows the poorest performance. The loss remains high throughout the training process and exhibits considerable fluctuations, particularly on the test set. Its final training loss settles above 0.007, and the lack of consistent decline indicates weak optimization ability. This is largely attributed to gradient saturation in both tails of the Sigmoid curve, which slows convergence and impedes effective weight updates.
The test loss curves provide further insight into the model’s generalization ability to unseen data. Both ReLU and Leaky ReLU exhibit stable loss values in the range of 0.0048 to 0.0052, with smooth trends and minimal fluctuations. This indicates strong generalization performance and suggests that the models have not overfitted to the training data. Their effectiveness can be attributed to the “sparse activation” mechanism inherent in the ReLU family, which suppresses irrelevant neuron activations and enables the network to focus on critical features, thereby enhancing generalizability.
Mish and Swish also demonstrate favorable performance on the test set. The loss curves for Mish and Swish stabilize at approximately 0.0047 and 0.0048, respectively. These results benefit from their structural adaptability to complex data distributions. The non-monotonic nature of Mish allows for richer feature representations, while Swish’s self-gating mechanism facilitates dynamic information modulation, enabling the model to maintain stable outputs in scenarios influenced by environmental variability and feature nonlinearity—such as skid resistance prediction.
The ELU function, with a test loss fluctuating between 0.0051 and 0.0053, shows less stability. Periodic oscillations in its curve indicate a lower tolerance to noisy inputs. Its exponential gradient in the negative region can amplify error propagation, particularly in environments with high input variability, reducing its effectiveness in maintaining consistent generalization performance.
The Tanh activation function performs comparably to Mish, Swish, and ELU on the test set. Its loss curve shows steady decline and stabilizes near 0.005, with limited oscillation, indicating reasonably strong generalization. While Tanh may suffer from gradient saturation in deep networks, in this specific regression task with moderate depth, its symmetric output range supports effective learning of both positive and negative feature patterns. Therefore, Tanh remains a viable option when model complexity is properly controlled.
Sigmoid exhibits the weakest generalization among all tested functions. Its test loss remains above 0.007 and shows more pronounced fluctuations, indicating difficulties in adapting to nonlinear and high-dimensional feature distributions. The limited output range and early gradient saturation prevent effective learning in deeper models, making Sigmoid poorly suited for complex regression tasks.
In deep learning-based skid resistance prediction, activation function selection should balance nonlinear representation, gradient flow stability, and robustness to noise. ReLU and Leaky ReLU offer fast convergence and strong generalization, making them the preferred choice for accurate and stable modeling. Mish and Swish also perform well, providing richer activation dynamics through non-monotonic structures and adaptive gating. They are suitable for capturing complex feature patterns and can be considered in scenarios requiring enhanced expressiveness. Tanh, though less powerful than the above functions, shows stable convergence and reasonable generalization in shallow networks. Its symmetric output range supports learning of both positive and negative features, making it a viable option under controlled model complexity. In contrast, Sigmoid performs poorly due to gradient saturation and limited output range, failing to adapt to high-dimensional, nonlinear data. It is not recommended for use in regression tasks with substantial feature variability.

4.2. Analysis of Model Performance Results with Different Optimizers

4.2.1. Analysis of Model Evaluation Results Based on Mish

This study constructs MLP neural network models for predicting the skid resistance of asphalt pavement using Mish and ReLU as activation functions, and Adam, RMSprop, and SGD as optimizers. The aim is to explore the impact of different optimizers on model performance.
Figure 6 compares the performance of the model using the Mish activation function under three different optimizers. As shown in Figure 6a, the term “predicted” refers to the SFC output by the trained MLP model based on the input features, whereas “measured” denotes the measured SFC. Both the horizontal and vertical axes represent dimensionless quantities. As we can see, Adam demonstrates the best overall performance, indicating that Adam effectively captures the underlying patterns in the data. While slight deviations exist in certain intervals, the overall prediction error remains low. The residual plot further supports this conclusion. Positive and negative residuals are evenly distributed around zero, with no evident bias or pattern, suggesting that the model does not suffer from underfitting or overfitting. Ideally, residuals should be randomly scattered with minimal trend, which is well achieved here. The error distribution approximates a normal curve, with most residuals concentrated near zero, indicating high prediction accuracy and a reasonable standard deviation. This reflects the model’s strong generalization and fitting capacity. Evaluation metrics further confirm the model’s performance: an MAE of 3.473, MSE of 19.813, RMSE of 4.451, and R2 of 0.834. These results demonstrate that the model maintains a controlled error level and satisfactory predictive ability, though there remains potential for further optimization.
The results shown in Figure 6b indicate that the RMSprop optimizer performs slightly worse than Adam in the skid resistance prediction task. In the time-series comparison between predicted and actual values, the RMSprop-predicted curve (green dashed line) exhibits more noticeable fluctuations. Although it generally follows the actual trend (blue line), the deviations are larger and more frequent, with error accumulation becoming evident over time. This suggests that RMSprop is less stable after convergence and has weaker resistance to noise, especially in complex data environments. The residual plot shows a relatively balanced distribution of positive and negative residuals, yet the overall magnitude of residuals is larger than that observed with Adam. This indicates greater variability in prediction errors, implying reduced generalization ability and an increased risk of bias when applied to unseen data. The error distribution plot further supports this assessment. While the errors roughly follow a normal distribution, the distribution is wider, with a larger standard deviation and reduced symmetry. This reflects lower robustness to data fluctuations and weaker control over prediction errors. Quantitative evaluation confirms these findings: the model trained with RMSprop yields a MAE of 3.579, MSE of 21.543, RMSE of 4.641, and R2 of 0.820, all of which are inferior to the results achieved with the Adam optimizer. Overall, RMSprop offers acceptable but suboptimal performance in this regression task.
The SGD optimizer, as illustrated in Figure 6c, demonstrates the weakest performance among the three. In the True vs. Predicted plot, the predicted curve shows the largest deviation from the actual values, with pronounced volatility and poor trend alignment. This indicates that the model failed to effectively learn the underlying data patterns during training, resulting in high prediction errors and limited generalization capability. The residual plot further reveals this weakness: residuals are widely scattered with large magnitudes, and the distribution between positive and negative errors is unbalanced, reflecting significant bias and instability in predictions. The error distribution plot shows a broad and flat distribution, with low concentration around zero, suggesting that prediction errors are both large and inconsistent across samples. Quantitative metrics confirm these observations. The SGD-optimized model yields a MAE of 4.281, MSE of 29.938, RMSE of 5.472, and R2 of 0.750, all of which point to poor fitting accuracy, high error levels, and weak adaptability to new data. Overall, SGD exhibits limited effectiveness in this regression task and is not recommended for use under similar modeling conditions.
In summary, the Adam optimizer demonstrates the best overall performance across all evaluation metrics, offering stable convergence, low prediction errors, and superior fitting quality. Its strong training stability and general applicability make it a reliable choice for most regression tasks. RMSprop ranks second, with slightly lower accuracy and stability compared to Adam, but remains effective in scenarios involving noisy or non-stationary data. By contrast, SGD performs the poorest, characterized by high error levels, unstable training, weak fitting ability, and limited generalization. These limitations render it unsuitable for complex prediction tasks such as skid resistance modeling.
Figure 7 illustrates the loss trajectories of MLP models trained with three different optimizers—Adam, RMSprop, and SGD—on both the training and test sets across epochs, offering a comparative assessment of their fitting and generalization capabilities.
In the training phase (Figure 7a), the Adam optimizer exhibits rapid convergence, with the training loss dropping sharply from 0.044 to approximately 0.012 within a few epochs and remaining stable thereafter, demonstrating strong training efficiency and stability. RMSprop also shows a steady decline in training loss, from 0.020 to about 0.012. Although its final loss is close to that of Adam, the convergence is slightly slower, and the overall training curve is less smooth. By contrast, the SGD optimizer shows a gradual decrease in training loss from 0.045 to around 0.0105. However, the curve lacks a clear convergence plateau, suggesting that the model had not fully converged by the end of 200 epochs. This indicates that under current hyperparameter settings, SGD is less effective in capturing feature patterns during training compared to Adam and RMSprop.
In the testing phase (Figure 7b), the Adam optimizer maintains a low and stable test loss, decreasing from 0.013 to approximately 0.011 with minimal fluctuation, reflecting strong generalization performance. RMSprop achieves a slightly lower final test loss of around 0.009 but exhibits more noticeable short-term oscillations, particularly in the later epochs. This instability may result from its adaptive learning rate being overly sensitive to local variations in the data. The test loss of the SGD optimizer gradually decreases from 0.027 to about 0.015. Although the curve is smooth and free from sharp oscillations, the convergence speed is slower, and the relatively narrow gap between training and test losses suggests slight underfitting.
Overall, Adam provides the most balanced performance in terms of convergence speed, training stability, and generalization, while RMSprop offers comparable performance but with less stability. SGD demonstrates the slowest convergence and the weakest fitting capability under the current experimental conditions.

4.2.2. Analysis of Model Evaluation Results Based on ReLU

Figure 8 presents the performance comparison of an MLP model with the ReLU activation function under three different optimizers. As shown in Figure 8a, the term “predicted” refers to the SFC output by the trained MLP model based on the input features, whereas “measured” denotes the measured SFC. Both the horizontal and vertical axes represent dimensionless quantities. In the regression plot (Predicted vs. Measured), although minor fluctuations exist, the predicted curve closely follows the actual trend, indicating effective learning of the underlying data patterns. The residual plot shows a symmetric distribution of errors around zero without discernible bias, suggesting that the model does not suffer from systematic errors. The error distribution plot reveals a concentrated, near-normal distribution centered at zero, further confirming that most prediction errors are small and stable. The quantitative results (MAE = 3.148, MSE = 17.089, RMSE = 4.134, R2 = 0.857) reflect the model’s superior fitting accuracy and error control.
In comparison, the RMSprop optimizer yields slightly lower performance, shown in Figure 8b. The regression plot reveals larger fluctuations, with more noticeable deviations from the actual values in some segments, particularly under volatile input conditions. The residuals remain roughly symmetric but show an expanded range and increased variability, indicating weaker consistency. The error distribution is still centered around zero but is wider and less symmetrical, reflecting reduced robustness. Corresponding metrics (MAE = 3.399, MSE = 19.793, RMSE = 4.449, R2 = 0.835) show that while RMSprop is capable of capturing general trends, its error control and fitting precision are inferior to Adam’s. Nevertheless, it may still be applicable to tasks with moderate accuracy requirements and noisy data environments.
The SGD optimizer performs the worst across all metrics, as shown in Figure 8c. The regression plot shows large deviations between predicted and actual values, especially in extreme ranges where the model fails to capture trend dynamics. The residual plot indicates a dispersed and unbalanced distribution, with greater variance and error magnitude. The error distribution plot is wide and less concentrated, suggesting instability and poor convergence. Quantitative metrics (MAE = 3.718, MSE = 22.364, RMSE = 4.729, R2 = 0.813) confirm that SGD suffers from slower learning, higher prediction variance, and reduced generalization, particularly in complex or nonlinear scenarios.
From the performance of the three optimizers, Adam stands out as the best choice in terms of fitting accuracy, error control, and model stability, making it suitable for most complex tasks, especially applications with high demands for prediction accuracy. While RMSprop is slightly less effective, it still offers a degree of noise resistance and can serve as a viable alternative to Adam, especially in the presence of strong noise interference. In contrast, SGD’s slow convergence, large error fluctuations, and poor model stability make it more suitable for simpler tasks with small feature variations, or tasks that require further tuning of learning rate decay and momentum factors for optimization.
Figure 9 illustrates the training and test loss trends across epochs for the ReLU-based MLP model under three optimizers (Adam, RMSprop, and SGD), highlighting the convergence behavior and generalization performance of each.
In the training loss curves (see Figure 9a), Adam demonstrates a distinct convergence advantage. Its loss rapidly decreases from 0.0162 to 0.0069 within the first epoch and stabilizes around 0.0059, indicating that the model quickly captures the input–output mapping and maintains stable error levels throughout training. RMSprop starts with a slightly higher initial loss (0.0171) and shows a slower decline but follows a similar overall trend, eventually converging to approximately 0.0069. Both optimizers maintain low, stable training losses in later epochs, reflecting smooth and efficient optimization. By contrast, SGD begins with a much higher training loss (0.0397), decreases at a slower rate, and requires more epochs to stabilize. Its final training loss plateaus around 0.0105, indicating reduced training efficiency and a limited ability to fully learn complex data features within the same epoch range.
On the test set (see Figure 9b), Adam again exhibits strong generalization. The test loss steadily decreases from 0.0072 to 0.0064 and remains tightly aligned with the training loss, showing minimal fluctuation. This consistency suggests that the model avoids overfitting and maintains high generalization capacity. RMSprop shows a similar trend, with its test loss decreasing from 0.0080 to 0.0062. However, mild fluctuations appear after epoch 50, likely due to sensitivity to local noise or disturbances, slightly affecting its generalization stability. SGD, starting from a much higher test loss of 0.0219, stabilizes around 0.0108. While the curve is relatively smooth and free of sharp oscillations, the overall loss remains significantly higher than that of Adam and RMSprop, reflecting its limited ability to model complex relationships in unseen data.
As shown in the optimizer performance comparison (Figure 7 and Figure 9), RMSprop exhibited noticeably greater fluctuations in training loss compared with Adam on the asphalt pavement skid-resistance dataset. RMSprop produced a higher standard deviation of loss values across epochs and a less smooth convergence curve, whereas Adam achieved more gradual and stable loss reduction. This difference can be attributed to both the characteristics of the input data and the internal mechanisms of the optimizers. RMSprop adjusts parameter-specific learning rates by dividing the gradient by the running average of recent magnitudes. While this allows rapid adaptation, it increases sensitivity to short-term variations. In this dataset, traffic-related variables (e.g., AADT, annual traffic volume) had much larger numeric ranges than environmental variables (e.g., temperature, rainfall), leading to heterogeneous gradient magnitudes that amplified RMSprop’s oscillatory behavior. Adam, by contrast, extends RMSprop with first-moment estimates (momentum) and bias correction, smoothing update steps and mitigating the impact of abrupt gradient changes. This stabilizing effect is particularly effective for datasets with uneven feature scales or irregular patterns, resulting in shorter convergence time and lower training-loss variability. These observations align with prior studies on ML-based pavement performance prediction, which reported Adam’s superior stability and convergence efficiency in heterogeneous, non-stationary data environments.
In summary, Adam and RMSprop both achieve fast convergence and strong generalization, with Adam showing better early-stage learning efficiency and more stable performance across epochs. RMSprop performs slightly less effectively but remains a viable alternative in many cases. In contrast, SGD suffers from slow convergence and underfitting, with higher residual errors on both training and test sets. Improving its performance would require tuning hyperparameters such as the learning rate and incorporating momentum mechanisms to enhance convergence and learning capacity.

5. Conclusions and Future Work

This study systematically investigates the influence of activation function and optimizer selection on the performance of a multilayer perceptron (MLP) model for predicting highway asphalt pavement skid resistance. Based on comprehensive evaluation metrics and visual analyses, the following conclusions are drawn:
(1)
Among the tested functions, ReLU exhibits the best overall performance, benefiting from its sparse activation and effective nonlinear feature extraction, making it the most suitable choice. Leaky ReLU and Mish also show strong performance, offering high accuracy and stable predictions across the full value range. Tanh, though a traditional function, shows moderate performance, outperforming Sigmoid in terms of convergence stability and fitting capability. It may still be applicable in certain scenarios with smoother feature distributions. In contrast, Sigmoid suffers from severe gradient saturation and limited expressiveness, resulting in poor training dynamics and generalization, and is not recommended for deep regression tasks.
(2)
The Adam optimizer consistently delivers the best performance across all evaluated metrics, with fast convergence, small errors, and stable training behavior. It demonstrates superior adaptability to the nonlinear and high-dimensional nature of skid resistance data. RMSprop provides relatively good performance in controlling fluctuations and noise sensitivity, though it converges more slowly and with less consistency than Adam. SGD, lacking adaptive learning mechanisms, shows limited optimization efficiency and fitting accuracy and tends to underfit in complex tasks.
(3)
For effective skid resistance prediction, combining the Adam optimizer with ReLU, Leaky ReLU, or Mish activation functions is recommended to ensure high accuracy and model robustness. Tanh may serve as a secondary option under certain task-specific conditions. To further enhance model adaptability, incorporating advanced feature engineering, data preprocessing, and regularization techniques is advised. Future studies could explore dynamic activation–optimizer combinations and adaptive training strategies to improve model performance in more diverse and complex transportation prediction tasks.
This study provides empirical evidence and methodological guidance for activation function and optimizer selection in deep learning-based pavement performance modeling, with implications extendable to other regression-based applications in engineering practice. However, several limitations of this study should be acknowledged. First, the analysis was restricted to the multilayer perceptron architecture in order to systematically examine the effects of activation functions and optimizers. While this design enhances internal validity, it inevitably limits the breadth of cross-model comparisons. Second, the dataset was collected from asphalt pavements along expressways within a single province, under relatively uniform climatic and structural conditions. This may constrain the generalizability of the model to other regions with different environmental or structural characteristics. Third, the input features mainly consist of macro-level inspection parameters, without incorporating microtexture descriptors, material composition variations, or environmental influences such as temperature and precipitation. This limitation may reduce the model’s ability to capture the complex mechanisms underlying skid resistance degradation. Fourth, the current validation was performed primarily on AC-13 asphalt pavements, leaving its applicability to other pavement structures untested. Finally, the framework does not explicitly capture seasonal variations or long-term temporal degradation patterns, which may further affect predictive performance in diverse real-world settings.
To address these limitations, future studies could extend the framework by incorporating additional machine learning and deep learning models (e.g., Random Forest, SVR, CNN, and LSTM) to enhance the robustness and generalizability of the findings beyond the MLP architecture. Systematic hyperparameter optimization—using techniques such as grid search, random search, or Bayesian optimization—combined with cross-validation, will also be explored to strengthen reproducibility and model stability. Moreover, the dataset will be expanded to include pavement sections from different climatic regions and structural types, allowing for broader applicability. Feature sets will be enriched with microtexture indicators, material properties, and environmental factors to better capture skid resistance degradation mechanisms. Finally, time-series modeling approaches will be employed to explicitly represent seasonal and long-term deterioration patterns, thereby advancing the predictive accuracy and practical relevance of skid resistance modeling for diverse transportation infrastructures.

Author Contributions

Conceptualization, X.W., X.Y. and Q.Y.; methodology, X.W., X.Y., M.C. and Q.Y.; software, H.Y., Z.L. and Q.Y.; validation, H.Y. and M.C.; formal analysis, X.W., X.Y., M.C. and Q.Y.; investigation, M.C., H.Y. and Z.L.; data curation, H.Y. and Q.Y.; writing—original draft preparation, X.Y., H.Y., M.C., Z.L. and Q.Y.; writing—review and editing, Q.Y. and X.Y.; visualization, H.Y., Z.L. and M.C.; supervision, X.W. and Q.Y.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of the Jiangxi Provincial Department of Transportation, China (Grant No. 2023C0017).

Data Availability Statement

The data presented in this study is available on request from the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and the handling editor for their careful guidance, which have helped improve the manuscript.

Conflicts of Interest

Authors Xiaoyun Wan, Xiaoqing Yu, Maomao Chen, Haixin Ye, and Zhanghong Liu were employed by the company Jiangxi Ganyue Expressway Co., Ltd. Author Xiaoqing Yu was employed by the Jiangxi Provincial Key Laboratory of Pavement Performance Evolution and Life Extension of Highway Subgrade. Author Qifeng Yu was employed by Shanghai Maritime University. The authors declare that this study received funding from the Science and Technology Project of the Jiangxi Provincial Department of Transportation, China (Grant No. 2023C0017). The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

  1. Chu, L.J.; Fwa, T.F. Pavement skid resistance consideration in rain-related wet-weather speed limits determination. Road Mater. Pavement Des. 2016, 19, 334–352. [Google Scholar] [CrossRef]
  2. Yang, Z.; Guo, Z.Y. Measurement of anti-slide performance of tunnel road surface and its effect on driving safety. J. Chongqing Jiaotong Univ. (Nat. Sci.) 2006, 25, 38–42. (In Chinese) [Google Scholar]
  3. Huang, X.M.; Ma, T. Skid Resistance Analysis Based on Tire-Asphalt Pavement Coupling: Theory and Practice; China Communications Press: Beijing, China, 2024. (In Chinese) [Google Scholar]
  4. Huang, X.M.; Zheng, B.S. Research status and progress for skid resistance performance of asphalt pavements. China J. Highw. Transp. 2019, 32, 32–49. (In Chinese) [Google Scholar]
  5. Fwa, T.F. Determination and prediction of pavement skid resistance—Connecting research and practice. J. Road Eng. 2021, 1, 43–62. [Google Scholar] [CrossRef]
  6. Hofko, B.; Kugler, H.; Chankov, G.; Spielhofer, R. A laboratory procedure for predicting skid and polishing resistance of road surfaces. Int. J. Pavement Eng. 2017, 20, 439–447. [Google Scholar] [CrossRef]
  7. Tan, Y.Q.; Xiao, S.Q.; Xiong, X.T. Review on detection and prediction methods for pavement skid resistance. J. Traffic Transp. Eng. 2021, 21, 32–47. (In Chinese) [Google Scholar] [CrossRef]
  8. Wu, J.; Wang, X.; Wang, L.; Zhang, L.; Xiao, Q.; Yang, H. Temperature correction and analysis of pavement skid resistance performance based on RIOHTrack full-scale track. Coatings 2020, 10, 832. [Google Scholar] [CrossRef]
  9. Han, S.; Liu, M.; Fwa, T.F. Testing for low-speed skid resistance of road pavements. Road Mater. Pavement Des. 2018, 21, 1312–1325. [Google Scholar] [CrossRef]
  10. Guo, F.; Pei, J.; Zhang, J.; Li, R.; Zhou, B.; Chen, Z. Study on the skid resistance of asphalt pavement: A state-of-the-art review and future prospective. Constr. Build. Mater. 2021, 303, 124411. [Google Scholar] [CrossRef]
  11. Du, Y.; Qin, B.; Weng, Z.; Wu, D.; Liu, C. Promoting the pavement skid resistance estimation by extracting tire-contacted texture based on 3D surface data. Constr. Build. Mater. 2021, 307, 124729. [Google Scholar] [CrossRef]
  12. Kim, S.H.; Kim, N. Development of performance prediction models in flexible pavement using regression analysis method. KSCE J. Civ. Eng. 2006, 10, 91–96. [Google Scholar] [CrossRef]
  13. He, Y.; Weng, Z.; Leng, Z.; Wang, D.; Wu, J. A review of asphalt pavement long-term skid resistance performance based on multi-scale texture evolution characterization. Friction 2025, 13, 944–962. [Google Scholar] [CrossRef]
  14. Koné, A.; Es-Sabar, A.; Do, M.T. Application of machine learning models to the analysis of skid resistance data. Lubricants 2023, 11, 328. [Google Scholar] [CrossRef]
  15. Xiao, F.; Chen, X.; Cheng, J.; Yang, S.; Ma, Y. Establishment of probabilistic prediction models for pavement deterioration based on Bayesian neural network. Int. J. Pavement Eng. 2022, 23, 1234–1248. [Google Scholar] [CrossRef]
  16. Chen, W.; Li, Y.; Liu, Z.; Zhang, C.; Zhao, Y.; Yan, X. Prediction model for bearing surface friction coefficient in bolted joints based on GA-BP neural network and experimental data. Tribol. Int. 2024, 196, 110217. [Google Scholar] [CrossRef]
  17. Yao, L.; Dong, Q.; Jiang, J.; Ni, F. Establishment of prediction models of asphalt pavement performance based on a novel data calibration method and neural network. Transp. Res. Rec. 2019, 2673, 66–82. [Google Scholar] [CrossRef]
  18. Shi, W.; Niu, D.; Li, Z.; Niu, Y. Effective contact texture region aware pavement skid resistance prediction via convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 2054–2070. [Google Scholar] [CrossRef]
  19. Hu, Y.; Sun, Z.; Li, H.W. Evaluate asphalt pavement frictional characteristics based on IGWO-NGBoost using 3D macro-texture data. Expert Syst. Appl. 2024, 242, 122786. [Google Scholar] [CrossRef]
  20. Saleem, M.; Abbas, S.; Ghazal, T.M.; Khan, M.A.; Sahawneh, N.; Ahmad, M. Smart cities: Fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques. Egypt. Inform. J. 2022, 23, 417–426. [Google Scholar] [CrossRef]
  21. You, Z.; Liu, C.; Deng, Q.; Feng, Q.; Qiu, Y.; Zhang, A.; He, X. Integrated FFT and XGBoost framework to predict pavement skid resistance using automatic 3D texture measurement. Measurement 2022, 188, 110638. [Google Scholar] [CrossRef]
  22. Marcelino, P.; de Lurdes Antunes, M.; Fortunato, E.; Gomes, M.C. Machine learning approach for pavement performance prediction. Int. J. Pavement Eng. 2021, 22, 341–354. [Google Scholar] [CrossRef]
  23. Yu, T.; Pei, L.I.; Li, W.; Sun, Z.Y.; Huyan, J. Pavement surface condition index prediction based on random forest algorithm. J. Highw. Transp. Res. Dev. (Engl. Ed.) 2021, 15, 1–11. [Google Scholar] [CrossRef]
  24. Ziari, H.; Maghrebi, M.; Ayoubinejad, J.; Waller, S.T. Prediction of pavement performance: Application of support vector regression with different kernels. Transp. Res. Rec. 2016, 2589, 135–145. [Google Scholar] [CrossRef]
  25. Wang, X.; Zhao, J.; Li, Q.; Fang, N.; Wang, P.; Ding, L.; Li, S. A hybrid model for prediction in asphalt pavement performance based on support vector machine and grey relation analysis. J. Adv. Transp. 2020, 2020, 7534970. [Google Scholar] [CrossRef]
  26. Yang, G.; Li, Q.J.; Zhan, Y.; Fei, Y.; Zhang, A. Convolutional neural network–based friction model using pavement texture data. J. Comput. Civ. Eng. 2018, 32, 04018052. [Google Scholar] [CrossRef]
  27. Pu, Z.; Liu, C.; Shi, X.; Cui, Z.; Wang, Y. Road surface friction prediction using long short-term memory neural network based on historical data. J. Intell. Transp. Syst. 2021, 26, 34–45. [Google Scholar] [CrossRef]
  28. Zhan, Y.; Chen, Y.; Lin, X.; Zhang, Y.; Zhang, A.; Ai, C. Prediction of the skid-resistance deterioration in asphalt pavement based on peephole–LSTM neural network. Int. J. Pavement Eng. 2023, 24, 2277815. [Google Scholar] [CrossRef]
  29. Damirchilo, F.; Hosseini, A.; Mellat Parast, M.; Fini, E.H. Machine learning approach to predict international roughness index using long-term pavement performance data. J. Transp. Eng. Part B Pavements 2021, 147, 04021058. [Google Scholar] [CrossRef]
  30. Kang, J.; Tavassoti, P.; Chaudhry, M.N.A.R.; Baaj, H.; Ghafurian, M. Artificial intelligence techniques for pavement performance prediction: A systematic review. Road Mater. Pavement Des. 2025, 26, 497–522. [Google Scholar] [CrossRef]
  31. Mers, M.; Yang, Z.; Hsieh, Y.A.; Tsai, Y. Recurrent neural networks for pavement performance forecasting: Review and model performance comparison. Transp. Res. Rec. 2023, 2677, 610–624. [Google Scholar] [CrossRef]
  32. Fernández-Delgado, M.; Sirsat, M.S.; Cernadas, E.; Alawadi, S.; Barro, S.; Febrero-Bande, M. An extensive experimental survey of regression methods. Neural Netw. 2019, 111, 11–34. [Google Scholar] [CrossRef]
  33. Motamed, M. Approximation power of deep neural networks: An explanatory mathematical survey. arXiv 2022, arXiv:2207.09511. [Google Scholar] [CrossRef]
  34. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  35. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
  36. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  37. LeCun, Y.; Bottou, L.; Orr, G.B.; Müller, K.R. Efficient backprop. In Neural Networks: Tricks of the Trade; Orr, G.B., Müller, K.R., Eds.; Lecture Notes in Computer Science; Springer: Berlin, Germany, 1998; Volume 1524, pp. 9–50. [Google Scholar] [CrossRef]
  38. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 315–323. [Google Scholar]
  39. Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
  40. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. 3–6. [Google Scholar]
  41. Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
  42. Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  43. Misra, D. Mish: A self-regularized non-monotonic activation function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
  44. Robbins, H.; Monro, S. A stochastic approximation method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
  45. Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude. Coursera Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
  46. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  47. JTG 3450-2019; Specifications for Field Test Methods of Highway Subgrade and Pavement. China Communications Press: Beijing, China, 2019. (In Chinese)
  48. JTG F80/1-2017; Standards for Quality Inspection and Evaluation of Highway Engineering, Part I: Civil Engineering. China Communications Press: Beijing, China, 2017. (In Chinese)
Figure 1. Correlation Matrix Between SFC and Other Variables.
Figure 1. Correlation Matrix Between SFC and Other Variables.
Symmetry 17 01708 g001
Figure 2. MLP Neural Network Model Architecture.
Figure 2. MLP Neural Network Model Architecture.
Symmetry 17 01708 g002
Figure 3. Relative Proportion of Evaluation Metrics for Activation Functions.
Figure 3. Relative Proportion of Evaluation Metrics for Activation Functions.
Symmetry 17 01708 g003
Figure 4. Comparison of model-predicted values and actual values under different activation functions: (a) tanh activation function; (b) ReLU activation function; (c) Sigmoid activation function; (d) Mish activation function; (e) Swish activation function; (f) Leaky ReLU activation function; (g) ELU activation function.
Figure 4. Comparison of model-predicted values and actual values under different activation functions: (a) tanh activation function; (b) ReLU activation function; (c) Sigmoid activation function; (d) Mish activation function; (e) Swish activation function; (f) Leaky ReLU activation function; (g) ELU activation function.
Symmetry 17 01708 g004aSymmetry 17 01708 g004b
Figure 5. Comparison of loss-curve changes of different activation functions on the training set and the test set when Adam optimizer is adopted: (a) training set; (b) test set.
Figure 5. Comparison of loss-curve changes of different activation functions on the training set and the test set when Adam optimizer is adopted: (a) training set; (b) test set.
Symmetry 17 01708 g005
Figure 6. Performance Comparison of Each Optimizer in the MLP Model Based on Mish Activation Function: (a) Adam Optimizer; (b) RMSprop Optimizer; (c) SGD Optimizer.
Figure 6. Performance Comparison of Each Optimizer in the MLP Model Based on Mish Activation Function: (a) Adam Optimizer; (b) RMSprop Optimizer; (c) SGD Optimizer.
Symmetry 17 01708 g006aSymmetry 17 01708 g006b
Figure 7. Comparison of loss-curve changes of different optimizers on the training set and the test set when the Mish activation function is adopted: (a) training set; (b) test set.
Figure 7. Comparison of loss-curve changes of different optimizers on the training set and the test set when the Mish activation function is adopted: (a) training set; (b) test set.
Symmetry 17 01708 g007
Figure 8. Performance comparison of the MLP model with the ReLU activation function under three different optimizers: (a) Adam optimizer; (b) RMSprop optimizer; (c) SGD optimizer.
Figure 8. Performance comparison of the MLP model with the ReLU activation function under three different optimizers: (a) Adam optimizer; (b) RMSprop optimizer; (c) SGD optimizer.
Symmetry 17 01708 g008aSymmetry 17 01708 g008b
Figure 9. Comparison of loss-curve changes of different optimizers on the training set and the test set when the ReLU activation function is adopted: (a) training set; (b) test set.
Figure 9. Comparison of loss-curve changes of different optimizers on the training set and the test set when the ReLU activation function is adopted: (a) training set; (b) test set.
Symmetry 17 01708 g009
Table 1. Basic Information of the Surveyed Expressways.
Table 1. Basic Information of the Surveyed Expressways.
No.ExpresswayOpened Length (km)Opening DateCompletion DateDesign Speed (km/h)
1Tongwan68.797January 2017July 202180
2Ningding51.595December 2016April 202180
3Dongchang152.130January 2017January 2022100
4Xiuping79.700January 2017December 202180
5Shangwan76.057December 2016December 2021100
6Ning’an163.860January 2017December 202180
Table 2. Dataset Partitioning.
Table 2. Dataset Partitioning.
DatasetQuantityProportion (%)
Train dataset337880
Test dataset84420
Table 3. MLP Neural Network Model Parameter Settings.
Table 3. MLP Neural Network Model Parameter Settings.
ParametersValues
Activation functionMish/ReLU
OptimizerAdam
Loss FunctionMean Squared Error
Learning Rate0.001
Number of Epochs200
Batch Size32
Beta_10.9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wan, X.; Yu, X.; Chen, M.; Ye, H.; Liu, Z.; Yu, Q. Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection. Symmetry 2025, 17, 1708. https://doi.org/10.3390/sym17101708

AMA Style

Wan X, Yu X, Chen M, Ye H, Liu Z, Yu Q. Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection. Symmetry. 2025; 17(10):1708. https://doi.org/10.3390/sym17101708

Chicago/Turabian Style

Wan, Xiaoyun, Xiaoqing Yu, Maomao Chen, Haixin Ye, Zhanghong Liu, and Qifeng Yu. 2025. "Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection" Symmetry 17, no. 10: 1708. https://doi.org/10.3390/sym17101708

APA Style

Wan, X., Yu, X., Chen, M., Ye, H., Liu, Z., & Yu, Q. (2025). Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection. Symmetry, 17(10), 1708. https://doi.org/10.3390/sym17101708

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop