1. Introduction
Road infrastructure plays an important and active role in the development of cities and communities and serves as the basis for safe, efficient, and sustainable transportation systems [
1]. It has a significant impact on the well-being and comfort of road users [
2] and is also an important driver of socio-economic growth. By enabling efficient transport, stimulating trade, creating jobs, improving accessibility, and encouraging investment, pavement management supports both infrastructure performance and societal progress in general [
3].
In this context, effective pavement management requires the assessment and evaluation of various parameters, of which pavement roughness is one of the most important. Traffic data, weather, and environmental conditions, as well as the structural condition of the pavement, play a crucial role in reducing roughness. There are several methods to quantify roughness, of which the International Roughness Index (IRI) according to ASTM E1364-95 [
4,
5] is the best known. Several studies have been conducted to develop predictive models for the IRI value to support timely maintenance planning [
3,
4,
6,
7,
8,
9].
In general, IRI prediction relies on statistical methods such as linear and non-linear regression [
4]. Although these methods are widely used, they have significant limitations, mainly because the relationships between IRI and influencing factors are often complex and non-linear [
10,
11]. In addition, the performance of regression models tends to decrease as the size of datasets and the number of predictor variables increase, resulting in lower prediction accuracy. In response to these challenges, Artificial Intelligence (AI) methods, particularly Neural Networks (NNs), have emerged in recent years as a powerful alternative that can solve complex problems by relying on observational data rather than relying on the fundamental laws of physics. They are a type of machine learning that is modelled on the way the human brain and its neurons work—hence their name [
12].
Traditionally, the term “neural network” was used to refer to a network or circuit of biological neurons, but modern usage of the term often refers to Artificial Neural Networks (ANNs). An ANN is a mathematical or computational model, an information processing paradigm inspired by the biological nervous system, e.g., the brain’s information system. It consists of interconnected artificial neurons that are programmed to mimic the properties of biological neurons. These neurons work together to solve specific problems. ANNs are configured to solve artificial intelligence problems without modelling a real biological system. ANN applications are performed through a training process that is similar to learning in biological systems and involves adaptation between neurons through synaptic connections [
13].
Regarding the structure of neurons, each neuron acts as a functional processing unit that receives multiple inputs, applies a series of synaptic weights, and generates an output signal. As described by Montesinos-López et al. [
14], the inputs {x1, x2, …, xp} are each associated with weights {w1, …, wp} that represent their relative influence. These weighted inputs are summed and passed through a non-linear activation function that determines the output of the neuron. As mentioned earlier, this structure mirrors the behavior of biological neurons, where dendrites receive external signals and synaptic strength modulates their effect on the cell body [
15].
There are different types of ANNs, which differ in their architecture. The choice of architecture depends on the type of problem the neural network is intended to solve. The four main categories of neural network architectures are the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN), the Long Short-Term Memory (LSTM), and the Convolutional Neural Network (CNN).
Among these, the Multilayer Perceptron (MLP) is a supervised feed-forward neural network [
16], which is one of the most commonly used ANN architectures for prediction studies. MLPs consist of multiple layers, including an input layer, hidden layers, and an output layer, with each layer containing a set of perceptual elements called neurons [
17]. The input layer receives the signal to be processed. The output layer is responsible for functions such as classification and prediction [
18]. The hidden layer and the output layer use a non-linear activation function. To train Multilayer Perceptron networks, a backpropagation algorithm should be implemented [
16].
In contrast, Recurrent Neural Networks (RNNs) are a class of deep learning models that are fundamentally designed to handle sequential data and are suited for applications such as natural language processing, speech recognition, and time series forecasting [
19]. Although RNNs represent a powerful encoding model, they are often affected by long-term memory loss issues due to vanishing gradient problem. This phenomenon occurs when large error gradients start accumulating, resulting in significant changes to the neural network model during training, ultimately resulting in model instability [
20].
To overcome the vanishing gradient problem, Long Short-Term Memory (LSTM) networks have been developed, which are a subset of RNNs and are able to capture historical information from time series data and are suitable for predicting long-term non-linear series [
21]. LSTM networks contain hidden layers with memory blocks that contain cells with self-connections instead of traditional non-linear units, allowing them to store information over long periods of time [
22]. The network consists of gates, including forgetting gates, input gates, and output gates, which can capture both long-term and short-term memory along time steps and avoid the exploding and/or vanishing gradient in standard RNNs [
23]. LSTM networks are mainly used for sequential data analysis (automatic speech recognition, language translation, handwritten character recognition, time series data prediction, etc.) [
24].
Lastly, Convolutional Neural Networks (CNNs) are used for understanding images, or more generally, spatially-dependent information, and they can classify images as wells as extract features from them. CNNs utilize a specialized type of linear operation known as convolution. A typical CNN architecture consists of the convolutional layer, the non-linearity (activation) layer, the pooling layer, and the fully connected layer [
25].
2. Objectives
In recent decades, numerous studies have been conducted on the prediction of road roughness using neural networks [
12,
26,
27,
28]. Conventional regression-based models, although widely used, have shown their limitations in capturing the complex, non-linear relationships between the many factors that influence pavement deterioration, such as traffic load and pavement structure, which can lead to lower prediction accuracy over time [
4,
6,
29]. To overcome these limitations, artificial intelligence and neural networks are increasingly being used, as they offer greater flexibility in modelling non-linearity and interactions between variables. However, despite their advantages, neural networks can also have their limitations, particularly in terms of interpretability, as their internal decision-making processes can be difficult to explain in practical applications. It is important to note that there is currently no clear consensus on which input variables (e.g., traffic load, climate, pavement type, maintenance history) are essential for IRI prediction, leading to challenges in comparing model performance across studies. In addition, few studies take into account prediction uncertainty or confidence intervals, which are crucial for pavement maintenance planning. It seems that, the larger the number of variables, the more complicated it is to analyze and assess the results. Furthermore, the findings from comparison of ANN performance with other models, such as regression-based models, are rather limited.
With this in mind, this study aims to evaluate the effectiveness of Artificial Neural Networks (ANNs), specifically the Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM) architectures, in predicting pavement roughness as measured and expressed by the International Roughness Index (IRI). The goal is to compare the predictive performance of these neural networks with traditional regression models, including linear, ridge, and lasso regression. A key focus is to investigate the ability of ANNs to capture complex, non-linear relationships between the IRI and influencing factors such as traffic data and pavement structure. Ultimately, the potential of neural networks as a reliable tool to support timely and cost-effective pavement maintenance planning will be evaluated.
3. Methodology
3.1. General Description
To evaluate the reliability of neural networks in predicting future IRI values, four experimental analyses were performed using the Python programming language and the PyTorch library. Data from two highway sections with similar structural characteristics, referred to as Section A and Section B, were used. Each section has a length of three kilometers and consists of three lanes. Data was collected for both sections over a period of 9 years. For each section, continuous roughness measurements were performed on the left (L1) and right (L2) lanes for the right wheel path using a high-speed inertial profiler system [
30,
31], with an interval of 10 m, to calculate the IRI values. The structure of the two highway sections was a typical flexible pavement, and the thickness of each layer was estimated using Ground-Penetrating Radar (GPR) measurements [
32,
33,
34] taken during the first year of the investigation. In addition, the available traffic data was processed and included in the analysis. It should be noted that a flexible pavement design is based on the load distribution properties of a layer system that transfers the load to the subgrade via a combination of layers.
Figure 1 shows a typical cross-section of a flexible pavement with the asphalt layers, the unbound layers (base-subbase layers), and the subgrade.
The four experimental analyses were conducted using five different prediction models: three based on linear regression techniques—linear, ridge, and lasso regression—and two based on NNs—MLP and LSTM. The selection of MLP and LSTM was based on their suitability for the structure and scope of the study. MLP is well suited for processing structured, non-sequential input data and is effective in modeling complex non-linear relationships. LSTM, on the other hand, is particularly advantageous for capturing temporal dependencies and trends in time series data, which is essential given the longitudinal nature of the IRI measurements, which were collected over a 9-year period. This combination allows the analysis to benefit from both the modeling of static inputs and the possibilities of time series forecasting.
The programming process begins with the creation of a file called “.utils.py”, into which all data is imported and split into input and output components. In each experimental analysis, the input data consisted of the measurements from section A, which were used to train the models. First, IRI predictions were generated for the left and right lanes of section A for the last year of the dataset. Similarly, IRI predictions were generated for section B for the same year. By comparing the predicted IRI values with the actual measured values from the database, conclusions could be drawn about the ability of the NNs to accurately predict future pavement roughness, as well as their ability to generalize across different road sections. The study acknowledges that sections A and B are similar in terms of structural and traffic characteristics, which supports the assumption that a model trained on one section can generalize reasonably well to the other. However, it is important to clarify that the scope of this research is not to develop a universally applicable model for all pavement types, but rather to evaluate the effectiveness of neural networks in predicting IRI within comparable pavement contexts. If a structurally different section were to be evaluated, it would be necessary to retrain or adapt the model using data representative of that section’s unique characteristics. The intent of the study is to demonstrate that, for similar—though not identical—pavement sections, the developed model can be effectively applied to support predictive pavement management.
Finally, the most effective prediction model was determined by calculating the prediction error for each method in each experimental analysis by comparing the performance of the linear regression models with that of the NNs. An overview of the research process is provided in
Figure 2, which illustrates the methodological framework of this study.
3.2. Data
3.2.1. IRI Indices
Figure 3 and
Figure 4 show the distributions of the IRI indices over a period of 9 years for the right wheel path of the left (L1) and right (L2) lanes in section A. Similarly,
Figure 5 and
Figure 6 show the corresponding IRI distributions for section B.
Figure 3,
Figure 4,
Figure 5 and
Figure 6 show that the IRI values for sections A and B on lanes L1 and L2 are different. For section A, the values for L1 range from 0.3 to 9.15 and for L2 from 0.23 to 6. In both cases, the majority of values are between 0.3 and 0.7. In section B, L1 has a wider range from 0.2 to 13.8, and L2 from 0.3 to 11.1, with most values between 0.3 and 0.7. The occasional high IRI values are localized, while the observed fluctuations show the same trend over the study years.
3.2.2. Pavement Thickness
The stratigraphy of the investigated highway sections was recorded with a vehicle-mounted, air-coupled GPR system operating at 1 GHz. The GPR data was analyzed, and the result was estimation of the thickness of the asphalt layers, as well as the base and subbase layers as a uniform layer. As the thickness of the asphalt layers does not usually change significantly over short periods of time, the values obtained from the GPR measurement (which was carried out for a single year) were considered representative of the entire study period.
Figure 7 and
Figure 8 show the thickness results for sections A and B, respectively.
As can be seen in
Figure 7 and
Figure 8, the asphalt layer thicknesses for the two highway sections are between 12 and 20 cm, while the base layer thicknesses are between 16 and 33 cm. These values do not deviate significantly from each other, which proves the similarity of the pavement structure of the two sections.
3.2.3. Traffic Data
Traffic data was available for both highway sections over the 9-year study period. For the purposes of pavement design and performance evaluation, the analysis incorporated both light and heavy vehicles. Traffic volumes were quantified using the AADT (Annual Average Daily Traffic) metric.
Table 1 and
Table 2 present the total AADT values for sections A and B, respectively, while
Figure 9 and
Figure 10 illustrate the corresponding traffic trends for these sections.
3.3. Linear Functions
The prediction of linear functions using the regression method is performed by multiplying the input data “x train” by the corresponding weights w, resulting in an output value “y train”. The optimal prediction model is then calculated using the Mean Squared Error (MSE) function, which is different for each of the following methods. This model provides the prediction “y test” based on the input data “x test”.
Of the collected data, 80% is defined as training data. It should be noted that the input data “x train” consists of all the elements that influence the IRI, i.e., the asphalt layer thickness and the traffic data for 8 years, while the output data “y train” consists of the IRI indices for the same years. The remaining 20% of the data is used to predict the IRI indices. In other words, a random data set “x test”, which accounts for 20% of the total data, is used to predict the IRI indices for the 9th year.
In order to better harmonize the results, the minimum and maximum values of the data are also determined. Another prerequisite for the regression method is the input of the x and y data in arrays. The x-data is inserted into a two-dimensional array, while the y-data is inserted into a one-dimensional array. In this way, a multiple linear regression model is created.
After all data are properly imported and fitted, the three models are used to predict the IRI indices for the four experimental analyses.
3.4. Neural Networks
Three core processes are carried out for operation of the MLP and LSTM neural networks: feeding and organizing the data into the network, configuring the network architecture, and constructing the training function. The data is imported into the neural networks and organized using the Python programming language and a large number of built-in functions.
To configure the architecture of the MLP network, two layers are defined, through which the data must pass (
Figure 11). The first layer, called the input layer, receives all data elements, including years, thicknesses, and traffic data, totaling 18 features. When the data leaves the input layer, the number of features is reduced. The data then goes through a non-linear ReLU activation function, which changes the values but not the number of elements. Finally, the processed data flows into the second layer, the output layer, where it is combined into a single output value. The two layers consist of a simple linear function. The non-linear function in between is the activation function ReLU (Rectified Linear Unit). The number of neurons and other hyperparameters were determined heuristically during creation of the architecture based on preliminary tests and practical considerations.
By introducing an activation function between two linear layers, non-linearity is achieved within the network. This enables the network to recognize patterns that could not be recognized with a simple linear function alone.
The architecture of the LSTM network is defined by its hidden layers, in which a series of mathematical operations take place. The network consists of two layers and sixteen hidden units. A linear layer is defined at the output, which generates a single output value. A two-dimensional time matrix “x” is also defined, in which the IRI indices and traffic data for each kilometer position are entered. In addition, the asphalt layer thicknesses and the base course thicknesses for each kilometer position are entered in the first hidden layer. This configuration forms the network structure shown in
Figure 12.
The pavement thicknesses are introduced in the first hidden layer, as these values remain constant over time. Then the time-varying data, such as the IRI indices and the traffic information for each year, are fed into the remaining hidden layers. Finally, all the data passes through a linear layer that provides a single prediction result.
The final step is to define the training function, which is implemented in the same way for both networks. At the beginning of the training process, a “seed” is introduced to ensure correct initialization of the weights of the model. More specifically, the weights are initially assigned random values, and, by defining a “seed”, the experiment can be repeated with identical results. This is important, as many of the processes involved are stochastic in nature.
For each road segment, 80% of the kilometer positions are selected to serve as training data, while the remaining 20% are used to predict future indicators and form the test dataset. In addition, the training data (80% of the total data) was divided into two subsets corresponding to a split of 80–20%. The first subset (64% of the total data), the so-called “training dataset”, is used to train the network, while the second subset (16% of the total data), the so-called “development dataset”, is used as output to monitor and evaluate the adequacy of the training of the network. Once the network is sufficiently trained, the iterative training process is terminated.
By comparing the predictions for the development data set with the actual values, an error metric, known as the development loss, is calculated. This is then compared with the training loss. The training process is stopped if a decrease in the training loss is observed without a corresponding decrease in the development loss. This signals the start of overfitting, where the network is too strongly tailored to the training data. As a result, the network performs well during training but is less effective in the final predictions.
Figure 13 illustrates the difference between training loss and development loss.
In the next step, the MLP and LSTM network models are introduced, the loss function is defined as the Mean Squared Error (MSE loss), and the optimizer is specified to minimize the loss by finding the smallest possible value. MSE was selected as a suitable evaluation metric, as it effectively captures overall prediction error and emphasizes larger deviations—making it well-suited for comparing model performance across similar pavement sections. After completing the iterative training process, the optimally trained model is loaded, and the predictions for the future IRI values are calculated in the 9th order. The prediction error is then calculated for both neural network models for each of the four test scenarios. Finally, by comparing the errors of the linear functions and the neural networks, the most effective method for estimating the future pavement roughness for the two highway sections is determined.
It should be noted that the test dataset was randomly selected and represents 20% of the kilometer positions for each highway section. Therefore, the error shown in the prediction tables in the next chapter reflects the difference between the predicted and actual (measured) values at these random positions and not the IRI values over the entire length of the section.
4. Results and Discussion
Following the methodology described above, experimental analyses were carried out for each of the two highway sections using the five different prediction methods, i.e., linear functions and neural networks.
4.1. Results for Section A
The results of the mean prediction errors resulting from the analyses in section A are summarized in
Table 3.
The results of the analysis for the left lane (L1) of section A show that the neural networks MLP and LSTM perform better than linear functions in estimating future IRI values. Of the linear prediction methods, ridge regression was found to be the most effective, as expected, because it has been shown to perform better in cases where the input features (in this case, pavement thickness and traffic data) are directly correlated with the predicted IRI values. Conversely, in situations where this correlation is weaker, one would expect lasso regression to yield lower prediction errors. Of all the methods investigated, the LSTM neural network had the highest prediction accuracy in this analysis. Therefore, further analysis is performed only for the LTSM predictions to describe the strength and direction of the relationship between the measured and predicted IRI values. The corresponding correlation result is shown in
Figure 14, where it can be seen that the prediction error is much lower at low IRI values, which confirms the statement that the size of the prediction error is related to the level of the index.
According to the results shown in
Table 3, the LSTM neural network shows better performance in estimating future IRI values for the right lane (L2) of section A compared to the linear functions. As with the linear prediction methods, the error is approximately the same for the three linear models. This is due to the fact that, in both ridge and lasso regression, the optimal value of the regularization parameter “alpha” is 10
−12, which is practically equal to zero. As a result, the regularization terms added to the error function in ridge and lasso regression are almost eliminated, resulting in error values that are essentially the same as those of linear regression. The LSTM neural network proved to be the most effective prediction method in this analysis, the performance of which is shown in the graph in
Figure 15, with similar comments as for
Figure 14.
To summarize, the analysis results for section A underline the superiority of the LSTM network in predicting the future roughness of road pavements.
4.2. Results for Section B
The results of the mean prediction errors resulting from the analyses in section B are summarized in
Table 4.
The results of the analysis for the left lane (L1) of section B show that the MLP and LSTM neural networks perform better than the linear functions in predicting the future values of the IRI indices. Moreover, similar to the second experimental analysis of section A, the error of the three linear functions is approximately equal, since for the ridge and lasso regression functions, the optimal “alpha” parameter is equal to 10
−12, which is practically zero. As a result, the parameters added to the error formula of the ridge and lasso regression functions (almost) cancel out, so that the error is equal to that of the linear regression. The LSTM neural network proved to be the optimal prediction method in this experimental analysis, the result of which is shown in the diagram in
Figure 16. Again, it can be seen that the prediction error is much lower at low IRI values.
Furthermore,
Table 4 shows that the LSTM neural network performs better than the linear regression models in predicting future values of the IRI indices for the right lane (L2) of section B. In line with the results of the previous analyses, the prediction errors of the three linear models remain almost identical, which is mainly due to the extremely low value of the regularization parameter “alpha”. This minimal value effectively cancels out the additional regularization terms in the ridge and lasso regression models, making their performance comparable to that of standard linear regression. In the final experimental analysis, the LSTM network was also identified as the most effective prediction method, the prediction results of which are shown in the diagram in
Figure 17, with similar comments as for
Figure 14,
Figure 15 and
Figure 16.
The analysis results for section B confirm the observations from the results for section A and underline the generalizability of the LSTM neural network in predicting the future roughness of a road pavement when using an LSTM model trained on data from a different but sufficiently similar road section.
4.3. Comparative Analysis of Errors
For a better understanding of the sources of error in the predictions, a comparison of the results for the two lanes in each section follows.
Figure 18 and
Figure 19 show a comparison of the errors in prediction of the IRI values for the two lanes in sections A and B, respectively.
Comparative analysis for section A shows that the prediction errors for the right lane (L2) are generally higher than for the left lane (L1). The deviation is about 0.07 for both the linear regression models and the MLP network. Similarly, the error associated with the LSTM network increases by about 0.03 in the second analysis. However, this difference in error between the two experimental phases is not considered significant, indicating that the prediction results in both analyses are quite close to the actual values. Comparable observations are also made in the second comparative analysis for section B.
On the other hand, the observed error differences can be attributed to the relatively higher IRI index values, which occur more frequently for measurements for the left lane (L1). It can also be noted that the error of the MLP network is slightly higher than that of the linear models, although the difference is not significant. This is probably due to the strong correlation between the input parameters used in the experimental analyses, i.e., pavement thickness and traffic data, and the IRI indices. As a result, the errors of the linear models are not significantly different from those of the MLP network.
In any case, the LSTM neural network consistently delivers lower prediction errors than the other methods in each experimental phase. This better performance is due to the LSTM model’s ability to capture temporal dependencies, which makes it more effective at predicting future IRI values. To illustrate the performance results,
Table 5 compares the results of this study with those reported in the recent literature on IRI prediction, highlighting the differences in input variables, modelling approaches, and accuracy metrics obtained. It is abundantly clear that the errors generated with LSTM modelling in the current study are significantly lower than the corresponding errors in the international literature.
Based on these results, it makes sense to investigate in the future whether similar modelling approaches can be used to predict the development of other pavement properties such as surface texture, skid resistance, and cracking. Although ANN models have demonstrated high predictive power, their black-box nature may limit their practical application. The integration of interpretation tools such as SHAP or sensitivity analysis can help to show the influence of input variables on predictions, increasing transparency and supporting more informed pavement management decisions. In any case, artificial intelligence and neural networks are a valuable tool for engineers, as decision making in a Pavement Management System (PMS) is based on synthesis of all the parameters involved in pavement condition assessment.
5. Conclusions
In this study, the ability of artificial intelligence and neural networks to predict pavement roughness in the form of IRI was investigated. Based on the results of the overall analysis, Long Short-Term Memory (LSTM) was identified as the most effective method for predicting pavement roughness among the investigated techniques.
Predicting the roughness of a road section can also be performed with a neural network trained on a different section, provided that both sections have similar pavement structures and traffic data, as well as comparable weather and environmental conditions. In cases where the input data is directly related to the target prediction, as is the case in this study, where the pavement thickness and traffic data are strongly related to prediction of IRI values, linear functions seem to provide similar results to a Multilayer Perception (MLP) neural network.
However, a major limitation when using the IRI over a longer period of time is that the maintenance history is not taken into account, which can significantly influence IRI values. Furthermore, the IRI is influenced by several additional factors, such as the type of asphalt mix, road classification, environmental conditions, and pavement construction, all of which can affect long-term performance. It is true that this study focused on a rather limited type of data, and therefore the results cannot be generalized to predict IRI conditions for all pavement types or wider networks. For structurally different or more diverse road sections, retraining of the model with relevant data would be required to ensure reliable predictions.
As with any experimental approach in future prediction, it is important to realize that the predictions made by neural networks are not entirely reliable. Pavement roughness changes over time and is influenced by fluctuations in traffic loads. To improve the accuracy of the neural network training and minimize the prediction error, it is therefore advantageous to collect as much historical data as possible for the road under investigation.
To summarize, the ability to predict future pavement roughness for a given road makes neural networks a valuable tool for tackling engineering challenges, such as the timely planning of maintenance and interventions, which ultimately aim to reduce both response time and overall costs.