Asphalt Pavement Damage Detection through Deep Learning Technique and Cost-Effective Equipment: A Case Study in Urban Roads Crossed by Tramway Lines

: Asphalt pavements are subject to regular inspection and maintenance activities over time. Many techniques have been suggested to evaluate pavement surface conditions, but most of these are either labour-intensive tasks or require costly instruments. This article describes a robust intelligent pavement distress inspection system that uses cost-effective equipment and the ‘you only look once’ detection algorithm (YOLOv3). A dataset for ﬂexible pavement distress detection with around 13,135 images and 30,989 bounding boxes of damage was used during the neural network training, calibration, and validation phases. During the testing phase, the model achieved a mean average precision of up to 80%, depending on the type of pavement distress. The performance metrics (loss, precision, recall, and RMSE) that were applied to estimate the object detection accuracy demonstrate that the technique can distinguish between different types of asphalt pavement damage with remarkable accuracy and precision. Moreover, the confusion matrix obtained in the validation process shows a distress classiﬁcation sensitivity of up to 98.7%. The suggested technique was successfully implemented in an inspection car. Measurements conducted on urban roads crossed by tramway lines in the city of Palermo proved the real-time ability and great efﬁcacy of the detection system, with potentially remarkable advances in asphalt pavement examination efﬁcacy due to the high rates of correct distress detection.


Introduction
Most state road operators supervise road pavement distress to support their asset management systems.An efficient pavement management system (PMS) necessitates the integration of modules for pavement inspection, condition assessment, condition prediction, optimisation, and decision-making regarding maintenance actions.The initial stage involves determining pavement conditions, which can be performed using a range of techniques, from manual to fully automated, to reduce subjectivity and increase efficiency.The implementation of an efficient pavement management system (PMS) requires accurate pavement inspections [1].Correctly identifying pavement distress is of huge importance for maintaining high levels of safety and performance for road transportation systems.Road pavement degradation is predominantly due to light and heavy vehicle traffic, weather conditions, and sunlight.Pavements can be classified into four main categories according to the materials used, namely asphalt (also known as flexible pavement), concrete, gravel, brick, and block.Over 90% of the total European road network has a flexible pavement [2].Flexible pavements are composed of different layers: surface, binder, base, and subbase courses.Distresses can be grouped into two different types (i.e., cracking and non-cracking), as shown in Figure 1 [3,4].
Infrastructures 2024, 9, x FOR PEER REVIEW 2 of 21 binder, base, and subbase courses.Distresses can be grouped into two different types (i.e., cracking and non-cracking), as shown in Figure 1 [3,4].Other classifications can be found in the literature.For instance, according to [5], distress can be categorised into six groups.The shapes of some common distresses are depicted in Figure 2. In addition, cracks can be subdivided based on the crack width (Table 1) [6].
The quality of road pavement can be assessed by applying several standards, as follows [2] (Qureshi et al., 2023): Other classifications can be found in the literature.For instance, according to [5], distress can be categorised into six groups.The shapes of some common distresses are depicted in Figure 2. In addition, cracks can be subdivided based on the crack width (Table 1) [6].Other classifications can be found in the literature.For instance, according to [5], distress can be categorised into six groups.The shapes of some common distresses are depicted in Figure 2. In addition, cracks can be subdivided based on the crack width (Table 1) [6].
Table 1.Examples of distress severity (adapted from [6]).The quality of road pavement can be assessed by applying several standards, as follows [2] (Qureshi et al., 2023):  The quality of road pavement can be assessed by applying several standards, as follows [2] (Qureshi et al., 2023):
Damage to road pavement mainly results from wear and tear, defects in the materials, or issues during the construction phases.
The pavement distress survey provides a helpful basis for suggesting corrective actions that allow for the scheduling of effective maintenance interventions with clear financial benefits for the road operator [7].Traditional methods for pavement distress identification are complex and ineffective at dealing with large quantities of images to be inspected.Numerous automated procedures have been developed to identify pavement distress.However, currently, the most widely used and most reliable technique for evaluating flexible pavement distress requires manual or semi-automated data collection by specialised technicians [7], which is time-consuming and labour-intensive [6].
Although some phases of pavement inspection have been moderately automated through image and video data acquisition, the following main problems persist: Damage detection is still a difficult activity [7].Current automated systems are often expensive to acquire and operate, and they are not simple to use [8].Consequently, distress is generally detected through visual inspections or manual measurement instruments [5].Manual techniques aim to identify and classify pavement cracks on the basis of shape, dimensions, and other parameters.Manual procedures have several restrictions, such as modest precision, subjectivity, and inconsistency in analysis outcomes.
To overcome these and other limitations, several automated criteria have been proposed, starting with the evidence that applications of artificial intelligence (AI) are becoming common in transportation and pavement engineering.In particular, deep learning (DL) algorithms were applied in previous studies [9][10][11][12].Recently, deep learning has shown several potential applications in many real-world domains [13], including target detection in pavement engineering.In deep learning, a computer model learns how to perform classification tasks based on different types of information, such as texts, sounds, or images.The models used in deep learning are built from a large amount of labelled data and neural network structures that contain different layers.The word 'deep' refers to the number of hidden layers in the neural network.The assessment of pavement conditions can be conducted through the utilisation of deep learning models, which can analyse pavement images and videos to evaluate the state of the pavement, encompassing such aspects as cracks, potholes, and other defects.This appears to be a beneficial approach for prioritising maintenance and repair endeavours.
Although PMS was originally designed for motorway and highway management, it can also be adapted for urban roads.Artificial intelligence techniques to identify pavement distress are increasingly used to solve the inadequacies of manual techniques (Figure 3).AI detects pavement defects quickly and in real-time; it is more efficient than manual methods, handles large-scale tasks with many defects, improves the accuracy of detection, and has high efficiency and strong scalability.On the other hand, AI detection methods can be expensive in their initial implementation, require technical expertise to be operated and maintained, may need frequent updates to be adapted to new pavement distress, and entail a risk of error if not properly trained or in unexpected conditions.
In the present article, we describe a deep learning-based technique for monitoring asphalt pavement health and decreasing the global time necessary for pavement evaluation.The detection of pavement damage is obtained using the YOLOv3 algorithm.The most recent editions of the YOLO family are YOLOv5 and YOLOv8.YOLOv8 was released in 2023.It uses a deep convolutional neural network (CNN) architecture like its predecessors but with some variations, comprising a new backbone architecture called CSPNet, a new neck architecture termed FPN+PAN, a new head architecture termed PANet, and a new training procedure.Because of its recent release, there are only a few YOLOv8 applications in the field of pavement engineering.YOLOv3 possesses numerous advantages and disadvantages.One considerable benefit is its swiftness, thus preserving the same detection speed as in the YOLO family.Another advantage is its ability to detect small objects, which has become better than the previous versions.Nevertheless, YOLOv3 has its own limitations.One is its difficulty in handling variations in scale, particularly when confronted with exceedingly minuscule or colossal objects.In short, YOLOv3 ensures fast speed and high detection accuracy for small entities, yet it may find it difficult to deal with scale variations and certain specialised detection tasks.These characteristics allow YOLOv3 to be applied successfully for object detection.Overall, its balance between speed and precision makes it the obvious choice for various applications.Consequently, the YOLOv3 algorithm was used for detecting pavement damages in this research.
The present article explores case studies of flexible pavements on urban roads crossed by tramway lines.The proposed technique requires a simple vehicle-mounted camera system.
The experiments proved that a simple video recording device, together with the use of a deep learning-based approach, can successfully detect several types of pavement damage.The main results also showed that the proposed technique is both affordable and accurate.
The article is structured as follows: Section 2 explains the deep learning algorithms applied here.Section 3 briefly describes the known datasets for road pavement damage detection and, more specifically, those used in this research, as well as the required survey equipment.Section 4 explains the neural network training and the main outcomes in terms of loss, precision, recall, and RMSE.Section 5 illustrates the case study, results, and discussion.The main achievements, challenges, contributions, and limitations are summarised in Section 6.

Algorithms for Crack Detection
In recent times, deep learning (DL) has awakened great interest in several fields of highway and pavement engineering.In this regard, Mohan and Poobal [14] reviewed 50 scientific articles and collected several procedures for automated distress detection through image processing.Two-stage object detection algorithms (namely convolutional YOLOv3 possesses numerous advantages and disadvantages.One considerable benefit is its swiftness, thus preserving the same detection speed as in the YOLO family.Another advantage is its ability to detect small objects, which has become better than the previous versions.Nevertheless, YOLOv3 has its own limitations.One is its difficulty in handling variations in scale, particularly when confronted with exceedingly minuscule or colossal objects.In short, YOLOv3 ensures fast speed and high detection accuracy for small entities, yet it may find it difficult to deal with scale variations and certain specialised detection tasks.These characteristics allow YOLOv3 to be applied successfully for object detection.Overall, its balance between speed and precision makes it the obvious choice for various applications.Consequently, the YOLOv3 algorithm was used for detecting pavement damages in this research.
The present article explores case studies of flexible pavements on urban roads crossed by tramway lines.The proposed technique requires a simple vehicle-mounted camera system.
The experiments proved that a simple video recording device, together with the use of a deep learning-based approach, can successfully detect several types of pavement damage.The main results also showed that the proposed technique is both affordable and accurate.
The article is structured as follows: Section 2 explains the deep learning algorithms applied here.Section 3 briefly describes the known datasets for road pavement damage detection and, more specifically, those used in this research, as well as the required survey equipment.Section 4 explains the neural network training and the main outcomes in terms of loss, precision, recall, and RMSE.Section 5 illustrates the case study, results, and discussion.The main achievements, challenges, contributions, and limitations are summarised in Section 6.

Algorithms for Crack Detection
In recent times, deep learning (DL) has awakened great interest in several fields of highway and pavement engineering.In this regard, Mohan and Poobal [14] reviewed 50 scientific articles and collected several procedures for automated distress detection through image processing.Two-stage object detection algorithms (namely convolutional neural networks (CNNs)) have proven excellent performance on segmenting pavement cracks, but the calculation time is unreasonable for real-time applications [14].
The YOLO ('you only look once') algorithm was created by Redmon et al. [19].It can classify and locate objects in only one step.YOLOv2 is an enhanced edition of the original YOLO model.It implements the concept of anchor priors in single-shot multibox detectors (SSD).YOLOv2 markedly differs in the lower number of layers, with only 19 convolution layers composed of 3 × 3 and 1 × 1 filters.
neural networks (CNNs)) have proven excellent performance on segmenting pavement cracks, but the calculation time is unreasonable for real-time applications [14].
Such a problem can be solved by adopting faster 'one-stage' (Figure 4) object detection [15,16].One-stage models include the single-shot multibox detector (SSD) [17], the retinanet [18], and 'you only look once' (YOLO).Figures 5 and 6 show the YOLOv3 architecture.For more details about this algorithm, the interested reader may consult [22,23].Figures 5 and 6 show the YOLOv3 architecture.For more details about this algorithm, the interested reader may consult [22,23].The image of interest is partitioned into S × S grids; each grid determines whether the centre of the focused object is located within it.
The grid evaluates B bounding boxes and the confidence of each box, i.e., the C(Ob), as follows [22]: P(Ob) = 1 (no target in the cell); P(Ob) = 0 (there are targets in the cell); (2) where IoU truth pred indicates the intersection over the union (Figure 7).The image of interest is partitioned into S × S grids; each grid determines whether the centre of the focused object is located within it.
The grid evaluates B bounding boxes and the confidence of each box, i.e., the C(Ob), as follows [22]: indicates the intersection over the union (Figure 7).The following variables are also calculated:

Detector loss function
The loss function comprises the following components [25]:  Classification loss:


Confidence loss [26]: When an object is not detected, it results in the following [25]: The following variables are also calculated: (x, y): position of the centre of the bounding box; (w, h): height and width of the bounding box; P(Class i |Ob): probability that the centre of the i-th object falls into the grid.

Detector Loss Function
The loss function comprises the following components [25]: Localisation loss [26]: Confidence loss [26]: When an object is not detected, it results in the following [25]: Therefore, the final loss is as follows: Finally, the location prediction is calculated as follows (Figure 8) [27]:

Performance metrics
The performance of the model in crack detection and classification can be calculated using the following parameters [29,30]: Symbol definitions are summarised in Table 2 [29].

Performance Metrics
The performance of the model in crack detection and classification can be calculated using the following parameters [29,30]: Symbol definitions are summarised in Table 2 [29].Finally, RMSE is obtained with the following relationship: RMSE allows the error to be assessed between the ground-truth distress numbers y i and the predicted distress numbers f i .

Survey Equipment
Nowadays, a lot of data sources for flexible pavement distress are openly available to be used, and the most popular among them are [23,[31][32][33][34][35]] (Table 3): CFD dataset, AigleRN dataset, CRACK500 dataset, GAPs dataset, CrackTree200 dataset, Road Damage dataset 2018, and Road Damage dataset 2019.The model was trained on the Road Damage dataset.Its initial version was made available in 2018, while the most recent version was released in 2019 [33].Compared to the first version, in the Road Damage dataset 2019, the total number of annotated images increased from 9053 to 13,135 and the number of annotations increased from 15,435 to 30,989.Table 4 lists the damage categories, their definitions, and their class names.Therefore, the present study takes 13,135 images of pavement cracks into consideration; the ratio of training, validation, and testing sets was set at 7:1.5:1.5 (i.e., 9195, 1970, and 1970 images, respectively).

Survey Vehicle
In the experimental phase, the input data for pavement distress detection and classification are videos of flexible pavements on urban roads in the city of Palermo.The videos are obtained through a camera installed on the rear windscreen of a car (Figure 9) by means of a gripper suction device.In order to reduce the motion blur due to vehicle speed and road conditions, the equipment was only experimented with at speeds v ≤ 50 km/h (the maximum speed limit of urban roads in Palermo).In addition, a proper camera angle of around forty-five degrees with respect to the road pavement surface was adopted.The detected images were 640 × 480 pixels in size, which corresponds to a ground truth measurement area of about 1080 mm × 1447 mm.The accuracy of crack detection was ensured by a strong camera calibration performed as shown in Figure 10.More details and an example of this process can be found in [23].
Infrastructures 2024, 9, x FOR PEER REVIEW 10 of 21 are obtained through a camera installed on the rear windscreen of a car (Figure 9) by means of a gripper suction device.In order to reduce the motion blur due to vehicle speed and road conditions, the equipment was only experimented with at speeds v ≤ 50 km/h (the maximum speed limit of urban roads in Palermo).In addition, a proper camera angle of around forty-five degrees with respect to the road pavement surface was adopted.The detected images were 640 × 480 pixels in size, which corresponds to a ground truth measurement area of about 1080 mm × 1447 mm.The accuracy of crack detection was ensured by a strong camera calibration performed as shown in Figure 10.More details and an example of this process can be found in [23].The first step of this research was the camera calibration obtained from Zhang's algorithm [35].The calibration was then performed by using several images of a chessboard in the outdoor environment (Figure 10).The extrinsic parameters estimated by the calibration process are depicted in Figure 11.Finally, the model was validated by comparing the real and predicted distances of some objects [36].Figure 12 shows an example of distress annotation [34].The first step of this research was the camera calibration obtained from Zhang's algorithm [35].The calibration was then performed by using several images of a chessboard in the outdoor environment (Figure 10).The extrinsic parameters estimated by the calibration process are depicted in Figure 11.Finally, the model was validated by comparing the real and predicted distances of some objects [36].Figure 12 shows an example of distress annotation [34].The first step of this research was the camera calibration obtained from Zhang's algorithm [35].The calibration was then performed by using several images of a chessboard in the outdoor environment (Figure 10).The extrinsic parameters estimated by the calibration process are depicted in Figure 11.Finally, the model was validated by comparing the real and predicted distances of some objects [36].Figure 12 shows an example of distress annotation [34].

Neural Network Training
The crucial elements for an accurate model are the image characteristics, in terms of quantity and quality, which are selected for the training.In this study, the Road Damage dataset 2019 was used (Table 4).Therefore, the pre-trained model was able to identify such types of distress in the pavement surface as those given in Table 4.
The initial learning rate was fixed to 0.001, and the 'rmsprop' algorithm was adopted.RMSprop is a popular optimisation algorithm used in machine learning.It is designed to improve the speed of convergence and find the minimum of the loss function quickly.In addition, the following parameters were fixed: minimum batch size 8, total epochs 20 (38,840 iterations).For the training dataset, Figure 13a summarises the total number of bounding boxes for each damage class and the total images in which a given damage was detected.As expected, the bounding boxes have relatively small variance in the pixel di-mensions for the damage types ID00, ID10, ID43, and ID44 and high variations for ID20 and ID40, as shown in Figure 13b.

Neural Network Training
The crucial elements for an accurate model are the image characteristics, in terms of quantity and quality, which are selected for the training.In this study, the Road Damage dataset 2019 was used (Table 4).Therefore, the pre-trained model was able to identify such types of distress in the pavement surface as those given in Table 4.
The initial learning rate was fixed to 0.001, and the 'rmsprop' algorithm was adopted.RMSprop is a popular optimisation algorithm used in machine learning.It is designed to improve the speed of convergence and find the minimum of the loss function quickly.In addition, the following parameters were fixed: minimum batch size 8, total epochs 20 (38,840 iterations).For the training dataset, Figure 13a summarises the total number of bounding boxes for each damage class and the total images in which a given damage was detected.As expected, the bounding boxes have relatively small variance in the pixel dimensions for the damage types ID00, ID10, ID43, and ID44 and high variations for ID20 and ID40, as shown in Figure 13b.addition, the following parameters were fixed: minimum batch size 8, total epochs 20 (38,840 iterations).For the training dataset, Figure 13a summarises the total number of bounding boxes for each damage class and the total images in which a given damage was detected.As expected, the bounding boxes have relatively small variance in the pixel dimensions for the damage types ID00, ID10, ID43, and ID44 and high variations for ID20 and ID40, as shown in Figure 13b.For a more in-depth evaluation of the flexible pavement distress the was confusion for demonstration the matrix, the columns show the predicted values, while the rows show the actual values.The cell where the row and column for a certain pavement distress class intersect indicates  For a more in-depth of the flexible pavement distress classification results, the confusion matrix was calculated (Figure 18).In deep learning applications, the confusion matrix is used for a demonstration of the classification model's performance.In this matrix, the columns show the predicted values, while the rows show the actual values.The cell where the row and column for a certain pavement distress class intersect indicates the true positive values for that class.We can observe that the pavement distress class with the lowest sensitivity in the validation data is D43, with a 86.8% sensitivity.On the contrary, the class with the highest sensitivity is D20, with a 98.7% sensitivity.

Distress Tracking and Surface Evaluation
The correspondence between a point in an image and its projection on a 2D image is determined with the use of a geometric model.The real dimension of several distresses can be estimated by the inverse perspective mapping (IPM) method [35][36][37].The IPM method eliminates the perspective effect in images by converting them to a bird's-eye view.This method corrects image distortion caused by tilt through a mathematical transformation deriving from the vanishing point, image plane, and slope.Multiple transformed images are stitched together to create a panoramic image.Thanks to the IPM, a top-down view of the damage to road pavements can be obtained (Figure 19) by means of the following equations:

Distress Tracking and Surface Evaluation
The correspondence between a point in an image and its projection on a 2D image is determined with the use of a geometric model.The real dimension of several distresses can be estimated by the inverse perspective mapping (IPM) method [35][36][37].The IPM method eliminates the perspective effect in images by converting them to a bird's-eye view.This method corrects image distortion caused by tilt through a mathematical transformation deriving from the vanishing point, image plane, and slope.Multiple transformed images are stitched together to create a panoramic image.Thanks to the IPM, a top-down view of the damage to road pavements can be obtained (Figure 19) by means of the following equations: where the projection on the pavement surface of the generic point is denoted with i P = {u, v, 1, 1} and the point placed on the road pavement surface is denoted with g P = {x g , y g , −h, 1}.A bird's-eye view allows us to project the coordinates of each distress from the input image onto the pavement surface and then determine the information of interest (e.g., the length or surface of each distress type).Therefore, the proposed technique can automatically detect asphalt pavement distress coordinates starting from video recording, in that it implements a specific procedure for tracking pavement damage present in subsequent frames.The proposed tracking algorithm is divided into the phases illustrated in Figure 20.During detection, the noise was reduced by applying the linear Kalman filter.The use of this filter is necessary to estimate the coordinates of the points on the perimeter of any damage present on the road pavement.It is a recursive filter [38] that estimates the state of a dynamic system of relationships [39,40] as follows:

Distress Tracking and Surface Evaluation
The correspondence between a point in an image and its projection on a 2D image is determined with the use of a geometric model.The real dimension of several distresses can be estimated by the inverse perspective mapping (IPM) method [35][36][37].The IPM method eliminates the perspective effect in images by converting them to a bird's-eye view.This method corrects image distortion caused by tilt through a mathematical transformation deriving from the vanishing point, image plane, and slope.Multiple transformed images are stitched together to create a panoramic image.Thanks to the IPM, a top-down view of the damage to road pavements can be obtained (Figure 19) by means of the following equations: T=h Figure 19.The IPM method used in this research.

The Case Study: Results and Discussions
The case study concerns several urban road sections in Palermo.We selected only roads crossed by tramway lines.The tramway transportation system of the city was opened on 30 Dec 2015 and comprises four lines for a total of 23.3 km.According to Bieberc's classifications [45,46], some sections of these tramway lines belong to class E (common corridor) and others to class B (exclusive protected corridor).Figure 21 depicts the tramway track details for both ballasted tracks and slab tracks.Figure 22 shows the construction phases of the tramway slab track in Palermo.Finally, Figures 23 and 24   Taking the error covariance [40,41] into consideration, as follows: where x n denotes the state value at phase n, A n denotes the state transition matrix, and u n is the measurement and the input at phase n.Q n is the white noise covariance [41].This is the 'prediction step' because it estimates the n + 1 state.Kalman gain value is obtained with the following relation [41]: in which C denotes the measurement matrix and R is the measurement noise.
The actual measurement value is: P n = (I−K n H)P n (21) where K n and H are the measurement value and the mapping matrix from the true state, respectively.
The combination of the IPM and tracking algorithm procedures allows us to determine the type and area of the surface for each damage.These data are essential for estimating numerous performance factors (e.g., PCI, RQI, RDI, and SRI [42-44]).

The Case Study: Results and Discussions
The case study concerns several urban road sections in Palermo.We selected only roads crossed by tramway lines.The tramway transportation system of the city was opened on 30 Dec 2015 and comprises four lines for a total of 23.3 km.According to Bieberc's classifications [45,46], some sections of these tramway lines belong to class E (common corridor) and others to class B (exclusive protected corridor).Figure 21 depicts the tramway track details for both ballasted tracks and slab tracks.Figure 22 shows the construction phases of the tramway slab track in Palermo.Finally, Figures 23 and 24

The Case Study: Results and Discussions
The case study concerns several urban road sections in Palermo.We selected only roads crossed by tramway lines.The tramway transportation system of the city was opened on 30 Dec 2015 and comprises four lines for a total of 23.3 km.According to Bieberc's classifications [45,46], some sections of these tramway lines belong to class E (common corridor) and others to class B (exclusive protected corridor).Figure 21 depicts the tramway track details for both ballasted tracks and slab tracks.Figure 22 shows the construction phases of the tramway slab track in Palermo.Finally, Figures 23 and 24

The Case Study: Results and Discussions
The case study concerns several urban road sections in Palermo.We selected only roads crossed by tramway lines.The tramway transportation system of the city was opened on 30 Dec 2015 and comprises four lines for a total of 23.3 km.According to Bieberc's classifications [45,46], some sections of these tramway lines belong to class E (common corridor) and others to class B (exclusive protected corridor).Figure 21 depicts the tramway track details for both ballasted tracks and slab tracks.Figure 22 shows the construction phases of the tramway slab track in Palermo.Finally, Figures 23 and 24      In this research, the proposed technique was used for evaluating distress in lateral lanes adjacent to the tramway lines [47].
The inspection videos were obtained by the rear camera outside the test vehicle (Figure 9).As demonstrated before, the loss and the RMSE values obtained in the training phase of the neural network prove that the model can detect flexible pavement distresses with high precision [48][49][50].Some test outcomes are shown in Figure 25.The proposed method can accurately detect, classify, and measure different pavement cracks and other damage types in the recorded videos.In this research, the proposed technique was used for evaluating distress in lateral lanes adjacent to the tramway lines [47].The procedure described in the previous sections was applied to carry out an error analysis in the case study.Numerous images extracted from 10 video clips were analysed, and damages were hand-labelled.
A total of 459 labelled frames contain 1378 boundary boxes of damages.The ID50 class is not taken into consideration because it refers to manholes (cf.Table 4), which are not considered damage.A comparison was made between the observed and real distress.The outcomes summarised in Figure 26 demonstrate the correct detection rate to range from 91% to 98% in distress detection.
Therefore, the empirical analyses demonstrate that the algorithm used is sufficiently precise in detecting and measuring pavement damages.This is true, although some errors are inevitable in the detection phase, especially due to the irregularities in the pavement, The procedure described in the previous sections was applied to carry out an error analysis in the case study.Numerous images extracted from 10 video clips were analysed, and damages were hand-labelled.
A total of 459 labelled frames contain 1378 boundary boxes of damages.The ID50 class is not taken into consideration because it refers to manholes (cf.Table 4), which are not considered damage.A comparison was made between the observed and real distress.in and measuring pavement true, although some errors are inevitable in the detection phase, especially due to the irregularities in the pavement, which generate oscillations in the vehicle and therefore in the camera [51][52][53][54].
To summarise, the practical importance of the proposed approach is related to the fact that the identification of damages helps the city administration or the road operator in the decision-making process to choose the appropriate technology for pavement detection and repair [55].On the other hand, automated pavement monitoring makes it possible to choose in advance an environmentally friendly technology (in terms of LCA) for the maintenance phase, resulting in a significant social benefit.

Conclusions
The accurate detection and classification of road pavement distress need to acquire three-dimensional depth pavement pictures, but this process requires expensive dedicated survey vehicles, sensors, and other devices.Therefore, alternative techniques should be considered to analyse road pavement carefully and at a low cost.Therefore, the empirical analyses demonstrate that the algorithm used is sufficiently precise in detecting and measuring pavement damages.This is true, although some errors are inevitable in the detection phase, especially due to the irregularities in the pavement, which generate oscillations in the vehicle and therefore in the camera [51][52][53][54].
To summarise, the practical importance of the proposed approach is related to the fact that the timely identification of damages helps the city administration or the road operator in the decision-making process to choose the appropriate technology for pavement detection and repair [55].On the other hand, automated pavement monitoring makes it possible to choose in advance an environmentally friendly technology (in terms of LCA) for the maintenance phase, resulting in a significant social benefit.

Conclusions
The accurate detection and classification of road pavement distress need to acquire three-dimensional depth pavement pictures, but this process requires expensive dedicated survey vehicles, sensors, and other devices.Therefore, alternative techniques should be considered to analyse road pavement carefully and at a low cost.
This research presents cost-effective equipment for asphalt pavement surface condition evaluation that has been applied for assessing urban road surface condition.The potential benefits of this equipment were evaluated by means of experiments on several urban road sections in the city of Palermo (Italy).Such road sections crossed by tramway lines were analysed because of their numerous distress types, extensions, and spreading.The procedure is founded on examining videos of asphalt pavement taken by a test car equipped with a rear camera and then applying the YOLOv3 algorithm for the detection and measurement of different damage categories.A dataset with about 13,135 images and 30,989 bounding boxes was used.Damages were classified into seven different types.The outcomes of the analyses prove that the pre-trained network can classify distress into several categories, i.e., longitudinal, lateral, and alligator cracks, rutting, bumps, potholes, etc.The loss and the RMSE guarantee that the algorithm can detect damages with very good accuracy and speed.In addition, as shown by the confusion matrix calculated during the validation process, the pavement distress classification sensitivity reaches up to 98.7%.
The experiments showed that a simple videorecording device, together with the implementation of a deep learning-based procedure, can successfully assess pavement quality, detect several damage types, and meet the real-time requirements.
Even though the method needs to be validated with additional experiments in other road conditions, sample data from 10 video clips from this research proves that the correct detection rate ranges from 91% to 98%.
However, there are still limitations that require future studies for the following main reasons: − Due to vehicle vibrations and visibility conditions, the algorithm is unable to identify some pavement distress.− Use of a public dataset.− In this research, YOLOv3 was applied even though it is a less-performing version of the YOLO family (e.g., YOLOv5 and YOLOv8).Despite this choice, it is possible to obtain excellent results in both the detection and classification of road surface damage using low-cost detection devices.Therefore, the application deep learning algorithms in pavement engineering even and cost-effective dedicated to application a generation of YOLO algorithms (i.e., and more expensive and higher-resolution cameras.Finally, it is worth highlighting that with the rapid development of autonomous vehicles, which will be equipped with numerous cameras and other sensors, the proposed technique will likely provide details about the state of flexible pavements and useful information for operators.

Figure 3 .
Figure 3. Image analysis method.In the present article, we describe a deep learning-based technique for monitoring asphalt pavement health and decreasing the global time necessary for pavement evaluation.The detection of pavement damage is obtained using the YOLOv3 algorithm.The most recent editions of the YOLO family are YOLOv5 and YOLOv8.YOLOv8 was released in 2023.It uses a deep convolutional neural network (CNN) architecture like its predecessors but with some variations, comprising a new backbone architecture called CSPNet, a new neck architecture termed FPN+PAN, a new head architecture termed PANet, and a new training procedure.Because of its recent release, there are only a few YOLOv8 applications in the field of pavement engineering.YOLOv3 possesses numerous advantages and disadvantages.One considerable benefit is its swiftness, thus preserving the same detection speed as in the YOLO family.Another advantage is its ability to detect small objects, which has become better than the previous versions.Nevertheless, YOLOv3 has its own limitations.One is its difficulty in handling variations in scale, particularly when confronted with exceedingly minuscule or colossal objects.In short, YOLOv3 ensures fast speed and high detection accuracy for small entities, yet it may find it difficult to deal with scale variations and certain specialised detection tasks.These characteristics allow YOLOv3 to be applied successfully for object detection.Overall, its balance between speed and precision makes it the obvious choice for various applications.Consequently, the YOLOv3 algorithm was used for detecting pavement damages in this research.The present article explores case studies of flexible pavements on urban roads crossed by tramway lines.The proposed technique requires a simple vehicle-mounted camera system.The experiments proved that a simple video recording device, together with the use of a deep learning-based approach, can successfully detect several types of pavement damage.The main results also showed that the proposed technique is both affordable and accurate.The article is structured as follows: Section 2 explains the deep learning algorithms applied here.Section 3 briefly describes the known datasets for road pavement damage detection and, more specifically, those used in this research, as well as the required survey equipment.Section 4 explains the neural network training and the main outcomes in terms of loss, precision, recall, and RMSE.Section 5 illustrates the case study, results, and discussion.The main achievements, challenges, contributions, and limitations are summarised in Section 6.

Figure 4 .
Figure 4. One-stage and two-stage pavement distress detection flows: (a) two-stage distress detection; (b) one-stage distress detection.The YOLO ('you only look once') algorithm was created by Redmon et al. [19].It can classify and locate objects in only one step.YOLOv2 is an enhanced edition of the original YOLO model.It implements the concept of anchor priors in single-shot multibox detectors (SSD).YOLOv2 markedly differs in the lower number of layers, with only 19 convolution layers composed of 3 × 3 and 1 × 1 filters.YOLOv3 uses the new Darknet-53 architecture based on successive 3 × 3 and 1 × 1 filters and a residual block inspired by ResNet [20,21].It showed considerable progress in detecting small objects in real-time.YOLOv8 is the last variant of the YOLO family, first released in May 2023.It uses a deep convolutional neural network (CNN) architecture like its predecessors but with some changes, including a new backbone architecture called CSPNet, a new neck architecture called FPN+PAN, a new head architecture called PANet, and a new training procedure.Despite being launched recently, YOLOv8 has not yet been much used in the field of pavement engineering.Therefore, this research has applied only the YOLOv3 algorithm to detect pavement damages.Figures5 and 6show the YOLOv3 architecture.For more details about this algorithm, the interested reader may consult[22,23].

Figure 4 .
Figure 4. One-stage and two-stage pavement distress detection flows: (a) two-stage distress detection; (b) one-stage distress detection.YOLOv3 uses the new Darknet-53 architecture based on successive 3 × 3 and 1 × 1 filters and a residual block inspired by ResNet [20,21].It showed considerable progress in detecting small objects in real-time.YOLOv8 is the last variant of the YOLO family, first released in May 2023.It uses a deep convolutional neural network (CNN) architecture like its predecessors but with some changes, including a new backbone architecture called CSPNet, a new neck architecture called FPN+PAN, a new head architecture called PANet, and a new training procedure.Despite being launched recently, YOLOv8 has not yet been much used in the field of pavement engineering.Therefore, this research has applied only the YOLOv3 algorithm to detect pavement damages.Figures5 and 6show the YOLOv3 architecture.For more details about this algorithm, the interested reader may consult[22,23].
) b w = p w e t w b h = p h e t h (10) Infrastructures 2024, 9, x FOR PEER REVIEW 8 of 21

Figure 10 .
Figure 10.Some phases of camera calibration.

Figure 10 .
Figure 10.Some phases of camera calibration.

Figure 10 .
Figure 10.Some phases of camera calibration.

Figure 12 .
Figure 12.An example of distress annotation.

Figures 14 -
17 represent the base learning rate, loss, RMSE, and precision-recall curves as a function of the iterations obtained in the training process.It is immediately clear how the model can detect flexible pavement damages accurately once around 8000 iterations are reached.The precision-recall curves demonstrate that the average precision ranges from 0.5 to 0.8, depending on the distress type (Figure17).

Figure 13 .
Figure 13.(a) Sample of images used for different types of damage; (b) object size variance across classes.Figures 14, 15, 16 and 17 represent the base learning rate, loss, RMSE, and precisionrecall curves as a function of the iterations obtained in the training process.It is immediately clear how the model can detect flexible pavement damages accurately once around 8000 iterations are reached.The precision-recall curves demonstrate that the average precision ranges from 0.5 to 0.8, depending on the distress type (Figure 17).

Figure 13 . 21 Figure 14 .
Figure 13.(a) Sample of images used for different types of damage; (b) object size variance across classes.Infrastructures 2024, 9, x FOR PEER REVIEW 12 of 21

Figure 15 .
Figure 15.Loss related to the number of iterations.

Figure 15 .
Figure 15.Loss related to the number of iterations.

Figure 15 .
Figure 15.Loss related to the number of iterations.

Figure 16 .
Figure 16.RMSE related to the number of iterations.

Infrastructures 2024, 9 , 21 Figure 18 .
Figure 18.Confusion matrix (zero values omitted for clarity).The diagonal shows the true positive values for each class (i.e., those labelled and classified as that class).

Figure 19 .
Figure 19.The IPM method used in this research.

Figure 18 .
Figure 18.Confusion matrix (zero values omitted for clarity).The diagonal shows the true positive values for each class (i.e., those labelled and classified as that class).

Figure 18 .
Figure 18.matrix (zero values omitted for clarity).The diagonal shows the true positive values for each class (i.e., those labelled and classified as that class).

Figure 19 .
Figure 19.The IPM method used in this research.
illustrate a few images of some tramway lines in operation.
illustrate a few images of some tramway lines in operation.
illustrate a few images of some tramway lines in operation.

Figure 21 .Figure 22 . 21 Figure 22 .
Figure 21.Tramway track details: (a) ballasted track and (b) slab track.(a) (b) (c) (d) Figure 22.Construction phases of the tramway track on the analysed urban roads.(a) casting of a lean concrete layer and curb construction; (b) laying of steel reinforcement bars; laying of the flat framework; (d) casting of the concrete slab and paving.Infrastructures 2024, 9, x FOR PEER REVIEW

Figure 23 .
Photos of the analysed road pavement (via Antonio Laudicina)-class E tramway line.

Figure 23 .
Figure 23.Photos of the analysed road pavement (via Antonio Laudicina)-class E tramway line.

Figure 24 .
Figure 24.Photos of the analysed road pavement (via Carlo Gulì)-class B tramway line.

Figure 23 .
Figure 23.Photos of the analysed road pavement (via Antonio Laudicina)-class E tramway line.

Figure 24 .
Figure 24.Photos of the analysed road pavement (via Carlo Gulì)-class B tramway line.
The inspection videos were obtained by the rear camera outside the test vehicle (Figure 9).As demonstrated before, the loss and the RMSE values obtained in the training phase of the neural network prove that the model can detect flexible pavement distresses with high precision [48-50].Some test outcomes are shown in Figure 25.The proposed method can accurately detect, classify, and measure different pavement cracks and other damage types in the recorded videos.Infrastructures 2024, 9, x FOR PEER REVIEW 18 of 21

Figure 25 .
Figure 25.Examples of damage detection and surface estimation.

Figure 25 .
Figure 25.Examples of damage detection and surface estimation.

Figure 26 .
Figure 26.The correct detection rate for a sample of ten clips.

Figure 26 .
Figure 26.The correct detection rate for a sample of ten clips.

Table 3 .
Main properties of some public pavement distress datasets.

Table 4 .
Classes of pavement distress.