Automated Road Defect and Anomaly Detection for Traffic Safety: A Systematic Review

Recently, there has been a substantial increase in the development of sensor technology. As enabling factors, computer vision (CV) combined with sensor technology have made progress in applications intended to mitigate high rates of fatalities and the costs of traffic-related injuries. Although past surveys and applications of CV have focused on subareas of road hazards, there is yet to be one comprehensive and evidence-based systematic review that investigates CV applications for Automated Road Defect and Anomaly Detection (ARDAD). To present ARDAD’s state-of-the-art, this systematic review is focused on determining the research gaps, challenges, and future implications from selected papers (N = 116) between 2000 and 2023, relying primarily on Scopus and Litmaps services. The survey presents a selection of artefacts, including the most popular open-access datasets (D = 18), research and technology trends that with reported performance can help accelerate the application of rapidly advancing sensor technology in ARDAD and CV. The produced survey artefacts can assist the scientific community in further improving traffic conditions and safety.


Introduction
Traffic accidents caused by road surface defects or unwanted objects lead to deaths, injuries and billions of dollars in property damage [1][2][3][4]. According to Justo-Silva and Ferreira [4], over 1.25 million lives are lost, and 20 to 50 million people are injured annually in traffic accidents worldwide. Moreover, highway accidents are predicted to be the fifthhighest cause of mortality by 2030. A 2019 survey based on approximately 166 countries by Chen et al. [5] estimated that road injuries would cost the world economy USD 1.8 trillion from 2015 to 2030, equivalent to a 0.12% annual tax on the global gross domestic product. Mohammed et al. [6] found that road accidents are now one of the top three causes of predicted deaths, posing a global threat to lives and economies. Among the multiple causes of crashes reported by the American Association of State and Highway Transportation Officials (AASHTO) [7], roadway factors such as road defects and anomalies account for approximately 34% [4].
The scientific community's aim to help reduce road accidents by detecting surface defects and predicting anomalies has existed since the advent of high-speed roads. A positive shift in momentum started with the advancements of sensor technology and the application of computer vision (CV) combined with soft-computing approaches such as machine learning (ML) and deep learning (DL) for adaptive automated road defect and anomaly detection (ARDAD) systems. As a consumer-grade example, modern mobile phones are equipped with features such as inertial sensors, high-speed video, and other sensors such as light detection and ranging (LiDAR).
phones are equipped with features such as inertial sensors, high-speed video, and other sensors such as light detection and ranging (LiDAR).
The first contribution of this systematic review is the discovery of an upward trend in surveillance automation since 2000, with a correlation between the scientific community's growing interest and technological advancement.
As a second contribution, our systematic review uniquely combines all ARDAD methods and focuses on traffic safety impacted by various on-road hazards ( Figure 1). Overview of automated anomaly/defect detection process. This approach distinguishes our review from others in the field and provides a comprehensive analysis of the current state-of-the-art ARDAD systems, making it a valuable resource for researchers and professionals working in the field of traffic safety.

Background
Road surfaces are constructed using different materials, which degrade over time due to wear, environmental effects, or external factors. Figure 2 provides a generally established unifying process of automated anomaly/defect detection. To ensure safety and

Background
Road surfaces are constructed using different materials, which degrade over time due to wear, environmental effects, or external factors. Figure 2 provides a generally established unifying process of automated anomaly/defect detection. To ensure safety and maintain infrastructural integrity, various types of structural damage ( Figure 3) must be regularly monitored and addressed to determine the underlying causes. Structural damage caused by poor construction techniques or external factors may take the form of potholes, cracks (due to thermal action), debonding, stripping, ravelling, bleeding, shrinkage of road layers, and swelling [20,21].
Potholes, for example, are random excavations caused by wear and tear on the affected section of the road. If not attended to in time, they can cause further damage by collecting water, which accelerates wear and tear [22]. According to Staniek [23], road surface cracks in sections of roads supported by pillars can lead to regions falling off, posing a significant risk to human life and vehicles. The debonding process caused by the loss of strength in the adhesive used in road construction leads to structural degradation on roads [24]. The structural degradation identified as stripping is caused by the loss of bonds between solid aggregations of road construction material [25]. Stripping begins from the bottom layers of the roads and progresses upward, causing significant damage to the road surfaces. Ravelling of road surface happens when stripping starts on the upper layers and goes downward [26]. Road surface bleeding is another form of structural degradation on roads, which occurs when asphalt rises from the lower concrete layers to the surface layer of the road, leading to a shiny surface. The leading cause of bleeding on-road is hot weather, poor-quality asphalt, and low space air void content. Timely structural damage detection on roads supports taking necessary measures to repair or rebuild the damaged structures [27]. Regular assessments help to uphold motorists' safety and save taxpayer money [14,28].
Scholars classify anomaly types into contextual, point, and collective anomalies [18]. Contextual anomalies are out-of-place objects such as fallen-off road cones [29] or animals on the road [30]. Point anomalies on the road refer to specific locations where unusual events or incidents occur, such as potholes or traffic accidents. Collective anomalies, on the other hand, refer to broader patterns or trends in road data that deviate significantly from the norm, such as a sudden increase in traffic volume or a rise in the number of vehicle breakdowns. Common anomalies include unsecured objects and debris that fly out of vehicles involved in accidents [31], small obstacles often overlooked such as speed bumps [32] or abnormalities in road terrain overlay, affecting self-driving cars [33,34]. Figure 1 illustrates a collage of on-road hazards from around the globe that ARDAD systems can help to mitigate.
CV-based ARDAD systems mostly employ data-driven ML algorithms that are trained on captured data samples representing normal behaviour and the abnormal behaviour and characteristics of the surveillance scene. The process typically uses supervised, semisupervised, or unsupervised learning [35]. In other words, the ARDAD methods use visual observation that depends on the surveillance scene's behaviour and characteristics. Hence, ML algorithms' performance also depends on data supplied for training.
As the survey's third contribution, we summarise the most popular publicly available datasets.
Due to the dynamic nature of road surveillance, ARDAD systems require expert feedback in the form of expert labelling or categorising of data into finite sets, such as roadside anomalies and defects (Figure 1), which could also result in re-training the model with an updated dataset. Supervised, semi-supervised or unsupervised learning are typically used in training such ARDAD frameworks [35]. Figure 2 illustrates the standard methodology for ARDAD system training and operations. Overview of automated anomaly/defect detection process (derived from [36]).
Over the last few decades, the emergence of DL has brought the End-to-End (E2E) learning approach to the forefront of anomaly and defect detection modelling. Traditional ML models often rely on domain knowledge or domain experts to design or improve data pre-processing and feature extraction algorithms. E2E learning, on the other hand, reduces this dependency on expert knowledge and simplifies the process of extracting features or analysing discriminative properties from input data. Instead, the focus is on the input, such as an image vector, and the intended classification result from the system output [37]. In E2E learning, the model learns to extract invariant road features, recognise anomalies, or extract different surface textures in defect recognition.
As the fourth contribution, the systematic review reports on the popular machine and DL approaches and their performance applied to ARDAD systems.

Motivation and Contribution
The motivation for this systematic review lies in the understanding that road defects and anomalies significantly impact traffic safety and the overall economy. In this systematic review, studies from 2000 to 2023 are selected to capture the evolution of ARDAD methods and technologies over the past two decades. The selected time frame covers crucial developments, including a mathematical morphological method at the turn of the millennium [38], automated anomaly detection a decade later [39], and sophisticated surveillance techniques employing UAV swarms by 2023 [40].
Identifying road defects and anomalies helps reduce drivers' risks while supporting road maintenance [12]. ARDAD systems can play a significant role in augmenting visual surveillance to safeguard the public and private transportation of modern cities roads [41], sub-urban and rural roads [42,43], animal hazard-prone hinterlands such as wilderness roads [30], and avalanche-prone mountainous roads [44]. This systematic review is the first in which the authors summarise the performance and accuracy of hazard detection systems used in road infrastructure surveillance achieved globally. The review proposes perspectives on existing technology, explores anomaly detection methods of the past three decades, and presents examples of anomalies and methods applied to detect and predict various static/dynamic anomalies and defects ( Figure 3). Overview of automated anomaly/defect detection process (derived from [36]).
Over the last few decades, the emergence of DL has brought the End-to-End (E2E) learning approach to the forefront of anomaly and defect detection modelling. Traditional ML models often rely on domain knowledge or domain experts to design or improve data pre-processing and feature extraction algorithms. E2E learning, on the other hand, reduces this dependency on expert knowledge and simplifies the process of extracting features or analysing discriminative properties from input data. Instead, the focus is on the input, such as an image vector, and the intended classification result from the system output [37]. In E2E learning, the model learns to extract invariant road features, recognise anomalies, or extract different surface textures in defect recognition.
As the fourth contribution, the systematic review reports on the popular machine and DL approaches and their performance applied to ARDAD systems.

Motivation and Contribution
The motivation for this systematic review lies in the understanding that road defects and anomalies significantly impact traffic safety and the overall economy. In this systematic review, studies from 2000 to 2023 are selected to capture the evolution of ARDAD methods and technologies over the past two decades. The selected time frame covers crucial developments, including a mathematical morphological method at the turn of the millennium [38], automated anomaly detection a decade later [39], and sophisticated surveillance techniques employing UAV swarms by 2023 [40].
Identifying road defects and anomalies helps reduce drivers' risks while supporting road maintenance [12]. ARDAD systems can play a significant role in augmenting visual surveillance to safeguard the public and private transportation of modern cities roads [41], sub-urban and rural roads [42,43], animal hazard-prone hinterlands such as wilderness roads [30], and avalanche-prone mountainous roads [44]. This systematic review is the first in which the authors summarise the performance and accuracy of hazard detection systems used in road infrastructure surveillance achieved globally. The review proposes perspectives on existing technology, explores anomaly detection methods of the past three decades, and presents examples of anomalies and methods applied to detect and predict various static/dynamic anomalies and defects ( Figure 3).  The survey analysed various processes based on environmental representation, features, approaches, and ML models. The systematic review's contributions are listed as follows: • Selection criteria and resulting review of globally relevant articles uniquely combining automated road defects and anomalies (ARDAD) peer-reviewed research since 2000. • Discovery of the upward and exponentially growing trend of ARDAD surveillance automation since 2000.

•
Taxonomy of machine and DL approaches combined with CV, including data acquisition technology and algorithms.

•
List of popular and current open access ARDAD datasets.

•
Critical analysis of the current state-of-the-art ARDAD methods to highlight the shortcomings that could be addressed in future research, including increasing environmental awareness of connected/self-driving cars.

•
Compliance list adopted from the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) (http://prisma-statement.org, accessed on 20 December 2022) and applied to the ARDAD research context.
The introduction section of this systematic review discusses the significance of ARDAD methods and their development over the past two decades. Emphasis is placed on the role of sensor technology, computer vision, and ML techniques in enhancing traffic safety. The growing trend in surveillance automation is highlighted as a premise for the upcoming sections focusing on the systematic review approach, dataset analysis, and a critical analysis of ARDAD methods. The survey analysed various processes based on environmental representation, features, approaches, and ML models. The systematic review's contributions are listed as follows: • Selection criteria and resulting review of globally relevant articles uniquely combining automated road defects and anomalies (ARDAD) peer-reviewed research since 2000. • Discovery of the upward and exponentially growing trend of ARDAD surveillance automation since 2000. • Taxonomy of machine and DL approaches combined with CV, including data acquisition technology and algorithms.

•
List of popular and current open access ARDAD datasets.

•
Critical analysis of the current state-of-the-art ARDAD methods to highlight the shortcomings that could be addressed in future research, including increasing environmental awareness of connected/self-driving cars. • Compliance list adopted from the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) (http://prisma-statement.org, accessed on 20 December 2022) and applied to the ARDAD research context.
The introduction section of this systematic review discusses the significance of ARDAD methods and their development over the past two decades. Emphasis is placed on the role of sensor technology, computer vision, and ML techniques in enhancing traffic safety. The growing trend in surveillance automation is highlighted as a premise for the upcoming sections focusing on the systematic review approach, dataset analysis, and a critical analysis of ARDAD methods.

Research Questions and Review Approach
According to the systematic literature review guidelines [45][46][47], screening on-road anomalies and defects addresses a problem that can be prevented by detection, leading to the genesis of screening or intervention-type research questions. Furthermore, since the problem's solution also depends on early problem detection, the research questions address the "preventive screening" problem. The research questions' scope should be balanced so as not to be too specific or too broad. A well-formulated question determines (a) the criteria used to select studies, (b) the development of the search strategy; (c) the data to be extracted. The research questions answered by the systematic review are as follows: • What are the best ML methods for improving classification performance and creating a robust detection and alert system? • What implications does the up-to-date research have on motorists' safety and future applications to related contexts, such as improving the environmental awareness of connected/self-driving cars?
The review process draws on empirical evidence from previous experiments, data collection, and studies. Figure 4 illustrates (a) literature review types and (b) how a systematic review of similar studies uses specific methods to identify, select, appraise, and synthesise the results.

Research Questions and Review Approach
According to the systematic literature review guidelines [45][46][47], screening on-road anomalies and defects addresses a problem that can be prevented by detection, leading to the genesis of screening or intervention-type research questions. Furthermore, since the problem's solution also depends on early problem detection, the research questions address the "preventive screening" problem. The research questions' scope should be balanced so as not to be too specific or too broad. A well-formulated question determines (a) the criteria used to select studies, (b) the development of the search strategy; (c) the data to be extracted. The research questions answered by the systematic review are as follows: • What are the best ML methods for improving classification performance and creating a robust detection and alert system? • What implications does the up-to-date research have on motorists' safety and future applications to related contexts, such as improving the environmental awareness of connected/self-driving cars?
The review process draws on empirical evidence from previous experiments, data collection, and studies. Figure 4 illustrates (a) literature review types and (b) how a systematic review of similar studies uses specific methods to identify, select, appraise, and synthesise the results.  [46,47]).

Data Gathering and Inclusion-Exclusion Criteria
The leading search engines used during data-gathering are Scopus and Google Scholar, linked by journal article search and subscription-based access from Auckland University of Technology's (AUT) library ( Table 1). The Boolean search (1) for article selection includes default settings for analysing titles, keywords, and abstracts.  [46,47]).

Data Gathering and Inclusion-Exclusion Criteria
The leading search engines used during data-gathering are Scopus and Google Scholar, linked by journal article search and subscription-based access from Auckland University of Technology's (AUT) library ( Table 1). The Boolean search (1) for article selection includes default settings for analysing titles, keywords, and abstracts. The criteria (Table 1) are defined and aligned with the research focus. The articles were selected according to Equation (1). Once the refinement process was completed, a total of 195 articles were excluded, and from the selected papers, 48 deal with structural damage detection, another 47 deal with anomaly detection and the remaining 21 surveys. In Figure 5, the scatter-plot distribution shows the relationship of the number of articles reviewed on the various types of anomalies from January 2000 to May 2023 (i.e., date of publication).
A comprehensive selection process was conducted to identify relevant papers for further review within the scope of this study. Out of the initial pool of 311 papers, a total of 116 papers were chosen, which included 21 surveys and reviews ( Figure 5). The remaining six papers consisted of reports, citations to research tools, or other types of valuable evidence that supported the review process. The criteria for advancing papers to the subsequent review stage were established based on the predefined guidelines outlined in Table 1.
Analysing the scatterplot data, we can observe approximate 18-month gaps between peaks and a notable increase in publications during 2020, followed by a decline in subsequent years. However, it is vital to provide a more insightful interpretation considering the impact of pandemic-induced lockdowns during 2020-2022, which resulted in reduced traffic, data collection, and occurrences of road damage. Further research is needed to gain deeper insights into the underlying factors driving these trends in ARDAD. Analysing the scatterplot data, we can observe approximate 18-month gaps between peaks and a notable increase in publications during 2020, followed by a decline in subsequent years. However, it is vital to provide a more insightful interpretation considering the impact of pandemic-induced lockdowns during 2020-2022, which resulted in reduced traffic, data collection, and occurrences of road damage. Further research is needed to gain deeper insights into the underlying factors driving these trends in ARDAD.

Datasets Reviewed
The research community can benefit from appropriate datasets when evaluating their road anomaly and defect detection models. Table 2 briefly describes prevalent open-access datasets for on-road anomalies and defect detection, which have been experimented on in the related literature. The datasets cover on-road hazards from simple potholes to more complex tunnels, concrete bridge defects, to avalanche debris flow affecting motorists' safety. The motivation for assembling this diverse dataset is to provide unrestricted access without login requirements or paywall barriers. This section reports a selection of singlepoint, one-click access routes to frequently downloaded open access datasets for the research community (Table 2). Our goal is to promote inclusivity and remove possible discrimination, ensuring that researchers from all backgrounds can contribute to and benefit from the advancements in the field of ARDAD. To verify unrestricted, all-inclusive

Datasets Reviewed
The research community can benefit from appropriate datasets when evaluating their road anomaly and defect detection models. Table 2 briefly describes prevalent open-access datasets for on-road anomalies and defect detection, which have been experimented on in the related literature. The datasets cover on-road hazards from simple potholes to more complex tunnels, concrete bridge defects, to avalanche debris flow affecting motorists' safety. The motivation for assembling this diverse dataset is to provide unrestricted access without login requirements or paywall barriers. This section reports a selection of single-point, one-click access routes to frequently downloaded open access datasets for the research community (Table 2). Our goal is to promote inclusivity and remove possible discrimination, ensuring that researchers from all backgrounds can contribute to and benefit from the advancements in the field of ARDAD. To verify unrestricted, all-inclusive access and to promote privacy, we have tested the dataset access to ensure that all data are freely accessible without needing a login or being restricted by paywalls.    In summary, Table 2 provides an exhaustive review of 18 selected open access datasets pertinent to anomaly and defect detection, including potholes, debris flow, animals on the road, tunnel defects, and concrete bridge defects. As added parameters, each dataset is characterised by its purpose, configuration, origin or citation, strengths, limitations, and usage statistics.
Apart from the datasets provided in Table 2, the systematic review inspected datasets used by the studies that are not open access. This leads to identifying datasets available upon request or needing a paid subscription. For instance, the research on pavement crack detection [48] makes the CFD dataset, Crack500 dataset, and a customised dataset called CrackSC available on request. In another study [49], a wide variety of road obstacle datasets are available on request. The road anomaly detection study [50] provides multiple datasets; however, login access is needed to download them. The research on the Adaboost algorithm for pavement distress detection [51] provides access to the dataset through the journal's website for readers with paper access. The research on thermal image analysis for defect detection [52] provided the dataset upon request.

Literature Review
The review summarises perspectives on existing detection technologies and presents examples of methods developed since 2000 for applications to detect and predict static/ dynamic anomalies and defects. Unlike the subjective nature of topic-oriented narrative literature reviews, the systematic literature review approach represents an opportunity for repeatable article selection and synthesis of follow-up reviews. As no detection system can be applied globally, Figure 3 illustrates a review-based breakdown of on-road hazards to motorist safety.
Image processing-based ARDAD systems play a significant role in enhancing traffic safety through visual surveillance [53]. Images of road sections are taken and analysed to detect structural variations and anomalies from time to time. In addition to image processing, CV combines artificial intelligence (AI) approaches to derive meaningful information from images and videos [54]. When merged with Global Positioning Systems (GPS), telescopes, binoculars, closed-circuit television (CCTV), vehicle-mounted video recorders and cameras, and low-cost mobile cameras, image processing-based visual surveillance can significantly increase the efficiency of ARDAD systems [18,40,[55][56][57]. Maya et al. [58] proposed a delayed long short-term memory (dLSTM)-based technique that is trained in a normal state and predicts abnormalities depending on the m-score defined in Equation (2). Here, the m-score is the normalised anomaly score R(t) within the abnormal state, and if in a dataset, the T 2 anomaly occurs at time t 1 , the resultant m-score value is as follows: Based on expert feedback, the anomaly is detected if the m-score is above the set threshold. The method is reported to be flexible when combined with other anomaly prediction models. The U.S. Department of Transportation (USDOT) devised a convention to rank road surface distress [59], as shown in Equation (3). The pavement condition index (PCI) is generated using a weighted sum of surface condition rating (SCR) and roughness condition index (RCI).
Structural damage on roads is caused by thermal action, external conditions and physical strain exerted by vehicles. Various anomalies and surface defects such as cracks and potholes are caused mainly by deformities such as debonding, stripping, ravelling, bleeding, shrinkage of road layers and swelling of road layers [60,61]. A wide range of datasets are produced to derive meaning from images of surfaces with such defects and identify the underlying conditions of the roads. According to Bhatt et al. [10], three distress categories are used to classify anomalies: cracking, visco-plastic deformations, and surface defects.
Thus, the current research mainly focuses on surface damage detection, anomaly detection, analysis and prediction using computer vision in association with traditional ML and DL technologies. The literature review of surveys between 2000 and 2023 shows that most of these can be classified into road surface defects or on-road anomalies (Table 3). Anomaly detection, analysis and prediction [12,[16][17][18]20,37,[63][64][65][66][67][68][69][70] Image processing based on traditional methods (statistical and classical ML) has been used to analyse road sections' images to detect defects [10]. Examples of applied methods in image processing include logical and linear regression [60], naïve Bayes [71], support vector machine (SVM) [57], random forest (RF) [61] and more. Statistical and traditional ML-based image processing might be inefficient due to known difficulties in handling noise in previously unseen images, complex textures in different backgrounds or variations in lighting conditions of surfaces. The shortcomings of statistical and traditional ML motivated researchers to investigate new approaches. Other anomaly and defect detection development suggests the following three main approaches: feature extraction-based image processing using DL, ML and ensemble learning models [72,73]. Different works have also used 3D imaging and LiDAR-based anomaly and defect detection methods [19].

ML-Based ARDAD
Li et al. [74] developed the defects detection and localisation network (DDLNet), a vision-based method for detecting, classifying, and geolocating defects using regiongrowing, edge detection, and threshold segmentation techniques. The DDLNet achieved 80.7% detection and 86% localisation accuracy. Cha et al. [75] proposed the utilisation of traditional Canny and Sobel edge detection methods, achieving an impressive accuracy of 98% in detecting block edges, edge cracks, and longitudinal and transverse cracks. For visco-plastic deformations, edge detection can efficiently detect pothole edges [76], depressions, stripping, and ravelling with high accuracy of 99.11% [77]. In some cases, edge detection is achieved using Prewitt, Canny, and Sobel operators [78]. Each operator impacts the detected edges within an image differently based on each operator's ability. Chatterjee and Saeedfar [79] proposed an improved Canny edge detection method that incorporates genetic algorithms and enhances the blurred edges using the Mallat Wavelet transform with an average detection accuracy of 91%. Vigneshwar et al. [80] proposed the binary conversion of greyscale images for anomaly detection. They set a threshold, compared pixels for background or target area identification, and used threshold segmentation, edge detection, and K-Means clustering for crack and defect detection with an average accuracy of 80.60%, 90.19%, and 82.47%, respectively. Table 4 presents a comprehensive overview of recent studies employing ML algorithms to detect road anomalies, showcasing a range of accuracy rates between 86.3% and 97.8%. While these studies significantly contribute to the advancement of traffic safety, autonomous driving, and urban planning, critical evaluation reveals limitations. These limitations include concerns about data accuracy, constrained feature applicability, and challenges in domain adaptation, which warrant further investigation and development.  The AdaBoost algorithm, proposed by Wang et al. [51], utilises supervised data for detecting surface defects such as ravelling and bleeding. The algorithm consists of a decision tree with elements categorised as root, leaf, and decision nodes. The collected data are passed through the root node and classified at each layer of the decision tree until it cannot be further classified. The sample data are divided into subsets for precise and optimal classification results. Each subset of the training data is assigned a leaf node, which should also have an associated class. Fan et al. [84] proposed three different decision trees in the AdaBoost algorithm for detecting road surface defects. Among these, the C4.5 decision tree continually prunes leaf nodes and adopts the root node as the new leaf node. The CART decision tree's pruning process, unlike that of C4.5, uses a verification data set to prevent data overfitting. The ID3 decision tree calculates the maximum gain of all the sample value data and assigns features to nodes. The recursive generation of decision trees occurs using these features as leaf nodes [85].
Feature extraction-based ML methods are considered advantageous for their simplicity, according to Avci et al. [13]. By performing feature extraction and classification, these techniques can be made more generic and effective in detecting structural damage. Hoang and Nguyen have developed various ML methods to detect different classes of static anomalies and structural damages within roads [86] that include support vector machines (SVM), random forest (RF), and artificial neural networks (ANN) using both labelled and unlabeled datasets. The supervised approaches relying on labelled training data in road anomaly detection are naïve Bayesian, RF, SVM, ANN, and logistic regression. Artificial neural networks and RF facilitate efficient visco-plastic deformations and cracking defect detection. Fakhri and Saadatseresht [87] proposed a model based on the random-forest supervised data model to detect cracks whilst overcoming the challenge of uneven edges of the cracks and cracks existing in complex topologies. Table 5 highlights the evolution of ARDAD methods, demonstrating a shift from traditional ML and statistical techniques to DL approaches. While both categories have contributed to automating defect detection and enhancing road safety, they exhibit limitations such as dependency on image quality and environmental conditions.  In detecting on-road anomalies, unsupervised learning models hold potential advantages as they do not rely on labelled data for sample classification, unlike supervised learning models, which depend on subjective human input [93,94]. As a result, the output of unsupervised learning models is not predetermined, allowing computers to independently discern anomalies in the data through classification processes [95]. Ishtiak et al. [43] proposed a system for identifying and categorising various road conditions, including visco-plastic deformities and defects. This approach uses a statistical analysis method and a scoring function considering several factors, such as road colour, material, and image quality. Despite the model's high accuracy, ranging from 77% to 89% across diverse road conditions, it has shown limitations in distinguishing shadows from road anomalies and analysing roads with water on the surface. Chatterjee et al. [79] proposed a machinelearning approach for crack detection, relying on feature extraction from image superpixels. The approach involves extracting 40 features, including variance, skewness, six Grey Level Co-occurrence Matrix (GLCM) features, and 32 Variance-of-Gabor (VoG) features. The study compared four classifiers, with gradient boosting (GB) being the most accurate at 92.77%, followed by random forest (RF), artificial neural network (ANN), and linear support vector machine (L-SVM). Naddaf-Sh et al. [96] proposed a novel model for detecting visco-plastic deformations and cracks, leveraging a multivariate statistical hypothesis and a minimum intensity path window for anomaly extraction. Despite a competitive F1 score of 56%, increased inference time during real-time prediction and transferred augmentation policies might hinder the model's performance. Mahadevan et al. [97] proposed a model that detects abnormalities in crowded scenes by considering temporal and spatial normalcy using a mixture of dynamic textures. The algorithms tested in the study show varying performance (25% to 42%) regarding an equal error rate and anomaly localisation, with MDT outperforming the others with a detection rate of 45%. Table 6 presents diverse, evolving methodologies in road defect detection, from simple image processing to sophisticated deep-learning models. These research efforts have led to accuracy rates ranging from 54% to over 99%, indicating a promising trend in the field. These studies collectively demonstrate evolving methodologies, from simple image processing to sophisticated deep-learning models.   Table 6 provides a comprehensive overview of road defect detection and classification research, offering a roadmap for further advancements towards safer and more efficient transportation systems.
A computer vision-based approach by Cha et al. [75] summarised that DL, as a powerful approach for object detection, image segmentation, and classification, has been used to detect anomalies and defects such as cracks, surface defects, visco-plastic deformations, and traffic anomalies. As a case in point, their CNN-based approach achieved accuracies of 98.22% out of 32K images and 97.95% out of 8K images in training and validation, respectively. The proposed CNN method showed very robust performance compared to traditional edge detection methods. Opara et al. [71] proposed a DL approach involving binary and multi-class classifications to detect anomalies in the RGB images (2400 × 2000 pixels) with a high F1 value of approximately 60% at 18,000 iterations. The study utilised a loss function that included terms for localisation, confidence, and classification errors to detect objects more accurately and effectively. Non-maximum suppression was applied to select the appropriate bounding box from the many predictions. On the other hand, multi-class classification is suggested for analysing road sections with multiple anomalies. At the same time, the authors also provided insights on performance trade-offs by adjusting hyperparameters and achieved state-of-the-art performance with an F1 score of up to 94.4% on three benchmark datasets [102].
The pixel segmentation method for pavement damage detection using a thermal-RGB fusion image-based model achieved high accuracy with a pre-trained EfficientNet B4 backbone architecture and an argument dataset with a detection accuracy of up to 98.34% [52]. To detect visco-plastic deformation, surface defects and cracks, Minhas et al. [103] proposed an efficient pixel segmentation model (F1 score 0.89) with four convolutional layers, three layers for segmenting the sample image directly connected to the input, and two for maximum pooling. To achieve optimal results, pixel segmentation models are divided into decoder and encoder layers [104]. The encoder layer is used to map the image features, while the decoder layer establishes feature vectors of images during the segmentation process. The decoder layer also develops a probability distribution of every pixel identified within the images. However, the object detection approach, which identifies and binds objects with boxes within captured images, has limited usability. Table 7 summarises various studies, including wildlife-vehicle collision analysis, pothole detection, road surface monitoring, and anomaly detection for autonomous vehicles. While the studies propose different methods ranging from traditional statistical analysis to ML and edge AI-based approaches, each method has limitations, including limited data, reliance on manual labelling, lack of road roughness estimation, and potential false positives. Nonetheless, these studies demonstrate the potential for technology to improve road safety and maintenance.  The selection of the appropriate neural network for a given problem depends on various factors, including the complexity of the intended solution, computing resources, and data availability [70]. Traditional ML methods can be advantageous when the dataset is small or limited, but their performance may plateau with more data. In contrast, deep neural networks tend to perform better with a large amount of data, enabling the identification of subtle dependencies through more dense layers. Oliveira and Correia [98] have reported that less sophisticated traditional machine-learning methods can be effective in the case of small datasets, particularly in dynamic anomaly detection systems. However, the performance of deep neural networks can improve with more data and complex architectures [37].
Neural networks' complexity increases with the need to process large amounts of data. Shallow neural networks typically have fewer layers and may not use backpropagation algorithms. However, deep neural networks usually perform better with enough data and sufficient computing resources than the traditional approach. However, according to Cui et al. [107], traditional machine-learning methods, such as support vector machines, usually perform better at anomaly detection and generally require fewer computing resources for data processing.
Due to their high efficiency in local filtering, noise detection, and overall transforming domain and non-local mean filters, convolutional neural networks (CNNs) have been increasingly used in anomaly detection and denoising images from sections containing anomalies [108]. Akagic et al. [109] proposed a two-step CNN model for road anomaly detection. Different images are fed into 32 by 32 CNN layers during the first step to train the model. Greyscaling is performed, followed by thresholding to detect identifiable anomalies within the image. According to Chambon and Moliard [28], CNN datasets are trained using various data such as different target anomaly types, road width, weather and lighting condition patterns, condition of the road surface, and the height of elevated road supports such as pillars of natural elevators. In another study by Li et al. [74], a Deep Dual Localisation Network (DDLNet) is proposed for defects detection and geolocalisation in a unified model. The model combines a novel defects RPN and a NetVLAD module for detection and geolocalisation. The authors also propose a novel data augmentation method and hard negative mining strategy to improve detection accuracy and reduce the possibility of triggering false alarms.
The twice-threshold segmentation method demonstrates higher accuracy of up to 98% in detecting cracks in runway images containing road markings, outperforming traditional threshold segmentation algorithms such as Otsu (40%) while maintaining adaptability for various applications [110]. Amhaz and Chambon [100] proposed the Minimum Path Selection (MPS) algorithm for crack detection with a Dice Similarity Coefficient (DSC) of 0.77 on 2D pavement images. However, further advancements in computation time and adaptability to 3D imaging systems are necessary for broader applications. The Dijkstra algorithm is then used to estimate the minimal path between the two points, which can be manually corrected if a false minimal path intersects with the crack. The post-processing method is then applied to estimate the crack's thickness and provide the complete crack pattern. Shankar and Wang [111] proposed a Fully Convolutional Neural Network (FCNN) model for anomaly detection, while Ishtiak and Ahmed [43] utilised a two-step image classification approach in their FCNN model. The first step involved feeding road surface images into the FCNN, with the model achieving 87% accuracy for all classes. In the second step, the model was trained with threshold images to establish cutoff levels for anomalies and structural damage detection.

Ensemble Learning for Improved Anomaly and Defect Detection
Doshi and Yilmaz [112] propose ensemble learning to improve the efficiency of different ML approaches used in static anomalies and structural damage detection. The ensemble model (EM) approach uses various trained models to predict the three proposed static anomaly approaches. The EM uses a variety of trained models for static anomaly prediction. Ensemble learning improves the accuracy of training the ML models. The ensemble prediction (EP) approach utilises images generated from the test time augmentation (TTA) and ensembles the anomaly predictions derived from these images. The hybrid approach uses EM and EP models to conduct anomaly predictions. Hegde et al. [42] proposed a DL approach for road damage detection and classification using YOLO and ensemble learning, achieving an F1 score of up to 0.67, demonstrating the potential of these methods for smart city applications.
Alipour et al. [83] investigated the use of ensemble learning for crack detection and proposed a method that combines pre-trained models developed for specific types of materials. To achieve this, the softmax operator was utilised to extract the probability of each prediction, where S j (x) represents the observation probability of class j, and n_class is equal to two for the binary crack vs. non-crack problem. The proposed method leverages the knowledge stored in both material-specific models to make a single prediction for each future image regardless of the material. The softmax operator is shown in Equation (4), where the denominator's last variable l in the exponent e xl represents the class label.
A hybrid algorithm, such as the non-maximum suppression (NMS) algorithm, derives a single output from these outputs [42]. The algorithm works by filtering out the overlapping or duplicate predictions from the predictions pool. All the images captured from road surfaces are then passed through the models for state prediction by applying the NMS. In EMs, the one-stage detector models include the ultralytics-You Only Look Once (u-YOLO) model, so it is possible to combine various u-YOLO models. In order to train a u-YOLO model, different input parameters to these models are tuned [113]. Different trained models are achieved by selecting different combinations of data for tuning. A favourable subset of these models is chosen for use, although the choice is based on the available training data in such cases. All the images captured from road surfaces are then passed through the models for state prediction by applying the NMS. Ensemble learning significantly reduces the prediction variance, making the approach highly accurate. The hybrid approach applies the EP model approach to each EM model. After every test image has been transformed through TTA, each EM model is given an input of the augmented images. The models output bounding boxes once NMS is applied to derive a prediction. The corresponding structural damage or anomaly on the road section is determined from the bounding boxes.
Based on the availability of computing resources and data volume, both ML and DL have their uses and potential for future technologies (Table 8).

Detection Based on 3D Imaging Methods
Traditional anomaly detection methods have predominantly relied on 2D imaging techniques, such as Bidimensional Empirical Mode Decomposition (BEMD), used for pavement crack detection [114]. However, with the development of range-based sensors and stereo cameras, 3D imaging methods have become more efficient. In addition, 3D stereo vision is particularly effective in estimating the depths of cracks and visco-plastic deformations with a precision score of up to 90% [115]. Microsoft Kinect and laser-imaging techniques are used in traditional methods and DL neural networks as a new research direction, including CrackNet, CrackNet II, and CrackNet V. Table 9 shows that these methods have demonstrated significant potential in object recognition, pose estimation, and autonomous navigation applications. However, the accuracy and reliability of 3D imaging methods heavily depend on factors such as sensor resolution, calibration accuracy, and environmental conditions. Nonetheless, the continual development of 3D imaging technologies presents promising opportunities for enhancing the capabilities of various applications in fields such as robotics, autonomous driving, and industrial automation.
Chen et al. [30] proposed a cost-effective approach for detecting deer crossing roads using 360 • LiDAR sensors. The proposed algorithm can detect deer with a maximum radius of 37.74 m around the LiDAR sensor, which can trigger warning signs for drivers. While the method has limitations in detecting small animals and tracking individual deer, it shows promise for improving traffic safety and analysing wildlife behaviour.   Zhang et al. [116] proposed a CNN model that utilises 3D imaging methods that represent different 3D view data of images into one compact shape descriptor. Such models extract 3D data from the images and pass it to an ML model to detect anomalies and structural damages [102]. The 3D data are then used to train classifiers. The spatial information of road surfaces, such as width, length and depth, is represented by the 3D data [121]. Medina et al. [122] proposed a 3D imaging method based on laser imaging that models road surfaces using dense networks of 3D points.
Frequency analysis, mostly Fourier transformation, is applied to distinguish between the different anomalies. Akarsu et al. [101] proposed an improved Fourier transformation model such that the model takes into account non-uniform illuminations on surfaces. The method differs from the mentioned method because it utilises probabilistic relaxation and is the only effective 3D imaging method to detect road surface defects such as bleeding and ravelling. Furthermore, it also detects likely occurrences of visco-plastic deformations and cracks. Figure 6 depicts the volume of literature reviewed based on detection origins divided by the ML methods' taxonomy. Deep learning (34%) is the most popular method, followed by traditional ML (26%) and ensemble learning (26%). In comparison, 3D image-based techniques (14%) are the least represented by the reviewed ARDAD systems. Considering the timeframe of the literature reviewed, recent advancements in ML techniques may impact the overall taxonomy distribution and the road defect detection landscape. The growing popularity of DL approaches is likely due to their ability to process large datasets and automatically extract relevant features. However, traditional ML and ensemble learning methods are still widely used across ARDAD systems. However, in discussing the benefits and drawbacks of each taxonomy, it is essential to acknowledge the gaps in the literature and encourage further research to explore underrepresented ML methods or road defect types. the literature and encourage further research to explore underrepresented ML methods or road defect types.

Gaps, Challenges, and Limitations
The road surveillance research domain is highly dynamic; the road surface and supporting infrastructure defects do not appear in uniform shapes or sizes, nor do the anomalies follow a uniform pattern, which leads to multiple challenges. An example of a significant gap and future opportunity is that the current detection methods do not evaluate or provide implications on how the defect or anomaly can directly affect motorists' safety.
While the review provides insights into the state-of-the-art ARDAD methods, it has some limitations. First, the review primarily covers peer-reviewed articles written in English, which may exclude valuable information from other sources such as technical reports, some conference proceedings, commercial product documentations and patents. Second, the inclusion and exclusion criteria rely on ARDAD-associated terminology and concepts, which may not include relevant studies that use different terminology or

Gaps, Challenges, and Limitations
The road surveillance research domain is highly dynamic; the road surface and supporting infrastructure defects do not appear in uniform shapes or sizes, nor do the anomalies follow a uniform pattern, which leads to multiple challenges. An example of a significant gap and future opportunity is that the current detection methods do not evaluate or provide implications on how the defect or anomaly can directly affect motorists' safety.
While the review provides insights into the state-of-the-art ARDAD methods, it has some limitations. First, the review primarily covers peer-reviewed articles written in English, which may exclude valuable information from other sources such as technical reports, some conference proceedings, commercial product documentations and patents. Second, the inclusion and exclusion criteria rely on ARDAD-associated terminology and concepts, which may not include relevant studies that use different terminology or naming conventions. To address these limitations, future reviews could consider broadening the search criteria to include additional sources of information and exploring alternative terminologies or approaches.
Reportedly, supervised techniques usually perform better when labelled data are available [69] because using labelled data during training allows supervised learning methods to detect boundaries and classify normal or anomalous classes. However, sometimes the training data do not include all types of anomalies, which leads to supervised approaches overfitting and performing poorly on new anomaly data. Hence, the availability of labelled anomaly data (or rather lack of it) creates an opportunity for applications and advancements in semi-supervised and unsupervised ML techniques. To address this observation, within the scope of this survey, we have provided a list of frequently downloaded open access on-road anomalies and defect image datasets.
Developing a robust ARDAD system is challenging; when summing up the body of literature on the topic, the main challenges and opportunities are as follows: • Despite setting the inclusion parameters for publishing dates between 2000 and 2023, the literature search yielded only 311 papers. Due to the focused selection criteria, the systematic review included only 116 papers (Table 1). • Contrary to our expectations, the number of computer vision-based studies directly impacting motorist safety was lower than expected.
For research replication, we adapted the PRISMA (http://prisma-statement.org, accessed on 20 December 2022) checklist, which is common for systematic reviews in health science. The adapted PRISMA checklist extension is important for future systematic reviews of ARDAD (and similar CV contexts). The PRISMA checklist extension is provided in the Supplementary Materials.

Conclusions and Future Work
Motivated by the need to accelerate technological advancements that can improve traffic safety and reduce incidents, this systematic review analyses the literature on automated road defects and anomaly detection (ARDAD) systems from 2000 to 2023. As a result, the systematic review covers peer-reviewed articles (N = 116) associated with types of roadside anomalies and defects that are jointly intended to help prevent the loss of lives, injuries and infrastructure damage, ensuring on-road and structural integrity.
In the context of augmenting on-road surveillance for ease of maintenance, such as structural damage detection and hazard prevention via predictive monitoring, the review summarises the ARDAD methods, including the achieved performance using traditional ML, and DL, combined with sensor technology. Notably, it quantifies the achieved performance of these methods, providing insights into their effectiveness. Additionally, the review provides a taxonomy of ARDAD methods and descriptions, including a list of frequently downloaded open access on-road anomalies and defect image datasets (D = 18), facilitating future research and benchmarking.
Considering the current publication trends, the advancements in video technology, availability of sensors and computing resources in general, there is an exponential growth in ARDAD research publications from 2000 to the present day. As anomaly detection intersects with automatic road traffic surveillance, this survey can also be a valuable resource for interested researchers working on related contexts.
Due to the impact of the global pandemic and lockdowns from 2020 to 2022, there was less traffic and opportunities for new data collection compared to the previous years. The exponentially growing trend in the number of research publications during the period from 2015 to 2020 could be explained by earlier data collections prior to the global pandemic ( Figure 5). In the authors' view, the growing trend surrounding ARDAD technologies and research is likely to reach its peak, aligning itself with the "Innovation Trigger" stage of "Gartner's technology adoption hype cycle framework" [123]. As such, future work on ARDAD technologies is likely to consider Gartner's framework for a better understanding of a current project position on the hype cycle, to project the adaptation and maturity levels (of ARDAD technologies), to identify practical aspects of technology transfer such as self-driving vehicles and to identify the possible impact on society.
Considering the state-of-the-art ARDAD methods, we conclude that the latest IoT, 5G and 6G communication technologies, swarm drones, satellite imagery, cloud computing and GPS have the potential for near-future research and further expansion of related research contexts. The benefits of ARDAD methods to humanity include the utilisation and advancements of AI, CV, and semi and self-learning techniques to support intelligent vehicles, urban planning, intelligent transportation systems, connected or self-driving vehicles, improved road surveillance, reduced road maintenance costs, and increased traffic safety.
In order to enhance future research in the field of ARDAD systems, there is a crucial need for more comprehensive performance/meta-analyses that can evaluate the efficacy and efficiency of various ARDAD methods in real-world settings. While not a full metaanalysis, our systematic review provides a strong foundation, serving as a platform for future research. This potential conversion would enable quantitative data synthesis, further advancing our understanding of ARDAD technologies and facilitating evidence-based decision-making. Additionally, quantifying the societal and stakeholder impacts resulting from the implementation of ARDAD systems would offer valuable insights for policymakers and industry professionals.
Overall, this systematic review is a significant milestone in ARDAD systems, bridging a crucial research gap with its comprehensive analysis of traffic hazards ranging from urban cities to the wild hinterlands. Our commitment to inclusivity is evident in examining often-overlooked road hazards such as avalanches or cattle on the road, showcasing our genuine belief in uncovering hidden knowledge from future data or previously unseen or untested datasets. This systematic review establishes a foundation for future research endeavours in ARDAD systems and highlights the potential of emerging technologies to drive advancements in traffic safety and road maintenance. Our research findings inspire optimism based on emerging technologies' potential to facilitate advancements aimed at improving safety and saving lives and making a positive impact on global society.