An Exploration of Recent Intelligent Image Analysis Techniques for Visual Pavement Surface Condition Assessment

Road pavement condition assessment is essential for maintenance, asset management, and budgeting for pavement infrastructure. Countries allocate a substantial annual budget to maintain and improve local, regional, and national highways. Pavement condition is assessed by measuring several pavement characteristics such as roughness, surface skid resistance, pavement strength, deflection, and visual surface distresses. Visual inspection identifies and quantifies surface distresses, and the condition is assessed using standard rating scales. This paper critically analyzes the research trends in the academic literature, professional practices and current commercial solutions for surface condition ratings by civil authorities. We observe that various surface condition rating systems exist, and each uses its own defined subset of pavement characteristics to evaluate pavement conditions. It is noted that automated visual sensing systems using intelligent algorithms can help reduce the cost and time required for assessing the condition of pavement infrastructure, especially for local and regional road networks. However, environmental factors, pavement types, and image collection devices are significant in this domain and lead to challenging variations. Commercial solutions for automatic pavement assessment with certain limitations exist. The topic is also a focus of academic research. More recently, academic research has pivoted toward deep learning, given that image data is now available in some form. However, research to automate pavement distress assessment often focuses on the regional pavement condition assessment standard that a country or state follows. We observe that the criteria a region adopts to make the evaluation depends on factors such as pavement construction type, type of road network in the area, flow and traffic, environmental conditions, and region’s economic situation. We summarized a list of publicly available datasets for distress detection and pavement condition assessment. We listed approaches focusing on crack segmentation and methods concentrating on distress detection and identification using object detection and classification. We segregated the recent academic literature in terms of the camera’s view and the dataset used, the year and country in which the work was published, the F1 score, and the architecture type. It is observed that the literature tends to focus more on distress identification (“presence/absence” detection) but less on distress quantification, which is essential for developing approaches for automated pavement rating.


Introduction
Two vital elements of road pavement (referred to as pavements in the rest of this paper) management are inventory management and periodic condition evaluation; both are used to set future priorities for pavement construction management and maintenance. In this paper, pavement refers to hard surfaces used for motor vehicles. A complete pavement management system consists of inventory data collection (i.e., width, length, shoulder, and pavement type) and pavement characteristic assessment, i.e., (roughness (ride), surface condition (distresses), surface skid resistance, pavement strength, and deflection). The current pavement networks, including motorways across a country, are developed and modernized over centuries. The construction, width, and length of a pavement depend on the traffic it will carry and the type of connection it will make. They are classified into different categories; for example, in Ireland, they are classified as motorways, national primary, national secondary, regional roads, and local roads [1]. A common way to periodically evaluate surface condition, including distresses on a pavement network, is for the civil authority to conduct a visual surface condition assessment and a ride smoothness test. Surface condition is assessed through visual surveying and usually consists of three steps: (1) pavement condition data collection, (2) distress identification and quantification, and (3) assigning a pavement rating index to a stretch of a pavement using a standard rating scale (e.g., pavement surface evaluation rating-PASER [2]) that is typically localized to a specific geographical region [3]. Figure 1 gives a complete picture of the three-step process. The data collection is followed by distress occurrence, severity measurement, and pavement condition rating decisions. are used to set future priorities for pavement construction management and maintenance. In this paper, pavement refers to hard surfaces used for motor vehicles. A complete pavement management system consists of inventory data collection (i.e., width, length, shoulder, and pavement type) and pavement characteristic assessment, i.e., (roughness (ride), surface condition (distresses), surface skid resistance, pavement strength, and deflection). The current pavement networks, including motorways across a country, are developed and modernized over centuries. The construction, width, and length of a pavement depend on the traffic it will carry and the type of connection it will make. They are classified into different categories; for example, in Ireland, they are classified as motorways, national primary, national secondary, regional roads, and local roads [1]. A common way to periodically evaluate surface condition, including distresses on a pavement network, is for the civil authority to conduct a visual surface condition assessment and a ride smoothness test. Surface condition is assessed through visual surveying and usually consists of three steps: (1) pavement condition data collection, (2) distress identification and quantification, and (3) assigning a pavement rating index to a stretch of a pavement using a standard rating scale (e.g., pavement surface evaluation rating-PASER [2]) that is typically localized to a specific geographical region [3]. Figure 1 gives a complete picture of the three-step process. The data collection is followed by distress occurrence, severity measurement, and pavement condition rating decisions. Data collection, the first step of surface visual assessment, is usually carried out by specially adapted vehicles (or, more recently, on devices such as smartphones [4] or unmanned aerial vehicles) for visual surface surveying. The vehicle is fitted with a computer, Global Positioning System (GPS) sensor, and an imaging sensor. In step 2, pavement distresses are identified and quantified using their shape, size, and texture. Due to environmental and geographical conditions and the actual pavement construction process, pavement distresses may vary in shape, size, and texture. Variations can also be caused by different image capture technologies and the placement of sensors in specialized vehicles used to collect pavement data. In step 3, a rating is assigned to a stretch of pavement based on distress identification and quantification from step 2. A rating is applied to an initial stretch after inspection and then will be adjusted along the road if the pavement surface changes noticeably. The length of the stretch of road typically ranges from 50 m to 200 m, while the width of the stretch ranges from 4 m to the entire width of the road. The rating is performed directly by civil authority staff or subcontracted to private companies. Civil authorities use this condition rating to estimate pavement service life and treatment measures to improve the condition.
Maintenance and improvement of pavements are expensive. For example, Ireland's government spent 850 Million Euros in 2021 to improve and maintain local, regional, and national primary and secondary roads [5]. There are 5413 km of national highways (primary, secondary, and motorways), 13,124 km of regional roads, and 81,300 km of local roads in Ireland. It totaled 99,830 Km of road network in 2018 in Ireland, meaning 95% of the road network in Ireland consists of regional and local roads [5]. Moreover, it takes most of the year to complete mechanical surveys on the national highways which are only 5% of the network, therefore we need a quicker method for the other 95%.Manual rating requires cognitive skills built through extensive training and experience. It is also impos- Data collection, the first step of surface visual assessment, is usually carried out by specially adapted vehicles (or, more recently, on devices such as smartphones [4] or unmanned aerial vehicles) for visual surface surveying. The vehicle is fitted with a computer, Global Positioning System (GPS) sensor, and an imaging sensor. In step 2, pavement distresses are identified and quantified using their shape, size, and texture. Due to environmental and geographical conditions and the actual pavement construction process, pavement distresses may vary in shape, size, and texture. Variations can also be caused by different image capture technologies and the placement of sensors in specialized vehicles used to collect pavement data. In step 3, a rating is assigned to a stretch of pavement based on distress identification and quantification from step 2. A rating is applied to an initial stretch after inspection and then will be adjusted along the road if the pavement surface changes noticeably. The length of the stretch of road typically ranges from 50 m to 200 m, while the width of the stretch ranges from 4 m to the entire width of the road. The rating is performed directly by civil authority staff or subcontracted to private companies. Civil authorities use this condition rating to estimate pavement service life and treatment measures to improve the condition.
Maintenance and improvement of pavements are expensive. For example, Ireland's government spent 850 Million Euros in 2021 to improve and maintain local, regional, and national primary and secondary roads [5]. There are 5413 km of national highways (primary, secondary, and motorways), 13,124 km of regional roads, and 81,300 km of local roads in Ireland. It totaled 99,830 Km of road network in 2018 in Ireland, meaning 95% of the road network in Ireland consists of regional and local roads [5]. Moreover, it takes most of the year to complete mechanical surveys on the national highways which are only 5% of the network, therefore we need a quicker method for the other 95%. Manual rating requires cognitive skills built through extensive training and experience. It is also impossible for a manual rater to transverse the whole road network across the country in a specific time. It is a challenging process and prone to errors. To make the process faster, more economical, and reliable, researchers have investigated automated processes for pavement condition evaluation, usually based on computer vision, machine learning, and, more recently, deep learning [6][7][8][9]. In recent years, researchers have reviewed different data acquisition technologies, including 1D-sensors, 2D-sensors, and 3D sensors, to automate pavement conditions [10][11][12][13][14][15]. Commercial solutions for automatic pavement assessment with certain limitations exist; the topic is also a focus of academic research. More recently, academic research has pivoted toward deep learning, given that image data is now available in some form. However, research to automate pavement distress assessment often focuses on the regional pavement condition assessment standards the country or state follows. This paper contributes a list of significant pavement condition rating indices (segregated based on granularity and measurement criteria) used in various parts of the world. A comprehensive list of distress for asphalt and concrete roads is presented and segregated into six main groups. Commercial solutions for data capture and assisted image analysis are reported along with their limitations. We then present a comprehensive list of publically available datasets along with a link to download, which is segregated based on view type, type of distress, resolution, type of ground truth, number of images available, and country of origin. The review of recent (2018-2022) deep learning techniques for pavement distress detection, classification, segmentation, and direct pavement rating classification is presented. We segregated the literature in terms of the camera's view and the dataset used, the year and country in which the work was published, the F1 score, and the architecture type that helps identify the latest trends. We observe that much of the literature focuses on automating step 2-distress identification and quantification-while there is less emphasis on automating step 3-automatically computing a pavement rating. We observe that the criteria a region adopt to make the evaluation qualitative depend on factors such as pavement construction type, type of road network in the area, flow and traffic, environmental conditions, and region's economic situation.
This paper is organized as follows. Section 2 explains the type of pavement surfaces, the types of distresses, and pavement rating indicators used around the world, including their advantages and limitations. Section 3 reviews data collection techniques for visual pavement inspection and commercial practices. Section 4 generalizes an automated rating system and publically available dataset and reviews classical and deep machine learning approaches. Then, we discuss the limitations of an AI-based automated pavement rating system. Finally, we conclude in Section 5.

Pavement Surface Types and Distress Assessment Indicators
This section briefly explains various pavement types, visual pavement distresses, and pavement assessment indicators.

Pavement Surface and Distress
Pavement or road surfaces can be categorized into four general classes, i.e., asphalt, concrete, gravel, and brick and block [16]. Asphalt, also known as flexible pavement, is widely used to construct national, regional, or local roads across the road network and has different sub-categories depending on its construction. Over 90% of the total European road network has an asphalt surface. Concrete surfaces are usually used in urban environments and can be subdivided into joined cement concrete and continuously reinforced concrete surfaces [17]. Concrete pavements are expensive and time-consuming to construct, but they are typically more potent and durable than asphalt roadways. They are more common in the USA; for example, approximately 60 percent of the interstate system in the USA is concrete. Pavement condition assessment considers several pavement characteristics, i.e., roughness, surface condition (distress detection), surface skid resistance, and pavement strength. Surface condition plays a significant role in pavement assessment, which requires pavement distress detection and quantification. Pavement surface distresses that occur in different geographical regions can be divided into six groups, i.e., cracks, surface openings, surface deformation, surface defects, joint deficiencies, and miscellaneous distress [17,18] (see Table 1). Most of these distresses can be detected generally through visual inspection (standard practice) of pavement surfaces, and their severity and quantity can be recorded using manual measurement tools [17]. Visual distresses appears on the surface due to wear and tear, which may indicate a fault in the construction. It may appear differently in rural and urban regions, depending on the surface type, the severity (low, medium, high) of the underlying problem, and other environmental conditions.

Pavement Assessment Indicators
Measuring different pavement characteristics is essential in long-term pavement performance incorporating all or a subset of pavement characteristics to conduct pavement assessments. These condition rating systems vary from country to country (or within a state in the USA), considering local variations, the characteristics of the pavements, and economic conditions. Pavement characteristics that are generally separately measured include pavement roughness, a vital pavement characteristic measured on a rating index known as the International Roughness Index (IRI) [19]. It is estimated in a moving vehicle from a  [20,21]. Another essential characteristic is transverse deflection, also known as rut depth, measured manually or using sensors that generate transverse pavement profiles [9]. Visual pavement condition assessment requires distress detection and quantification to measure pavement conditions and is more reliable than other methods are. Engineers and professionals have proposed several standards for visual surface assessment, such as Pavement Surface Evaluation Rating (PASER) [2], Pavement Condition Index (PCI) [19,22], Pavement Surface Condition Index (PSCI) [23], and the Road Condition Indicator (RCI) [24]. Table 2 lists different pavement condition ratings used around the world. The standard ratings of various regions differ in scale granularity, formula to estimate a value on the rating scale, and data acquisition procedure. Table 2. A summary of different pavement condition rating systems used by regional road transportation departments or proposed by academics.  The earliest work in creating a standardized condition assessment scale dates from the 1960s in the United States [25]. The scale used two pavement characteristics-pavement roughness and visual surface distress identification, to determine the Present Serviceability Index (PSI) ranges from zero (very poor) to five (very good condition). A roughness index was carried out by 3-5 individual raters trained to qualitatively estimate pavement roughness by driving a vehicle on the pavement. It was followed by visual inspection for cracks, patches, and potholes. These two were then combined mathematically to calculate the PSI score (0-5) [25].

Type of Indicators
Over the years, data acquisition techniques have evolved; different pavement condition assessment ratings have been proposed that mainly focus on assessing the different types of pavement characteristics, their quantity, and their effect on the overall condition of the pavement. PASER is a direct rating on a scale of 10-1 (9-10 is excellent condition, while 2-1 is extremely poor). On the other hand, the ASTM standard for pavement is PCI, a rating on a scale of 100-0 (85-100 is a good condition, while 0-10 is completely deteriorated. It is mathematically based on distress occurrence and severity level. The Irish PSCI [18,23,26] rating is on a scale from 1-10, similar to PASER, where index-1 is the lowest (surface completely worn out or failed), and index-10 (no distress, new pavement) is the highest. It covers flexible urban pavements, urban concrete pavements, and flexible rural pavement separately. PSCI ratings are given to continuous stretches of pavements with similar conditions, with 200 m being the minimum length to have their distinct rating [26]. In the United States, the Federal Land Transportation program recommends visual distress detection based on PASER for direct pavement condition evaluation [27]. Some transportation departments (or road authorities) that use scales similar to PCI use a subset of the visual distresses and roughness index to calculate the PCI rating. For example, the New Zealand Road Assessment and Maintenance Management System (RAMM) assigns a CI (Condition Index) from 0-100 (0-Excellent-100-Failed); it includes a visual inspection of not only the pavement but the surface water channels along the pavement [28]. China uses the Chinese Pavement Condition Index (CPCI), a scale similar to PCI, and considers cracking, raveling, potholes, rutting, and roughness. Japan used the Maintenance Control Index (MCI) until 2005, a function of cracking, rutting, and roughness, on a scale of 10 to 0 [29]. After 2005, the Ministry of Transportation Japan has used RRI, which is a function of cracking ratio, rutting depth, and International Roughtness Index [29]. A similar index is used in Tajikistan under Japan International Cooperation Agency [30]. The RCI is a rating from 1-4 (with 1 meaning no physical deterioration, while 4 is severe deterioration), adopted in England, Wales, Scotland, and Northern Ireland, and fuses visual condition and gauging parameters of pavement condition [24]. In Germany, the RMA (Road Monitoring and Assessment) protocol rate the pavement into four categories based on visual distresses [31]. Some states use four classes in the USA, i.e., Good, Fair, Poor, Very-Poor, as a condition scale based on the original PSI rating. In some countries, such as India and Brazil, a visible pavement distress condition rating on a scale of 0 to 3 is used [32]. Ratings are based on cracking, rutting, raveling, patching, and potholes, while roughness is not considered [3]. Pavement condition surveys of national and local roads are commonly conducted annually, every two years, or every five years in different regions across the world (for example, in Ireland, they are conducted every two years, while in Florida, state highway surveys are completed annually [33]). Therefore, these survey methods should be quick, fast, reliable, and economical.
In summary, different regions have different ways of performing pavement condition rating; some take roughness and visual condition combined to assign a rating from a standard scale (e.g., China, Japan, and some states in the USA), while others rate only a subset of visual distress (e.g., UK, Ireland, Brazil, Germany, New Zealand, India, and some states in the USA). Some of these indices are very granular (1 to 100) such as PCI in some parts of USA versus that (0 to 3 scale) used in Brazil/India. The choice of scale has evolved with economic prosperity and maturity of the road network.

Data Acquisition Process and Commercial Practices
This section describes how the data is acquired for pavement distress assessment and current commercial practices.

Data Acquisition Process
Different sensing technologies, sensing positions, and vehicles have been used to capture data to assess pavement conditions. The choice of technology depends on economic factors, availability of resources, and pavement characteristics to be measured [4,34]; the sensor's position depends on the sensing technology used to acquire data for pavement condition assessment [18,26]. Figure 2 lists three types of sensing technologies available for structural monitoring and distress detection that can be integrated into vehicles with a GPS. To measure the pavement surface's vibration, deflection, displacement, stress, temperature, or humidity, 1D or point sensors are usually used for structural condition assessment, which is an indirect method of pavement surface condition assessment. Data from 2D or 3D sensors is generally used for visual distresses identification and direct pavement condition assessment. 3D sensors, including laser imaging, stereo pair, and To measure the pavement surface's vibration, deflection, displacement, stress, temperature, or humidity, 1D or point sensors are usually used for structural condition assessment, which is an indirect method of pavement surface condition assessment. Data from 2D or 3D sensors is generally used for visual distresses identification and direct pavement condition assessment. 3D sensors, including laser imaging, stereo pair, and ground penetrating radar, are used obtained from the top view of pavement. In contrast, 2D sensors, including RGB (color) cameras, are used mainly in the frontal wide-view camera configuration.
Recently, research in [35][36][37][38][39][40][41][42] has shown promise in using aerial vehicles to help detect visual distress, such as potholes, cracks, and aging on asphalt pavement. Data collection using aerial imagery poses other difficulties, such as occlusion due to ongoing traffic, permission to fly in urban areas, and lower ground sampling distances. However, it does have limited use in pavement condition assessment, especially on airport runways [41]. Using a laser or color camera mounted at the back of the vehicles, a top view of the road, as used by [43][44][45][46][47], focuses on crack detections and potholes. A wide-angle view of the road, using a camera mounted on the front of the dashboard or top of the car, as used by [32,48,49], is used for detecting types of cracks, potholes, and types of surfaces and surface ratings.
Hand-held mobile cameras, as used by [47,[50][51][52][53] have significant utilization in road surface distress detection. The top-view camera setup provides a better ground sampling distance than the wide-view setup, while the wide-view is much quicker as it covers more area per image. Thus, the literature review highlights that different camera capturing techniques for visual distress detection have been used: frontal wide-view, top-view, handheld smartphone, and aerial view.

Current Commercial Practices
Many commercial systems are available for image data collection for pavement condition assessment. These systems are reconfigurable and can be customized to carry different data sensors and inbuilt data analysis software for manual or semi-automated rating. The inbuilt software uses automated image analysis techniques to detect and quantify visual cracks for a pavement rating system. This section discusses currently available reconfigurable commercial systems used in different regions for data collection and assessment. A commercial vehicle usually consists of a GPS/GNSS module, transverse profile logger for rutting, laser profilometer for roughness estimation, high-resolution odometer, laser cracking measurement system, video logging modules for frontal wide-view capture, bump integrator, and an onboard computer for recording data (see Figure 3).
PaveVision3D [12] is a system that contains a data vehicle, an automated surface imaging system capable of conducting a complete lane width distress-detection survey at 1-mm resolution at a speed up to 100 KM/h. It uses a top-view approach with laser scanners and an intensity camera looking down on the pavement. It has dedicated software for crack identification, optional software and hardware for laser rut measurement, and laser roughness measurement. Pavemetrics [33] provides a similar solution called Laser Crack Measurement System (LCMS), which uses 3D laser scanners fitted on a vehicle. The LCMS software can geo-tag, measure, detect and quantify cracks, potholes, bleeding, shoving, raveling, and roughness. It can capture one lane of the pavement with a 1-mm resolution and a speed of 100 KM/h. The automated IRI and distress detection reports produced by LSTM comply with ASTM and the American Association of State Highway and Transportation Officials (AASHTO). These specialized vehicles are costly, and the distress detection software is calibrated for national highway pavement conditions in the USA or Canada.
Measurement System (LCMS), which uses 3D laser scanners fitted on a vehicle. The LCMS software can geo-tag, measure, detect and quantify cracks, potholes, bleeding, shoving, raveling, and roughness. It can capture one lane of the pavement with a 1-mm resolution and a speed of 100 KM/h. The automated IRI and distress detection reports produced by LSTM comply with ASTM and the American Association of State Highway and Transportation Officials (AASHTO). These specialized vehicles are costly, and the distress detection software is calibrated for national highway pavement conditions in the USA or Canada. In England, Wales, Scotland, and Northern Ireland, the pavement maintenance authorities use TRACS (Traffic-speed Condition Surveys) and SCANNER (Surface Condition Assessment for the National Network of Roads), which consists mainly of a laser scanner mounted on the front and back of a van giving a top view of the pavement surface. ROMDAS [54] is another customizable data-capturing solution that can provide both topview using laser scanners for crack measurement or frontal view using the color camera for other detecting other distresses. STIER [55] is a customizable vehicle with a top-down stereo vision monochrome camera and a frontal view camera for data capture. It uses its software to detect distresses defined by German FGSV regulation, i.e., cracks, potholes, inlaid patches, applied patches, open joints, and bleeding. This information is then used to rate roads into four categories [31].
PMS video survey van [56] is equipped with distance-measuring sensors and a GPS sensor attached to the onboard computer to provide accurate distance measurements. A In England, Wales, Scotland, and Northern Ireland, the pavement maintenance authorities use TRACS (Traffic-speed Condition Surveys) and SCANNER (Surface Condition Assessment for the National Network of Roads), which consists mainly of a laser scanner mounted on the front and back of a van giving a top view of the pavement surface. ROM-DAS [54] is another customizable data-capturing solution that can provide both top-view using laser scanners for crack measurement or frontal view using the color camera for other detecting other distresses. STIER [55] is a customizable vehicle with a top-down stereo vision monochrome camera and a frontal view camera for data capture. It uses its software to detect distresses defined by German FGSV regulation, i.e., cracks, potholes, inlaid patches, applied patches, open joints, and bleeding. This information is then used to rate roads into four categories [31].
PMS video survey van [56] is equipped with distance-measuring sensors and a GPS sensor attached to the onboard computer to provide accurate distance measurements. A frontal wide-view video camera mounted on the dashboard of the pavement surface offers a high-quality compressed video stream using a state-of-the-art compression algorithm to retrain high definition (1920 × 1080) at minimum storage space. The real-time software integrated with the onboard computer provides options for the expert to rate the pavement condition on the go or record the video for offline manual PSCI rating. The ground sampling distance is lower than cameras providing top-view imaging. However, it can cover the whole pavement in one direction on multi-lane pavements with imaging every 5 m.
The speed of data collection through the PaveVision3D, Pavemetrics, and Ricoh (with a top-view 3D laser camera) [57] compared with less expensive dedicated vehicles [56] (with a high-resolution frontal wide-view camera) is comparable. However, the wide-view camera systems can cover more lanes than the top-view systems due to frontal coverage. The amount of data generated for records and further computation using top-view vehicles is more than from vehicles that use only frontal view for paving rating. The methods using 3D laser sensors (PaveVision3D, Pavemetrics, or RICOH) produce better ground sampling distance per pixel than frontal wide-view images. Still, the confidence in the final output of such reliable vehicles is much more than using a GoPro camera or a smartphone, as they do not have a customized processing unit to fuse readings from different sensors such as GPS and distance measuring sensors.
Therefore, a vehicle with a wide-view camera in front of a dashboard without external sensors (laser scanners or profilers), is more economical for an extensive network of local and regional roads with less maintenance requirement. It requires less storage to record pavement images by compromising spatial resolution; however, enough distress information to manually rate a pavement condition on a standard scale. On the other hand, vehicles with a top-view camera and external sensors are recommended for national highways and motorways [2]. They provide a higher spatial resolution, better for distinguishing different types of cracks and patches. Such vehicles have higher maintenance costs and would not be cost-effective when driven on the regional or local road network.
The commercial solutions discussed are usually limited to automated data capture and semi-automated analysis for distress identification and quantification, followed by an assisted or manual pavement condition rating assessment for a stretch of pavement. Some companies in the USA and Japan do provide automated solutions for pavement condition ratings. RoadBotics [58] working locally on USA roads, use a limited version of PASER [2], i.e., they rate pavement from 1 to 5, with 5 being the lowest rating. An automated rating system from Ricoh [57] estimated the amount and location of cracks on a 50 cm × 50 cm patch and has adopted its rating system for Japanese roads based on PCI. The automated solutions for frontal wide-view and top-view are still evolving toward robustness and generalization and require calibration and transfer learning with local data.
In summary, there is no off-the-shelf solution for automated pavement condition rating. Most of the existing commercial solutions usually provide automatic data and image-capturing solutions, while their ability to detect and quantify distress from images is limited to a few distresses. The limited automated solutions for intelligent distress detection (identification and quantification) from imagery require recalibration to capture regional variations in the pavement distresses for shape, size, or texture due to environmental conditions. These automated solutions also do not support adaptation to different pavement condition rating standards used by different regional and local authorities. The choice of imaging technology for visual inspection depends on the type of distress, environmental conditions, and economic factors and how adequate they are in identifying those distresses.

Literature Review on Automated Visual Pavement Condition Rating Systems
Automated visual pavement surface condition rating can be broken down into several processes: a pavement surface classification process, a distress detection and quantification process, prediction of a rating score computed using the type of distress detection and its quantification based on a standard rating scheme, and predicting the rating for a given stretch of pavement based on majority voting scheme. The manual PCI system works similarly, i.e., it identifies all the individual distresses and quantities, calculates 'deduct' values based on each distress type, severity, and amount, and then generates an overall rating by subtracting the sum of the weighted deduct values from a perfect score of 100.
To generalize the above statement, let D 0 to D n be the 0 to nth distinct types of distresses, A m be the area of the mth instance of the nth distress, and w n is the effect or weight on the rating score of the nth distinct distress, then we can define the rating score of a pavement condition in an image using Equation (1).
where D n is the distress, and n is the type of distress. A m is the area in m_th instance of D n and w n is the weight of each n_th distress to overall score. For decades, extracting useful information from images has been a task of computer vision-based systems. Early researchers used image processing techniques (such as gradient or change in intensity detection, color or intensity thresholding, and morphological processing) to extract useful information from the pixels [59] directly. We first present a few prominent image analysis techniques and their limitations. Then we present a brief history of the evolution of machine learning techniques, benchmarking, and state-of-the-art models deep learning models for pavement condition assessment.

Evolution of Machine Learning in Computer Vision
With the development of machine learning algorithms such as K-mean classifiers [60], support vector machines (SVM) [61], ANN (artificial neural networks) [62], and many others [63], researchers started using hand-crafted features such as SIFT (scale-invariant feature transform) [64], ORB (Oriented FAST and rotated BRIEF) [65], or AKAZE [66] to uniquely describe an image, object, or region of interest. Image processing algorithms use these features to learn to classify, detect uniquely, or segment objects, areas of interest, or images. Hand-crafted features and classical machine learning provide robustness across scale, lighting, rotation, and other environmental conditions. Advancements in machine learning, including the development of dense neural networks [67], convolutional neural networks (CNN) [68], and more recently, Transformers [69], has provided solutions for computer vision tasks, which are more robust to changes in the input data and are coined as 'deep learning' computer vision or image analysis techniques. Handcrafted features are automatically extracted for a particular computer vision problem using deep learning algorithms.

Automated Distress Detection and Identification
The review of the literature tells us that researchers have investigated pavement distress detection using different imaging technologies, computing suitable features, and learning data models to detect, classify, or segment distresses over the last decade. In [7], the authors listed technologies to enable researchers to choose the imaging technique for pavement stress detection. They mentioned the state-of-the-art methods using image processing techniques for crack detection and potholes detection while highlighting the problems that need to be investigated, including pavement texture detection, temperature segregation detection, rutting detection, and joint faulting detection. The distress identification literature can be segregated into image processing, classical machine learning, and deep learning techniques.

Publicly Available Datasets
Over the years, researchers have made available datasets for benchmarking automated distress detection systems, mainly covering different types of cracking and potholes [4]. Only a few focus on other distresses or visual pavement classification. In [14], the authors have reviewed different methods to detect pavement surfaces and highlighted different benchmarks for pavement surface detection. In [4], the authors list contributions to existing publicly available pavement image datasets for distress detection. These very limited datasets can be categorized based on the view angles (top-view, wide-view, hand-held), and imaging technologies (3D or intensity), mainly focused only on a subset of distress types (different crack types, potholes, and patches) found locally in the geographical regions (USA, China, India, Japan, Czech Republic, Brazil, Italy, and Mexico). In [70,71], the authors generated a pavement distress detection dataset using images available from Google APIs. The images available through Google APIs give both top-view and wide-view images; however, the images are old, captured over the years, and not labeled for pavement distress. The authors in [4] highlight that most of the literature on distress detection is based on image datasets not publicly available. Table 3 summarizes the current publically available datasets for distress detection and pavement condition assessments. The datasets are used as benchmarks to verify the crack segmentation algorithms include CrackTree200 [72], Crack500 [73], CrackForest [74], and Agile-RN [70]. Though several researchers have used it for verification of their deep learning-based architectures; however, they are limited in terms of covering various shapes, sizes, and textures of cracks formed due to different environmental conditions.  GAPS (German Asphalt Pavement Distress Dataset) used by [31,71] provides a topview, good quality, close range, high-resolution dataset (approx~2468 images) for surface distress identification which trained operators to label in the field. Six different distress defined by German FGSV regulation [71], i.e., cracks, potholes, inlaid patches, applied patches, open joints, and bleeding, are labeled in the images using a bounding box. The dataset is limited to only a few distresses regulated by German FGSV and does not contain severity levels of these distresses.
The second main contribution to the distress detection dataset is by [75], which has three different variants, namely, RDD2019, RDD2020 [75], and RDD2022 [76]. The dataset contains frontal-view images that are mainly labeled using a bounding box for four distresses, i.e., alligator, transverse and longitudinal cracks and potholes. The 2019 variant contains images of Japan, while the 2020 variant contains images from India and the Czech Republic. The 2022 variant contains images from China, Norway, and the USA. The dataset may be prone to labeling errors as it is labeled using crowdsourcing by labelers, not an expert in the field. A similar dataset for cracks with a wide-view camera located at the back of the vehicle is contributed by [77].
Two frontal view datasets focus on pavement rating; the first is the Paris-Saclay and the second is the Road Quality Dataset (RQ) [78]. The Paris-Saclay dataset [79] is annotated for pavement condition rating for a stretch of a road based on PASER for New York roads. The frontal-view images are extracted from Google Maps API, while the ground truth annotation for each stretch is extracted from the pavement condition rating of New York in [80]. The ground truth annotation contains the street index, the number of images in the street, the PASER rating for each street segment, and a rating of Good, Fair, and Poor for each street segment. A similar image dataset can be extracted from Google images for Oakland, USA, while the street segment pavement rating based on PCI can be generated from the database available at [81].
RQ Dataset [78] is a manually annotated frontal-view image for pavement condition index ratings based on six different condition ratings for the Czech Republic. The pavement condition rating criteria are defined in [78], while the images are obtained using Google Maps API. FHWA-LTPP [34] is another image-level classification resource composed of for five distress (alligator, longitudinal and transverse cracking, deflection, and longitudinal profiles) captured from different states of the USA.

Image Processing Techniques
Techniques using decision-based rules and image processing mainly focus on crack segmentation and identification. The authors in [12] described different image processing techniques for edge detection to find surface defects and segregated the recent literature on pavement stress identification using machine learning models into classification, object detection, and pixel-level segmentation problems. The authors of [82] proposed a modified Otsu-Canny edge detection algorithm for pavement crack detection. They evaluated the technique on a publicly available dataset Crackforest [83]. Peng et al. [84] proposed a double thresholding segmentation technique. After applying an enhanced Otsu threshold segmentation algorithm to eliminate pavement symbols in a runway image, they applied an adaptive iterative threshold segmentation algorithm. Lastly, the shape of the crack is achieved through the morphological denoising technique. In [85], the authors propose a multiscale local optimal threshold segmentation for pavement crack segmentation and crack density distribution. The method achieves better results than the optical threshold and global thresholding techniques. Zhao et al. [86] proposed an improved pavement edge detection method for crack identification. In [87], authors have used image processing, including thresholding, filtering, and morphological processing, to identify fatigue cracks. CrackIT [88] uses image pre-processing techniques before applying machine learning models for crack detection.
Image processing techniques are mainly applied to pictures with a top view of the pavement. Moreover, the early literature focuses on identifying characteristics of cracks or potholes. The image processing techniques are less robust to changes in intensity, noise, environmental factors, and pavement construction variations.

Classical Machine Learning Techniques
Machine learning approaches for distress identification can be classified as an image or object classification problems, object localization or detection problems, or pixelsegmentation problems. Many classical machine-learning approaches have been investigated for crack detection, including [89][90][91]. Daniel et al. [92] proposed a method to detect and classify cracks and potholes on asphalt pavements. They offered a two-step approach, pavement defect detection and classification, and defect severity detection and evaluation. The second stage is important for an automated pavement condition assessment and computed defect severity for each defect by calculating the area of the blobs. The method achieved 86% classification accuracy for cracks and potholes.
Raveling is a common visual distress in asphalt pavements, which occurs due to the loss of surface stones. It is recognized visually by observing the change in the macrotexture of the asphalt pavement along the stretch of the pavement. The severity of raveling increases with a higher chip loss from the pavement surface. In [93], authors evaluated different classical machine learning techniques such as AdaBoost with decision trees, support vector machine, and random forest to detect and classify different levels of raveling severity. For data collection for raveling, they used 3D images from PaveVision3D [12]. They observe that random forest is better than other techniques, with a recall ranging from 86.9% for level -1 severity to 75.6% for level-3 severityVery little work is reported on raveling detection and severity classification. In [94], the authors highlight the limitations in generalizing classical machine learning methods for crack detection. In [4], the authors have listed many classical machine-learning approaches for distress detection. These approaches mainly focus on detecting fatigue cracks, longitudinal and transverse cracks, potholes, rutting, and raveling. The image dataset is mainly captured through the top-view camera on a specialized vehicle, a hand-held camera view, or a UAV. Different hand-craft features have been extracted in these techniques. Models such as K-nearest neighbor, support vector machine, artificial neural network, and random forests are used to train a pixel-classifier (image segmentation) or an object detector. The precision ranged from 65.8% to 99% and recall from 79.4% to 98% [4].
However, these evaluation parameters are not generalizable as they depend highly on the image capture process. The datasets used mainly contain localized cracks or potholes, do not have different severity levels and are limited to a particular pavement type in a specific geographic region.
In summary, the public datasets available are limited to certain distress and mainly annotated by presence or absence of the distress. The image dataset annotated for pavement rating indices is also limited and does not cover the full range of standard visual rating scales, i.e., PASER [2] and PSCI [18]. The dataset does not cover distinct types of distress (mentioned in Table 1), or different shapes and textures, which vary due to different viewing angles, camera sensors, and geographical locations. Therefore, the evaluation matrix based on these benchmarks is less helpful in developing real-world automated pavement condition assessment systems.

Deep Learning Techniques
We focused on literature from 2018 onwards for deep learning techniques. The techniques are mainly broken into segmentation, classification, and object detection algorithms. Deep learning techniques, mainly convolutional or filtering layers, require a large amounts of data. Deep learning techniques are now widely used for computer vision tasks, including semantic segmentation, image classification, object detection, and image generation [95].
Deep architectures have also been used to solve classification hyperspectral imagery for remote sensing [96,97].
Deep learning algorithms or architectures mainly consist of two parts the feature extraction phase and a classification, segmentation, or detection phase. In simple terms, for a deep learning-based classification, the CNN provides feature extraction layers, and the dense neural layer is added to estimate a class based on a feature extracted by the CNN. In deep learning-based segmentation, the CNN provides a feature extraction layer and is termed an encoder, while a set of de-convolutional layers are added to obtain pixellevel classification and termed a decoder layer. In deep learning-based object detection or localization, the CNN is used for feature extraction, followed by region proposal layers for object detection bounding box on the original image [68]. The interlinked deep learning layers are usually termed as 'architectures' or, when referred to alongside the weights and biases, as models.
Distress detection using a deep neural network can be separated into object detection, segmentation, and classification-based approaches [4,8]. One major bottleneck for developing a model using deep learning is a good set of balanced training data for different distresses in the images, instances, and quantity [98]. In [99], authors present the first CNN-based raveling detection by training macro texture features obtained from the 3D images from PaveVision3D [12]. They achieved the highest accuracy of 90.8% for different raveling detection and an 85% accuracy for severity classification.

Classification Approaches to Distress Detection
Classification-based distress detection focuses on whether an image or part of the image is classified as a particular type of distress. The authors in [100,101] proposed a flexible pavement distress classification convolutional neural network (CNN) framework to classify whether a patch is a crack or not. The images used are taken from a hand-held mobile phone camera. They evaluated the accuracy of their approach by comparing it with different classification approaches. Aparna et al. [102] assessed the feasibility of hand-held thermal imaging for pothole patch classification. Image data is acquired under various lighting conditions with offline data augmentation. A residual CNN model with pre-trained weights gave an accuracy of 99.7% for pothole patch image classification with an image size of 224 × 224 pixels. Yusof et al. [103] proposed a multi-label classifier for crack-type classification, i.e., transverse, alligator, and longitudinal. The images were taken from a hand-held Nikon digital camera with a dimension of 1024 × 768 pixels. The image was broken into a 32 x 32 patch image to classify different crack types. The data collection was carried out for Malaysian pavements.
An average accuracy of 98% was achieved to classify crack types with a precision of 97%. In [104], authors presented an algorithm for occurrence and severity classification in images captured from a top-view camera of urban flexible pavements in Spain. Their occurrence detection is based on patch classification using ResNet architecture, while the severity classifier is also a ResNet architecture. Each image is cropped to remove the background, broken down into three smaller blocks, resized to 224 × 224 pixels, and labeled for six classes, i.e., alligator cracks, longitudinal cracks, transverse cracks, pothole, raveling, and patches. To determine the severity of four distresses, mainly longitudinal cracks, transverse cracks, potholes, and patches, they labeled each distress with a bounding box in each image block. Although there were multiple distresses in each image block, the smaller block size minimized the likelihood of having different types of distress in each block. For the distress occurrence stage, the classifier's average F1-Score was reported to be 0.9262 on validation data, while the average Intersection of Union (IoU) was 0.729.
Researchers [31,[105][106][107][108][109] have also used a similar patch-based approach, i.e., dividing a higher resolution image into small image patches to detect localized distress, i.e., distinct types of cracks and potholes. In summary, most classification-based approaches focus on identifying types of distress in an image patch of higher-resolution images. Localized distresses are investigated, i.e., potholes and cracks. The images are taken from a top view or a hand-held camera view; the data set is localized to only specific to one region. The number of image patches is reasonable in number with a limited higher resolution image from where the patches have been extracted.
Many researchers have used patch or image classification techniques for multiple distress detection. Researchers in [43] and [44] used top-view color images for the experiment and then used ResNet-based architectures to develop a model for multiple distress classification. The ResNet model used in [43] has an F1-score of 0.92, whereas the model used in [44] has an F1-score of 0.90 on their test datasets. An image classification technique for multiple distress detection is mainly used for bleeding, raveling detection, and severity classification by [110] for pavements in Iran and [99] for pavements in the USA. In [46], the authors used detection and segmentation algorithms to classify four different types of cracks and then segment crack pixels. They [46] observe a better pixel segmentation F1-score on the CrackForest dataset than others using a multiple image-resolution training strategy.
Most researchers are focusing on multiple crack classification and having a better a F1-score using an image from a camera with an orthogonal view of the pavement and high ground sampling distances (i.e., pixel per inch). The evaluation of patch-based classification approaches for distress and its severity classification is limited in the literature. Patch-based classification and identification of distress instances are helpful for localized distresses; the technique is suitable for images that capture the top view of the road. It is computationally less expensive than pixel-level segmentation approaches. Table 4 provides a summary of the literature focusing on distress classification using either patch classification, image classification, or semantic segmentation.  [99] Pixel Segmentation Approaches to Distress Detection Segmentation-based approaches classify or label each pixel as a group or distress. Usually, distinct types of crack distresses are good candidates for pixel-level segmentation. The precise location of a crack can be determined using pixel-level labeling. In [111], the authors used U-Net architecture to segment crack pixels using a publicly available crack image database. The number of input training and test images is minimal; the experiment shows promise to segment crack pixels. In [112], the authors summarize a review of 68 manuscripts covering deep learning techniques for crack detection using segmentation. The authors evaluated eight segmentation models on 3D pavement images obtained from systems like [33]. They observed that FCN [113] and U-Net [114] performed better than others for 3D pavement images. In another attempt by [115], the author proposed a CNN-based segmentation algorithm named DeepCrack. The images are publicly available datasets of cracks from an intensity camera with a top view of the pavements with a dimension of 512 × 512 pixels. DeepCrack architecture, built with different scales and inspired by the SegNet network [115]. The authors in [100] used VGG-16 DCNN to detect cracks, by dividing high resolution images into smaller patches and use an image classification approach to detect cracks.
The authors have extensively evaluated DeepCrack with other state-of-the-art pixel segmentation models. The experimental result was an average F1-score of 0.85 for Deep-Crack. The researchers in [116] have investigated the U-Net model architecture for crack segmentation; they used transfer learning techniques on pre-trained weights to train the classifier. The data on the concrete pavement is collected through a mobile phone at various locations at the Huazhong University of Science and Technology, China, with an image dimension of 512 × 512 pixels. The authors claim a higher accuracy and precision for crack pixel classification for concrete pavement types. In [117], researchers have proposed an asphalt pavement crack segmentation using a new CNN architecture. The data were collected from 12 cities in Liaoning province, China, through a hand-held mobile phone camera. The researchers have compared the results with existing segmentation models such as U-Net [118], SegNet [119], PSPNet [120], and DeepLabV3 [121]. The proposed model performance is better than the existing segmentation CNN architecture. Tang et al. [122] proposed an encoder-decoder network EDNet for crack segmentation. The network caters to quantity imbalance between crack and non-crack pixels. The images are taken from the top-view laser scanning camera to acquire 3D pavement images. The proposed method achieves an average F1 score of 97.80% and 97.82%.
Researchers focus more on cracks than other visual distress on the pavement surface when using deep learning techniques. One reason for this is that the crack is a fatigue on the surface that further disintegrates into potholes or total failure of the pavement surface. In Section 2.2, we observe that the occurrence and severity of visual distress, especially cracks, are essential to estimate the pavement conditions index. Table 5 summarizes crack segmentation and detection using deep learning. Researchers have mainly used encoder and decoder convolutional neural network architectures to segment crack pixels. Researchers in [94,107,109,123] proposed smaller customized CNN encoder-decoder networks, and the model is trained on smaller patches extracted from the higher-resolution image, while [115,[124][125][126] used a modified UNET [113] based architecture, which is a fully convolutional network for semantic segmentation, and used a smaller resized image. Researchers combined three [127] and five publicly available datasets [51] for training their models and reported a lower F-1 score than previous ones using three deep learning architectures namely Holistically-Nested Edge Detection (HED), Richer Convolutional Features (RCF) and the Feature Pyramid and Hierarchical Boosting network (FPHB). For crack segmentation, the top orthogonal view is preferred over than front wide view of the pavement. The orthogonal view has the advantage of controlled lighting and higher pixels per inch; however, the disadvantage is of covering a lesser view of the pavement.  We observe that the biases and weights of the encoder (feature extraction part) are trained primarily from scratch instead of pre-trained on the ImageNet benchmark for developing a crack pixel classifier. Most crack detection and segmentation models are evaluated on publicly available datasets. Table 5 shows the test performance (F1-score) of different models developed using different architectures and training datasets. The deep-learning-based algorithms perform well when the test data is similar to the training images (i.e., from the same device); however, the performance degrades when the multiple training datasets are combined, or the test dataset is from a different capturing device and region. We also observe that automated segmentation deep learning algorithms using orthogonal images show a higher F1-score than the front or back view images. Methods like DeepCrack [115] holds promise to identify linear (transverse, longitudinal) cracks that are difficult to detect in patch-based methods.

Object Detection Approach to Distress Detection
Distress detection can also be approached using object detection. The approach is somewhat like patch-based image classification; however, the implementation is different in terms of the input and output of the CNN architecture. The object detection method can be used to find multiple object (distress) instances in a high-resolution image using CNN networks like Faster RCNN [135], the SSD MobileNet [136], or the YoloV3 [137]. Object detection-based techniques are usually used to detect different distresses, mainly including potholes, patches, cracks, and their various types and severities (see Table 6).
In [138], the author proposed a pothole detection system trained on images taken from a hand-held camera. The model was tested and compared with four object detectors. The authors observed that single-shot multi-box detectors (SSD) have higher accuracy but lower computational speed than YoloV3. YoloV3 fails in cases where the size of the pothole is small. In [139], researchers used Squeeznet architecture to train a model on image patches of size 64 × 64 extracted from two datasets with an orthogonal view of the pavement. The F1-score using the GAPs dataset was poorer than the F1-score obtained on the custom dataset obtained in the USA. Researchers in [45,[140][141][142][143][144][145] used a version of the YOLO [137] architecture to train a model to detect different distresses. A crack severity detector for the top view of the pavement using YOLO with an average F1-score of 0.70 was proposed in [45].  [145] Similarly, in [149], the authors experimented with thermal imagery and used object detection algorithms with an average precision of 91.15%. Maeda et al. [150] proposed an object detector based on SSD MobileNet and Inception V2 architectures. They achieved an average recall of 77% with a precision of 71% for potholes, alligator cracking, and blurry line marks. However, the 'presence/absence' detection is not very helpful for quantifying distress, which is essential for pavement surface evaluation. Table 6 shows a summary of object-detection-based distress detection. The F1-scores indicate that the performance of the object detector deteriorates for multiple distress detection compared to detectors that detect one or two distresses. We observe that Yolo architectures promise to detect distresses from a frontal view of the camera; however, developing a robust model for a region will require calibration from the local distresses. Top-view, hand-held cameras, and wide-view images have been used in experiments. The object detection-based algorithm can be used for localized distress detection, such as alligator cracks and potholes. The localization and detection accuracy is better than the patch-based method. Recall or accuracy for detecting cracks (linear or edge) using a frontal view image is less when object detection networks such as Yolo [137] are used compared to top-view images.

Automated Direct Pavement Condition Rating
It is highlighted in the introduction of this paper that the primary purpose of distress detection and identification is to evaluate the condition of the pavement using a standardized scale. Distresses must first be identified to compute a rating for an extensive pavement network. Then the number of distinct distresses and their severity must be considered over a given stretch of pavement. Most research focuses on distress identification but falls short of computing a direct pavement rating for a stretch of pavement. One approach to computing direct ratings is described in [151], where the authors present a hybrid model of an object detector and semantic segmentation for classifying and quantifying distress severity on pavements and predicted PASER indices for each patch. The images are collected from Google Street View maps -70-degree wide-angle views, and 90-degree birds-eye view images. Wide-view photos are used for crack and pothole detection, and top-view images are used to quantify crack severity. The results from the hybrid model are then fed to a linear and weighted regressor for predicting PASER indices to pavement patches. They trained YOLO to detect distress and used U-Net (based on a fully convolutional layer) to classify crack severity. The results from the two models are then combined to find the crack density per pavement defect. The results are then fed to a linear and a weightage regressor to label each image a PASER index. The photos are from USA pavements, and the PASER calibration set is minimal. The predicted PASER model fits with an R 2 of 0.9382 or test data with a root mean square error of 10.45. One of the limitations of this research is the use of Google API images that are usually older. In this system, only two distresses are taken for the rating (cracks and potholes); however, in most practical scenarios, cracks, potholes, patches, raveling, and bleeding also need to be considered, requiring transfer learning for adding localized distresses further modification in the algorithm for raveling and bleeding.
In [152], the authors have presented an image classification approach to surface rating using a three-rating index-good, regular, and bad. The dataset used for the experiments is RTK [32], caRINE [153], and KITTI [154]. It classifies roads into three different types and three different ratings. The images are cropped to extract the region of interest that contains the road. Data augmentation is performed to increase the robustness and avoid overfitting. The authors used three convolutional layers, a flattening layer, and two fully connected dense layers to classify the road types into asphalts, paved, and unpaved. The classified images are then further passed through another classifier to estimate the quality of each road, as good, regular, and bad for each class. The surface type accuracy is reported as 98% for three types. The classification accuracy for the three quality types is 98% for good asphalt and 96% for bad asphalt. The precision of classifying the good class is 86.7%, while classifying the bad asphalt class is 81%. The number of rating indices is limited to three-good, bad, and regular, and they only relate to Brazil's actual standard rating system. However, judging on a scale of 3 levels is not very useful in real life, where maintenance decisions are based on the overall rating, and individual distresses that lead to that rating. Moreover, it also requires further experiments to increase the number of image classes to be adopted for visual standards such as PASER or PSCI. The higher statistics of recall and precision are much easier to obtain if the images are simple-complex images with multiple distress and different quantities are much more difficult.
In [155] the author presented the complexity of manual PCSI practices. The author used pixel segmentation using a semantic segmentation CNN-based model from [156] to extract roads, marks, and background pixels. They analyzed state-of-the-art EfficientNet V2 [157] image classification approach for automating PSCI ratings. Each image in the training and test set has a 'segmented' pavement image, an 'augmented' image, and an 'original' image. Image height is cropped 250 pixels from the top and 50 pixels from the bottom to remove the sky and pavement pixels further away from the camera and pavement pixels too close to the camera. 'Augmented' image is computed by combining the pavement segmented intensity image, the pavement plus mark pixel intensity image, and the original intensity image. They used a combination of these images to evaluate the performance of the classifier. For a 10-class classification, the best model achieved an F1-score of 0.57, while a 0.73 for a five-class classification after combining adjacent classes.

Benchmarking and State-of-the-Art Models
During the last decade, researchers have developed benchmark datasets to evaluate deep learning models, especially CNN feature extraction layers [158], and images labeled for a particular computer vision task. The algorithm is known as a state-of-the-art model if the model's performance matrix is the best if evaluated against benchmarks [159]. The website [160] gives a structured approach to finding state-of-the-art models for different computer vision, natural language processing, and signal processing tasks on the respective datasets. Improving the state-of-the-art models using benchmark datasets is one approach; however, recently, researchers have argued that an application-centric process must be followed for a deep learning solution. In [160], Hooker argues that chasing benchmarks is incorrect for evolving a machine learning model. Instead, smartly chosen training images specific to a particular application helps in better understanding for developing a deep learning-based solution as suggested by [161]. Across different subfields of AI, specifically in machine learning, current benchmarking practices tend to distort the development of fair and flexible AI systems for real-world scenarios. In [162], the authors systematically explored the limitations of influential dataset-based benchmarks, revealed the construct validity issue, pointed out the risk associated with their framing, and proposed alternative performance evaluation methods. The authors [162] have logically argued that the stateof-the-art performance of AI models on these benchmarks does not validate the generalpurpose capabilities of models, particularly in visual and language understanding domains.
Therefore, benchmarking is a conservative approach to assessing general model capabilities due to limited task design, de-contextualized data, hidden biases, false performance reporting, and inappropriate community use in the machine learning context. These benchmarks are arbitrarily selected subsets of objects from the real world and cannot cover the domain knowledge for a particular application. It is recommended that along with recalibration or transfer learning with a localized dataset, alternative methods such as unit testing and failure mode analysis could measure the broader capabilities of an automated pavement rating system.

Limitations of AI-Based Automated Pavement Rating Systems
A country or region's adaptation to a standard, or defining local variants for pavement condition assessment depends on environmental factors, local pavement distresses, and economic factors in the data collection process. The evolution in imaging technology, computational power, and CNNs have made pavement condition assessment through visual distress detection fast, quick, easy, and cost-effective for a country's comprehensive pavement survey. In [9], the authors summarized different imaging technologies, types and sub-types of distress, and distinct levels of distress severity (i.e., low, medium, and high). The variation in shape, size, and texture is due to different severity levels of these distresses due to various weather conditions in different geographical locations [17]. The variation in data in different regions is not only because of changes in shape, size, and texture of distresses but also due to different imaging technology and placement of sensors in specialized vehicles. Automated condition rating for a pavement stretch depends on types and sub-types (severity-level) of distress and the amount of distress present in a particular stretch. The CNN-based automated decision tools depend on learning from statistical information present in images; therefore, data injected for learning needs to be centric to the problem domain, smartly sized, and less noisy [161]. The accuracies and precisions mentioned in the literature are reported on limited data sets, certainly not with complex images with multiple distresses of different shapes, sizes, or textures. Any automated rating system using imaging technology needs to be recalibrated (for example, using transfer learning techniques) for the regional distress to capture variation in shape size, the texture of distresses, and variation in light intensity. The highlighted environmental factors (such as rain, standing water, poor lighting, and moisture) play a crucial role in distress shape, size, and texture. Moreover, while imaging these distresses for pavement condition assessment, the algorithm is not generalizable for different geographical locations due to the distress's environmental factors, shape, size, and texture.
Orthogonal views capturing the pavement requires expensive external 2D and 3D sensors mounted outside the back of the vehicle, which makes it expensive to maintain. It increases the budget for pavement condition assessment for a road network across the country, especially the local network. However, it captures images with controlled lighting conditions, which help in the automated detection and segmentation of cracks and patches. It is recommended for use on national highways and motorways. The frontal view capturing of the pavement requires low-cost cameras that can be mounted inside the vehicle, which makes it less expensive and lower budget. It captures a wide view of the pavement, which helps in the automated detection and classification of different distress types, including raveling, bleeding, different types of patches, cracks, and potholes. It is recommended to cover a bigger network of pavement surfaces, including local and regional roads. Another challenge for such approaches is the unavailability of very large datasets of labeled data-labeled images identifying multiple distress types and their severity levels are expensive to create, requiring both time and expert knowledge.

Conclusions
Technology and intelligent algorithms for automated pavement surface condition evaluation have evolved during the last decade. The literature indicates the experimentation in evaluating different imaging technologies (such as intensity, color, and 3D laser camera), imaging road views (top-view, wide-view, or hand-held), and developing a robust algorithm for detecting distinct instances of distresses in an image-moreover, very little work is found on pavement condition assessment rating. The current limitations include a lack of a general evaluation matrix to evaluate the robustness of the detecting algorithms for different shapes, sizes, and textures of distinct distresses in different geographical locations. The lack of algorithms for quantifying these distresses in images and, finally, for rating a stretch of pavement using a sequence of images to develop a real-world automated pavement condition assessment rating. In practice, a rating is assigned to a stretch (200 m or 100 m) of pavement instead of one image; different regions follow different assessment standards.
We found little work on automatically computing direct pavement ratings. The recent literature reviews pavement condition evaluation summarize imaging technologies and different machine learning approaches for distress detection and identification; however, they have limited insight into the correlation between standard condition rating practice to distress detection and its quantification. Road or pavement rating conditions depend on the type of distress and quantification, which changes (shape, size, and texture) with several factors, including environmental conditions (weather) and the pavement construction process. The highlighted environmental factors (such as rain, standing water, poor lighting, and moisture) play a crucial role in these distresses' shape, size, and texture.
For data collection, the top view of the pavement and a wide-angle view of the pavement has been used for distress detection, identification, segmentation, and pavement condition ratings. Choosing an imaging technology for visual inspection depends not only on the type of distress but also on environmental conditions and economic factors and how adequate they are in identifying those distress. The top view gives a higher ground sampling distance but covers less area per image than wide-view images. Vehicles with external laser scanners and stereo pairs are more expensive to operate and maintain than vehicles with an internal high-resolution camera with a frontal view. In summary, there is no off-the-shelf solution for automated pavement condition rating. Most of the existing commercial solutions usually provide automatic data and image-capturing solutions, while their ability to detect and quantify distress from images is limited to a few distresses.
Many of the datasets available as benchmarks are limited only to cracks and potholes and are localized to a geographical location. Research on automated pavement distress assessment is often limited to the regional pavement condition assessment standard that a country or state follows. The criteria a region adopts to make the evaluation qualitative depends on factors such as pavement construction type, type of road network in the area, flow and traffic, environmental condition, and region's economic situation.
Most of the automated image-analysis-based pavement condition assessment focuses on two primary distress, i.e., distinct types of cracks and potholes. Very few experiments can be seen in the literature on raveling or bleeding (see ( [163,164]), which are forms of surface defects and contribute toward a unified pavement surface rating. Other surface distress, such as patching, utility patches, and utility cover, is seldom considered (see [165]). PASER (used in the USA and other regions) and PSCI (used in Ireland), the ratings 10-7, are decided based on the amount of raveling and bleeding alone. Similarly, the study of direct pavement ratings from images as a classification problem is limited, apart from [152] and [151]. The 'presence/absence' detection is not very helpful for quantifying distress, essential for pavement surface evaluation. Higher levels of recall and precision are much easier to obtain if the images are simple; complex images with multiple distresses, and their quantities are much more difficult. Automated distress detection and condition rating is not a time-critical process, it can be conducted offline, so accuracy and precision are more important than computational time.
In the future, automatically computing a rating for a stretch of pavement will need to combine several methods. For example, image processing techniques such as cropping may be required to remove objects such as the sky, buildings, cars, and sidewalks to prepare images for use by machine or deep learning models. Then, segmentation may be used to segment the distinct distress and use the number of pixels of each different instance to calculate the area of the distress. A similar approach could be implemented using object detection-based approaches to detect individual distresses. Distresses such as rutting and sag may require multiple images or a fusion of point sensor information to establish the presence of such stresses. Deep learning models will need to be calibrated (trained) to capture the severity levels of each distress for a local region where it needs to be deployed. The number of distresses and their severity can be used to compute a rating score averaged over a set of images for a given stretch of road. Advances in deep learning may allow computing a rating directly using image classification. Still, a lack of benchmark datasets containing various distresses for learning may hinder such approaches. Developing a benchmark dataset for a diverse set of distinct distresses and their severity levels is challenging, as it requires extensive data collection to capture different environmental conditions and regional variations. We propose that any automated rating system for pavement conditions using imaging technologies will require re-calibration (i.e.,