Mapping Rooﬁng with Asbestos-Containing Material by Using Remote Sensing Imagery and Machine Learning-Based Image Classiﬁcation: A State-of-the-Art Review

: Building rooﬁng produced with asbestos-containing materials is a signiﬁcant concern due to its detrimental health hazard implications. Efﬁciently locating asbestos rooﬁng is essential to proactively mitigate and manage potential health risks from this legacy building material. Several studies utilised remote sensing imagery and machine learning-based image classiﬁcation methods for mapping roofs with asbestos-containing materials. However, there has not yet been a critical review of classiﬁcation methods conducted in order to provide coherent guidance on the use of different remote sensing images and classiﬁcation processes. This paper critically reviews the latest works on mapping asbestos roofs to identify the challenges and discuss possible solutions for improving the mapping process. A peer review of studies addressing asbestos roof mapping published from 2012 to 2022 was conducted to synthesise and evaluate the input imagery types and classiﬁcation methods. Then, the signiﬁcant challenges in the mapping process were identiﬁed, and possible solutions were suggested to address the identiﬁed challenges. The results showed that hyperspectral imagery classiﬁcation with traditional pixel-based classiﬁers caused large omission errors. Classifying very-high-resolution multispectral imagery by adopting object-based methods improved the accuracy results of ACM roof identiﬁcation; however, non-optimal segmentation parameters, inadequate training data in supervised methods, and analyst subjectivity in rule-based classiﬁcations were reported as signiﬁcant challenges. While only one study investigated convolutional neural networks for asbestos roof mapping, other applications of remote sensing demonstrated promising results using deep-learning-based models. This paper suggests further studies on utilising Mask R-CNN segmentation and 3D-CNN classiﬁcation in the conventional approaches and developing end-to-end deep semantic classiﬁcation models to map roofs with asbestos-containing materials.


Introduction
Asbestos exposure remains a significant cause of death and health problems [1]. Preventing asbestos-related diseases such as asbestosis, mesothelioma, and lung cancer [2][3][4] is difficult because inadvertent exposure to unknown asbestos contaminations in buildings and the environment is unavoidable [5]. Moreover, asbestos-related diseases have a long latency and continuous disease progression after exposure, which may not be diagnosed early [6]. The release of airborne asbestos fibres can be caused by human activities that produce dust [7] (e.g., the maintenance, repair, and demolition of buildings with asbestos-containing materials (ACMs) and the unsafe disposal of waste) and degradation classification model is developed based on training data in supervised methods or analysts' knowledge in expert system methods [49]. The review paper is structured to cover all stages of the ACM roof mapping procedure in the targeted studies. Firstly, the various types of RSI used in the literature, including HSI, MSI, and the fusion of RSI, are explained, and the preprocessing methods are discussed in detail. Secondly, the feature engineering and classification methods are explained based on their classifying approaches, including pixel based, object based, and deep learning classification techniques. Based on the conventional procedure of ACM roof mapping, the challenges affecting the accuracy results of the studies are extracted and synthesised in tabular structures. Then, the challenges are studied in other applications of RSI and image classification to identify possible solutions for enhancing the ACM roof mapping process and accuracy results.

Materials and Methods
Ref. [51] defined the characteristics of state-of-the-art review by comparing the HSI type of review with the other common review types using the Search, Appraisal, Synthesis, and Analysis (SALSA) analytical framework. For this review, no formal quality assessment and predefined method (e.g., systematic review) were considered to collect and analyse the sources [51]. Due to the non-numerical nature of the input data, a qualitative research design approach was adopted to structure the paper. The identified challenges [10,11,13,16,[20][21][22][23][26][27][28]30,31] are narratively synthesised and presented in Section 5. The review paper is structured to cover all stages of the ACM roof mapping procedure in the targeted studies. Firstly, the various types of RSI used in the literature, including HSI, MSI, and the fusion of RSI, are explained, and the preprocessing methods are discussed in detail. Secondly, the feature engineering and classification methods are explained based on their classifying approaches, including pixel based, object based, and deep learning classification techniques. Based on the conventional procedure of ACM roof mapping, the challenges affecting the accuracy results of the studies are extracted and synthesised in tabular structures. Then, the challenges are studied in other applications of RSI and image classification to identify possible solutions for enhancing the ACM roof mapping process and accuracy results.

Materials and Methods
Ref. [51] defined the characteristics of state-of-the-art review by comparing the HSI type of review with the other common review types using the Search, Appraisal, Synthesis, and Analysis (SALSA) analytical framework. For this review, no formal quality assessment and predefined method (e.g., systematic review) were considered to collect and analyse the sources [51]. Due to the non-numerical nature of the input data, a qualitative research design approach was adopted to structure the paper. The identified challenges [10,11,13,16,[20][21][22][23][26][27][28]30,31] are narratively synthesised and presented in Section 5.
As shown in Figure 2, the study was designed in three stages: (1) data acquisition, (2) data synthesis, and (3) solution development. In the first stage, the search was conducted within Scopus, ScienceDirect, and Google Scholar; then, the target studies were selected based on criteria noted in Section 2.1. In the second stage, selected sources were summarised and grouped based on the input RSI (i.e., HSI, MSI, and multisource data) and the image classification approach (i.e., PBIA, OBIA, and DL). In the third stage, the studies were critically reviewed to identify challenges affecting the ACM roof mapping process and classification accuracy results. Finally, the synthesised challenges were studied in reviewed sources and other applications of RSI (e.g., object detection in urban areas) to develop possible strategies to address the ACM roof identification challenges.
based on criteria noted in Section 2.1. In the second stage, selected sources were summarised and grouped based on the input RSI (i.e., HSI, MSI, and multisource data) and the image classification approach (i.e., PBIA, OBIA, and DL). In the third stage, the studies were critically reviewed to identify challenges affecting the ACM roof mapping process and classification accuracy results. Finally, the synthesised challenges were studied in reviewed sources and other applications of RSI (e.g., object detection in urban areas) to develop possible strategies to address the ACM roof identification challenges.

Data Acquisition
The targeted sources for reviewing mainly originate from two categories of RSI applications: (1) ACM roof identification, or (2) roof classification with at least one ACM roof class. Data for the review were initially gathered from related research satisfying the following three criteria: (1) studied the identification of ACM roofs or studied roof classification with at least one ACM class, (2) used at least one type of RSI, and (3) used machine learning methods in the classification process.
Peer-reviewed journal articles and conference papers from 2012 onwards were extracted from the collected sources in the initial search. ACM roof identification studies prior to 2012 were often conducted on small study samples and mainly included flat urban areas [24,52]. Moreover, the input data had a low spatial resolution (e.g., more than 5 m), causing the problem of mixed pixels in urban studies [36,53,54]. Hence, focusing on the peer-reviewed research of the past ten years is considered appropriate for studying the recent findings of ACM roof identification. The searched databases were Scopus, ScienceDirect, Google Scholar, and some high-impact journals such as Remote Sensing of Environment, the ISPRS Journal of Photogrammetry and Remote Sensing, IEEE Transactions on Geoscience and Remote Sensing, and the International Journal of Remote Sensing.

Data Analysis
Accuracy assessment indicates the performance of a classification process by calculating (1) user accuracy (UA), showing the omission error (OE), where pixels are incorrectly classified as a labelled class; (2) producer accuracy (PA), which represents the probability that a particular pixel of an area on the ground is classified as such, and also allows calculating commission error (CE); and (3) overall accuracy (OA) represents the percentage of total pixels correctly classified [33]. As shown in Figure 1, the process of ACM roof identification includes different stages that could be performed by various methods, techniques, and tools. ACM roof mapping studies have used numerous remote sensing data, feature engineering methods and machine learning classification algorithms [28,29,32]. The accuracy assessment of ACM classes is thus affected by different attributes of RSI data and the various factors used in the mapping procedures [55,56]. Hence, it may be inappropriate to compare the accuracy results in quantitative analysis. In other words, comparing the percentage-based accuracies of the works might be inconclusive in terms of determining the most effective methods and techniques of ACM roof mapping. Moreover, other contributing factors, such as the available RSI data, the research cost, and the researchers expertise, affect the mapping accuracy results.
Given the unmeasurable factors in the ACM roof mapping process, a qualitative approach was adopted to review, synthesise, and discuss the conducted works. First, a summary of each selected ACM roof mapping study is provided; then, all studies are grouped based on their input imagery data and classification methods. Second, each study is critically reviewed in terms of RSI attributes, image processing, feature extraction, classification method, and the accuracy or presence of errors in the mapping results. The major challenges affecting the mapping process are determined, and sources are compared to identify the common challenges. Then, the results are narratively synthesised, and their general themes are evaluated in both the reviewed studies and other closely related applications of RSI. The conclusive information is discussed to develop strategies associated with input RSI, image fusion and processing, feature engineering, and the classification process. Moreover, further avenues of study are recommended by reviewing several successful applications of image classification in urban studies.

Input Data for ACM Roof Classification
Satellite and aerial sensors provide a wide array of remote sensing data (i.e., optical, LiDAR, and synthetic aperture radar images), differing in spatial, spectral, and radiometric resolution. RSI provides useful data for classifying different surfaces on the earth. The first remote sensing studies attempted to map aggregated land-cover classes-for example, forests and artificial surfaces-and, later, change the detection of land-cover land use [57]. The evolution of remote sensing systems over the past 30-40 years has led several studies to focus on roof classification, particularly ACM roof mapping.
In studies of ACM roof mapping, various types of hyperspectral data, multispectral data, and their fusion with auxiliary data have been used as the input data. The first studies investigated detection of ACM roofs based on spectral features without considering the pixel sizes of images [20][21][22][23]. Due to issues such as mixed pixels caused by the low spatial resolution of the images acquired from early hyperspectral sensors, these studies did not generate a consistent accuracy result. However, with advancements in the features of hyperspectral sensors, a number of studies have demonstrated acceptable classification accuracy of ACM roofs [58]. Furthermore, other studies have focused on the use of a combination of bands from multispectral images [10][11][12]31]. While these studies reported acceptable accuracy results, issues such as spectral confusion affected the mapping process of ACM roofs. However, the fusion of different image types, such as the combination of multispectral and LiDAR images [30,32], significantly improved the accuracy results of ACM roof mapping. From the different results gained by using various remote sensing data types, it can be concluded that there is no fit-for-all input data to obtain an acceptable ACM roof accuracy result.
Different image types are used due to the various features that each RSI type can provide. Selection of the input data requires prior knowledge of the quality and adequacy of RSI and corresponding sensors. RSI types have common characteristics despite the different resolutions of RSI types. First, they are geolocated, which means that each pixel of an image corresponds to a spatial coordinate. This facilitates the fusion of multiple spatial and non-spatial sources with a single pixel or a segment of the image [37]. Second, remote sensing images are geodetic measurements that provide accurate estimates of geoparameters. This allows the utilised images to provide geophysical quantities through detection or classification processes. On the other hand, remote sensing images differ in their attributes and contents. For instance, while multispectral infrared visible imaging spectrometer (MIVIS) images are rich in hyperspectral bands, most satellite sensors provide a limited number of spectral bands but wide area coverage.
Therefore, selecting the required imagery types and their processing architecture is a trade-off between multiple factors. While the image resolution (i.e., spatial, spectral, temporal, and radiometric) is a key factor, other factors such as cost and the availability of data [22] play a role in selecting the required imagery type. Moreover, potential challenges should be considered when gathering the input data. For example, there are generally errors in roof classification due to some independent factors affecting the OA of the classification [23]. These errors may be caused by the diverse colour of ACM roof coverings exposed to U.V. radiation, decreased homogeneity of the surface due to mixed materials, or different geometrical shapes resulting in different lighting effects [29]. In this regard, adopting multisource RSI by combining the input RSI with other auxiliary data could improve the classification accuracy results. Using LiDAR data acquired from active sensors has shown the enhanced performance of the conventional classification methods [30,32]. Despite the advantages of RSI's rich content, "big data" are a challenge, requiring the process architecture and classification algorithms to be efficient and fast enough to handle the massive volume of remote sensing data [37]. While this issue could be unavoidable, DLbased architectures perform better than PIBA and OBIA methods when extracting features from large datasets [37,49]. Moreover, attempts have been made to design lightweight and compact classification models, which could significantly assist in handling the challenges of large volume and complexity of remote sensing data [59,60].

Hyperspectral Imagery (HSI)
HSI has been widely used in studies of ACM roof mapping. Hyperspectral systems generally record more than 20 to 1000 bands and are generally acquired by airborne platforms [25]. Furthermore, hyperspectral systems provide images with both high spatial and spectral resolutions, making them efficient input data for ACM studies [24]. In fact, HSI from aerial sensors represents the spectral heterogeneity of roof classes as well as provides very high spatial resolution [52]. Thus, several studies have focused on the single use of spectral data to investigate the efficiency of HSI itself or compare the OA gained by different classifiers [13,[20][21][22][23][24]. Table 1 provides a summary of studies associated with ACM roof classification using airborne HSI. The early HSI used for ACM mapping did not provide adequate spatial resolution to classify small roofs. While sensors such as MIVIS were rich in hyperspectral data, they led to significant omission errors (OEs) of ACM roofs with a small surface [12]. Fiumi et al. (2012) [20] used MIVIS images with a spatial resolution of 4 × 4 m, including 102 spectral channels ranging between 0.433 and 12.70 µm. The Spectral Angle Mapper (SAM) algorithm was adopted to classify HSI using ENVI software. Moreover, the error caused by the atmosphere layer between the sensor and the earth s surface was corrected using the international average relative reflectance radiometric calibration method. While a high classification accuracy was reported, identified roofs had large areas with an average roof area of around 1200 m 2 . As the accuracy will increase when the average roof area increases [53], the results of this study may not be inconsistent.
Large OE of ACM roofs was reported in the other studies in which the OA was acceptable. Frassy et al. (2014) [22] worked on the quantitative mapping of roofs with asbestos cement (AC) covering a large region in Italy. The 102-channel MIVIS data with different pixel resolutions from 4 m to 9 m were classified using the SAM algorithm. Concerning AC roof units, while the result revealed a small commission error with less than 10% misclassification (false positives), the OE was quite large. Approximately 50% of the ACM roofs were not identified with the MIVIS images (false negatives) and thus were omitted in the thematic map. As a result, only 43% of the ACM roofs were correctly mapped considering the roof units, despite the OA of the roofs surface being approximately 80%.
To detect ACM roof units, Frassy et al. (2014) [22] suggested that a window of 3 × 3 pixels is a threshold for classifying an individual ACM roof. For example, using MIVIS images with a 4 m spatial resolution, only roof units with an area of at least 144 m 2 (RSLT144) are detected with reasonable confidence. Computing statistics for RSLT144 led to significant improvements in the classification results of another independent study by [20]. They reported that OA increased from 43% to 75%, and OE decreased from 48% to 19% for the entire dataset. The study indicated that the spatial resolution of images was the main source of errors, while the topographic features of the area did not significantly affect the accuracy of roof classification results.
Another issue with HSI in early studies was that the single use of spectral data was not practical for discriminating ACM roofs with complex geometry. Other than flat roofs, numerous forms of ACMs roofs (e.g., vaulted and pitched typologies) have geometry that reflects complex patterns captured by sensors. With particular attention to the various forms, Fiumi et al. (2014) [21] utilised MIVIS images with 3 m spatial resolution and 102 spectral channels to investigate the mapping of AC roofs over the Lazio Region in Italy. The results indicated that while the SAM classifier failed to accurately discriminate ACM roofs, during an additional process using the minimum bounding algorithm, only 63% of vaulted ACM roofs and 71% of pitched ones were classified accurately. However, using HSI of these roof typologies has been less investigated in ACM roof mapping studies despite the low accuracies resulting from the complex relationship between a light source and a roof-covering geometry [21].
The mixed spectral reflectance caused by complex geometry shapes was also reported as a source of error. Szabo et al. (2014) [23] used AISA Eagle II images with the spatial resolution of 1 m and 126 spectral bands over the 7 km 2 area of Debrecen, East Hungary. The original HSI dataset was denoised using the maximum noise fraction (MNF) transformation, and a new dataset was created. The original HSI and the MNF-transformed dataset were classified by the SAM, support vector machine (SVM), and maximum likelihood (ML) algorithms. While the classification details of ACM roofs were not provided in the study, SVM performed best for both datasets with an OA of around 80% [23].
With the advancement of hyperspectral sensors, several studies have reported significant improvement in the classification accuracy of ACM roofs. Recent aerial sensors could provide both high spectral and high spatial resolution.   [24] worked on two subsets of HSI by combining a data mining (DM) method with object-based image analysis (OBIA) to classify the urban surface materials of the study areas. One of the datasets provided 20 spectral bands with a 1 m spatial resolution, and the other provided 102 spectral bands with 0.68 m pixel sizes. An OA of 93.42% was achieved by one of the subsets with a PA and UA of 100% for asbestos roofs. Cilia et al. (2015) [13] also reported acceptable results of deploying MIVIS images with 102 channels classified by SAM algorithm. An average UA of 89% and PA of 86% was achieved for the ACM class, which indicated a reliable result of using HSI in roof mapping studies [11].
However, despite the advantages of using HSI, some limiting factors for accurately classifying ACM roofs exist. The high dimensionality of hyperspectral data can adversely affect the accuracy of the classification process. Traditional classification algorithms generally do not perform well due to the massive volume of spectral data [52]. The intrinsic dimensionality of the original data must usually be compressed, and the maximum amount of information should be preserved for classification. This is due to the traditional algorithms limited capabilities to perform in large dimensional spaces. In particular, the selection of bands, wavelengths, and an optimal resolution of hyperspectral data is a challenging task in the classification process [28,29]. Moreover, the scene features could cause challenges. The mixed materials of roof coverings or the near homogeneity of spectral data [32], different materials in parts of the roofs, and different colours of aged roofs [16] could cause spectral confusion in the classification of roofs [13]. Given the issues with hyperspectral data, other studies have investigated different remote sensing paradigms for ACM roof mapping.

Multispectral Imagery (MSI)
Several studies have indicated the possibility of using MSI for mapping ACM roofs. MSI commonly comprises remote sensing images with spectral bands of 2 to 13 (i.e., Sentinel-2). The number of bands is the main difference between MSI and HSI, despite their similar 3D structure [49]. MSI acquired by space-born sensors provides a broader coverage area [11] and cheaper cost [10] in comparison with airborne hyperspectral sensors, which have limited coverage and availability. Furthermore, although HSI usually provides both higher spatial and higher spectral resolution [11], the use of MSI could address the problems with HSI, such as high dimensionality and different noises caused by the large number of bands [27]. Hence, many ACM roof mapping studies have focused on MSI and its fusion with other remote sensing data. Table 2 provides a summary of studies that adopted MSI in ACM roof mapping. Mapping roofing materials, including one ACM class [27,31] Mapping impervious and pervious surfaces, including one ACM class [29] Mapping intra-urban land cover including three ACM classes [28] Eight bands Fusion with LiDAR Mapping intra-urban land cover, including one ACM class [30] WV- Mapping intra-urban land cover including three ACM classes [26] In the reviewed studies of ACM roof mapping, very-high-resolution MSI from spaceborne sensors is mostly used as the input data. Among the MSI categories based on the spatial resolution (i.e., coarse, medium, and high spatial resolution data) [13], images with a spatial resolution of more than five meters are usually inappropriate for urban studies due to the problem of mixed pixels [56]. When the pixel size is bigger than a roof unit, the pixel of the ACM roof mixes with the pixels of the surrounding roof units or other objects, causing salt-and-pepper errors [57]. WorldView-2 (WV-2) images with eight multispectral bands (2 m spatial resolution) and a panchromatic band (0.5 m resolution) are considered to be appropriate input data for ACM roof mapping [10,[27][28][29][30][31]. Moreover, since the launch of the WorldView-3 (WV-3) satellite in 2014, images with finer resolution (eight bands with a 1.24 m resolution and a 0.3 m-resolution panchromatic band) have been used in the studies of ACM roof mapping [32].
Multispectral data from aerial imagery have also been used for ACM roof mapping. While the input data used in the literature stemmed primarily from satellite sensors, airborne MSI provides higher resolution than satellite images. However, aerial images may not be available for the areas under study and have limited coverage compared with images acquired from satellite images. Krówczyńska et al. (2020) [12] used two sets of aerial images in natural colour (RGB) and colour infrared (CIR) to identify AC roofs in Chęciny, Poland. As one of the few studies of ACM roof mapping, the work investigated a deep learning algorithm, namely the convolutional neural network (CNN). The classification results revealed a PA of 89% and an OA of more than 87% for both datasets. However, the proposed method may not successfully classify roofs in dense urban areas, although it yielded acceptable results for rural areas. Furthermore, the trained database could not be transferable to other study areas, and studying different areas required the creation of newly trained databases.
Attempts have been made to achieve the autonomous identification of ACM roofs using MSI. Tommasini et al. (2019) [11] worked on autonomous identification of ACM roofs over a metropolitan area in Prato (Italy) using a tool in QGIS software. WV-3 images were used as the input data, and the classification result was reasonably good, with only some false positives and negatives. Although this study illustrated a proper tool for automatically identifying asbestos, preprocessing images (i.e., pan sharpening) required a non-automatic process. The preprocessing of MSI, mostly pan sharpening, is a significant factor in improving the resolution of images by combining required bands.

Pan Sharpening of Satellite Imagery
For extracting land-cover information (e.g., roofs with ACM coverings), spatial resolution is considered a key factor. Images with finer spatial resolutions are more valuable than those with a higher spectral resolution [11]. The pan sharpening method is commonly applied to improve multispectral bands spatial resolution by combining them with panchromatic (PAN) images. The reason is that the spatial resolution of multispectral images is lower than panchromatic images of the same scenes acquired by the same satellite sensor [61,62]. Therefore, while multispectral bands provide the required spectral content for discriminating roof covers, their fusion with PAN images of the same scenes improves the spatial resolution of the input dataset, resulting in a more accurate description of roof textures and shapes [63].
The pan sharpening method has been widely used to improve the classification accuracy of ACM roof classes in studies of roofing material identification. Abriha et al. (2018) [31] conducted a study on WV-2 to identify roof materials and investigate the potential efficiency of pan sharpening. By adopting pan sharpening with the Gram-Schmidt method, the panchromatic band (450-800 nm) with a 0.5 m resolution was fused with the lower resolution multispectral bands with a 2 m geometric resolution. The roof materials were divided into three and six roof classes (asbestos, brown tile, and red tile); the data were then divided into shadowed and sunny roof parts and classified by discriminant function analysis (DFA) and random forest (RF) classifiers. The results revealed that while the OA was above 85%, asbestos was classified with more than 95% accuracy and identified successfully with all classifiers.
On the other hand, the classifier adapted for pan-sharpened images plays a significant role in increasing the classification accuracy. While the accuracy of classified ACM roofs increases with pan sharpening, some image processing techniques and classification algorithms work more efficiently with pan-sharpened images [64]. To evaluate the accuracy results of different classifiers on pan-sharpened WV-2 images, Gibril et al. (2017) [10] employed both supervised classifiers and a rule set combined with the Taguchi optimisation technique. The supervised classifiers showed some misclassifications between ACM roofs and spectrally similar materials, while some classifiers provided acceptable results. On the other hand, combining the rule-based OBIA method with the Taguchi technique showed a high overall accuracy of around 90% and 93% for two pan-sharpened WV-2 datasets.

Fusion of MSI with Light Detection and Ranging (LiDAR)
The fusion of MSI with 3D point cloud data from LiDAR sensors has shown significant enhancement in ACM roof classification results [30,32]. LiDAR provides 3D topographic information of surveyed areas on the Earth s surface. These sensors do not provide information on the lighting colour and intensity of the surface, but the depth measurement offers information to extract 3D building information and the features of roof shapes. In fact, LiDAR sensors are insensitive to lighting conditions and are unaffected by shadows or poor lighting intensity [65]. While 2D planimetric multispectral data could be affected by factors such as poor contrast, skewed image perspectives, and shadows [66], the fusion of MSI with LiDAR data could not only provide 3D measurements of ACM roof shapes, but could also address the lighting condition problems affecting 2D data.
Using multisource images, moreover, could result in assessing the roof surfaces details. Norman et al. [32] utilised the combination of WV-3 with LiDAR data to map detailed roof surfaces and evaluate the condition of the covering materials. The results reported a high OA of roof detection and analysed the degradation status of detected roofs. While the layer stacking method for the combination of WV-3 and LiDAR data resulted in an OA of 87%, another tested fusion method, namely principal components spectral sharpening, was not efficient enough, resulting in only 43% OA. Hence, while multisource images could enhance the mapping performance, the fusion method used in the process could significantly affect the results.
Furthermore, the algorithms and techniques adopted in the classification phase are key factors in improving ACM roof mapping accuracy.   [30] studied the combination of WV-2 and LiDAR for detailed urban mapping. The study used different pixel-based classifiers and developed a ruleset to evaluate the results of the classification methods. The pixel-based classifier provided lower-accuracy results than rule-based classification methods with the same multisource images of the studied area. Furthermore, while pixel-based classifiers resulted in mixed pixels and a salt-and-pepper effect, using fused images significantly improved classification results by decreasing the spectral variation and spatial heterogeneities of intra-urban classes. This indicates that while the input data and preprocessing image method are important, the accuracy of the mapping process also depends on the classification methods and techniques.

Classification Methods of ACM Roof Mapping
This section reviews the image processing tasks performed to map ACM roofs. The targeted studies are categorised based on the adopted image classification approaches and critically reviewed in terms of deployed classifiers and their accuracy assessment results.

Pixel-Based Image Analysis (PBIA) Approach
Early studies on mapping ACM roofs adopted the PBIA approach [13,[20][21][22][23], which relies on a basic image processing concept developed in the 1970s [38]. Pixel-based methods analyse spectral information on individual pixels. In traditional PBIA methods, pixels are labelled to a predefined class by comparing the similarity of spectral values (i.e., spectral signature). In particular, RSI is classified by conducting spectral pattern recognition, a category of image classification that uses the pixel-by-pixel spectral features as the basis of image processing [39], wherein no spatial patterns and contextual information are employed in the classification process.
Selecting a suitable classifier is a challenging task in per-pixel spectral-based image classification [67]. Uncertainty often exists in identifying the appropriate classification algorithm because multiple factors, such as the type of input data, as well as the size and quality of training data, should be considered when comparing the available methods [68]. Several ACM roof mapping studies have thus used different algorithms for pixel-based classification of the images. While several ACM mapping studies have utilised the SAM classifier [20][21][22], other studies have obtained the best results from classifiers such as DFA and RF [13,31]. Moreover, few studies have attempted to classify the same subset of input data by using different classifiers.  [69] claimed that SVM is a suitable classifier for classifying impervious surfaces, while the SAM algorithm exhibits a better performance when an adequate amount of training data are available. Therefore, it is difficult to define a fit-for-all algorithm for classifying ACM roofs. Table 3 summarises the PBIA studies of ACM roof mapping. As explained in Section 3.1, the pixel size of the images in early studies was not fine enough for one to identify roof units in an image [22,23]. As indicated by [70], a 4 m wide object requires a minimum spatial resolution of 2 × 2 m (i.e., a minimum of four pixels is required). Consequently, the spatial resolution of the image must be at least half the size of the smallest roof unit. However, this does not align with the real-world location of roof units over image pixels [71]. For example, an object with a 4 m length and width is unlikely to fit perfectly over four pixels with a 2 m spatial resolution [72]. Frassy et al. (2014) [22] indicated that 3 × 3 image pixels are a threshold for optimal detection of ACM roof units. Considering this assumption, when the remotely sensed image has a 4 m spatial resolution, only roofs larger than 144 m 2 will be detected with reasonable confidence, as illustrated in Figure 3.
On the other hand, with the improvement of RSI resolution, the per-pixel spectralbased method was no longer effective for classifying ACM roofs. Several features on the Earth s surface are composed of materials that may show similar spectral responses when using high-resolution data [72]. By deploying very high-resolution RSI, the spectral responses of some roofing classes in urban areas showed complex patterns [73]. This affects the accuracy results of image classification because of the increasing intra-class variability of classes. The use of the PBIA approach with traditional classifiers led to unreliable classification results because it works on only spectral information without considering the spatial patterns of roofs [74]. Hence, due to the issues such as confusion among classes, adopting traditional pixel-based classifiers may not be efficient for classifying very high-resolution images [30].
As explained in Section 3.1, the pixel size of the images in early studies was not fine enough for one to identify roof units in an image [22,23]. As indicated by [70], a 4 m wide object requires a minimum spatial resolution of 2 × 2 m (i.e., a minimum of four pixels is required). Consequently, the spatial resolution of the image must be at least half the size of the smallest roof unit. However, this does not align with the real-world location of roof units over image pixels [71]. For example, an object with a 4 m length and width is unlikely to fit perfectly over four pixels with a 2 m spatial resolution [72]. Frassy et al. (2014) [22] indicated that 3 × 3 image pixels are a threshold for optimal detection of ACM roof units. Considering this assumption, when the remotely sensed image has a 4 m spatial resolution, only roofs larger than 144 m 2 will be detected with reasonable confidence, as illustrated in Figure 3. On the other hand, with the improvement of RSI resolution, the per-pixel spectralbased method was no longer effective for classifying ACM roofs. Several features on the Earth′s surface are composed of materials that may show similar spectral responses when using high-resolution data [72]. By deploying very high-resolution RSI, the spectral responses of some roofing classes in urban areas showed complex patterns [73]. This affects the accuracy results of image classification because of the increasing intra-class variability of classes. The use of the PBIA approach with traditional classifiers led to unreliable

Object-Based Image Analysis (OBIA) Approach
Taking into account the limitations associated with PBIA (refer to Section 4.1), objectbased image analysis (OBIA) has emerged as a more sophisticated approach, classifying images with a high spatial resolution by analysing both spatial and spectral features of RSI [75]. In OBIA methods, images are segmented at multiple scales, the analysis units rather than pixels [76][77][78]. Segmented pixels (also known as objects) are contiguous homogenous pixels grouped by similar attributes. In OBIA methods, when objects are created, segments attributes are computed, and a number of rules are developed for classifying features. These rules assist the classification process using attributes such as texture, size, and geometry (area and length) [26]. Several ACM roof mapping studies utilised OBIA methods using different RSI data. Table 4 summarises the studies that have used OBIA methods. Two main steps in OBIA methods are often image segmentation and classification [79,80]. OBIA starts with image segmentation, which critically affects the accuracy of the following feature engineering and classification steps [81,82]. In the image segmentation process, images are divided into homogeneous patches [83]. These patches represent the Earth s surface features, such as roofs, roads, and grasslands [84]. Studies of ACM roof mapping have often employed two segmentation techniques: (1) edge based and (2) region based. While edge-based and region-based techniques theoretically demonstrate the same object in different representations, edge-based methods could lead to different results than regionbased methods [85,86].
Among the object-based ACM roof classification literature, several studies have implemented an edge-based segmentation approach using a multiscale segmentation algorithm featured as an ENVI feature extraction tool [24,[27][28][29][30]. In edge-based techniques, the edges are first identified and then connected to one another using contouring algorithms [87][88][89][90] with the assumption that the pixel features abruptly change between edges [91]. From this perspective, edges are defined as boundaries between objects and where changes occur [92]. A multiscale edge-based segmentation [93] generates objects based on having similar attributes such as textural, spatial, and spectral index indicators. An example of multiscale edge-based segmentation of HSI is shown in Figure 4.

Image Segmentation in the OBIA Process
Two main steps in OBIA methods are often image segmentation and classification [79,80]. OBIA starts with image segmentation, which critically affects the accuracy of the following feature engineering and classification steps [81,82]. In the image segmentation process, images are divided into homogeneous patches [83]. These patches represent the Earth′s surface features, such as roofs, roads, and grasslands [84]. Studies of ACM roof mapping have often employed two segmentation techniques: (1) edge based and (2) region based. While edge-based and region-based techniques theoretically demonstrate the same object in different representations, edge-based methods could lead to different results than region-based methods [85,86].
Among the object-based ACM roof classification literature, several studies have implemented an edge-based segmentation approach using a multiscale segmentation algorithm featured as an ENVI feature extraction tool [24,[27][28][29][30]. In edge-based techniques, the edges are first identified and then connected to one another using contouring algorithms [87][88][89][90] with the assumption that the pixel features abruptly change between edges [91]. From this perspective, edges are defined as boundaries between objects and where changes occur [92]. A multiscale edge-based segmentation [93] generates objects based on having similar attributes such as textural, spatial, and spectral index indicators. An example of multiscale edge-based segmentation of HSI is shown in Figure 4. Other object-based ACM roof classification studies have adopted region-based methods as a suitable way to create roof segments [10,26,32]. In region-based methods, the determination of segments starts from the inside of the objects and then expands outward until the value of neighbouring pixels changes [94,95]. Moreover, it is assumed that the neighbouring pixels within the same region have similar values [43], which is the opposite of edge-based methods [96]. Multiresolution segmentation (MRS) is a region-growing algorithm that has been widely used in remote sensing applications [41,80,97]. This algorithm considers a pixel as a seed that grows until it produces meaningful objects based on predefined local thresholds of scale, shape, and compactness parameters [44,98].
The segmentation scale is a significant aspect of the RSI segmentation process. Large image objects caused by high-scale values (under-segmentation) and small objects produced by low-scale values (over-segmentation) make it challenging to generate meaningful objects effectively [47]. For MRS, the segmentation scale is even more significant, as it determines the size and shapes of the objects affecting the classification results [99][100][101][102]. Other object-based ACM roof classification studies have adopted region-based methods as a suitable way to create roof segments [10,26,32]. In region-based methods, the determination of segments starts from the inside of the objects and then expands outward until the value of neighbouring pixels changes [94,95]. Moreover, it is assumed that the neighbouring pixels within the same region have similar values [43], which is the opposite of edge-based methods [96]. Multiresolution segmentation (MRS) is a region-growing algorithm that has been widely used in remote sensing applications [41,80,97]. This algorithm considers a pixel as a seed that grows until it produces meaningful objects based on predefined local thresholds of scale, shape, and compactness parameters [44,98].
The segmentation scale is a significant aspect of the RSI segmentation process. Large image objects caused by high-scale values (under-segmentation) and small objects produced by low-scale values (over-segmentation) make it challenging to generate meaningful objects effectively [47]. For MRS, the segmentation scale is even more significant, as it determines the size and shapes of the objects affecting the classification results [99][100][101][102]. As mentioned above, therefore, it is crucial to identify the appropriate segmentation scale for generating meaningful objects [103].
Several methods have been used to optimise segmentation scale parameters [104,105]. In ACM roof-mapping studies, integrative image segmentation optimisation [47] and the robust statistical Taguchi method have optimised MRS parameters [10]. However, a number of ACM studies have adopted a trial and error approach, in which the segmentation parameters have been determined based on analysts experience, which may not be reliable [106]. In another paradigm of segmentation approaches, some studies suggested that the collaboration of segmentation with classification effectively addresses the challenges of selecting optimal parameters for segmentation algorithms [41]. In this approach, while the object parameters may not be optimised, adding a segmentation step in the classification process addresses issues such as over-segmentation [42,107].

Image Classification in the OBIA Process
By generating the objects and computing their features (i.e., object characteristics), the classification step is performed in the OBIA methods. In the ACM roof identification studies, three classification approaches are often adopted to assign a class label to objects: supervised [10,27,29,32], supervised rule-based [24,26], and expert systems [28,30]. The first approach is the supervised classification approach, also known as nearest neighbour classifiers [72], which utilises training data to assign the segmented objects to the classes. The second approach is a form of rule-based classification, also known as membership function classifiers (providing fuzzy or crisp membership functions), which utilise a ruleset to extract the features. The crisp function computes the rules from a training sample, also known as a supervised ruleset. The third approach uses a ruleset to assign labels to roofs, which relies on an analyst s knowledge. Some studies adopted multiple methods to compare the efficiency of different methods [108].

Supervised Classification
Supervised classification utilises training data to assign segmented objects to roofing classes, which can be defined by any existing data, such as field surveys, laboratory measurements, and libraries of RSI with high spatial resolution [109]. The ease of use, high degree of accuracy [110], and features offered by various software have led a vast number of studies to adopt the supervised approach in applications such as roofing material classification. The supervised classification approach is a dominant and widely used trend in OBIA studies [111]. However, the studies that adopted supervised classifiers reported several issues in workflow steps [112], such as sample selection [113], feature selection [114], and accuracy assessment [115].
Reliance on training data in supervised classification causes major problems that make this method unsuitable for universal use. The size and representativeness of training data require the consideration of different factors, such as the spatial resolution of input data, the availability of ground references, and the geophysical situation of study areas [116]. The sample collection strategies could significantly affect the classification accuracy results [117]. Moreover, gathering sufficient training data could be challenging when the study areas are complex and heterogeneous. Therefore, poorly defined or insufficient training samples could significantly affect the accuracy of the classification [118,119]. Furthermore, the training data make the supervised technique less transferrable to other images [65,120]. The classification of wider-scale and multi-image areas requires configuration or a new set of training samples due to the different inherent attributes among images [121], while the collection of the training data could be costly in terms of time and money [122]. This is a limitation of autonomous ACM roof classification over wide-scale study sites.

Supervised Rule-Based Classification
Generally, rule-based systems in the OBIA image classification methods adopt "If . . . Then . . . Else" threshold rules to assign classes (e.g., ACM roof) to segmented pixels [123]. In supervised rule-based classification methods, rules are autonomously developed from training data to determine the relationships between RSI and the defined classes. In ACM roof studies, decision trees (DTs) are widely used for autonomous generating rulesets of roof classification [124]. In this method, training data are divided into homogenous subsets based on the threshold values of image features. By defining subset variances, subsequent subsets are generated in a hierarchical structure until they reach the predefined classification tree levels [125].
A significant advantage of rule-based methods is the transparency of the feature engineering and classification procedure because the attributes and rulesets are represented [108,126,127]. Unlike "black-box" supervised methods, the analyst can measure data on different scales and interpret the ruleset data [128]. Moreover, DTs make no assumptions regarding the frequency distributions of image classes, and classification performance can be improved using boosting techniques [129]. Hence, the supervised rule-based OBIA may be a more appropriate method than the supervised OBIA due to the transparency of image processing steps.
Successful applications of DT methods are reported in ACM roof studies. Pinho et al. (2012) [26] adopted DT to classify urban feature classes using IKONOS images. ACM roofs were defined in three categories: dark, medium-toned, and light asbestos. The study found that while light asbestos roofs gained acceptable accuracy with a PA of 90% and a UA of 94%, dark and medium-toned asbestos roofs had lower classification accuracies. In this study, while the method s transferability was not tested in more study areas, DT significantly improved image classification accuracy, achieving an OA of 71.91% [130].
In another study by   [24], two sets of AISA imagery and the C4.5 data mining algorithm were combined with DT to develop a knowledge model of HSI. The DT method identified relevant attributes with a high classification capability based on the training images. The DT learning algorithm used in this study offered an autonomous feature selection capability to develop the classification ruleset. The results of HSI study demonstrated high accuracy for asbestos roofs, with an OA between 88% and 93% and an asbestos PA of 100% and 84% for the two sets of images, respectively [24]. The explained method is suitable for autonomously developing the rule sets from training data with several attributes and image segments, addressing the high dimensionality of HSI.

Fuzzy Rule-Based Classification (Expert System)
The rule set in an expert system is based on the analyst s knowledge and reasoning to extract the features [131]. In an expert system, the analyst determines various parameters such as spectral, spatial, and contextual attributes to create classification rules [10]. In the fuzzy rule-based methods, fuzzy logic is utilised as an effective tool for generating ruleset parameters, and based on the knowledge of the analyst, the attributes are selected and examined to create the classification ruleset. This approach has several advantages compared to supervised and supervised rule-based OBIA approaches. On the one hand, the presentation of ruleset data in a logical and flexible structure and, on the other hand, the modular arrangement of the ruleset allows for classification enhancement via easy updates and alterations [123].
Studies of intra-urban mapping with ACM classes showed that expert systems often outperform supervised classifiers [10,[28][29][30]127]. Gibril et al. (2018) [10] utilised two subsets of WV-2 to map AC roofs by using an expert-developed ruleset (see Figure 5) and different supervised classifiers of SVM, RF, k-NN, and Bays. While the overall accuracy of the supervised classifiers was acceptable, results showed misclassification of surfaces with similar spectral responses with ACM roof materials. On the other hand, the ruleset result revealed a better performance in the classification of ACM roofs compared to supervised classifiers for the two subsets of WV-2 images.   [30] developed a ruleset considering spatial, elevation, and spectral features to map urban areas by combining WV-2 and LiDAR images. The PBIA classifiers (i.e., SVM and ML) were also used to compare the accuracy results with the rule-based classification. While the OA of the RF and SVM classifiers gained 72.46% and 75.69%, respectively, the rule-based classification gained an OA of 92.84%. Moreover, asbestos roofs gained a higher PA and UA than the supervised classifiers. However, the transferability of the proposed ruleset was not investigated in other images.
result revealed a better performance in the classification of ACM roofs compared to supervised classifiers for the two subsets of WV-2 images.   [30] developed a ruleset considering spatial, elevation, and spectral features to map urban areas by combining WV-2 and LiDAR images. The PBIA classifiers (i.e., SVM and ML) were also used to compare the accuracy results with the rule-based classification. While the OA of the RF and SVM classifiers gained 72.46% and 75.69%, respectively, the rule-based classification gained an OA of 92.84%. Moreover, asbestos roofs gained a higher PA and UA than the supervised classifiers. However, the transferability of the proposed ruleset was not investigated in other images. Transferability is a significant advantage of a classification model, which allows acceptable results to be obtained when reusing the model for other image datasets [132]. The OBIA rulesets are often transferable or reproducible, and can be used to classify other RSI datasets with different spatial and spectral features [133]. As mentioned before, the supervised OBIA methods rely on training data that may not be transferable to other study areas [65], whereas developed rulesets in expert systems can be reapplied to other areas [134] and temporal images with or without some manual editing [135]. In studies of ACM roof classification, successful attempts have been made to develop transferable models using fuzzy rule-based methods. Hamedianfar et al. (2015) [28] tested the transferability of fuzzy rulesets on the WV-2 images of three different study areas. The classification rules were developed based on expert knowledge of the spatial features and spectral indexes. Without any changes in the features′ thresholds, the rulesets performed well for all datasets, resulting in OAs of more than 86%.

Deep-Learning-Based (DL-Based) Approach
Deep-learning-based (DL-based) methods and techniques have recently been rapidly used in several image processing applications [136,137]. By increasing the number of "depths" or "hidden layers" of machine learning methods, these architectures improve the performance and accuracy of the computation process [40]. While OBIA was considered more suitable than PBIA for many years, both approaches had limitations in widespread applications because of issues such as classification errors and imbalances in classes [37]. Deep learning methods enhance the image classification process through efficient and automatic feature extraction from a large number of images, and they reduce classification errors by adopting complex models in regression [138]. Additionally, DL-based methods have a better performance than PBIA and OBIA methods when handling the Transferability is a significant advantage of a classification model, which allows acceptable results to be obtained when reusing the model for other image datasets [132]. The OBIA rulesets are often transferable or reproducible, and can be used to classify other RSI datasets with different spatial and spectral features [133]. As mentioned before, the supervised OBIA methods rely on training data that may not be transferable to other study areas [65], whereas developed rulesets in expert systems can be reapplied to other areas [134] and temporal images with or without some manual editing [135]. In studies of ACM roof classification, successful attempts have been made to develop transferable models using fuzzy rule-based methods. Hamedianfar et al. (2015) [28] tested the transferability of fuzzy rulesets on the WV-2 images of three different study areas. The classification rules were developed based on expert knowledge of the spatial features and spectral indexes. Without any changes in the features thresholds, the rulesets performed well for all datasets, resulting in OAs of more than 86%.

Deep-Learning-Based (DL-Based) Approach
Deep-learning-based (DL-based) methods and techniques have recently been rapidly used in several image processing applications [136,137]. By increasing the number of "depths" or "hidden layers" of machine learning methods, these architectures improve the performance and accuracy of the computation process [40]. While OBIA was considered more suitable than PBIA for many years, both approaches had limitations in widespread applications because of issues such as classification errors and imbalances in classes [37]. Deep learning methods enhance the image classification process through efficient and automatic feature extraction from a large number of images, and they reduce classification errors by adopting complex models in regression [138]. Additionally, DL-based methods have a better performance than PBIA and OBIA methods when handling the massive volume of image data, particularly in supervised classifications with large training datasets [37,49,60]. Hence, several studies have shifted to deep-learning-based methods, such as the pixel-based semantic segmentation of VHR images [139][140][141][142][143].
Despite the promising results of using deep-learning-based methods, they are not extensively explored in ACM roof mapping studies. To date, the only study that took a deep learning approach to ACM roofing identification was conducted by Krówczyńska et al. (2020) [12], who used convolutional neural networks (CNNs) to identify AC roofs. In this study, the RGB and CIR compositions of aerial photographs with 25 cm resolution were classified by a CNN network consisting of two convolutional blocks. ACM roofs were detected with an OA of around 89% and a PA of 89%. While the result of this study may not be conclusive for the successful adoption of the proposed model in dense urban areas [12], using the DL-based method resulted in an acceptable classification accuracy using images with few spectral bands.
The DL architectures such as fully convolutional networks (FCNs) have shown promising accuracy results in other image classification applications, such as urban object detection and LULC classification [144][145][146][147]. Moreover, DL-based techniques have been successfully combined with conventional approaches in image processing [148][149][150]. DL-based threedimensional spectral-spatial classification methods such as 3D-CNN (see Figure 6) have shown good performance in handling the high dimensionality of HSI [151]. Mask R-CNN has been widely used for object instance segmentation, which simultaneously generates high-quality segmentation masks and identifies objects in images [45]. Furthermore, the end-to-end deep learning architectures are a new paradigm wherein features are learnt automatically instead of being engineered [49]. Over the last few years, the end-to-end use of deep learning models has been prevalent within the remote sensing community. Despite the challenges arising from model development complexities and computational requirements, the end-to-end DL-based approach is a future avenue in ACM roof mapping. massive volume of image data, particularly in supervised classifications with large training datasets [37,49,60]. Hence, several studies have shifted to deep-learning-based methods, such as the pixel-based semantic segmentation of VHR images [139][140][141][142][143].
Despite the promising results of using deep-learning-based methods, they are not extensively explored in ACM roof mapping studies. To date, the only study that took a deep learning approach to ACM roofing identification was conducted by Krówczyńska et al. (2020) [12], who used convolutional neural networks (CNNs) to identify AC roofs. In this study, the RGB and CIR compositions of aerial photographs with 25 cm resolution were classified by a CNN network consisting of two convolutional blocks. ACM roofs were detected with an OA of around 89% and a PA of 89%. While the result of this study may not be conclusive for the successful adoption of the proposed model in dense urban areas [12], using the DL-based method resulted in an acceptable classification accuracy using images with few spectral bands.
The DL architectures such as fully convolutional networks (FCNs) have shown promising accuracy results in other image classification applications, such as urban object detection and LULC classification [144][145][146][147]. Moreover, DL-based techniques have been successfully combined with conventional approaches in image processing [148][149][150]. DLbased three-dimensional spectral-spatial classification methods such as 3D-CNN (see Figure 6) have shown good performance in handling the high dimensionality of HSI [151]. Mask R-CNN has been widely used for object instance segmentation, which simultaneously generates high-quality segmentation masks and identifies objects in images [45]. Furthermore, the end-to-end deep learning architectures are a new paradigm wherein features are learnt automatically instead of being engineered [49]. Over the last few years, the end-to-end use of deep learning models has been prevalent within the remote sensing community. Despite the challenges arising from model development complexities and computational requirements, the end-to-end DL-based approach is a future avenue in ACM roof mapping.

Discussion
This section represents a summary of the reviewed studies (Section 5.1), discusses the synthesised challenges, and highlights possible solutions identified in both ACM roof mapping studies and other remote sensing applications (Section 5.2). Figure 7 illustrates a tree diagram of the studies categorised according to their RSI, classifiers, and date order. The RSI, classier, and OA of each study are shown with colour

Discussion
This section represents a summary of the reviewed studies (Section 5.1), discusses the synthesised challenges, and highlights possible solutions identified in both ACM roof mapping studies and other remote sensing applications (Section 5.2). Figure 7 illustrates a tree diagram of the studies categorised according to their RSI, classifiers, and date order. The RSI, classier, and OA of each study are shown with colour codes for a comparison of their features. This study reviewed sixteen peer-reviewed journal articles investigating ACM roof mapping in a ten-year timeframe from 2012 to 2022. Furthermore, the studies are grouped based on the input RSI and classification methods. codes for a comparison of their features. This study reviewed sixteen peer-reviewed journal articles investigating ACM roof mapping in a ten-year timeframe from 2012 to 2022. Furthermore, the studies are grouped based on the input RSI and classification methods. In terms of RSI, six papers [13,[20][21][22][23][24] used HSI as the input data, whereas ten studies worked on MSI [10][11][12][26][27][28][29]31], in which two works combined RSI with LiDAR data [30,32] (see Table 5). The early HSI studies adopted the PBIA approach and often used SAM and SVM algorithms as the classifiers. Relying on the spectral information in traditional PBIA methods caused many issues, such as large OE and salt-and-pepper error. Hence, many recent studies adopted the OBIA approach, which takes the contextual factors into account for more accurate classification results. Supervised, supervised rulebased, and fuzzy rule-based methods were investigated for feature engineering and the classification process of ACM roof mapping. Only one paper [12] investigated the potential capabilities of deep-learning-based methods for ACM roof classification. While these studies mostly reported acceptable classification results, several challenges affected the mapping process or classification accuracy results. The following section discusses the major challenges synthesised from the reviewed works and highlights solutions adopted in other studies across different applications.  [10][11][12][13][20][21][22][23][24][26][27][28][29][30][31][32].

Summary of the Reviewed Studies
In terms of RSI, six papers [13,[20][21][22][23][24] used HSI as the input data, whereas ten studies worked on MSI [10][11][12][26][27][28][29]31], in which two works combined RSI with LiDAR data [30,32] (see Table 5). The early HSI studies adopted the PBIA approach and often used SAM and SVM algorithms as the classifiers. Relying on the spectral information in traditional PBIA methods caused many issues, such as large OE and salt-and-pepper error. Hence, many recent studies adopted the OBIA approach, which takes the contextual factors into account for more accurate classification results. Supervised, supervised rule-based, and fuzzy rule-based methods were investigated for feature engineering and the classification process of ACM roof mapping. Only one paper [12] investigated the potential capabilities of deep-learning-based methods for ACM roof classification. While these studies mostly reported acceptable classification results, several challenges affected the mapping process or classification accuracy results. The following section discusses the major challenges synthesised from the reviewed works and highlights solutions adopted in other studies across different applications.  Table 6 demonstrates the synthesis of major challenges in the reviewed ACM roof mapping studies and highlights the opportunities to enhance the procedure. The challenges and their associated opportunities are further discussed in the corresponding paragraphs. As explained in Section 2.2, adopting a systematic assessment approach to compare the accuracy results was inappropriate. The reviewed ACM roof mapping studies worked on different study areas, which have different characteristics. Apart from that, the mapping processes often differ in the details, such as image preprocessing, feature extraction, and segmentation optimisation techniques. A valid comparison of the mapping processes assists in identifying the optimal approaches and opportunities. Future works are suggested to focus on a particular challenge and its corresponding opportunities by adopting the same study area and minimising the variation in research conditions and mapping workflow. This synthesis aims to provide guidance on the general performance of the reviewed methods, assisting the researchers in identifying best practices and improving areas in future research. High dimensionality [23,24] HSI Curse of dimensionality [34] Inter-class confusions [35] Noisy map [36] Dimensionality reduction techniques (e.g., SFS and PCA) and OBIA classification [23,152,153] 3
The low spatial resolution of early HSI was a significant challenge causing large OEs [13,[20][21][22][23]. As explained in Section 4.1, a window of 3 × 3 image pixels is required to detect a roof unit with reasonable confidence [72]; consequently, for HSI with a 4 m spatial resolution, traditional pixel-based classifiers often were able to detect roof units with an area of larger than 144 m 2 [22]. While large OEs were reported in early studies of ACM roof mapping using HSI, two studies [13,20] indicated high OAs. This was due to the large areas of the classified roof units, because the classification accuracy increases when roof units have large areas. In the study by Cilia et al. (2014) [13], roofs with an area smaller than 36 m 2 were excluded from accuracy assessment, which resulted in high OA. Hence, the results of the above studies may not be consistent due to the high OEs of asbestos classes. Due to the advancement in remote sensors, low spatial resolution is not a challenge with current hyperspectral images, as they often provide both high spectral and spatial resolutions. Additionally, for low-resolution HSI, DL-based semantic classification could enhance the accuracy results by assigning class membership to subpixels instead of a single label [141,142,145].

2.
High dimensionality remained a challenge in ACM roof mapping studies, despite enhancement in the spatial resolution of recent HSI [23,24]. Using the non-optimal number of bands and features could adversely affect the classification accuracy (i.e., the curse of dimensionality) [34]. Furthermore, the high dimensionality of the data increases image processing time and causes inter-class confusion [35]. High dimensional space causes noisy maps due to the atypical or mixed pixels in pixel-based methods [36]. Dimensionality reduction techniques such as principal component analysis (PCA), and sequential forward selection (SFS) could significantly enhance the classification accuracy via optimal selection of features [152] [24] indicated that integrating a data mining (DT) algorithm with OBIA classification could significantly improve attribute selection and classification accuracy.

3.
On the other hand, while very-high-resolution MSIs have been suitable input data, the pixel-based classification methods could not handle the detailed data of MSI with fine resolution [72]. The Earth s surface is a mixture of various natural and artificial materials, so the finer resolution of MSI represents more details, resulting in complex spectral responses, particularly in heterogeneous urban areas [30]. Using spectral information in PBIA methods increases intra-class variability, confusion among classes, and salt-and-pepper errors when classifying very-high-resolution MSI [37,154]. Apart from that, other challenges may arise from interconnection among image resolutions, bands, and the cost of acquiring very high-resolution data in which using new technologies such as drones could reduce the survey costs [29]. As mentioned above, OBIA methods have been an appropriate alternative for classifying very high-resolution MSI [10,11,[26][27][28][29][30][31][32]. However, misclassification errors were reported [27,29,32] as another challenge that was often connected to variability in the roof condition and geometry. In this regard, [30,32] showed that the fusion of MSI with LiDAR could not only significantly enhance the classification accuracy, but also could be capable of evaluating the ACM roof conditions. 4.
Generating the optimised segmentation parameters plays a significant role in achieving acceptable accuracy results in OBIA classifications. Both over-segmentation and under-segmentation [157] can result in too many small objects or large objects corresponding to mixed classes. A trial-and-error approach was adopted in ACM roof mapping studies [29] to identify the optimal parameters of objects; however, this segmentation process is subjective, which may cause poor results [155]. Apart from that, ENVI tools were utilised [28,30] for the segmentation of images, while no method was adopted to evaluate the segmentation quality. Several techniques for segmentation optimisation are proven to enhance the results, reduce the processing time, and minimise trial efforts [48,158]. A Taguchi optimisation technique, which has been widely used in OBIA studies, showed acceptable results [32,47,130,155]. In other remote sensing applications, DL-based image segmentation methods such as Mask R-CNN have shown good performance [45]. Moreover, it is suggested that the collaboration of segmentation with classification [115] could identify more accurate objects by adding a segmentation step in the classification process [41,42,107].

5.
The reliance on training data was the main barrier to transferability and widespread use of supervised classification methods [111,112]. The size, representativeness, cost, and sample collection strategies could significantly affect the classification accuracy [117]. When training samples are gathered from field surveys, the availability of ground references and the geophysical situation of study areas raise more challenges, particularly in complex and heterogeneous urban areas [110,118]. Moreover, because of the differences in inherent attributes among study areas [40], it is usually required to develop a new dataset for transferring the model to other areas, which is a limitation of autonomous ACM roof classification over wide-scale study sites. While defining training data requires several factors to consider, Mather et al. (2011) [118] suggested that the sample size should preferably be 30 times greater than the number of RSI spectral bands used in classification. Furthermore, adopting advanced nonparametric classifiers such as RF and SVM could have an acceptable performance when a small number of training data are available [46]. Supervised classifications were often outperformed by rule-based methods in the reviewed studies of ACM roof mapping [10,32]. Hence, fuzzy rule-based (expert systems) could be an alternative when adequate training data are not available. 6.
Analysts' subjectivity in expert systems could cause biased and error-prone results because the rules are defined based on the analyst s knowledge and reasoning on feature classes [101,105,153,156]. Consequently, when rulesets are developed by different operators, different results may be achieved. In the ACM roof mapping studies, while [10,[28][29][30] reported acceptable accuracy results of rule-based classification, a systematic evaluation of the subjectivity [101] is missing. In this regard, adopting automatic induction (i.e., data mining) methods could be a suitable solution to reduce the effects of the analyst s subjectivity [24,65,120]. Moreover, within the end-to-end deep learning structure, feature extraction is replaced by feature learning as a part of the classifier training phase. In this case, instead of defining the inner steps of the feature engineering phase, the end-to-end architecture generalises the model generation involving feature learning as part of it [49]. Hence, DL-based end-to-end architectures have been suggested as a practical approach to replace conventional rule-based OBIA classifications.

Conclusions and Future Direction
This paper reviewed the state-of-the-art of ACM roof mapping using remote sensing imagery and machine-learning-based classification. This paper synthesised the critical challenges of the ACM mapping process in two main contexts: (1) RSI and image preprocessing, and (2) classification methods and techniques. Further, other applications of machine learning and remote sensing were used to inform any priorities for future research. Literature model results were provided in this paper as example results for each modelling approach. Due to the variety of approaches and analysis of results, a comparison of results was not provided. Therefore, this review does not conclude with an optimal method, but rather provides an overview of best practices from the existing literature.
Six major challenges were identified in the reviewed studies: (1) the low spatial resolution of early HSI classified by traditional PBIA in early studies; (2) the high dimensionality of HSI, which often causes inter-class confusions; (3) complex patterns of surface materials in VHR MSI classified by traditional PBIA; (4) non-optimised segmentation parameters causing low classification accuracy and mixed objects in OBIA approach; (5) inadequate training data in supervised methods; and (6) analysts' subjectivity causing biased or errorprone classification results in fuzzy rule-based OBIA (expert system). By identifying these challenges and integrating them in this review paper, we aim to assist in the research and development of ACM roof mapping models.
Among the reviewed studies of ACM roof mapping, adopting OBIA classifications on MSI often showed more accurate results than the PBIA methods. The fusion of MSI with LiDAR resolved skewed image perspectives and complex geometry of roofs. Segmentation optimisation techniques such as a Taguchi optimisation technique resulted in improved accuracies in OBIA classification. While utilisation of DL-based classification methods is in its infancy in ACM roof mapping studies, the review of other remote sensing applications has shown several opportunities to enhance the performance of ACM roof mapping. Two future research directions for ACM roof mapping are to explore: (1) deploying DL-based end-to-end semantic classification on aerial imagery; and (2) integrating conventional classification approaches in ACM roof mapping with DL-based architectures such as Mask R-CNN and 3D-CNN.

Data Availability Statement:
The datasets during and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.