A QGIS Tool for Automatically Identifying Asbestos Rooﬁng

: Exposure to asbestos ﬁbers implies a long-term risk for human health; therefore, the development of information systems that are able to detect the extent and status of asbestos over a certain territory has become a priority. This work presents a tool (based on the geographic information system open source software, QGIS) that is conceived for automatically identifying buildings with asbestos rooﬁng. The area under investigation is the metropolitan area around Prato (Italy). The performance analysis of this system was carried out by classifying images that were acquired by the WorldView-3 sensor. These images are available at a low cost when compared with those obtained by means of aerial surveys, and they provide adequate resolution levels for rooﬁng classiﬁcation. The tool, a QGIS plugin, has shown fairly good performance in identifying asbestos rooﬁng, with some false negatives and some false positives when applying a per-pixel classiﬁcation. A performance improvement is obtainable when considering the percentage of asbestos pixels that are contained in each roof of the analyzed image. This value is also available with the plugin. In the future, this tool should make it possible to monitor the asbestos roof removal process over time in the area of interest, in accordance with other image data that give evidence of such removals.


Introduction
Between the 1970s and the 1990s, asbestos was widely used in buildings for its particular characteristics of resistance and its good insulating properties. Specifically, flat sheets and cement boards containing asbestos were used as building materials in roofing, both in industrial and civilian infrastructures. Some years after their installation, such plates tended to release a huge amount of fibers, particles, and fractions of asbestos that were of inhalable size. The main problem lies in the fact that asbestos fibers tend to divide lengthways into thinner fibrils that are small enough to penetrate the lung alveoli, thus causing very serious diseases [1,2].
Despite this danger, and regardless of the ban on using asbestos in any business context, a prohibition that is carried out in many countries, there are still many buildings with asbestos roofs. In Italy, many old buildings still have Eternit cladding (asbestos fibers reinforced with cement material), even though the national law (no. 257 of March 1992) forbids the use of asbestos.
Most people are not aware of the danger, and thus, they may not properly dispose of it during roof renovations. Accordingly, a thorough and exact list of the roofs posing a risk in terms of asbestos pollution is required [3].
Therefore, one priority is to set up and to finalize information systems that are capable of describing the extent and conditions of asbestos-related materials in a given area.

•
Orthophoto: aerial photographs or images that have been geometrically corrected and georeferenced ("orthorectified"), with radiometric resolution in four spectral bands (R,G,B, NIR) and with a spectral coverage of up to~800 nm. Such images are available free of charge at the Geoscope Observatory in Tuscany (http://www502.regione.toscana.it/geoscopio/ortofoto.html). Their low spectral resolution does not make it possible for their use in any asbestos classification. • Vector graphics files, available at the Geoscope Observatory in Tuscany. Once the Geoscope website is accessed, the layer with the information requested can be chosen; for instance, surveyor maps. • High spectral resolution images, acquired by airborne sensors (MIVIS) [14]. They are very expensive if there are specific acquisition procedures. Their broad representative potential is limited by the noise occurring in high-depth bands. • Satellite images acquired by the WorldView-3 sensor working on eight spectral bands. WorldView satellite imagery is often more expensive than RGB aerial imagery if a comparison is made between the archives. In the presented case, no archive aerial imagery is available, and additionally, WorldView images of the area of interest can be purchased at fairly low prices (about 2200 euros) [24].
The need to have multispectral imaging and high spatial resolution [25] while trying to keep costs down, influenced the choices of the images to be used for this research, favoring those that were obtained by the WorldView-3 sensor. This sensor, owned by DigitalGlobe, is mounted on a commercial Earth observation satellite [24]. In August 2014, DigitalGlobe launched this multi-payload, super-spectral, very high spatial resolution satellite. Operating at an altitude of 620 km, WorldView-3 acquires panchromatic imagery at a 0.31 m maximum spatial resolution, eight visible and near-infrared (VNIR) bands at 1.24 m maximum spatial resolution, and eight short-wave infrared (SWIR) bands at 3.7 m [17]. Since these images do not cover the asbestos emissions wavelength, it was possible to detect the fiber-cement roofing by analyzing the corresponding spectrometric signature, namely, by analyzing the typical reflection characteristics of this material to such wavelengths. Therefore, the classification process was adjusted accordingly, so as to fully exploit each and every piece of information that could be obtained from the available satellite images. Figure 1 shows the steps of the classification process, organized in a block diagram. Two phases made up the process. The first non-automatic phase focused on preprocessing the available data, and it included three steps.
1. Pansharpening, the process that enabled a merger of the information collected from satellite imaging, as acquired by the WorldView-3 sensor, in order to create a single image with the resolution of the initial panchromatic image ( Figure 2). The low-resolution image was interpolated with bi-cubic interpolation. In this activity, pansharpening was carried out using a component substitution algorithm. The, RCS (relative component substitution) algorithm was selected among those available, through Orfeo Tool-box [26], a software integrated into QGIS. Table 1 describes the features of the images used for such processing: Two phases made up the process. The first non-automatic phase focused on preprocessing the available data, and it included three steps.

1.
Pansharpening, the process that enabled a merger of the information collected from satellite imaging, as acquired by the WorldView-3 sensor, in order to create a single image with the resolution of the initial panchromatic image ( Figure 2). The low-resolution image was interpolated with bi-cubic interpolation. In this activity, pansharpening was carried out using a component substitution algorithm. The, RCS (relative component substitution) algorithm was selected among those available, through Orfeo Tool-box [26], a software integrated into QGIS. Table 1 describes the features of the images used for such processing: • A panchromatic image with a higher spatial resolution (50 cm); • A multispectral image having eight spectral bands (coastal, blue, green, yellow, red, red edge, NIR-1, NIR-2) and characterized by a spatial resolution of 2 m.
These images were georeferenced in UTM33 WGS84 and orthorectified. They were also calibrated and corrected radiometrically. Our research started in 2016, and it aimed to document the diffusion of asbestos roofs in the Prato area in 2014. For this reason, the selected images were among those acquired in 2014. In fact, from that moment onwards, the authorities of Prato have updated their database by inputting the removal procedures of asbestos-contaminated artifacts by individuals and companies.

2.
Raster image filtering, meaning filtering of the pixel grid, was obtained through a vector mask of the chosen area. The vector mask was something that ensued from the cadastral shape layer, namely, from the surveyor map where all of the buildings that were related to the selected area were described via a vector graphics editor ( Figure 3). The inferred mask only contained information that related to the buildings' roofing. This process enabled the removal of everything but the buildings' roofing from the initial satellite image ( Figure 4). The cropping operation was performed using the corresponding functions offered by QGIS and based on the Gdal libraries.

3.
Partition of the filtered image in tiles, namely sections that had suitable dimensions, so as to carry out the classification as quickly as possible. The partition of the satellite image was carried out with a plugin called GridSplitter, thus obtaining 49 sections, each of them correctly geo-referenced according to the projection of the original image. Such tiles were saved into a single folder containing only and exclusively these files.  Two phases made up the process. The first non-automatic phase focused on preprocessing the available data, and it included three steps.
1. Pansharpening, the process that enabled a merger of the information collected from satellite imaging, as acquired by the WorldView-3 sensor, in order to create a single image with the resolution of the initial panchromatic image ( Figure 2). The low-resolution image was interpolated with bi-cubic interpolation. In this activity, pansharpening was carried out using a component substitution algorithm. The, RCS (relative component substitution) algorithm was selected among those available, through Orfeo Tool-box [26], a software integrated into QGIS. Table 1 describes the features of the images used for such processing: namely, from the surveyor map where all of the buildings that were related to the selected area were described via a vector graphics editor ( Figure 3). The inferred mask only contained information that related to the buildings' roofing. This process enabled the removal of everything but the buildings' roofing from the initial satellite image ( Figure 4). The cropping operation was performed using the corresponding functions offered by QGIS and based on the Gdal libraries.   3. Partition of the filtered image in tiles, namely sections that had suitable dimensions, so as to carry out the classification as quickly as possible. The partition of the satellite image was carried out with a plugin called GridSplitter, thus obtaining 49 sections, each of them correctly geo-referenced according to the projection of the original image. Such tiles were saved into a single folder containing only and exclusively these files.

Classification Procedure
The second phase was the real classification of the cladding available in the area according to the pre-processed image. Digital image classification techniques grouped pixels into classes, to represent the different types of land cover data. There were three main approaches in digital image classification: supervised, unsupervised, and object-oriented. The method applied hereafter was based on supervised classification, which operated a per-pixel classification of the image with some representative samples, the true points, for each and every land cover data class that appeared in the digital image. The choice of a supervised approach was not limiting, because it enabled the subsequent application of an object-oriented classification, if required, as shown further on.
In this case study, there were six selected classes: asbestos cladding, fiber cement cladding, red/red roofing, black/shadow, green, and background. The last class collected pixels from objects that were different from the roofing. These pixels could be incorrectly located on a building roof due to incorrect processing of the image, such as inaccurate georeferencing. The white background pixels of Figure 4 also contributed to this class if the classification process also considered them.
The corresponding six training sets were composed of samples that were collected during a survey campaign carried out in the Prato area. At each site, the geographical position (latitude, longitude, WGS84 datum) of the surface was detected, and a link was associated with the vector record of the building, as provided by the land registry.
The image classification software therefore worked with the training sets to define the kinds of land cover in the entire image. Digital image classification determined the membership to each class according to the "resemblance" of the observed object's radiative properties to the properties defined in the training set. Among the different algorithms operating supervised classifications, the

Classification Procedure
The second phase was the real classification of the cladding available in the area according to the pre-processed image. Digital image classification techniques grouped pixels into classes, to represent the different types of land cover data. There were three main approaches in digital image classification: supervised, unsupervised, and object-oriented. The method applied hereafter was based on supervised classification, which operated a per-pixel classification of the image with some representative samples, the true points, for each and every land cover data class that appeared in the digital image. The choice of a supervised approach was not limiting, because it enabled the subsequent application of an object-oriented classification, if required, as shown further on.
In this case study, there were six selected classes: asbestos cladding, fiber cement cladding, red/red roofing, black/shadow, green, and background. The last class collected pixels from objects that were different from the roofing. These pixels could be incorrectly located on a building roof due to incorrect processing of the image, such as inaccurate georeferencing. The white background pixels of Figure 4 also contributed to this class if the classification process also considered them.
The corresponding six training sets were composed of samples that were collected during a survey campaign carried out in the Prato area. At each site, the geographical position (latitude, longitude, WGS84 datum) of the surface was detected, and a link was associated with the vector record of the building, as provided by the land registry.
The image classification software therefore worked with the training sets to define the kinds of land cover in the entire image. Digital image classification determined the membership to each class according to the "resemblance" of the observed object's radiative properties to the properties defined in the training set. Among the different algorithms operating supervised classifications, the random forest algorithm was selected and applied to the available images. Indeed, this algorithm seemed to be more suitable, because as an ensemble of classifiers, it was capable of classifying each pixel by resorting to its characteristics, such as: its spectral properties in the eight different bands of the satellite sensor, its panchromatic value, and its membership to building class. Moreover, it worked well when homogeneous training data were available, and it was relatively robust to outliers. Last, but not least, implementation of this algorithm ensured a high computational speed [19,[27][28][29]. Within this work, the classification process was totally automatic, and it was implemented with an algorithm that was specifically designed through a QGIS plugin.

The RoofClassify Plugin
The QGIS open source environment enables any data collection coming from different sources (raster, vector, database, and so forth), to be used to build a single project of spatial analysis that is capable of meeting the basic functionalities that a GIS has to cope with. Furthermore, a useful aspect of QGIS is the option of expanding basic functionalities through plugins in Python language, which is simple for any user to understand. This means that Python libraries that are related to both image management and processing can interact with the functionalities that a GIS has (for instance displaying an indexed map in some reference systems). This leads to the possibility of creating a simple application that efficiently copes with a particular problem: the application software has a user interface, and it works with both new and already existing application fields in QGIS, as defined by the user in Python language.
The RoofClassify plugin was developed within the application field of this current paper, and it made it possible to classify a pre-processed satellite image, as already described in Section 2.1.1. The plugin outputs included both the image that had its pixels classified, and an optional shapefile "describing" the outcome, which is to say, a shapefile having all of the pieces of information concerning the number of pixels for each polygon in each class of the image. The percentage of pixels classified as asbestos within each polygon was optionally reported.

Plugin User Interface (UI)
The assumption is based on having the initial image to be classified as something that is organized in a single folder, which includes only the tiles that the image is made of, or the entire image itself.
Being a supervised classification, it is also assumed that the user has a raster with some true points, namely, some buildings' cladding in a definitely known material, which can be used as a classification training set. Each single class is identified by a shapefile, with one different shapefile for each class, thus uniquely identifying a type of cladding that is known in advance. These shapefiles are saved, for convenience, in a single folder containing only such files.
In short, there must be: • The image to be classified, whether it is single or divided into tiles, kept together in a single folder; • A folder containing the shapes corresponding to each identified class; • The image to be used as a training set, where different shapes are represented.
The plugin interface is depicted in Figure 5. The different items have the following meanings: • Select raster(s) to be classified (folder): by selecting the 'browse' button, it is possible to search the folder that has the image (or images) to be classified. • Select training shapes (folder): by selecting the 'browse' button, it is possible to search the folder that has shapefiles corresponding to the different classes of the training set. • Select training raster (.tif): by selecting the 'browse' button, it is possible to search the geo-referenced image (filename extension '.tif') whose pixels conform to the shapefiles set out in point 2. • Create shape count: by ticking this (which is optional), the plugin creates not only a classified image, but also a shapefile (*.dbf) containing all of the polygons that correspond to each cladding. In this shape, each polygon (building) is provided with the following information: the identification code, the total number of pixels that each polygon has, and the pixel number of each identified class in the polygon. • Create shape percentage: by ticking this (which is optional), the plugin creates not only a classified image, but also a shapefile (*.dbf) containing all of the polygons that correspond to each cladding. In this shape, each polygon (building) is provided with the following information: the identification code, and the percentage of classified pixels within each polygon for each identified class in that polygon. • Select the file (.shp) to create a shape count/percentage: whether one of the two previously mentioned options or both of them have been ticked, this box makes it possible to select the shapefile to be used when calculating the required statistics. Generally speaking, this shapefile is the same as the one that is used in the filtering phase of the satellite image. • Select the output folder: when clicking on 'browse', it is possible to select the folder where the output files will be saved. • Select the file (.shp) to create a shape count/percentage: whether one of the two previously mentioned options or both of them have been ticked, this box makes it possible to select the shapefile to be used when calculating the required statistics. Generally speaking, this shapefile is the same as the one that is used in the filtering phase of the satellite image. • Select the output folder: when clicking on 'browse', it is possible to select the folder where the output files will be saved.

Focusing on How the Classification Plugin is Organized
The plugin performs the classification according to the following procedural points: 1. Importing the raster image to be classified (images available in the selected folder as input dataset) into a Python data structure (multi-array), where each and every pixel matches exactly with a cell of the structure, made of eight components (one for each band). 2. Importing the raster image, which represents the selected training set for classification purposes (same as point 1). 3. Extracting the training set from the shapefile: a shapefile is defined for each class to be identified, and it includes the polygons matched with the related roofing. As previously explained, each and every shape (one shape matching with the asbestos class, one shape for a red roof, and so forth) is saved in a single folder. The algorithm will recursively access this folder, looking for the shapes in order to extract the training set. These data are also imported into a Python multiarray. 4. Training the classifier file: the training and classification algorithm belongs to the "Scikit-learn"

Focusing on How the Classification Plugin is Organized
The plugin performs the classification according to the following procedural points:

1.
Importing the raster image to be classified (images available in the selected folder as input dataset) into a Python data structure (multi-array), where each and every pixel matches exactly with a cell of the structure, made of eight components (one for each band).

2.
Importing the raster image, which represents the selected training set for classification purposes (same as point 1).

3.
Extracting the training set from the shapefile: a shapefile is defined for each class to be identified, and it includes the polygons matched with the related roofing. As previously explained, each and every shape (one shape matching with the asbestos class, one shape for a red roof, and so forth) is saved in a single folder. The algorithm will recursively access this folder, looking for the shapes in order to extract the training set. These data are also imported into a Python multiarray.

4.
Training the classifier file: the training and classification algorithm belongs to the "Scikit-learn" Python library. The multiclass "RandomForest" classification function was used in this case, which was more suitable for execution speed. The inputs were: the multiarray raster of the training set (point 2); the multiarray shapes of the training set (point 3).

5.
Classification according to the selected function. The output will be a multiarray comprising the classified pixels' values. 6.
Reconstructing and saving the correctly geo-referenced image as the starting image for use, beginning with the output provided in point 5.
Some items were left out of the above description for the sake of brevity, namely, the option of creating the shapefile with all of the information on classification.

Classification Results and Validation
The outcome that is achieved by applying the plugin to the pre-processed image is an image that has pixels classified according to the different types of roofing. Such an image is correctly geo-referenced according to the source satellite image. Figure 6 is an example of a part of an image with classified pixels. 5. Classification according to the selected function. The output will be a multiarray comprising the classified pixels' values. 6. Reconstructing and saving the correctly geo-referenced image as the starting image for use, beginning with the output provided in point 5.
Some items were left out of the above description for the sake of brevity, namely, the option of creating the shapefile with all of the information on classification.

Classification Results and Validation
The outcome that is achieved by applying the plugin to the pre-processed image is an image that has pixels classified according to the different types of roofing. Such an image is correctly geo-referenced according to the source satellite image. Figure 6 is an example of a part of an image with classified pixels. With regard to each building in the input shape file (Shapefile Mask), once the classified image is available, the designed algorithm identifies the total number of pixels in that image, and for each identified class, the total number of pixels belonging to that specific class.
When the classification is complete, the worthiness of such a process needs to be demonstrated, by validating its outcome. However, given the nature of this use, specifically, the lack of sufficient samples of a well-known nature, especially in dealing with asbestos as the class of interest, it becomes more and more difficult to carry out validations with the criteria and techniques that are most common in the literature. Therefore, the following actions were performed to verify the worthiness of this classification method: 1. Identifying the building roofing with well-known construction materials as well as their tagging according to the corresponding class. 2. Creating a shapefile for each class, which included the previously tagged roofing types for each With regard to each building in the input shape file (Shapefile Mask), once the classified image is available, the designed algorithm identifies the total number of pixels in that image, and for each identified class, the total number of pixels belonging to that specific class.
When the classification is complete, the worthiness of such a process needs to be demonstrated, by validating its outcome. However, given the nature of this use, specifically, the lack of sufficient samples of a well-known nature, especially in dealing with asbestos as the class of interest, it becomes more and more difficult to carry out validations with the criteria and techniques that are most common in the literature. Therefore, the following actions were performed to verify the worthiness of this classification method:

1.
Identifying the building roofing with well-known construction materials as well as their tagging according to the corresponding class.

2.
Creating a shapefile for each class, which included the previously tagged roofing types for each file, according to a specific class.

3.
Developing a Python script. Provided with input data, which consisted of both the image created from the classification process, and the shapefiles, as described in point 2, this script is capable of providing a confusion matrix as an output. 4.
Analyzing the confusion matrix and estimating any possible classification errors.
The first phase is very critical, and it draws the line at the validation process; in fact, the classification process classifies each pixel according to the dominantly present classes, but an ambiguity arises when asbestos and fiber cement classes are concerned. Indeed, there is a problem with correctly allocating the tag "cement" or "asbestos" to pixels belonging to the available images.
The action taken in order to address a similar situation for such surfaces was to use the corresponding "true points"-obviously not the ones selected for the training set-as test buildings for the asbestos class. Other asbestos-free, newly registered, and recently built structures were used as test buildings for the cement class.
The outcomes are described in Table 2, which represents the confusion matrix, or the taking into account of the outcomes of mistaken classification relating to the parts of the image: the first-line cells show the true classes, whereas the first-column cells show the output classes according to the classification tool. There were six true/output classes of this matrix, according to the number of classes that collected roofing and background. In this regard, if an "i" index is assigned to the seven rows representing true classes, and a "j" index is also assigned to the seven output classes, the cell corresponding to the "i"-indexed row and the "j"-indexed column shows the number of pixels that are labeled as belonging to the "j"-indexed class, according to the classification process, whereas in reality they belong to the "i"-indexed class.
In order to evaluate the quality of the results, the eighth row and the eighth column were added, respectively containing the percentage values of the commission accuracy (CA), namely, the user's accuracy, and the omission accuracy (OA) [30]. Errors of omission represented false negatives. Errors of commission represented false positives. The overall or average accuracy was 81.77% (reported at the bottom right corner of the matrix). The Kappa index of the classifier was 0.7514 [31].
As evident from the table, many pixels belonging to "cement" and "asbestos" classes were used in this test, with respect to other classes like "green" or "shadow". This choice was motivated by the classification method, which focused on asbestos detection. Therefore, when calculating the process accuracy, greater consideration should have been given to the class to be detected, as well as to the other classes that asbestos could have been misclassified with.
It turns out that the percentage of correction classification that was related to asbestos detection amounted to 75.12% when analyzing the confusion matrix.
Another validation was carried out on the basis of the data collected in the Health Information System for Collective Prevention (SISPC) [32], which is a regional database that provides the stake-holders with a tool to manage health activities and data in a uniform way throughout the regional territory. Clean-up plans, as well as removal and disposal procedures of products and artifacts contaminated with asbestos, are presented online, by means of the SISPC. Therefore, this system provided data with the highest "evidentiary" value, which was needed for a complete validation of the classification data. A validation of the SISPC data was carried out by visual examination, where the related depollution location was checked via Google maps, by identifying the roofing that could be consistent with the declared remediation, and by verifying whether the classification, previously carried out on the image before any clean-up operation, had detected asbestos in that area. A selection was made for both residential and industrial buildings, with 11 different addresses. As for classification accuracy, the results on asbestos-classified pixels were better than the outcome obtained with the previously explained validation method. Indeed, the plugin RoofClassify detected 90% of asbestos roofs. The differences between the validation tests lied in the lower spectral ambiguities of the selected buildings in the second validation procedure.

Discussion
The QGIS tool described in this paper was conceived for identifying building roofs with asbestos. Roofing classification was performed through the digital processing of images obtained by the WorldView-3 sensor. Such images were pre-processed, so as to make them suitable for subsequent classification [20]. The outcomes, as achieved in an open context, underwent a validation process, which consisted of analyzing the confusion matrix, and subsequently assessing the classification errors. Given that the most difficult task was testing the tool's impact in discerning asbestos and cement, the decision was made to only use structures with well-known roofing materials as the test buildings, according to recent surveys [18][19][20]. A second validation was carried out on the basis of a verification process, with data being collected at the Health Information System for Collective Prevention (SISPC) [32].
The results obtained from the first validation showed that about 25% of the asbestos roofing was a false negative. At the same time, a high number of asbestos pixels were classified as cement pixels, because asbestos-reinforced cement often contains only 6% of asbestos fiber. These observations could lead to the conclusion that a weak performance by the classification method was obtained [20,30]. Improvements in asbestos roofing identification are possible when resorting to object-oriented classification [22]. As previously mentioned, the designed algorithm identifies the total numbers of pixels in that image, and the total number of pixels belonging to each specific class. The plugin can also compose an optional shapefile that describes all of the pieces of information concerning the number of pixels in each class, for each polygon that composes the images. Optionally, the plugin provides the percentage of pixels that are classified as asbestos, within each polygon. This optional shapefile makes it possible to estimate the probability that a building has asbestos roofing. Therefore, it is possible to assign the "asbestos" or "not asbestos" labels to each roof. This procedure should significantly improve classification performance. This last step is left to the user, who can visualize the data that is extracted from the algorithm.
The tool for asbestos identification in the QGIS platform requires a basic grasp of GIS, but the open-source environment makes it easier for both software use and algorithm distribution. A plugin limit is built into the algorithm, which is conceived to only identify asbestos. Strictly speaking, this feature cannot be considered a flaw, nor a limit on flexibility: in fact, the plugin is meant to perform a specific classification of asbestos, and it has been optimized for this individual purpose, making it different from other proprietary software.

Conclusions
This paper has focused on an asbestos identification system to give support to the asbestos roofing disposal policies and programs in small geographical areas. The classification tool is based on GIS open source software, and it was used to analyze Prato district satellite images obtained with the WorldView-3 sensor. The QGIS tool has shown quite good performance in identifying asbestos roofing when applying a per-pixel classification. The percentage of asbestos pixels contained in each roof of the analyzed image can also be determined with this tool. This value can be used for improving the classification process. The presented classification methods can be explained in terms of restoring a data sheet, or a list of records on asbestos roofing buildings, and it can provide a low-cost basis for building up a larger knowledge database, where geographical data are integrated with other data from different administrative sources (as, for instance, SISPC).