Deep-Learning-Based Edge Detection for Improving Building Footprint Extraction from Satellite Images †

: Buildings are objects of great importance that need to be observed continuously. Satellite and aerial images provide valuable resources nowadays for building footprint extraction. Since these images cover large areas, manually detecting buildings will be a time-consuming task. Recent studies have proven the capability of deep learning algorithms in building footprint extraction automatically. But these algorithms need vast amounts of data for training and they may not perform well under the low-data conditions. Digital surface models provide height information, which helps discriminate buildings from their surrounding objects. However, they may suffer from noises, especially on the edges of buildings, which may result in low boundary resolution. In this research, we aim to address this problem by using edge bands detected by a deep learning model alongside the digital surface models to improve the building footprint extraction when training data are low. Since satellite images have complex backgrounds, using conventional edge detection methods like Canny or Sobel filter will produce a lot of noisy edges, which can deteriorate the model performance. For this purpose, first, we train a U-Net model for building edge detection with the WHU dataset and fine-tune the model with our target training dataset, which contains a low quantity of satellite images. Then, the building edges of the target test images are predicted using this fine-tuned U-Net and concatenated with our RGB-DSM test images to form 5-band RGB-DSM-Edge images. Finally, we train a U-Net with 5-band training images of our target dataset, which contain precise building edges in their fifth band. Then, we use this model for building footprint extraction from 5-band test images, which contain building edges in their fifth band that are predicted by a deep learning model in the first stage. We compared the results of our proposed method with 4-band RGB-DSM and 3-band RGB images. Our method obtained 82.88% in IoU and 90.45% in F1-score metrics, which indicates that, by using edge bands alongside the digital surface models, the performance of the model improved 2.57% and 1.59% in IoU and F1-score metrics, respectively. Also, the predictions made by 5-band images have sharper building boundaries than RGB-DSM images.


Introduction
Automatic building footprint extraction from remote-sensing imagery has various applications in urban planning, 3D modelling and disaster management [1].Due to the advanced technology in acquisition of high-resolution satellite images, there are valuable resources for building footprint extraction nowadays [2].Satellite images cover a vast amount of areas and contain complex backgrounds and rich information [3].Since manually extracting building footprints from satellite images is a laborious and challenging task, automatic approaches should be considered in this case [4].
With recent developments in data science and artificial intelligence, deep learning algorithms are used widely in remote sensing [5,6].Deep learning algorithms are capable of extracting features from satellite images automatically and using this information to solve problems [7].Recently, much research has focused on extracting building footprints from remote-sensing images with deep learning models [3].Deep convolutional neural networks and fully convolutional networks are used frequently for this task [8].
In a study, Aryal et al. proposed two scale-robust fully convolutional networks by focusing on multi-scale feature utilization and domain-shift minimization [9].Yu et al. proposed a convolutional neural network called ConvBNet, which uses deep supervision in training with weighted and mask cross-entropy losses to ensure stable convergence [10].In another study, Ji et al. proposed Siamese U-Net to improve the classification of larger buildings [11].Ma et al. proposed GMEDN by focusing on using global and local features and mining multi-scale information, which has a local and global encoder with a distilling decoder [12].
Also, in some studies, LiDAR point clouds or digital surface models (DSMs) are used alongside RGB images to improve the accuracy of the building footprint extraction task.In a study, Yu et al. proposed MA-FCN and used digital surface models with RGB images to extract buildings from aerial images, which resulted in better predictions [13].
Although many studies proposed deep learning models for building footprint extraction from remote-sensing imagery, most of these models need a considerable amount of training data, which may not be available all the time.Moreover, noisy DSMs may lead to noisy building edges.To address this problem, in this study, we propose using building edge bands detected by a deep learning model alongside RGB images and digital surface models to improve the results of building footprint extraction from satellite images that contain a small amount of training data.

Methodology
In this study, our goal is to improve the accuracy of building segmentation maps in low training data conditions by using building edge bands alongside RGB-DSM images.
To address this problem, we use the U-Net [14] model to detect building edges from satellite images.In this section, first, we present a brief review of U-Net model.Then, the advantages of deep-learning-based building edge detection over traditional edge detection methods will be discussed.

U-Net
U-Net is a fully convolutional network that has an encoder-decoder structure with skip connections between them.In U-Net structure, convolutional blocks are used to extract features from input data, pooling layers are used in the encoder to pass the output of each convolutional block to the next block by reducing the dimensions of the output by half and up-convolution layers are used to increase the dimensions of the output by two and pass it to the next convolutional block in the decoder part.Also, skip connections used in the U-Net model help the model retrieve spatial information from early stages of the model and reduce information loss.The structure of the U-Net model is shown in Figure 1.

Edge Detection Methods
Satellite images are rich in information and have complex backgrounds.Since our aim is to improve the accuracy of building segmentation maps by using edge bands, these edge bands should only contain the edges of buildings.By using conventional edge detection methods like Canny or Sobel filter, there will be complex edges detected from satellite images that do not contain building edges exclusively.

Edge Detection Methods
Satellite images are rich in information and have complex backgrounds.Since our aim is to improve the accuracy of building segmentation maps by using edge bands, these edge bands should only contain the edges of buildings.By using conventional edge detection methods like Canny or Sobel filter, there will be complex edges detected from satellite images that do not contain building edges exclusively.
To address this problem, we use a deep learning model to detect building edges from remote-sensing images.For this purpose, we need a training dataset that contains images with corresponding binary building edge labels.First, we create binary building edge labels for the training dataset by applying Canny filter to the binary building masks of WHU and our target satellite datasets.Then, the U-Net model is trained with the WHU dataset and it is fine-tuned with our target satellite dataset.By using this strategy, we can use U-Net to exclusively detect building edges from test images, which do not contain the edges of other objects like trees or roads existing in the images.
In order to evaluate the impact of edge bands in building footprint extraction, we create 5-band RGB-DSM-Edge training and test images from the satellite data.The training images contain precise building edges in their fifth band since they are created from applying the Canny filter to the binary building masks of the training data.On the other hand, the fifth band of the test images contains building edges that are predicted by the U-Net model.
Finally, we train U-Net with RGB, RGB-DSM and RGB-DSM-Edge satellite images and compare their results with each other in order to evaluate the impact of using edge bands alongside RGB images and DSMs in the building footprint extraction task.Edge bands can help the model be aware of the building edges, which can lead to more complete segmentation maps with sharper boundaries for buildings.

Datasets
In this research, we used two datasets: WHU dataset and IEEE Data Fusion Contest 2019 dataset [15].WHU dataset consists of aerial images with buildings of various shapes, sizes and colors.IEEE Data Fusion Contest 2019 dataset consists of satellite images and DSMs.For both of these datasets, binary building edge labels are created by applying a canny filter to the binary building footprint labels.An example of these datasets with building footprint and building edge labels is shown in Figure 2. To address this problem, we use a deep learning model to detect building edges from remote-sensing images.For this purpose, we need a training dataset that contains images with corresponding binary building edge labels.First, we create binary building edge labels for the training dataset by applying Canny filter to the binary building masks of WHU and our target satellite datasets.Then, the U-Net model is trained with the WHU dataset and it is fine-tuned with our target satellite dataset.By using this strategy, we can use U-Net to exclusively detect building edges from test images, which do not contain the edges of other objects like trees or roads existing in the images.
In order to evaluate the impact of edge bands in building footprint extraction, we create 5-band RGB-DSM-Edge training and test images from the satellite data.The training images contain precise building edges in their fifth band since they are created from applying the Canny filter to the binary building masks of the training data.On the other hand, the fifth band of the test images contains building edges that are predicted by the U-Net model.
Finally, we train U-Net with RGB, RGB-DSM and RGB-DSM-Edge satellite images and compare their results with each other in order to evaluate the impact of using edge bands alongside RGB images and DSMs in the building footprint extraction task.Edge bands can help the model be aware of the building edges, which can lead to more complete segmentation maps with sharper boundaries for buildings.

Datasets
In this research, we used two datasets: WHU dataset and IEEE Data Fusion Contest 2019 dataset [15].WHU dataset consists of aerial images with buildings of various shapes, sizes and colors.IEEE Data Fusion Contest 2019 dataset consists of satellite images and DSMs.For both of these datasets, binary building edge labels are created by applying a canny filter to the binary building footprint labels.An example of these datasets with building footprint and building edge labels is shown in Figure 2.

Results
In this section, the results of building edge detection and building footprint extraction with RGB-DSM-Edge images will be discussed.The results of building edge detection with U-Net are compared with Canny and HED [16] edge detection methods.Also, we used Mask-RCNN, MA-FCN and U-Net models for comparison by using RGB and RGB-DSM images in the building footprint extraction task.The quantitative results of the mentioned models are shown in Table 1.
As shown in Figure 3, Canny and HED edge detection results contain a lot of noise, which cannot be used for building footprint extraction improvement with deep learning models.On the other hand, the edge detection result of the U-Net, which was trained with building edge labels, produced building edges that can be used alongside RGB-DSM images to improve the results of the building footprint extraction task.As shown in Table 1, RGB-DSM images improved the results of U-Net and MA-FCN models compared to the RGB images.Moreover, using edge bands alongside RGB images

Results
In this section, the results of building edge detection and building footprint extraction with RGB-DSM-Edge images will be discussed.The results of building edge detection with U-Net are compared with Canny and HED [16] edge detection methods.Also, we used Mask-RCNN, MA-FCN and U-Net models for comparison by using RGB and RGB-DSM images in the building footprint extraction task.The quantitative results of the mentioned models are shown in Table 1.As shown in Figure 3, Canny and HED edge detection results contain a lot of noise, which cannot be used for building footprint extraction improvement with deep learning models.On the other hand, the edge detection result of the U-Net, which was trained with building edge labels, produced building edges that can be used alongside RGB-DSM images to improve the results of the building footprint extraction task.

Results
In this section, the results of building edge detection and building footprint extraction with RGB-DSM-Edge images will be discussed.The results of building edge detection with U-Net are compared with Canny and HED [16] edge detection methods.Also, we used Mask-RCNN, MA-FCN and U-Net models for comparison by using RGB and RGB-DSM images in the building footprint extraction task.The quantitative results of the mentioned models are shown in Table 1.
As shown in Figure 3, Canny and HED edge detection results contain a lot of noise, which cannot be used for building footprint extraction improvement with deep learning models.On the other hand, the edge detection result of the U-Net, which was trained with building edge labels, produced building edges that can be used alongside RGB-DSM images to improve the results of the building footprint extraction task.As shown in Table 1, images improved the results of U-Net and MA-FCN models compared to the RGB images.Moreover, using edge bands alongside RGB images and DSMs improved the results and outperformed all other models in F1-score and IoU metrics.Since Mask-RCNN uses Region Proposal Network, ROI Align and ResNet architecture, it performs better in low data conditions, which helps it to detect more true positives and leads to better results in RGB images and the precision metric.Although edge bands helped the U-Net model to perform better than other cases, since edge detection U-Net was trained with a low quantity of training data, edge bands detected by U-Net are not that accurate, which may lead to lower true positives and lower precision.Our proposed method achieved 90.45% and 82.88% in F1-score and IoU metrics, respectively, which indicates the better quality of segmentation maps created by the U-Net model using RGB-DSM-Edge images.These results indicate the effectiveness of using edge bands alongside RGB-DSM images in datasets with a low quantity of training images.
In Figure 4, predictions made by the mentioned models are shown with test images and ground-truth binary labels.It is clear that DSMs improved the quality of segmentation maps produced by MA-FCN and U-Net models compared to RGB images but there is still room for improvement, especially in the number and boundaries of detected buildings.Edge bands addressed these problems effectively, since the segmentation maps produced by RGB-DSM-Edge images are more complete, especially in the third image.Also, detected buildings by RGB-DSM-Edge images have sharper building boundaries, which indicates the effectiveness of using edge bands in producing sharper building boundaries.
and DSMs improved the results and outperformed all other models in F1-score and IoU metrics.Since Mask-RCNN uses Region Proposal Network, ROI Align and ResNet architecture, it performs better in low data conditions, which helps it to detect more true positives and leads to better results in RGB images and the precision metric.Although edge bands helped the U-Net model to perform better than other cases, since edge detection U-Net was trained with a low quantity of training data, edge bands detected by U-Net are not that accurate, which may lead to lower true positives and lower precision.Our proposed method achieved 90.45% and 82.88% in F1-score and IoU metrics, respectively, which indicates the better quality of segmentation maps created by the U-Net model using RGB-DSM-Edge images.These results indicate the effectiveness of using edge bands alongside RGB-DSM images in datasets with a low quantity of training images.
In Figure 4, predictions made by the mentioned models are shown with test images and ground-truth binary labels.It is clear that DSMs improved the quality of segmentation maps produced by MA-FCN and U-Net models compared to RGB images but there is still room for improvement, especially in the number and boundaries of detected buildings.Edge bands addressed these problems effectively, since the segmentation maps produced by RGB-DSM-Edge images are more complete, especially in the third image.Also, detected buildings by RGB-DSM-Edge images have sharper building boundaries, which indicates the effectiveness of using edge bands in producing sharper building boundaries.

Conclusions
In this study, we aimed to improve the results of building footprint extraction from satellite images with deep learning models both quantitatively and qualitatively.Since satellite images have complex backgrounds with various objects, traditional and state-ofthe-art edge detection methods are not capable of detecting building edges exclusively.For this purpose, we proposed preparing building edge labels to train a U-Net model for the building edge detection task.Then, these edge bands were attached to RGB-DSM images to create RGB-DSM-Edge images, which were used for building footprint extraction with U-Net.We compared the results of our proposed method with other deep learning models with RGB and RGB-DSM images.The U-Net model trained with RGB-DSM-Edge images outperformed Mask-RCNN with RGB images and also U-Net and MA-FCN models with both RGB and RGB-DSM images.Our proposed method reached 90.45% and 82.88% in F1-score and IoU metrics, respectively.Also, the segmentation maps produced

Conclusions
In this study, we aimed to improve the results of building footprint extraction from satellite images with deep learning models both quantitatively and qualitatively.Since satellite images have complex backgrounds with various objects, traditional and state-ofthe-art edge detection methods are not capable of detecting building edges exclusively.For this purpose, we proposed preparing building edge labels to train a U-Net model for the building edge detection task.Then, these edge bands were attached to RGB-DSM images to create RGB-DSM-Edge images, which were used for building footprint extraction with U-Net.We compared the results of our proposed method with other deep learning models with RGB and RGB-DSM images.The U-Net model trained with RGB-DSM-Edge images outperformed Mask-RCNN with RGB images and also U-Net and MA-FCN models with

Figure 3 .
Figure 3. Building edge detection results: (a) satellite image, (b) ground truth building edge label, (c) Canny edge detection, (d) HED edge detection and (e) U-Net edge detection.

Figure 3 .
Figure 3. Building edge detection results: (a) satellite image, (b) ground truth building edge label, (c) Canny edge detection, (d) HED edge detection and (e) U-Net edge detection.

Figure 3 .
Figure 3. Building edge detection results: (a) satellite image, (b) ground truth building edge label, (c) Canny edge detection, (d) HED edge detection and (e) U-Net edge detection.

Table 1 .
Results of mentioned models with RGB, RGB-DSM and RGB-DSM-Edge images.

Table 1 .
Results of mentioned models with RGB, RGB-DSM and RGB-DSM-Edge images.

Table 1 .
Results of mentioned models with RGB, RGB-DSM and RGB-DSM-Edge images.