Next Article in Journal
Statistical Characteristics of Cloud Occurrence and Vertical Structure Observed by a Ground-Based Ka-Band Cloud Radar in South Korea
Next Article in Special Issue
Mapping Essential Urban Land Use Categories in Nanjing by Integrating Multi-Source Big Data
Previous Article in Journal
Retrieval of Soil Moisture Content Based on a Modified Hapke Photometric Model: A Novel Method Applied to Laboratory Hyperspectral and Sentinel-2 MSI Data
Previous Article in Special Issue
EANet: Edge-Aware Network for the Extraction of Buildings from Aerial Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks

Institute of Geospatial Engineering and Geodesy, Faculty of Civil Engineering and Geodesy, Military University of Technology, 00-908 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(14), 2240; https://doi.org/10.3390/rs12142240
Submission received: 9 June 2020 / Revised: 5 July 2020 / Accepted: 9 July 2020 / Published: 13 July 2020
(This article belongs to the Special Issue Urban Land Use Mapping and Analysis in the Big Data Era)

Abstract

:
With the development of effective deep learning algorithms, it became possible to achieve high accuracy when conducting remote sensing analyses on very high-resolution images (VHRS), especially in the context of building detection and classification. In this article, in order to improve the accuracy of building detection and classification, we propose a Faster Edge Region Convolutional Neural Networks (FER-CNN) algorithm. This proposed algorithm is trained and evaluated on different datasets. In addition, we propose a new method to improve the detection of the boundaries of detected buildings. The results of our algorithm are compared with those of other methods, such as classical Faster Region Convolution Neural Network (Faster R-CNN) with the original VGG16 and the Single-Shot Multibox Detector (SSD). The experimental results show that our methods make it possible to obtain an average detection accuracy of 97.5% with a false positive classification rate of 8.4%. An additional advantage of our method is better resistance to shadows, which is a very common issue for satellite images of urban areas. Future research will include designing and training the neural network to detect small buildings, as well as irregularly shaped buildings that are partially obscured by shadows or other occlusions.

Graphical Abstract

1. Introduction

High-resolution remote sensing satellite imagery can provide the geometric features, spatial features and textures of many objects, including various types of buildings. For many years, automatic methods of object detection and classification using very high-resolution images (VHRS) have been an important research problem. The dynamic technological development of satellite systems has made it possible to acquire images with better spatial resolution, which has led to the possibility of extracting more details of objects contained in the images, i.e., easier and more effective detection of objects in the image. Furthermore, because of their range and temporal resolution, they provide large amounts of information in a short time, and thus, they are playing an increasingly important role in updating, controlling and analyzing the spatial development of many studied areas [1,2,3]. This opens up new possibilities for obtaining information on the land cover of urban areas at a very detailed level [4,5]. Owing to the large amount of data transferred by satellite images, it is necessary to extract only those data that are necessary in the analysis of, for example, buildings. This ability can be the basis of a system that aims to detect buildings that are illegally erected or used contrary to the local land-use plan. Knowing the functional purpose of the structures, it is also possible to perform a statistical analysis of the area. However, performing these analyses manually or by using frequently inaccurate classification methods is time-consuming and laborious. Therefore, research topics related to the development of new and intelligent methods of classification of land cover, especially in urban areas, are still valid. In addition, given the variability in the information content of satellite imagery, it is difficult to implement one effective classical classification method. Because of the benefits of detecting buildings in high-resolution satellite images, this topic has attracted many researchers. Methods of extracting object features can be distinguished according to whether they are based on data or based on a model [6]. The former relies on mathematical operations applied to a given image, without prior knowledge of what it may contain. Examples of these types of methods are various types of filtration or edge detection algorithms. Another way to detect objects is segmentation, a process in which the image is divided into specific regions that are homogeneous in terms of selected values. Neither of these methods provides an unambiguous answer about the location of buildings that are in the image, let alone their purpose.
Work on the detection of objects has significantly accelerated since the introduction of deep neural networks, which are very powerful models for more efficient classification of objects. Research efforts have developed even more dynamically since the second decade of the twenty-first century, when deep convolutional networks were introduced. These models greatly facilitate all work with images, which results in the detection of objects by means of semantic segmentation (Convolutional Network (FCN) [7], U-Net [8]), classification (AlexNet [9], Visual Geometry Group Network (VGG) [10], GoogleNet [11], Residual Network (ResNet) [12]), classification and location, and object detection using bounding boxes (Region Convolutional Neural Network (R-CNN) [13], Fast R-CNN [14], Faster R-CNN [15], Region Fully Convolutional Network (R-FCN) [16], You Only Look Once (YOLO) [16], Single-Shot Multibox Detector (SSD) [17]) or using masks (Mask R-CNN) [18]. These possibilities have inspired researchers to use them to solve contemporary problems. In recent years, a lot of research has been carried out on the subject of building extraction from different types of imagery using algorithms based on segmentation [19,20,21,22] or the detection of structures [23,24,25,26,27,28,29]. Convolutional Neural Networks allow for the extraction of image features at the semantic level, which makes the detection of objects with various shapes and colors possible. The work of M. Vakalopoulou’s team is an example of building detection based on an ImageNet framework [22], whereas Ghandour et al. presented a building detection method with shadow verification. Algorithms have been used to detect roof tile buildings, flat building detection has been used to detect non-tile flat buildings according to shape features, and results fusion has been used to fuse and aggregate results from previous blocks [30]. The Tong Bai team uses the improved algorithm Faster R-CNN (region-based Convolutional Neural Network), which adopts DRNet (Dense Residual Network) and RoI (Region of Interest) Align to utilize texture information and to solve the region mismatch problems [31]. Another example of a solution to this problem is the work of the Evangelos Maltezos team, which used LiDAR [Light Detection and Ranging] data to detect buildings [32]. Algorithms using semantic segmentation classify each pixel in one of two cases, namely, if a pixel belongs to a building or to its surroundings. Boonpook et al. [23] proposed another approach, and they reached over 90% accuracy of building detection in UAV images using semantic segmentation. Ji, S. et al. presented a CNN-based two-part structure used for a change detection framework for locating changed building instances as well as changed building pixels from very high-resolution (VHR) aerial images [33]. Li W. et al. propose a U-Net-based semantic segmentation method for the extraction of building footprints from high-resolution multispectral satellite images using the SpaceNet building dataset [34]. However, these methods do not solve the problem of detection and classification of buildings due to their purpose.
This article (1) presents how to use convolutional neural networks for the automatic detection and classification of buildings, (2) proposes a modification of the convolutional network known as Faster R-CNN, i.e., Faster Edge Region Convolution Neural Network (FER-CNN), (3) verifies the impact of the selected optimization method on the detection and classification of buildings, (4) verifies the detection of a building’s shape based on Mask R-CNN, (5) determines a new method of correcting the shape of detected buildings while maintaining good results of building categorization by using the Ramer–Douglas–Peucker (RDP) algorithm and (6) proposes a new method of building boundary regularization.
This paper is structured as follows. In Section 2, the research method is explained. In Section 3, the test dataand results are presented. Section 4 presents thediscusion. Finally, Section 5 provides a brief summary of this work.

2. Methodology

The proposed research methodology took into account the use of convolutional neural networks for the detection and classification of buildings in satellite imagery of urban and suburban areas. A new method of boundary correction of the detected buildings is also presented. First, the performances of Faster Region Convolution Neural Network (Faster R-CNN), modified by the authors and called Faster Edge Region Convolution Neural Network (FER-CNN), and Single-Shot Multibox Detector (SSD) were compared in the detection and classification of structures; in this analysis, the focus was on the choice of network training parameters because they play a significant role in the successful and effective training of the network and, therefore, in the attainment of results with the best accuracy. For this purpose, a brief analysis of the impact of the optimization algorithm on the results was carried out. The aim of optimization is to find the extremum of the set objective function, i.e., the optimal solutions to the problems posed. These algorithms differ from classic optimization methods. They work on the principle of indirect optimization of the performance of the trained models by reducing the cost function, which will minimize the expected error, referred to as risk. The reason for using indirect optimization is that only a training database is available, which results in a lack of knowledge about the distribution of generated data.
Because there is a need to know the exact location of buildings within the imagery, the operation capabilities of Mask R-CNN were tested, and as a result, buildings were detected by means of polygons with shapes similar to those of the buildings. By means of additional processing of the obtained results, a correction of the building’s shape was performed using the RDP algorithm, and a new method of performing building boundary regularization was developed.

2.1. Faster R-CNN

The work of Faster R-CNN (Faster Region Convolution Neural Network) can be divided into two stages [15]. In the first stage, the input data are processed using the feature extractor and the VGG16 model [10]. The result of this process is a map of features that is used in the next stage. In this part, Faster R-CNN consists of two networks. The first is the Region Proposal Network (RPN), which is responsible for generating regions (called a region proposal), on the basis of which the second network performs structure detection. This is directed at those regions that most likely contain objects. As a result, the time needed to generate the proposed regions is reduced (Figure 1).

2.2. Our Method: Faster Edge Region CNN

FER-CNN (Figure 2) was used to train the network, in which our own convolutional network was used, generating a feature map. This network consists of six modules, followed by a Maxpooling and Dropout layer. In addition, the parametric PReLu [Parametric Rectified Linear Unit] activation function was used.

2.3. Single-Shot MultiBox Detector Network

The second stage of research on the impact of the optimization method on the detection and classification of structures was to train the model based on the Single-Shot MultiBox Detector (SSD) network [17]. This model was proposed by Christian Szegedy’s team in 2016 [11] to increase the efficiency and precision of structure detection. It is based on the VGG-16 architecture; however, the fully connected layers were replaced by additional convolution layers, enabling the detection of structures of different sizes and the gradual reduction of the output data size for subsequent layers (Figure 3).

2.4. Mask R-CNN

Mask R-CNN is a deep neural network designed for instance segmentation of a structure, which, unlike Faster R-CNN and SSD, not only provides information about the bounding box but also inlays a mask on the image similar in shape to the outline of the object [35]. This architecture differs significantly from the previously described methods, and it works in two stages: first, it generates suggestions for regions in which the structure may be located; next, it predicts the class of the object, corrects the coordinates of the bounding box and then creates a mask at the pixel level.
By using Mask R-CNN, it is possible to detect a structure in the image and obtain its approximate shape. The disadvantage of this action is that the building is represented with a mask, so the exact location of its edges and their coordinates are unknown. In order to solve this problem, we propose a method that makes it possible to detect buildings in a satellite image while maintaining its membership to one of the defined categories.
This algorithm displays structures belonging to one class in the image and then uses the Ramer–Douglas–Peucker algorithm [36] to minimize the number of points that create the building contour. This operation is performed for each of the seven classes, and the result of the algorithm is an image that shows the boundaries of buildings (while maintaining their classification) and a set of image coordinates of the structure. As a result, we obtain a building contour that consists of a smaller number of points (Figure 4).

2.5. Building Boundary Regularization Method

Given the nature of those structures, which are buildings, it can be seen that, in most cases, the adjacent walls of the building form a right angle with each other. On the basis of the above, we propose using the following algorithm to correct the detected edges while remembering that wall projections can also take other shapes (e.g., arches) (Figure 5).
The proposed method takes the first pair of points as the “base” edge PnPn + 1 and then checks the angle that it creates with the next edge Pn + 1Pn + 2. If the sine value for this angle is in the range from 0.9925 to 1.0000, it is assumed that the angle error is less than 7° (condition I); it corrects this edge by leading a straight line perpendicular to the base edge that passes through the Pn + 1 vertex, and then it projects the point Pn + 2 onto this straight line to create point P’n + 2 (substitutes the coordinates of Pn + 2 with P’n + 2). As a result of this action, a new, corrected building edge is created with a beginning at Pn + 1 and an end at Pn + 2 (Figure 6).
Note that not all building edges intersect at right angles. Therefore, the condition in which the angle sine value is in the range from 0.1219 to 0.9925 (condition II) is considered. For this case, the length of the Pn+1Pn+2 segment is checked first, and if it is less than 5 pixels (in the case of a pixel size equal to 0.5 m, it is a length of less than 2.5 m), this indicates the “truncation” of a corner of the building. This phenomenon is very often found as a result of mask rounding on the corners. In such an instance, the program determines the corner of the building at the point of intersection of the lines that pass through the points Pn, Pn+1 and Pn+2, Pn+3 (it is first checked to determine whether these lines form an angle of 90° ± 7° with each other). If the length of the edge is longer than 5 pixels, the algorithm checks the next pair of edges; if it is less than five, it does not make corrections, but if it is at least 5, it approximates on the basis of these points (Equation (1)). This program also checks to determine whether there is a case in which all the angles of the figure will meet this condition. If so, then the program inscribes them in an ellipse (Equation (2)).
φ ( x ) = a 0 + a 1 x + a 2 x 2 + + a m x m = k = 0 m a k x k
x 2 a 2 + y 2 b 2 = 1
Another condition that the program checks is the case in which the angle value is between 0.0000 and 0.1219 (condition III). If this condition is met, it checks the distance of the point Pn+1 from the line passing through the points Pn, Pn+2. If this distance is less than 5 pixels, the program removes the point Pn+1.
On the basis of the above-mentioned conditions, the algorithm checks all the edges of the figure, which is the first iteration (this algorithm performs three iterations because the differences in the shape of the building outline for a larger number is insignificant) (see Figure 7). In the case of a different spatial resolution, a correction of the distance between points and the straight line should be made.

3. Experiments and Results

This work explores the potential of convolution networks in the detection and classification of buildings in satellite images of urban and suburban areas.
All calculations included in this work were carried out using a PC with an Intel Core i5 CPU, NVIDIA 2070 RTX processor, 16 GB RAM memory and Ubuntu 16.04 operating system. In order to prepare images for network training, the Rasterio (v. 1.1.3), OpenCV (v. 4.1.0) and Numpy (v. 1.16.1) libraries were mainly used, while the Keras (v. 2.2.4) and TensorFlow (v. 1.13.1) libraries were used to implement neural network models.
First, a comparison was performed between Faster R-CNN, FER-CNN, and SSD in terms of their abilities to detect and classify structures, with a focus on the choice of network training hyperparameters because they play a very important role in successful and effective training and, therefore, also in the attainment of results with the best accuracy. The hyperparameters include the number of filters in the convolution layer and the activation function. The ReLU [Rectified Linear Unit] activation function was used in order to train Faster RCNN and SSD networks, while the parametric version of this function (PReLU) was used for Faster Edge Region CNN. Each model was trained for 200 epochs with a batch size of 2. For this purpose, a brief analysis of the impact of the optimization algorithm on the results was carried out. The aim of the optimization is to find the extremum of the set objective function, i.e., the optimal solutions to the problems posed. These algorithms differ from classic optimization methods. They work on the principle of indirect optimization of the performance of trained models by reducing the cost function, which will minimize the expected error, referred to as risk. The reason for using indirect optimization is that only a training database is available, which results in a lack of knowledge about the distribution of generated data. Overall, the optimizer’s task is to adjust the network with data from the loss function. The optimizer allows the best possible efficiency to be obtained for training data. Shortening the process of learning could be achieved by the appropriate choice of hyperparameter. Examples of such optimizers are Momentum, RMSProp and Adam.
Because there is a need to determine the exact location of buildings in the images, the Mask R-CNN performance was tested. With this mask, the buildings were detected by means of polygons with shapes similar to the buildings’ structures. By means of additional processing of the obtained results, the building’s shape was corrected using the RDP algorithm and the algorithm proposed in Section 2.4.

3.1. Experiment Data

The database was created on the basis of satellite images that were obtained using WorldView-2 and Pléiades (see Table 1) satellites and show a fragment of the city of Warsaw and its periphery (Poland).
The studied area covers the western part of Warsaw and is located in a rectangle limited by the 20°52′19″ E, 21°00′07″ E meridians and the 52°14′05″ N, 52°17′58″ N latitudes (Figure 8) in the WGS-84 reference system in the UTM system. From this image, six areas with different urban characteristics are distinguished: (1) apartment blocks with garages, (2) a block of flats with small shops, (3) diversified cases with occlusion, (4) dense block buildings, (5) structures shaded by trees or images with low contrast between buildings and the surroundings and (6) dense single-family houses and terraced buildings.
Before performing operations on the image, in order to increase their resolution, pan sharpening was performed, which increased the resolution of the multispectral image to 0.5 m. These images were divided into smaller ones that measure 512 × 512. We used our own script for this, which divides the image into smaller parts with a vertical step of 250 and horizontal step of 350 pixels. As a result, a database was created consisting of 500 images with red, green, and blue channels and a spatial resolution of 0.5 m, which was divided into three sets of data: training (350 images), validation (75 images) and test (75 images).
In the resulting database, about 12,500 buildings of various types were marked with LabelImg [37]. They were divided into six categories: shopping center, block of flats, church, terraced houses, single-family house and garage (see Figure 9).
Selected types of objects differ from each other by many parameters, including size, height, shape and roof (along with the installations located on it). A crucial feature that allows buildings to be distinguished is their size, which is why Figure 10 shows the difference between the horizontal dimensions of sample buildings in each category. The largest building of the featured classes is the shopping center, while the smallest is the garage (see Figure 10).
Architectural variability within specific classes is an additional difficulty in classification. This phenomenon is very clearly visible among single-family homes. These structures can have different dimensions, shape, number of floors, or even a different type of structure. In the case of terraced buildings (or garages), another difficulty is the occurrence of different values for the ratio of the length of the sides of the structure because it can consist of either three or ten segments (see Figure 11).
The number of image pixels also affects the classification of buildings. In the WorldView-2 images, a small single-family house with dimensions of 8 × 10 m is made up of about 320 pixels, which makes it possible to identify many elements on the surface of the roof (installations, chimneys) and to determine the type of roof. In the case of images with lower resolution, e.g., acquired using the Ikonos satellite (where the resolution of the images is 1 m), the same building will consist of about 80 pixels, which makes it impossible to identify the details of the roof, and so models trained on the basis of these data cannot cope with classification.
The similarity among classes is another problem. This phenomenon is visible in the case of terraced houses and garages. These buildings can be distinguished from each other by the details on their roofs. In the case of images with lower resolution (or quality), the differences between these categories are small, which is why their proper identification on the basis of their surroundings is possible (see Figure 12). Classical methods of building detection in satellite images are very time-consuming and do not allow for division by their destination. Although edge detection or segmentation techniques provide the possibility to detect buildings, they require particular parameter choices for each image. In our method, data preparation for the algorithm and training are the most time-consuming, and when using the network, achieving results depends on image import.
In addition, it should be noted that because of their diverse architecture, these types of structures are characterized highly variable shapes, and besides this, they are often accompanied by shadows, occlusion, noise, deformations, variable lighting or different resolutions (see Table 2). These factors have a significant impact on the algorithms’ performance, especially in urban areas, where building density is very high. In this case, an often-occurring phenomenon is the overlapping of building shadows on neighboring objects, or the “laying of buildings”, which can lead to the merging of many buildings into one.
When working with satellite imagery, these structures are identified only on the basis of roofs and the surroundings. In this case, one must pay attention to the heterogeneity of the roofs and the possibility of additional installations being present on them (which is possible, but not necessary).

3.2. Accuracy Assessment

To assess the accuracy of the detection and classification of structures, three parameters can be used: accuracy detection (AD) (Equation (3)), missing ratio (MR) (Equation (4)), and false-positive classification rate (FPCR) (Equation (5)).
A D = n u m b e r   o f   s t r u c t u r e s   d e t e c t e d n u m b e r   o f   a l l   s t r u c t u r e s
M R = n u m b e r   o f   u n d e t e c t e d   s t r u c t u r e s n u m b e r   o f   a l l   s t r u c t u r e s
F P C R = n u m b e r   o f   i n c o r r e c t l y   c l a s s i f i e d   s t r u c t u r e s n u m b e r   o f   a l l   c l a s s i f i e d   s t r u c t u r e s   d e t e c t e d

3.3. Experimental Results

3.3.1. Results of Faster R-CNN

First, three Faster R-CNN networks were trained using three optimization methods: Adam, Momentum and RMSProp. The training process was carried out for a constant number of iterations.
The results of the optimization are presented in Table 3. The table presents a record of structure detectability in images. This was used to check the correct operation of the models. The assessment was made using parameters that determine the quality of building detection and classification. Detailed assessment results are provided in the Appendix A. Table A1, Table A2 and Table A3 (see Appendix A) compare five parameters that determine the correctness of the network operation: detected objects (W), true objects (T), false objects (F), undetected objects (B) and the number of buildings in the image (S). These parameters allow for an assessment of an algorithm’s results. The main parameters are detected objects, true objects and the number of buildings in the image. Their application shows that the Adam and RMSProp algorithms yield good results (see Table 3). The second of these methods achieves slightly better results for images 1 and 3, but the Adam algorithm is unrivaled compared with the others: in the case of visual analysis, it correctly determines the location of structures in the image.
The classification process of detected structures is also performed most accurately when using the Adam method. In addition, it can be seen that the most recognizable structures are the block of flats, while garages are the least recognizable.

FPCR: False-Positive Classification Rate

From the three parameters that define the accuracy of detection (Table 4), it can be seen that the Adam algorithm provides the best results in the second, fourth, fifth and sixth set, while RMSProp does better in other cases. In addition, the correctness of detecting individual building types was examined. The fewest errors in the classification of buildings occur when using the Adam algorithm. The exception is set 3, for which better results are obtained with the RMSProp algorithm.

3.3.2. Result of Our Method, FER-CNN

To be able to compare the results of these two networks, the same assumptions were made: the same image database and the same set of images for visual assessment were used, and the duration of both networks’ training was equal to 200,000 steps.
The results of the program operation, depending on the optimization used, are presented in the tables below (see Table 5 and Table 6). Significant improvement is visible when fitting the structures into a bounding box, so they contain less shadowing of buildings (Table 5). In addition, as in the first case, the best results in the detection and classification of structures occur for the Adam and RMSProp optimizations.

FPCR: False-Positive Classification Rate

Owing to the modification of the VGG-16 network architecture, significant improvements in the results are noticeable. There is an increase in the accuracy detection parameter (more buildings were detected), and a significant reduction in classification error can be observed. The program operating in this network model classifies structures significantly better, especially those that are particularly vulnerable to errors due to the similarity between classes, e.g., the side-by-side character of the construction of garages (Set 1).

3.3.3. Result of Single-Shot MultiBox Detector (SSD)

In the case of SSD, much better results are achieved for the RMSProp and Momentum algorithms. After applying the accuracy rating parameters, it can be seen that the first one achieves better results. Image 1 is an exception, for which the Momentum model is characterized by a slightly higher detection of buildings (Table 7).
After using the SSD model, one can notice a much larger number of buildings that are not detected in the image than that in the case of Faster R-CNN, while the number of incorrectly classified structures decreases (Table 8). Thus, it can be concluded that the SSD model is more accurate in terms of classification, but less accurate for structure detection.
In the case of SSD architecture, the accuracy detection parameter significantly worsens (depending on the image; for the RMSProp algorithm, it ranges from 0.65 to 0.91), which is also true for the missing ratio (for the same optimization method, it ranges from 0.25 to 0.40). Because fewer buildings are detected, there is a slight improvement in the classification error parameter, whose maximum value is 0.20.

3.3.4. Google Earth Image Database

In order to check the correctness of our method, the additional database was prepared, which consist of photos fragments from Google Earth. These photos show cities of different size, density and type of buildings—among the selected cities are Koszalin, Żyrardów, Opole and Suwałki. A database consists of 500 images, of different resolutions (the smallest image has a resolution of 241 × 306 pixels, and the largest- 856 × 1144 pixels), which were divided into two sets of data: training (400 images) and test (100 images). These images mark over 2200 buildings, which, as in the previous database, were divided into six categories: shopping center, block of flats, church, terraced houses, single-family house and garage. As in previous part, the three networks were trained for each architecture (Faster R-CNN, FER-CNN, SSD), which are differ from each other by the optimization method. The results are presented in Table 9.
As in the previous case of the database, consisting of fragments of satellite photos from the WorldView-2 satellite, the most relevant results can be reached using the FER-CNN network-based algorithm with the Adam optimization method. This algorithm has a significantly higher object detection rate (AD = 92%), while maintaining over 95% of correctness of the buildings classification. Additionally, the FER-CNN network-based algorithm has much higher accuracy in the detection of small objects (e.g., garage), even when they are partly shaded. In case of the total shading of the object and a roof color only slightly contrasting with the surrounding area, this algorithm does not detect these objects, similar to SSD and Faster R-CNN network-based detectors.

3.3.5. Box Detection

When comparing the operation of the above methods for the detection and classification of buildings in satellite images, one can see a significant advantage of our proposed FER-CNN (Table 10). An additional advantage of this network is better resistance to shadows, which is very common in satellite images of urban areas.
Analyzing the comparison in Table 9, it can be seen that, in many cases, the smallest classification error occurs in the case of SSD networks. On this basis, it can be concluded that algorithms working on the basis of this network detect fewer objects, but with a smaller classification error.

3.3.6. Results of Edge Detection Using Mask R-CNN and Ramer–Douglas–Peucker (RDP)

The algorithm based on Mask R-CNN allows for the detection and classification of a structure using only 200 images in the training base. In addition, in order to check the algorithm’s ability to detect garages and small halls in the pictures, a seventh category of buildings was added—halls. It works well with objects of various sizes and those with low visibility due to low contrast with the surroundings, as well as with objects that are obscured by shadows. Moreover, this algorithm works well when it detects small structures such as garages or garden sheds (assumed to be in the garage category), as well as slightly larger warehouses (shopping centers).
However, attention should be paid to the alignment of the mask generated by the algorithm with the actual contour of the building. For structures with simple shapes, this error is much smaller than it is for structures with complex shapes, e.g., churches and shopping centers (Figure 13a,b).
To obtain building outlines based on the generated masks, the RDP algorithm was used, followed by the proposed building boundary regularization method. Its application introduces a significant improvement in mapping the edges of buildings. In addition, as can be seen in Figure 13, distinguishing buildings with unusual architecture, such as churches or single-family terraced houses, is difficult, but our method can successfully recognize and locate them with faithful edge retention.
The algorithm we propose allows us to improve the edges generated by the RDP algorithm, but it depends on the quality of masks generated by Mask R-CNN. For Mask R-CNN, which was trained on the basis of a small database of 200 images, the method of building shape correction that we propose allows us to improve over 67% of angles on a right angle, as well as over 83% of edges. Due to these operations, the shape of the detected buildings is much closer to the real one. In the case of an increasing value of the parameter dmin (e.g., dmin = 10) and angle a1 (e.g., a1 = 12), we can increase the number of correctly made corrections to 89% for angles and to 93% for edges.

4. Discussion

This article presents the results of a comparative analysis of state-of-the-art CNN-based object detection method for determining and classifying buildings, along with a new method of improving building boundary regularization. All networks used in the study were trained on the basis of a dataset consisting of 500 images with red, green, and blue channels and a spatial resolution of 0.5 m. The best results were obtained with FER-CNN. The obtained test results prove the universality of the presented approach for high-resolution satellite imagery for the detection of buildings.
Comparing the results obtained by using Faster R-CNN, FER-CNN and SSD, it can be concluded that (1) models based on the Adam algorithm achieved good results only for Faster R-CNN, while they generated errors when used in SSD networks. (2) The modification of the VGG network resulted in the better detection of structures in the image, and it was more resistant to shadows. (3) The time required to train an SSD network was approximately three times longer than that for Faster R-CNN. (4) The SSD-based model did not detect buildings that had low contrast with the surroundings, but it generated fewer errors when classifying objects. (5) The size of the files needed to run Faster R-CNN was about five times larger than that for an SSD network.
On the basis of the comparative analyses performed, the effectiveness of our method was 97.5%; however, there is a significant difference between classes. In most cases, buildings that represent categories differ significantly, including in size and appearance (e.g., garages and shopping centers). In addition, it should be kept in mind that these networks mainly make mistakes when classifying the above objects. The proposed network is most often wrong in the classification of garages (especially those in terraced houses), where the detection accuracy error is 37% and the error of classification is only 6%. Therefore, before using the presented networks for utility purposes, it is necessary to improve their algorithms in this area.
Comparing our method with classic techniques that allow for the detection of buildings, the time needed to detect the defined categories of buildings is significantly reduced. In addition, this approach does not require the selection of individual parameters for each image.
In relation to other similar studies using the Faster R-CNN model based on the ResNet101 network, researchers obtained a classification accuracy of 99% with 2000 epochs, whereas building extraction using support vector machine achieved 88.3% [38]. WorldView-2 images were also part of our test material. Tests performed on these data also confirmed our conclusions that convolutional neural networks better extract features and detect structures in high-resolution images. The training time and prediction time of our algorithm is not much longer than Faster R-CNN, which is due to the higher value of the input resolution parameter (Table 11). Other studies related to building detection have used the architecture of Res-U-N and Guided Filtering. An accuracy of 97% was achieved but without faithful reproduction of edges [21]. Other research results obtained by [39] showed that the use of an artificial neural network could produce an accuracy of 91.7%.
From our research results and the analysis of the data contained in Table 10, it can be seen that our method, compared with the others investigated, makes it possible to increase the number of detected and classified buildings in fragments of satellite images. In addition, the correctness of detecting buildings using bounding boxes significantly increases. However, we believe that these results can be improved by:
  • Increasing the training base by adding images of other types of areas and buildings, paying particular attention to increasing the number of images of garages, shopping centers and churches;
  • Increasing the number of iterations during network training;
  • Further modifying Faster R-CNN.
In the second part of the research, buildings were detected using Mask R-CNN, and then the building boundary regularization process was performed. Our proposed method significantly improved the shape of the detected boundaries. However, a certain limitation of our method (as can be seen in Figure 13) is that the identified edges deviate slightly from the actual shape of the building. The reason for this phenomenon is the fact that completely covering a structure with a mask is a significant problem. The result of this may be too small a database or too few iterations made during network training. Apart from this disadvantage, the network detects and classifies structures very well, even those with a small surface area, although it must be taken into consideration that it sometimes fails to detect small structures, which do not have sufficient contrast with their surroundings.

5. Conclusions

In this work, the capabilities of neural networks in the detection and classification of buildings located in satellite images were examined. In the first stage of research, three network models were examined: Faster R–CNN, FER-CNN and SSD; additionally, the optimization method was taken into account. For the first of these architectures, the model using the Adam algorithm achieved the best results (it gave slightly better results than the Momentum algorithm). After performing the modification of the VGG network, a significant increase in the correctness of detecting buildings in images was noted, especially for the Adam algorithm, although the RMSProp algorithm also performed very well. In the case of the SSD-based model, the best results were obtained by using the RMSProp algorithm, while the worst performance was found in the Adam algorithm.
In the second stage of this work, the capabilities of the algorithm based on Mask R-CNN for the detection and classification of objects were investigated. Then, because of the irregular shape of the training polygons, a method was proposed that allowed us to obtain an image with the edges of classified buildings marked on it.
This algorithm coped well with the detection and classification of structures from satellite imagery; however, the generated edges were slightly different from the actual shape of the building, especially in the case of buildings with a complex structure, e.g., churches. The application of the method, which performs a correction of the edges of buildings, significantly improved the rendition of their actual shapes. However, attention should be paid to the size of the database on the basis of which the Mask R-CNN model was trained: in the first case, it contained 100 images, and in the second, it contained 200 images. This shows how great the possibilities are of the application of neural networks, even in the case of such a small amount of data.
From the above, it can also be stated that building extraction is still an important and current research topic and requires further experiments. In future work, we will focus on constructing a network for more reliable building classification, improving model performance for accurate edge extraction and developing a new model for using relationships between specific groups of buildings. In addition, we plan to design and train a neural network to detect small buildings and buildings with irregular shapes that are partially obscured by shadows or other occlusions. In addition, we plan to increase the efficiency of the proposed method and increase its automation.

Author Contributions

Formal analysis M.K.; Methodology, K.R. and M.K.; Project administration, M.K.; Software, K.R.; Validation, M.K.; Visualization, K.R.; Writing—original draft, K.R. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This paper has been supported by the Military University of Technology, the Faculty of Civil Engineering and Geodesy.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Source Code and Database

Appendix A

Table A1. Summary of structure detection for Sets 1–6: Faster R-CNN architecture.
Table A1. Summary of structure detection for Sets 1–6: Faster R-CNN architecture.
Set 1Set 2Set 3Set 4Set 5Set 6
AMRAMRAMRAMRAMRAMR
garageW446100000000100000
T446000000000000000
F000100000000100000
B775000000000000000
S111111000000000000000
terraced houseW65500032300100026123
T53300022200000025122
F122000101001000101
B577000111000000113514
S101010000333000000363636
Single-family houseW71214000232000223738871
T466000111000000605959
F368000121000223132912
B322000000000000455
S888000111000000646464
block of flatsW212423191921747363334161516000
T211921181719637363333161516000
F052122110001000000
B351242140588232000
S242424212121777414141181818000
churchW000000111000000000
T000000111000000000
F000000000000000000
B000000111000000000
S000000222000000000
Shopping centerW000111111100121000
T000111111100111000
F000000000000000000
B000111000122111000
S000222111222222000
W: detected objects; T: true objects; F: false objects; B: undetected objects; S: number of buildings on the image; A: Adam optimizer; M: Momentum; R: RMSProp optimizer.
Table A2. Summary of structure detection for Sets 1–6: FER-CNN architecture.
Table A2. Summary of structure detection for Sets 1–6: FER-CNN architecture.
Set 1Set 2Set 3Set 4Set 5Set 6
AMRAMRAMRAMRAMRAMR
garageW578100000000200000
T577000000000000000
F000100000000000000
B643000000000000000
S111111000000000000000
terraced houseW73300032300100028024
T62300022200000027023
F110000101001000101
B00000011100000093613
S101010000333000000363636
Single-family houseW111315000232000223729174
T567000111000000626363
F671000121000223102811
B100000000000000211
S888000111000000646464
block of flatsW232424221821747383635161516000
T232022211720637383634161516000
F042111110001000000
B142041140357232000
S242424212121777414141181818000
churchW000000111000000000
T000000111000000000
F000000000000000000
B000000111000000000
S000000222000000000
Shopping centerW000111111100121000
T000111111100111000
F000000000000000000
B000111000122111000
S000222111222222000
W: detected objects; T: true objects; F: false objects; B: undetected objects; S: number of buildings on the image; A: Adam optimizer; M: Momentum; R: RMSProp optimizer.
Table A3. Summary of structure detection for sets 1-6: SSD architecture.
Table A3. Summary of structure detection for sets 1-6: SSD architecture.
Set 1Set 2Set 3Set 4Set 5Set 6
AMRAMRAMRAMRAMRAMR
garageW100000000000000000
T000000000000000000
F100000000000000000
B111111000000000000000
S111111000000000000000
terraced houseW07800002301000001413
T06600001200000001013
F112000011010000040
B1044000321000000362623
S101010000333000000363636
Single-family houseW09700000000000046857
T05400000000000005349
F0430000000000004158
B834000111000000641115
S888000111000000646464
block of flatsW42221418212772293211113000
T02120018200660293201113000
F411401011200000000
B24342131711411291875000
S242424212121777414141181818000
churchW000000011000000000
T000000001000000000
F000000000000000000
B000000211000000000
S000000222000000000
Shopping centerW000030111000111010
T000010111000111000
F000020000000000010
B000212000222111000
S000222111222222000
W: detected objects; T: true objects; F: false objects; B: undetected objects; S: number of buildings on the image; A: Adam optimizer; M: Momentum; R: RMSProp optimizer.

References

  1. Li, L.; Wang, C.; Zhang, H.; Zhang, B.; Wu, F. Urban Building Change Detection in SAR Images Using Combined Differential Image and Residual U-Net Network. Remote Sens. 2019, 11, 1091. [Google Scholar] [CrossRef] [Green Version]
  2. Cui, B.; Zhang, Y.; Yan, L.; Wei, J.; Wu, H. An Unsupervised SAR Change Detection Method Based on Stochastic Subspace Ensemble Learning. Remote Sens. 2019, 11, 1314. [Google Scholar] [CrossRef] [Green Version]
  3. Neuville, R.; Pouliot, J.; Poux, F.; Billen, R. 3D Viewpoint Management and Navigation in Urban Planning: Application to the Exploratory Phase. Remote Sens. 2019, 11, 236. [Google Scholar] [CrossRef] [Green Version]
  4. Luo, N.; Wan, T.; Hao, H.; Lu, Q. Fusing High-Spatial-Resolution Remotely Sensed Imagery and OpenStreetMap Data for Land Cover Classification Over Urban Areas. Remote Sens. 2019, 11, 88. [Google Scholar] [CrossRef] [Green Version]
  5. Qin, Y.; Wu, Y.; Li, B.; Gao, S.; Liu, M.; Zhan, Y. Semantic Segmentation of Building Roof in Dense Urban Environment with Deep Convolutional Neural Network: A Case Study Using GF2 VHR Imagery in China. Sensors 2019, 19, 1164. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Wang, J.; Qin, Q.; Ye, X.; Wang, J.; Yang, X.; Qin, X. A Survey of Building Extraction Methods from Optical High Resolution Remote Sensing Imagery. Remote Sens. Technol. Appl. 2016, 31, 653–662. [Google Scholar]
  7. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  8. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Boston, MA, USA, 14–18 September 2015; pp. 234–241. [Google Scholar]
  9. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
  10. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  11. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  12. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  13. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semanticsegmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  14. Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
  15. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real- Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
  16. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  17. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
  18. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  19. Badrinarayanan, V.; Kendall, A.; Cipoll, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
  20. Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic Building Segmentation of Aerial Imagery UsingMulti-Constraint Fully Convolutional Networks. Remote Sens. 2018, 10, 407. [Google Scholar] [CrossRef] [Green Version]
  21. Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
  22. Vakalopoulou, M.; Karantzalos, K.; Komodakis, N.; Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Milan, Italy, 26–31 July 2015; pp. 1873–1876. [Google Scholar]
  23. Boonpook, W.; Tan, Y.; Ye, Y.; Torteeka, P.; Torsri, K.; Dong, S. A Deep Learning Approach on Building Detection from Unmanned Aerial Vehicle-Based Images in Riverbank Monitoring. Sensors 2018, 18, 3921. [Google Scholar] [CrossRef] [Green Version]
  24. Zuo, T.; Feng, J.; Chen, X. HF-FCN: Hierarchically Fused Fully Convolutional Network for Robust Building Extraction. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Cham, Switzerland, 2016; pp. 291–302. [Google Scholar]
  25. Chen, K.; Fu, K.; Gao, X.; Yan, M.; Sun, X.; Zhang, H. Building extraction from remote sensing images with deep learning in a supervised manner. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, TX, USA, 23–28 July 2017; pp. 1672–1675. [Google Scholar]
  26. Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
  27. Zou, Z.; Shi, T.; Li, W.; Zhang, Z.; Shi, Z. Do Game Data Generalize Well for Remote Sensing Image Segmentation? Remote Sens. 2020, 12, 275. [Google Scholar] [CrossRef] [Green Version]
  28. Perez, H.; Tah, J.; Mosavi, A. Deep learning for detecting building defects using convolutional neural networks. Sensors 2019, 19, 3556. [Google Scholar] [CrossRef] [Green Version]
  29. Zhang, L.; Wu, J.; Fan, Y.; Gao, H.; Shao, Y. An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN. Sensors 2020, 20, 1465. [Google Scholar] [CrossRef] [Green Version]
  30. Ghandour, A.J.; Jezzini, A.A. Autonomous Building Detection Using Edge Properties and Image Color Invariants. Buildings 2018, 8, 65. [Google Scholar] [CrossRef] [Green Version]
  31. Bai, T.; Pang, Y.; Wang, J.; Han, K.; Luo, J.; Wang, H.; Lin, J.; Wu, J.; Zhang, H. An Optimized Faster R-CNN Method Based on DRNet and RoI Align for Building Detection in Remote Sensing Images. Remote Sens. 2020, 12, 762. [Google Scholar] [CrossRef] [Green Version]
  32. Maltezos, E.; Doulamis, A.; Doulamis, N.; Ioannidis, C. Building extraction from LiDAR data applying deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2018, 16, 155–159. [Google Scholar] [CrossRef]
  33. Ji, S.; Shen, Y.; Lu, M.; Zhang, Y. Building Instance Change Detection from Large-Scale Aerial Images using Convolutional Neural Networks and Simulated Samples. Remote Sens. 2019, 11, 1343. [Google Scholar] [CrossRef] [Green Version]
  34. Li, W.; He, C.; Fang, J.; Zheng, J.; Fu, H.; Yu, L. Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data. Remote Sens. 2019, 11, 403. [Google Scholar] [CrossRef] [Green Version]
  35. Huang, Z.; Zhong, Z.; Sun, L.; Huo, Q. Mask R-CNN with pyramid attention network for scene text detection. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 764–772. [Google Scholar]
  36. Hershberger, J.E.; Snoeyink, J. Speeding up the Douglas-Peucker Line-Simplification Algorithm; University of British Columbia, Department of Computer Science: Vancouver, BC, Cananda, 1992. [Google Scholar]
  37. Tzutalin. LabelImg. Git Code. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 20 February 2020).
  38. Shetty, A.R.; Krishna Mohan, B. Building Extraction in High Spatial Resolution Images Using Deep Learning Techniques. In Computational Science and Its Applications—ICCSA 2018; Gervasi, O., Murgante, B., Misra, S., Stankova, E., Torre, C.M., Rocha, A.M., Taniar, D., Apduhan, B.O., Tarantino, E., Ryu, Y., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 10962. [Google Scholar] [CrossRef]
  39. Lari, Z.; Ebadi, H. Automatic extraction of building features from high resolution satellite images usingartificial neural networks. In Proceedings of the ISPRS Conference on Information Extraction from SAR and Optical Data, with Emphasis on Developing Countries, Istanbul, Turkey, 16–18 May 2007. [Google Scholar]
Figure 1. Architecture of Faster Region Convolution Neural Network (Faster R-CNN).
Figure 1. Architecture of Faster Region Convolution Neural Network (Faster R-CNN).
Remotesensing 12 02240 g001
Figure 2. VGG-16 modification.
Figure 2. VGG-16 modification.
Remotesensing 12 02240 g002
Figure 3. Single-Shot Multibox Detector (SSD) detector architecture.
Figure 3. Single-Shot Multibox Detector (SSD) detector architecture.
Remotesensing 12 02240 g003
Figure 4. Scheme of the building edge detection algorithm.
Figure 4. Scheme of the building edge detection algorithm.
Remotesensing 12 02240 g004
Figure 5. Scheme of the proposed building boundary regularization method.
Figure 5. Scheme of the proposed building boundary regularization method.
Remotesensing 12 02240 g005
Figure 6. Building boundary regularization example.
Figure 6. Building boundary regularization example.
Remotesensing 12 02240 g006
Figure 7. The result of recreating the building’s real boundaries.
Figure 7. The result of recreating the building’s real boundaries.
Remotesensing 12 02240 g007
Figure 8. Localization of the study area.
Figure 8. Localization of the study area.
Remotesensing 12 02240 g008
Figure 9. Database components.
Figure 9. Database components.
Remotesensing 12 02240 g009
Figure 10. Comparing the lengths of buildings.
Figure 10. Comparing the lengths of buildings.
Remotesensing 12 02240 g010
Figure 11. (a) Block of flats, (b) single-family house, (c) terraced housing, (d) garage, (e) shopping center, (f) church.
Figure 11. (a) Block of flats, (b) single-family house, (c) terraced housing, (d) garage, (e) shopping center, (f) church.
Remotesensing 12 02240 g011aRemotesensing 12 02240 g011b
Figure 12. Example of building similarity.
Figure 12. Example of building similarity.
Remotesensing 12 02240 g012
Figure 13. (a) Mask R-CNN (100 images); (b) Mask R-CNN (200 images); (c) algorithm based on Ramer–Douglas–Peucker (RDP); (d) our method (third iteration).
Figure 13. (a) Mask R-CNN (100 images); (b) Mask R-CNN (200 images); (c) algorithm based on Ramer–Douglas–Peucker (RDP); (d) our method (third iteration).
Remotesensing 12 02240 g013
Table 1. Technical characteristics of WorldView-2 and Pléiades satellites.
Table 1. Technical characteristics of WorldView-2 and Pléiades satellites.
SatelliteLaunch DateOrbital Altitude [km]Pan Resolution at Nadir [m]MS Resolution at Nadir [m]Swath Width [km]
WorldView-28 October 20097700.461.816.5
Pléiades2 December 20126950.502.020
Table 2. Factors influencing the accuracy of building detection and classification.
Table 2. Factors influencing the accuracy of building detection and classification.
OcclusionLighting
Remotesensing 12 02240 i001 Remotesensing 12 02240 i002 Remotesensing 12 02240 i003 Remotesensing 12 02240 i004
BackgroundScale
Remotesensing 12 02240 i005 Remotesensing 12 02240 i006 Remotesensing 12 02240 i007 Remotesensing 12 02240 i008
DeformationType
Remotesensing 12 02240 i009 Remotesensing 12 02240 i010 Remotesensing 12 02240 i011 Remotesensing 12 02240 i012
Resolution variationShadows
Remotesensing 12 02240 i013 Remotesensing 12 02240 i014 Remotesensing 12 02240 i015 Remotesensing 12 02240 i016
Table 3. Detection of structures using the Faster R-CNN architecture.
Table 3. Detection of structures using the Faster R-CNN architecture.
Faster R-CNN
Remotesensing 12 02240 i017
AdamMomentumRMSProp
Remotesensing 12 02240 i018 Remotesensing 12 02240 i019 Remotesensing 12 02240 i020
Remotesensing 12 02240 i021 Remotesensing 12 02240 i022 Remotesensing 12 02240 i023
Remotesensing 12 02240 i024 Remotesensing 12 02240 i025 Remotesensing 12 02240 i026
Remotesensing 12 02240 i027 Remotesensing 12 02240 i028 Remotesensing 12 02240 i029
Remotesensing 12 02240 i030 Remotesensing 12 02240 i031 Remotesensing 12 02240 i032
Remotesensing 12 02240 i033 Remotesensing 12 02240 i034 Remotesensing 12 02240 i035
Table 4. Accuracy assessment of the model’s operation: Faster R-CNN.
Table 4. Accuracy assessment of the model’s operation: Faster R-CNN.
ADMRFPCRCorrectness of Class Detection
GTFBCHS
Set 1A0.720.340.120.360.600.630.88--
M0.850.400.410.360.200.500.79--
R0.910.280.330.550.300.500.88--
Set 2A0.910.130.11---0.90-0.50
M0.870.220.11---0.90-0.50
R0.960.130.10---1.00-0.50
Set 3A0.880.190.27-0.671.000.860.501.00
M0.690.380.38-0.671.000.430.501.00
R0.880.130.17-0.671.001.000.501.00
Set 4A0.860.140.00---0.88-0.50
M0.770.230.00---0.80-1.00
R0.810.230.06---0.80-0.00
Set 5A1.000.150.18---0.89-0.50
M0.950.200.13---0.83-1.00
R1.000.150,18---0.89-0.50
Set 6A0.990.150.16-0.690.94---
M0.890.400.48-0.030.92---
R0.940.190.16-0.610.92---
A: Adam optimizer; M: Momentum; R: RMSProp optimizer; AD: accuracy detection; MR: missing ratio; FPCR: false-positive classification rate; G: garage; T: terrace house; F: single-family house; B: block of flats; CH: church; S: shopping center.
Table 5. Detection of structures using the Faster Edge Region Convolution Neural Network (FER-CNN) architecture.
Table 5. Detection of structures using the Faster Edge Region Convolution Neural Network (FER-CNN) architecture.
FER-CNN
Remotesensing 12 02240 i036
AdamMomentumRMSProp
Remotesensing 12 02240 i037 Remotesensing 12 02240 i038 Remotesensing 12 02240 i039
Remotesensing 12 02240 i040 Remotesensing 12 02240 i041 Remotesensing 12 02240 i042
Remotesensing 12 02240 i043 Remotesensing 12 02240 i044 Remotesensing 12 02240 i045
Remotesensing 12 02240 i046 Remotesensing 12 02240 i047 Remotesensing 12 02240 i048
Remotesensing 12 02240 i049 Remotesensing 12 02240 i050 Remotesensing 12 02240 i051
Remotesensing 12 02240 i052 Remotesensing 12 02240 i053 Remotesensing 12 02240 i054
Table 6. Accuracy assessment of the model’s operation: FER-CNN.
Table 6. Accuracy assessment of the model’s operation: FER-CNN.
ADMRFPCRCorrectness of Class Detection
GTFBCHS
Set 1A0.870.150.130.450.600.630.96--
M0.890.150.230.640.200.750.83--
R0.940.090.060.640.300.880.92--
Set 2A1.040.040.09---1.00-0.50
M0.830.220.04---0.81-0.50
R0.960.090.04---0.95-0.50
Set 3A1.000.210.21-0.671.000.860.501.00
M0.790.430.21-0.671.000.430.501.00
R1.000.140.14-0.671.001.000.501.00
Set 4A0.910.090.00---0.93-0.50
M0.840.160.00---0.88-0.00
R0.840.210.05---0.83-0.00
Set 5A1.050.150.10---0.89-0.50
M0.950.200.15---0.83-0.50
R1.000.150.15---0.89-0.50
Set 6A1.000.110.11-0.750.97---
M0.910.370.28-0.000.98---
R0.980.140.12-0.640.98---
A: Adam optimizer; M: Momentum; R: RMSProp optimizer; AD: accuracy detection; MR: missing ratio; FPCR: false-positive classification rate; G: garage; T: terrace house; F: single-family house; B: block of flats; CH: church; S: shopping center.
Table 7. Detection of structures using the SSD architecture.
Table 7. Detection of structures using the SSD architecture.
Single-Shot MultiBox Detector
Remotesensing 12 02240 i055
AdamMomentumRMSProp
Remotesensing 12 02240 i056 Remotesensing 12 02240 i057 Remotesensing 12 02240 i058
Remotesensing 12 02240 i059 Remotesensing 12 02240 i060 Remotesensing 12 02240 i061
Remotesensing 12 02240 i062 Remotesensing 12 02240 i063 Remotesensing 12 02240 i064
Remotesensing 12 02240 i065 Remotesensing 12 02240 i066 Remotesensing 12 02240 i067
Remotesensing 12 02240 i068 Remotesensing 12 02240 i069 Remotesensing 12 02240 i070
Remotesensing 12 02240 i071 Remotesensing 12 02240 i072 Remotesensing 12 02240 i073
Table 8. Accuracy assessment of the model’s operation: SSD architecture.
Table 8. Accuracy assessment of the model’s operation: SSD architecture.
ADMRFPCRCorrectness of Class Detection
GTFBCHS
Set 1A0.091.000.110.000.000.000.00--
M0.720.400.110.000.600.630.88--
R0.680.430.110.000.600.500.83--
Set 2A0.171.000.17---0.00-0.00
M0.910.170.09---0.86-0.50
R0.910.130.04---0.95-0.00
Set 3A0.210.930.00-0.000.000.000.001.00
M0.790.360.14-0.330.000.860.001.00
R0.860.290.14-0.670.000.860.501.00
Set 4A0.050.950.05---0.00-0.00
M0.700.280.02---0.71-0.00
R0.740.260.00---0.78-0.00
Set 5A0.100.950.00---0.00-0.50
M0.600.400.00---0.61-0.50
R0.650.250.00---0.72-0.00
Set 6A0.041.000.00-0.000.00---
M0.830.370.20-0.280.83---
R0.700.380.08-0.360.77---
A: Adam optimizer; M: Momentum; R: RMSProp optimizer; AD: accuracy detection; MR: missing ratio; FPCR: false-positive classification rate; G: garage; T: terrace house; F: single-family house; B: block of flats; CH: church; S: shopping center.
Table 9. Detection of structures using the Faster R-CNN, FER-CNN, SSD architecture.
Table 9. Detection of structures using the Faster R-CNN, FER-CNN, SSD architecture.
Remotesensing 12 02240 i074
Faster R-CNNFER-CNNSSD
Adam Remotesensing 12 02240 i075 Remotesensing 12 02240 i076 Remotesensing 12 02240 i077
Momentum Remotesensing 12 02240 i078 Remotesensing 12 02240 i079 Remotesensing 12 02240 i080
RMSProp Remotesensing 12 02240 i081 Remotesensing 12 02240 i082 Remotesensing 12 02240 i083
Adam Remotesensing 12 02240 i084 Remotesensing 12 02240 i085 Remotesensing 12 02240 i086
Momentum Remotesensing 12 02240 i087 Remotesensing 12 02240 i088 Remotesensing 12 02240 i089
RMSProp Remotesensing 12 02240 i090 Remotesensing 12 02240 i091 Remotesensing 12 02240 i092
Adam Remotesensing 12 02240 i093 Remotesensing 12 02240 i094 Remotesensing 12 02240 i095
Momentum Remotesensing 12 02240 i096 Remotesensing 12 02240 i097 Remotesensing 12 02240 i098
RMSProp Remotesensing 12 02240 i099 Remotesensing 12 02240 i100 Remotesensing 12 02240 i101
Table 10. Comparing the results for Faster R-CNN, FER-CNN and SSD.
Table 10. Comparing the results for Faster R-CNN, FER-CNN and SSD.
Faster R-CNNOur FER-CNNSSD
ADMRFPCRADMRFPCRADMRFPCR
Set 1A0.720.340.120.870.150.130.091.000.11
M0.850.400.410.890.150.230.720.400.11
R0.910.280.330.940.090.060.680.430.11
Set 2A0.910.130.111.000.040.090.171.000.17
M0.870.220.110.830.220.040.910.170.09
R0.960.130.100.960.090.040.910.130.04
Set 3A0.880.190.271.000.210.210.210.930.00
M0.690.380.380.790.430.210.790.360.14
R0.880.130.171.000.140.140.860.290.14
Set 4A0.860.140.000.910.090.000.050.950.05
M0.770.230.000.840.160.000.700.280.02
R0.810.230.060.840.210.050.740.260.00
Set 5A1.000.150.181.000.150.100.100.950.00
M0.950.200.130.950.200.150.600.400.00
R1.000.150.181.000.150.150.650.250.00
Set 6A0.990.150.161.000.110.110.041.000.00
M0.890.400.480.910.370.280.830.370.20
R0.940.190.160.980.140.120.091.000.11
Table 11. Comparing the training/prediction time for Faster R-CNN, FER-CNN and SSD.
Table 11. Comparing the training/prediction time for Faster R-CNN, FER-CNN and SSD.
Input ResolutionTraining TimePrediction Time
Faster R-CNN224 × 22415 h0.8 s
SSD300 × 30035 h0.9 s
FER-CNN512 × 51219 h1.0 s

Share and Cite

MDPI and ACS Style

Reda, K.; Kedzierski, M. Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks. Remote Sens. 2020, 12, 2240. https://doi.org/10.3390/rs12142240

AMA Style

Reda K, Kedzierski M. Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks. Remote Sensing. 2020; 12(14):2240. https://doi.org/10.3390/rs12142240

Chicago/Turabian Style

Reda, Kinga, and Michal Kedzierski. 2020. "Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks" Remote Sensing 12, no. 14: 2240. https://doi.org/10.3390/rs12142240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop