Detecting Building Edges from High Spatial Resolution Remote Sensing Imagery Using Richer Convolution Features Network

As the basic feature of building, building edges play an important role in many fields such as urbanization monitoring, city planning, surveying and mapping. Building edges detection from high spatial resolution remote sensing (HSRRS) imagery has always been a long-standing problem. Inspired by the recent success of deep-learning-based edge detection, a building edge detection model using a richer convolutional features (RCF) network is employed in this paper to detect building edges. Firstly, a dataset for building edges detection is constructed by the proposed most peripheral constraint conversion algorithm. Then, based on this dataset the RCF network is retrained. Finally, the edge probability map is obtained by RCF-building model, and this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface. The experimental results suggest that RCF-building model can detect building edges accurately and completely, and that this model has an edge detection F-measure that is at least 5% higher than that of other three typical building extraction methods. In addition, the ablation experiment result proves that using the most peripheral constraint conversion algorithm can generate more superior dataset, and the involved refinement algorithm shows a higher F-measure and better visual effect contrasted with the non-maximal suppression algorithm.


Introduction
Buildings are one of the most important and most frequently updated parts of urban geographic databases [1].As an important and fundamental feature for building description, the building edges detection plays a key role during building extraction [2,3].Building edges detection has extensive applications in real estate registration, disaster monitoring, urban mapping and regional planning [4][5][6].With the rapid development of remote sensing imaging technology, the number of high spatial resolution remote sensing (HSRRS) imagery has increased dramatically.HSRRS imagery have improved the spectral features of objects and highlighted information on the structure, texture, and other details of the objects.At the same time, they also brought severe image noise, "different objects with similar spectrum" and other problems [7].In addition, due to the diversity of the structure of the buildings themselves and the complexity of the surroundings, the detection of building edges from HSRRS imagery is a challenge in the field of computer vision and remote sensing urban application.
In the rich history of edge detection, typically, the early edge detectors were designed by gradient and intensity.Later, researchers began to use artificial design features to detect edges.But these traditional edge detection algorithms mainly rely on handcrafted low-level features to detect edges, whose accuracy are difficult to guarantee and cannot adapt to application.However, with the rapid progress of artificial intelligence, deep learning has excellent performance in the field of natural image edge detection.N4-fields [8], DeepContour [9], DeepEdge [10], HFL [11], HED [12], and richer convolution features (RCF) network [13] were successively proposed.The accuracy of their test results on the BSDS500 [14] dataset has been continuously improved, while the accuracy of the newly proposed RCF network has even exceeded the human performance.
Lots of studies have shown that the deep-learning-based edge detection model can not only detect the edge of the image effectively, but also generate a higher accuracy than the traditional edge detection algorithm.However, it is not applicable to directly extract building edges from HSRRS imagery by using pre-trained deep learning network.The reasons come as follows: • The dataset used in network training is natural image rather than remote sensing imagery.
Remote sensing imagery has some features that natural images do not possess such as resolution information [15] and spatial autocorrelation.

•
The remote sensing imagery has other superfluous objects in addition to the building.The network trained by the natural image cannot identify the edges of a certain object, so it is difficult to obtain the building edges directly through the pre-trained deep learning network.
Although it is difficult to acquire a high quality building edges dataset for deep learning, the limitation of the data can be overcome by modifying the existing datasets.Due to the special architecture of RCF and its excellent performance in the deep-learning-based edge detection, this paper presents a new method to detect building edges.Using the most peripheral constraint conversion algorithm, a high-quality HSRRS imagery building edges dataset for deep learning is built for the first time.This paper constructs a building edges detection model by fine-tuning the pre-trained RCF network with this self-build dataset, and the generated RCF-building model can exclusively detect the building edges.In the post-processing stage, this paper involves a geomorphological concept to refine the edge probability map generated by the RCF-building model and obtains accurate building edges.In particular, the advantage of the RCF network special architecture is exploited, which can make full use of all the convolution layers to improve the edge detection accuracy.
The rest of this paper is organized as follows.In Section 2, we briefly present the related work.RCF-based building edges detection model is described in Section 3. Section 4 presents the experiment and contrast results and analyzes the performance of the proposed methods.Finally, the discussion and conclusions are drawn in Sections 5 and 6, respectively.

Related Work
Although there are various edge detection algorithms and theories, but there is a great gap between theory and application, only considering the edge detection algorithm cannot directly extract buildings from imagery.As the edge detection algorithm does not have the function to distinguish what kind of object the edge belongs to, it is difficult to obtain the building edges directly by the edge detection.The previous building edges detection methods can be grouped into the following 3 categories:

•
Region-driven methods.The building region feature and the edge feature are the important elements of the building description.Under certain circumstances, building edges can be converted from building region.Various classification strategies were utilized to extract building region, here are only a few classification strategies for HSRRS imagery: N Object-based image analysis (OBIA) extraction method has gradually been accepted as an efficient method for extracting detailed information from HSRRS imagery [7,[32][33][34][35][36][37][38][39].
For example, references [7,[32][33][34][35][36][37] comprehensively used object-based image segmentation and various features of objects such as spectrum, texture, shape, and spatial relation to detect buildings.Due to the scale parameter has an important influence on OBIA, Guo et al. [38] proposed a parameter mining approach to mine parameter information for building extraction.In addition, Liu et al. [39] adopted the probabilistic Hough transform to delineate building region which extract by multi-scale object oriented classification, and result showed that with the boundary constraint, most rectangular building roofs can be correctly detected, extracted, and reconfigured.

N
Extraction method based on deep learning is a long-standing problem in recent years [40][41][42][43][44][45][46][47][48].References [40][41][42][43][44][45] designed an image segmentation using convolutional neural network, full convolutional network or other network, to effectively extract building region from imagery.The above research is still pixel-level-based, references [46][47][48] proposed superpixel-based convolution neural network (SML-CNN) model in hyperspectral image classification in which superpixels are taken as the basic analysis unit instead of pixels.Compared to other deep-learning-based methods, superpixel-based method gain promising classification results.Gao et al. [49] combined counter map with fully convolutional neural networks to offer a higher level of detection capabilities on image, which provided a new idea for building detection.In addition, constantly proposed theories, such as transfer hashing [50] and structured autoencoders [51] can also be introduced into this application field to solve problems, such as data sparsity and data mining.
• Auxiliary-information-based methods.Due to the complexity of the structure and surrounding environment of the building, many scholars have proposed the method of extracting the building by the shadow, stereoscopic aerial image or digital elevation model (DEM) data to assist the building extraction.Liow et al. [59] pioneering proposed a new idea of using shadow to extract buildings.Later, research in [59][60][61][62] proposed to identify and extract buildings based on the shadow features and graph based segmentation in high-resolution remote sensing imagery.
In addition, local contrast in the image where shadow and building interdepend will be increase.
Among the methods mentioned above, the first category normally used semantic analysis to grouping lines segments, and they have shown relatively good performance on moderate and low spatial resolution remote sensing imagery because of its high signal noise ratio (SNR).However, for HSRRS imagery, the high spatial resolution and low SNR substantially increases the difficulties of locating and identifying the accurate building edges [39].For the second category, they have many advantages, such as a comprehensive consideration of prior knowledge, image features, pattern recognition theory and other factors.However, the related methods still have the problems of cumbersome workflow, which requires more prior knowledge, and unable to meet the practical requirements of buildings extraction from high spatial resolution images with high scene complexity.The applicability is also limited by buildings type, density, and size.Moreover, the edge of extraction results is not ideal, so it is difficult to ensure the edge integrity of complex objects.For the last category, although the accuracy of building extraction can be improved based on stereo information, it is greatly limited by multiple data sources scarcity and data misalignment.
Therefore, to overcome these limitations of single data, building structure, surrounding complexity and prior knowledge, this paper tries to detect building edges using state-of-the-art method of edge detection with deep learning, which is only based on two-dimensional HSRRS imagery, also needs no prior knowledge once the deep supervision based dataset is perfectly built.

Methodology
As shown in Figure 1, the workflow of proposed method is mainly divided into three stages.In the dataset construction stage, the initial dataset is processed by conversion, clipping, rotation, and selection into a special dataset which can be dedicated to deep-learning-based edge detection.The second stage is network training.Based on the training set, the RCF network is retrained to generate a RCF-building edges detection model.The third stage is detecting and post-processing.The edge probability map is obtained by using RCF-building model.Subsequently, the edge probability map is refined by the involved algorithm, so that the building edges are obtained.locating and identifying the accurate building edges [39].For the second category, they have many advantages, such as a comprehensive consideration of prior knowledge, image features, pattern recognition theory and other factors.However, the related methods still have the problems of cumbersome workflow, which requires more prior knowledge, and unable to meet the practical requirements of buildings extraction from high spatial resolution images with high scene complexity.
The applicability is also limited by buildings type, density, and size.Moreover, the edge of extraction results is not ideal, so it is difficult to ensure the edge integrity of complex objects.For the last category, although the accuracy of building extraction can be improved based on stereo information, it is greatly limited by multiple data sources scarcity and data misalignment.Therefore, to overcome these limitations of single data, building structure, surrounding complexity and prior knowledge, this paper tries to detect building edges using state-of-the-art method of edge detection with deep learning, which is only based on two-dimensional HSRRS imagery, also needs no prior knowledge once the deep supervision based dataset is perfectly built.

Methodology
As shown in Figure 1, the workflow of proposed method is mainly divided into three stages.In the dataset construction stage, the initial dataset is processed by conversion, clipping, rotation, and selection into a special dataset which can be dedicated to deep-learning-based edge detection.The second stage is network training.Based on the training set, the RCF network is retrained to generate a RCF-building edges detection model.The third stage is detecting and post-processing.The edge probability map is obtained by using RCF-building model.Subsequently, the edge probability map is refined by the involved algorithm, so that the building edges are obtained.

Dataset Construction
As mentioned previously, in the field of deep learning, there is no experimental HSRRS imagery dataset available to building edges detection.Therefore, this paper builds an edge based sample dataset that satisfies the training and testing requirements of the RCF network by pre-processing Massachusetts Building Dataset [79].The Massachusetts Building dataset is constructed by Mnih and publicly available at http://www.cs.toronto.edu/vmnih/data/.This dataset has a resolution of 1 m and sizes of 1500 × 1500 pixels.It contains 137 training images, 10 testing images, and four validation images between which has no intersection.Each set of data includes an original remote sensing image and a manually traced building region map, as shown in Figure 2a,b.Since the output of RCF network is based on the fusion of multi layers, RCF network is tolerable to slight overfitting.Thus, RCF network does not need validation sets.

Dataset Construction
As mentioned previously, in the field of deep learning, there is no experimental HSRRS imagery dataset available to building edges detection.Therefore, this paper builds an edge based sample dataset that satisfies the training and testing requirements of the RCF network by pre-processing Massachusetts Building Dataset [79].The Massachusetts Building dataset is constructed by Mnih and publicly available at http://www.cs.toronto.edu/vmnih/data/.This dataset has a resolution of 1 m and sizes of 1500 × 1500 pixels.It contains 137 training images, 10 testing images, and four validation images between which has no intersection.Each set of data includes an original remote sensing image and a manually traced building region map, as shown in Figure 2a,b.Since the output of RCF network is based on the fusion of multi layers, RCF network is tolerable to slight overfitting.Thus, RCF network does not need validation sets.Edge detection is different from region extraction, and the location shift of only one pixel may cause the model fail to extract features and reduce the overall precision.To ensure that there is no error occurred when convert building region to building edges, this paper proposes most peripheral constraint algorithm.With constraint of "most peripheral", it emphasizes on only extracting the outermost pixels of the building region features as building edges, and the width of edge is only one pixel.Figure 3 shows the diagram of this conversion algorithm.The steps come as follows: (1) Binarization of the building region map.Supposing the building pixel value is 1, and the nonbuilding pixel is 0; (2) Generating an image with the same size as the original image, and all the pixel values are 0.
Scanning the building region map row by row to find all pixels (marked as Pr) satisfying two conditions: the pixel value is 1, and the pixel value shifts from 1 to 0 or from 0 to 1.In the newly generated image, setting the pixel values at the same locations with Pr as 1.Thus, building edge pixels on each row are detected; (3) Generating an image with the same size as the original image, and all the pixel values are 0.
Repeating step 2 to detect all building edge pixels on each column; and (4) All building edge pixels on each row and each column are combined.Thus, the building edge is finally detected.Edge detection is different from region extraction, and the location shift of only one pixel may cause the model fail to extract features and reduce the overall precision.To ensure that there is no error occurred when convert building region to building edges, this paper proposes most peripheral constraint algorithm.With constraint of "most peripheral", it emphasizes on only extracting the outermost pixels of the building region features as building edges, and the width of edge is only one pixel.Figure 3 shows the diagram of this conversion algorithm.The steps come as follows: (1) Binarization of the building region map.Supposing the building pixel value is 1, and the non-building pixel is 0; (2) Generating an image with the same size as the original image, and all the pixel values are 0.
Scanning the building region map row by row to find all pixels (marked as P r ) satisfying two conditions: the pixel value is 1, and the pixel value shifts from 1 to 0 or from 0 to 1.In the newly generated image, setting the pixel values at the same locations with P r as 1.Thus, building edge pixels on each row are detected; (3) Generating an image with the same size as the original image, and all the pixel values are 0.
Repeating step 2 to detect all building edge pixels on each column; (4) All building edge pixels on each row and each column are combined.Thus, the building edge is finally detected.Edge detection is different from region extraction, and the location shift of only one pixel may cause the model fail to extract features and reduce the overall precision.To ensure that there is no error occurred when convert building region to building edges, this paper proposes most peripheral constraint algorithm.With constraint of "most peripheral", it emphasizes on only extracting the outermost pixels of the building region features as building edges, and the width of edge is only one pixel.Figure 3 shows the diagram of this conversion algorithm.The steps come as follows: (1) Binarization of the building region map.Supposing the building pixel value is 1, and the nonbuilding pixel is 0; (2) Generating an image with the same size as the original image, and all the pixel values are 0.
Scanning the building region map row by row to find all pixels (marked as Pr) satisfying two conditions: the pixel value is 1, and the pixel value shifts from 1 to 0 or from 0 to 1.In the newly generated image, setting the pixel values at the same locations with Pr as 1.Thus, building edge pixels on each row are detected; (3) Generating an image with the same size as the original image, and all the pixel values are 0.
Repeating step 2 to detect all building edge pixels on each column; and (4) All building edge pixels on each row and each column are combined.Thus, the building edge is finally detected.

RCF Network
The RCF network was originally proposed by Liu in 2017 [13].It was optimized on the basis of VGG16 [80] network.The input of the RCF network is an RGB image with unlimited size, and the output is the edge probability map with the same size.Figure 4 shows the architecture of RCF network when the input image size is 224 × 224 pixels.The main convolutional layers in RCF (as shown in the red dashed rectangle) are divided into five stages and the adjacent two stages are connected through the pooling layer.After the down sampling of the pooling layer, different scales of features can be extracted, and useful information can be obtained while reducing the amount of data.Different from VGG16 network, the RCF network discards all the fully connected layers as well as the fifth pooling layer, and each main convolution layer is connected to a convolution layer with kernel size 1 × 1 and channel depth 21.Then, RCF network sets an element_wise layer for accumulation after each stage.Afterwards, each element_wise layer is connected to a convolution layer with kernel size 1 × 1 and channel depth 1.The difference between the RCF network and the traditional neural network lies in: for the boundary extraction, the previous neural networks only use the last layer as the output, and lose many feature details, while the RCF network fuses the convoluted element_wise layers of each stage (convoluted element_wise layers of 2, 3, 4, and 5 stages need to be restored to its original image size by deconvolution) with the same weights to get a fusion output.This special network architecture allows the RCF network to make full use of semantic information and detailed information for edge detection.

RCF Network
The RCF network was originally proposed by Liu in 2017 [13].It was optimized on the basis of VGG16 [80] network.The input of the RCF network is an RGB image with unlimited size, and the output is the edge probability map with the same size.Figure 4 shows the architecture of RCF network when the input image size is 224 × 224 pixels.The main convolutional layers in RCF (as shown in the red dashed rectangle) are divided into five stages and the adjacent two stages are connected through the pooling layer.After the down sampling of the pooling layer, different scales of features can be extracted, and useful information can be obtained while reducing the amount of data.Different from VGG16 network, the RCF network discards all the fully connected layers as well as the fifth pooling layer, and each main convolution layer is connected to a convolution layer with kernel size 1 × 1 and channel depth 21.Then, RCF network sets an element_wise layer for accumulation after each stage.Afterwards, each element_wise layer is connected to a convolution layer with kernel size 1 × 1 and channel depth 1.The difference between the RCF network and the traditional neural network lies in: for the boundary extraction, the previous neural networks only use the last layer as the output, and lose many feature details, while the RCF network fuses the convoluted element_wise layers of each stage (convoluted element_wise layers of 2, 3, 4, and 5 stages need to be restored to its original image size by deconvolution) with the same weights to get a fusion output.This special network architecture allows the RCF network to make full use of semantic information and detailed information for edge detection.

Refinement of Edge Probability Map
The test results of the RCF network are gray-scale edge probability map, on which the greater the gray value is, the higher the probability that the pixel is on an edge.To accurately detect the building edges, it is necessary to refine edge probability map.In computer vision filed, non-maximal suppression (NMS) algorithm is a commonly used refining method.However, as observed in Figure

Refinement of Edge Probability Map
The test results of the RCF network are gray-scale edge probability map, on which the greater the gray value is, the higher the probability that the pixel is on an edge.To accurately detect the building edges, it is necessary to refine edge probability map.In computer vision filed, non-maximal suppression (NMS) algorithm is a commonly used refining method.However, as observed in Figure 5, the results show that using NMS algorithm to refine building edges has the following problems: broken outliers, isolated points and flocculent noises.5, the results show that using NMS algorithm to refine building edges has the following problems: broken outliers, isolated points and flocculent noises.Therefore, this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface.As illustrated in Figure 6, our basic idea is to regard the edge probability value as elevation, according to the principles of geometric morphology, and the points with maxima elevation (i.e., the watershed point) on the topographic profile curve are extracted as accurate edges.As described in Figure 7, the procedures of this refinement algorithm are as follows: (1) Scanning from four directions (vertical, horizontal, left diagonal, and right diagonal) to find the local maxima points as candidate points; (2) Setting a threshold to discard the candidate points whose probability is less than 0.5 (After many experiments, the highest accuracy is obtained under this threshold.For gray image, the threshold value is 120.);(3) Calculating the times that each candidate point is detected out.When a candidate point is detected at least twice, it is classified as an edge point; (4) Checking the edge points got by step (3) one by one.When there is no other edge point in an eight neighborhood, this point is determined as an isolated point and deleted; Therefore, this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface.As illustrated in Figure 6, our basic idea is to regard the edge probability value as elevation, according to the principles of geometric morphology, and the points with maxima elevation (i.e., the watershed point) on the topographic profile curve are extracted as accurate edges.
Remote Sens. 2018, 10, x FOR PEER REVIEW 7 of 19 5, the results show that using NMS algorithm to refine building edges has the following problems: broken outliers, isolated points and flocculent noises.Therefore, this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface.As illustrated in Figure 6, our basic idea is to regard the edge probability value as elevation, according to the principles of geometric morphology, and the points with maxima elevation (i.e., the watershed point) on the topographic profile curve are extracted as accurate edges.As described in Figure 7, the procedures of this refinement algorithm are as follows: (1) Scanning from four directions (vertical, horizontal, left diagonal, and right diagonal) to find the local maxima points as candidate points; (2) Setting a threshold to discard the candidate points whose probability is less than 0.5 (After many experiments, the highest accuracy is obtained under this threshold.For gray image, the threshold value is 120.);(3) Calculating the times that each candidate point is detected out.When a candidate point is detected at least twice, it is classified as an edge point; (4) Checking the edge points got by step (3) one by one.When there is no other edge point in an eight neighborhood, this point is determined as an isolated point and deleted; As described in Figure 7, the procedures of this refinement algorithm are as follows: (1) Scanning from four directions (vertical, horizontal, left diagonal, and right diagonal) to find the local maxima points as candidate points; (2) Setting a threshold to discard the candidate points whose probability is less than 0.5 (After many experiments, the highest accuracy is obtained under this threshold.For gray image, the threshold value is 120.);(3) Calculating the times that each candidate point is detected out.When a candidate point is detected at least twice, it is classified as an edge point; (4) Checking the edge points got by step (3) one by one.When there is no other edge point in an eight neighborhood, this point is determined as an isolated point and deleted; (5) Generating edge mask map based on the edge point map got by step (4) to refine the edge probability map and obtain the final edge refinement map.
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 19 (5) Generating edge mask map based on the edge point map got by step (4) to refine the edge probability map and obtain the final edge refinement map.

Experiments and Analysis
The experimental environment for RCF network re-training and testing is the Caffe framework [81] in Linux system with support of NVIDIA GTX1080 GPU.The learning rate refers to the rate of descent to the local minimum of the cost function, and the initial learning rate is 1 × 10 −7 .Every ten thousand iterations, the learning rate will be divided by 10 in training process.The experimental data are the self-processed Massachusetts Building-edge dataset which has been introduced in Section 3.1.

Experimental Results
In this paper, a trained model generated by 40,000 iterations is selected to extract the building edges.Some example of the building edges detection results are shown in Figure 8(e1-e3).From the visual perspective, the RCF-based building edges detection method adapts to the background very well.As can be seen from the third line of data (Figure 8(a3,b3,c3,d3,e3), which are highlighted by red rectangle, the fine-tuned RCF-building model can not only detect building edges correctly, but also extract building edges that the human unrecognized.Additionally, the refinement results of involved refinement algorithm (Figure 8(e1-e3) are experimentally compared with the results of NMS algorithm (Figure 8(d1-d3).There are less isolated points and flocculent noises in the building edges detection results by the involved refinement algorithm.

Experiments and Analysis
The experimental environment for RCF network re-training and testing is the Caffe framework [81] in Linux system with support of NVIDIA GTX1080 GPU.The learning rate refers to the rate of descent to the local minimum of the cost function, and the initial learning rate is 1 × 10 −7 .Every ten thousand iterations, the learning rate will be divided by 10 in training process.The experimental data are the self-processed Massachusetts Building-edge dataset which has been introduced in Section 3.1.

Experimental Results
In this paper, a trained model generated by 40,000 iterations is selected to extract the building edges.Some example of the building edges detection results are shown in Figure 8(e1-e3).From the visual perspective, the RCF-based building edges detection method adapts to the background very well.As can be seen from the third line of data (Figure 8(a3,b3,c3,d3,e3), which are highlighted by red rectangle, the fine-tuned RCF-building model can not only detect building edges correctly, but also extract building edges that the human unrecognized.Additionally, the refinement results of involved refinement algorithm (Figure 8(e1-e3) are experimentally compared with the results of NMS algorithm (Figure 8(d1-d3).There are less isolated points and flocculent noises in the building edges detection results by the involved refinement algorithm.

Precision and Recall Evaluation
In this paper, inspired by references [82][83][84], we used recall, precision, and F-measure as the criteria for RCF-building model.The evaluation indices can be descripted by Equations ( 1)-( 3): where true positive (TP) represents the number of coincident pixel between detected edges and referenced building edges of ground truth.False positive (FP) represents the number of non-coincident pixel between detected edges and referenced building edges of ground truth.False negative (FN) represents the number of non-coincident pixel between detected non-building objects and non-building edges in the referenced ground truth.F-measure is a synthetic measurement of precision and recall.Actually, the precision and the recall are two contradictory measurements.Generally, they are negatively correlated [85,86].Based on recall and precision, the precision-recall curve (P-R curve) can be drawn.As shown in Figure 9, it can be noted that our RCF-building model has an F-measure of 0.89 on the test set, which is higher than the 0.51 from the original RCF network.In addition, compared with the original RCF network, the precision of RCF-building model increases at least 45%.It means that the retraining RCF network has the function of recognizing the edges of buildings.The generated RCF-building model can exclusively detect the building edges, and effectively avoid the superfluous objects edges.

Precision and Recall Evaluation
In this paper, inspired by references [82][83][84], we used recall, precision, and F-measure as the criteria for RCF-building model.The evaluation indices can be descripted by Equations ( 1 where true positive (TP) represents the number of coincident pixel between detected edges and referenced building edges of ground truth.False positive (FP) represents the number of noncoincident pixel between detected edges and referenced building edges of ground truth.False negative (FN) represents the number of non-coincident pixel between detected non-building objects and non-building edges in the referenced ground truth.F-measure is a synthetic measurement of precision and recall.Actually, the precision and the recall are two contradictory measurements.Generally, they are negatively correlated [85,86].Based on recall and precision, the precision-recall curve (P-R curve) can be drawn.As shown in Figure 9, it can be noted that our RCF-building model has an F-measure of 0.89 on the test set, which is higher than the 0.51 from the original RCF network.In addition, compared with the original RCF network, the precision of RCF-building model increases at least 45%.It means that the retraining RCF network has the function of recognizing the edges of buildings.The generated RCF-building model can exclusively detect the building edges, and effectively avoid the superfluous objects edges.

Comparison with Other Building Extraction Methods
In this paper, four remote sensing images with different characteristics from the testing set are selected to compare the performance of our method with other three representative building detection methods.Figure 10 illustrates the visual results of our method, OBIA-based ENVI

Ablation Experiment
To verify the effectiveness of different steps of the proposed method, this paper compares the performance of RCF model trained by the self-processing dataset (Massachusetts Building-edge dataset) with the RCF model trained by Canny algorithm [89] converted dataset on all testing set.We also quantitatively compare the performance of the involved edge refining algorithm with NMS edge refining algorithm.Table 3 lists the evaluation results of different pre-processing and post-processing methods.Our methods present the best performance in the Precision, Recall and F-measure.The experimental results verify the effectiveness of proposed conversion algorithm for dataset pre-processing, which proves that the superior dataset has positive influence on RCF network.Furthermore, comparison results also reveal that the good performance of our approach takes the advantage of the involved refinement algorithm.For all set, the refining algorithm presented in this paper has better performance.

Influence of the RCF Fusion Output
To explore why RCF-building can recognize the edge of building, this paper compares the average Precision, Recall and F-measure values of all testing set imagery at each stage of network.As shown in Figure 11, with the deepening of the network, the precision and recall value rises gradually during the first three stages, and then the precision and recall value descend (or roughly descend) during the fourth and fifth stage.During the first three stages, the network gradually learns the characteristics of the building edge, so the precision and recall of the detected building edges increase gradually.However, during the fourth and fifth stages, the network is overfitting and regards the characteristics of one training sample as the general nature of all the potential samples.This phenomenon of reduced generalization performance eventually leads to the failure of detecting some parts of the building edges.On the other hand, the overfitting of edge detection is different from the overfitting in other fields, which means after overfitting, if one pixel is judged as edge, the probability of actually being edges is higher.Above all, to make full use of the information generated at each stage, the RCF network utilizes a special architecture that the traditional neural networks do not have: the fusion output layer.The fusion output layer fuses all the output of each stage with the same weight, so that it can perfectly inherit the advantages of each stage and suppress the useless information at first two stages.Thus, the fusion output guarantees the highest precision and recall value.Take a test image as an example, the output of the each stage and fusion output images are shown in Figure 12.It is clear that with the deepening of the network stages, the model can gradually extract the edge of the building and eliminate the edges of other superfluous objects, but in the fourth or fifth stage, the edge of the building cannot be completely extracted in the image.The visual result of the fusion output image has the best performance, and the edge of the building can be extracted completely and accurately compared with other stages output.Therefore, RCF's special fusion output architecture makes it suitable for building edges extraction from high resolution remote sensing images.

Conclusions
This paper proposes a method for detecting building edges from HSRRS imagery based on the RCF network.The highlights of this work are listed as follows:

•
The RCF network is firstly combined with HSRRS imagery to detect building edges and then an RCF-building model that can accurately and comprehensively detect the building edges is built.Compared to the traditional building edge extraction method, the method used in this paper can make use of high-level semantic information and can get a higher accuracy evaluation value and better visual effects.Compared to deep-learning-based building extraction methods, RCFbuilding could better retain the corner part building edges.In addition, this paper also analyzes the influence of the RCF fusion output architecture on the building edges detection accuracy, and the precision and recall lines affirm that this unique architecture of RCF can perfectly inherit the advantages of each stage and has a strong applicability to the detection of building edges.

•
In the preprocessing stage, on the basis of Massachusetts Building dataset, we proposed the most peripheral constraint edge conversion algorithm and created the Massachusetts Building-edge dataset specifically for deep-learning-based building edges detection.The comparison result Take a test image as an example, the output of the each stage and fusion output images are shown in Figure 12.It is clear that with the deepening of the network stages, the model can gradually extract the edge of the building and eliminate the edges of other superfluous objects, but in the fourth or fifth stage, the edge of the building cannot be completely extracted in the image.The visual result of the fusion output image has the best performance, and the edge of the building can be extracted completely and accurately compared with other stages output.Therefore, RCF's special fusion output architecture makes it suitable for building edges extraction from high resolution remote sensing images.Take a test image as an example, the output of the each stage and fusion output images are shown in Figure 12.It is clear that with the deepening of the network stages, the model can gradually extract the edge of the building and eliminate the edges of other superfluous objects, but in the fourth or fifth stage, the edge of the building cannot be completely extracted in the image.The visual result of the fusion output image has the best performance, and the edge of the building can be extracted completely and accurately compared with other stages output.Therefore, RCF's special fusion output architecture makes it suitable for building edges extraction from high resolution remote sensing images.

Conclusions
This paper proposes a method for detecting building edges from HSRRS imagery based on the RCF network.The highlights of this work are listed as follows:

•
The RCF network is firstly combined with HSRRS imagery to detect building edges and then an RCF-building model that can accurately and comprehensively detect the building edges is built.Compared to the traditional building edge extraction method, the method used in this paper can make use of high-level semantic information and can get a higher accuracy evaluation value and better visual effects.Compared to deep-learning-based building extraction methods, RCFbuilding could better retain the corner part building edges.In addition, this paper also analyzes the influence of the RCF fusion output architecture on the building edges detection accuracy, and the precision and recall lines affirm that this unique architecture of RCF can perfectly inherit the advantages of each stage and has a strong applicability to the detection of building edges.

•
In the preprocessing stage, on the basis of Massachusetts Building dataset, we proposed the most peripheral constraint edge conversion algorithm and created the Massachusetts Building-edge dataset specifically for deep-learning-based building edges detection.The comparison result

Conclusions
This paper proposes a method for detecting building edges from HSRRS imagery based on the RCF network.The highlights of this work are listed as follows: • The RCF network is firstly combined with HSRRS imagery to detect building edges and then an RCF-building model that can accurately and comprehensively detect the building edges is built.Compared to the traditional building edge extraction method, the method used in this paper can make use of high-level semantic information and can get a higher accuracy evaluation value and better visual effects.Compared to deep-learning-based building extraction methods, RCF-building could better retain the corner part building edges.In addition, this paper also analyzes the influence of the RCF fusion output architecture on the building edges detection accuracy, and the precision and recall lines affirm that this unique architecture of RCF can perfectly inherit the advantages of each stage and has a strong applicability to the detection of building edges.

•
In the preprocessing stage, on the basis of Massachusetts Building dataset, we proposed the most peripheral constraint edge conversion algorithm and created the Massachusetts Building-edge dataset specifically for deep-learning-based building edges detection.The comparison result shows that the dataset produced by the most peripheral constraint algorithm can effectively improve the performance of RCF-building model, and affirms the positive impact of accurately labeled data on network training.This Massachusetts Building-edge dataset makes the foundation for future research on deep-learning-based building edges detection.

•
In the post-processing stage, this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface.Compared to the NMS algorithm, the involved refinement algorithm could balance the precision and recall value, and get a higher F-measure.It can preserve the integrity of the building edges to the greatest extent and reduce noise points.However, there are still some broken lines, as well as some discontinuities in the detected building edges results after the post-processing.
Additionally, it is worth noting that building edges detection is not the terminal goal of building extraction from HSRRS imagery.The future work will include: (1) connection of the broken edges of the building; (2) vectorization of building edges features; (3) the improvement of RCF network architecture; and (4) using various strategies to ensure that large images can be processed in memory [90].

Figure 2 .
Figure 2. Dataset sample.(a) Original image; (b) building region map; and (c) building edges ground truth map.

Figure 3 .
Figure 3. Diagram of conversion from building region into building edges.

Figure
Figure 2c shows the conversion result of Figure 2b.After conversion, in order to improve the accuracy of the training network, we augment the data by rotating the imagery by 90, 180, and 270 degrees.Meanwhile, to avoid memory overflow and invalid imagery, this paper ultimately constructs the dataset after image clipping and choosing.The final dataset contains 1856 training images with size of 750 × 750 pixels and 56 testing images with size of 750 × 750 pixels, named Massachusetts Building-edge dataset.

Figure 2 .
Figure 2. Dataset sample.(a) Original image; (b) building region map; and (c) building edges ground truth map.

Figure 2 .
Figure 2. Dataset sample.(a) Original image; (b) building region map; and (c) building edges ground truth map.

Figure 3 .
Figure 3. Diagram of conversion from building region into building edges.

Figure 2c shows the
Figure 2c shows the conversion result of Figure 2b.After conversion, in order to improve the accuracy of the training network, we augment the data by rotating the imagery by 90, 180, and 270 degrees.Meanwhile, to avoid memory overflow and invalid imagery, this paper ultimately constructs the dataset after image clipping and choosing.The final dataset contains 1856 training images with size of 750 × 750 pixels and 56 testing images with size of 750 × 750 pixels, named Massachusetts Building-edge dataset.

Figure 3 .
Figure 3. Diagram of conversion from building region into building edges.

Figure
Figure 2c shows the conversion result of Figure 2b.After conversion, in order to improve the accuracy of the training network, we augment the data by rotating the imagery by 90, 180, and 270 degrees.Meanwhile, to avoid memory overflow and invalid imagery, this paper ultimately constructs the dataset after image clipping and choosing.The final dataset contains 1856 training images with size of 750 × 750 pixels and 56 testing images with size of 750 × 750 pixels, named Massachusetts Building-edge dataset.

Figure 4 .
Figure 4. Overview of the RCF network architecture.

Figure 4 .
Figure 4. Overview of the RCF network architecture.

Figure 5 .
Figure 5.The refinement results by NMS algorithm.

Figure 5 .
Figure 5.The refinement results by NMS algorithm.

Figure 5 .
Figure 5.The refinement results by NMS algorithm.

Figure 7 .
Figure 7. Workflow of edge probability map refinement.

Figure 7 .
Figure 7. Workflow of edge probability map refinement.

Figure 9 .
Figure 9.The P-R curves.The solid curve is the result of proposed RCF-building on the test set and the dotted one is the original RCF network.

Figure 9 .
Figure 9.The P-R curves.The solid curve is the result of proposed RCF-building on the test set and the dotted one is the original RCF network.

Figure 11 .
Figure 11.Comparison of precision, recall and F-measure of the output maps at different stages.

Figure 11 .
Figure 11.Comparison of precision, recall and F-measure of the output maps at different stages.

19 Figure 11 .
Figure 11.Comparison of precision, recall and F-measure of the output maps at different stages.

Table 3 .
The performance of training set generated by different conversion methods and performance comparison of different refinement algorithms.