Automatic Building Detection from High-Resolution Remote Sensing Images Based on Joint Optimization and Decision Fusion of Morphological Attribute Proﬁles

: High-resolution remote sensing (HRRS) images, when used for building detection, play a key role in urban planning and other ﬁelds. Compared with the deep learning methods, the method based on morphological attribute proﬁles (MAPs) exhibits good performance in the absence of massive annotated samples. MAPs have been proven to have a strong ability for extracting detailed characterizations of buildings with multiple attributes and scales. So far, a great deal of attention has been paid to this application. Nevertheless, the constraints of rational selection of attribute scales and evidence conﬂicts between attributes should be overcome, so as to establish reliable unsupervised detection models. To this end, this research proposes a joint optimization and fusion building detection method for MAPs. In the pre-processing step, the set of candidate building objects are extracted by image segmentation and a set of discriminant rules. Second, the differential proﬁles of MAPs are screened by using a genetic algorithm and a cross-probability adaptive selection strategy is proposed; on this basis, an unsupervised decision fusion framework is established by constructing a novel statistics-space building index (SSBI). Finally, the automated detection of buildings is realized. We show that the proposed method is signiﬁcantly better than the state-of-the-art methods on HRRS images with different groups of different regions and different sensors, and overall accuracy (OA) of our proposed method is more than 91.9%. Author Contributions: Conceptualization, C.W.; methodology, C.W. and Y.Z.; software, Y.Z.; valida-tion, X.C., Y.Z., and H.J.; formal analysis, Y.Z. and S.W.; investigation, M.M. and H.J.; resources, C.W.; writing—original draft preparation, Y.Z.; writing—review and editing, C.W.; visualization, C.W. Y.Z.; supervision, C.W., X.C., and H.J.; project administration, C.W.


Introduction
With the rapid development of earth observation technology, building detection based on high-resolution remote sensing (HRRS) images has been one of the research hotspots in the field of remote sensing [1]. Remote sensing images have the characteristics of wide coverage, strong timeliness, and a large amount of obtainable information, which are helpful for cognition and interpretation of geographical targets. Buildings occupy an important position in the area of human activities. The spatial characteristics and distribution of urban buildings represent important basic data for urban construction management, such as national survey monitoring, urban and rural planning management, real estate management [2], etc. The study of automatic high-precision detection of buildings on remote sensing images is significant for further developing remote sensing image informa-tion mining technology, and promoting its applications in digital cities and other related fields [3].
Compared with the traditional medium and low-resolution remote sensing images, the HRRS images contain a wealth of spatial structure information, which is conducive to the fine description of buildings in the complex urban scene. On the other hand, the low signal to noise ratio (SNR) of HRRS images limits the detection accuracy. In addition, buildings are often hedged in by other artificial or natural geographic objects due to their complex structures. Moreover, there may be significant differences even between buildings in the same area. All of these negative factors cause difficulties in implementing high-precision, reliable building detection with HRRS images [4].
In recent years, morphological attribute profiles (MAPs) have been proven to have a strong ability to detect buildings in complex urban backgrounds, which has been one of the most effective spatial structure modelling methods for HRRS images. The morphological feature set of local area constructed by MAPs can be used to realize the multi-attribute and multi-scale expression of different ground objects, thus significantly improving the separability of buildings and other ground objects [5][6][7]. However, the following limitations must be overcome to realize high-precision, unsupervised building detection based on MAPs: (1) The potential building pixels are directly determined by the differential attribute profiles (DAPs) extracted from the differential of neighboring attribute profiles (APs), and morphological attribute profile (MAP) theory does not give a scale parameter setting using clear rules, so the requirement according to the scale of the original image is used to construct (on an adaptive basis) a reasonable parameter set. If the interval between the scales is too large, it is difficult to describe different types of buildings with different attributes. Otherwise, it is easy to retain too many other feature pixels with similar attributes to buildings in the detection results. (2) As a basis for determining whether a pixel belongs to the building, the DAPs extracted by different attributes may give opposite conclusions, and the experimental results in this article verify that it is difficult to achieve reliable detection results for the common practice of taking the union of all attributes and scales of DAPs, design of effective decision rules is needed to deal with this evidential conflict. (3) Buildings should be a type of geographical objects with closed contours, and how to automatically convert potential building pixels extracted based on MAPs into object-level building detection results is another challenge to be tackled.
In response to these challenges, we propose an automatic building detection method from HRRS images based on the joint optimization and decision fusion of MAPs. The contributions of this study can be summarized as follows: (1) A new adaptive cross-probability genetic algorithm based on DAPs (ACGA-DAPs) is proposed to detect the pixels of potential buildings by transforming the scale parameter selection of MAPs into the joint optimization of multi-attribute DAPs. To meet the application requirements of building detection, a wide range of scale parameter values and tight sampling intervals are set and traversed to ensure that the initial DAPs can extract the property details of the building. Based on this, the genetic algorithm (GA) is introduced to optimize the DAPs with different attributes, and a cross-probability adaptive selection strategy is proposed. The constructed ACGA-DAPs are helpful in significantly improving the detection accuracy of buildings.
(2) Based on ACGA-DAPs and image segmentation results, we propose an unsupervised decision fusion framework, which bridges the gap between potential building pixels and object-level building detection results. Among them, this framework combines statistics and spatial information to construct a novel statistics-space building index (SSBI), finally realizing the automatic detection of building sets.
The rest of the paper is organized as follows: Section 2 reviews the relevant literature on building detection, and introduces MAP theory; Section 3 presents the implementation steps of the proposed method in detail; in Section 4, the experimental results are evaluated; Section 5 discusses the setting of proportion parameters; and in Section 6 conclusions and future lines of research are summarized.

Building Detection from Remote Sensing Images
Building detection from remote sensing images can be implemented by combining artificial interpretation and field investigation. However, these methods require a great deal of manpower and material resources, and are of very low detection efficiency. In recent years, extensive building detection research-in regard to both theory and methods-have been undertaken, such as demolished building detection from aerial imagery using deep learning [8], automatic building extraction with rooftop detectors [9], etc. Considering the particularity of deep neural structures, we divide the existing methods into deep learning methods and non-deep learning methods.

Deep Learning Methods
Deep learning technology is based on the biological understanding and has a strong impact in the field of remote sensing image processing. Deep learning has been proven to have a strong ability of concentrating on the essential building characteristics of the dataset from non-annotated samples [10].
Many scholars have conducted various deep network structures in the building detection application. Hamed et al. [11] proposed a building detection approach based on deep learning using the fusion of light detection and ranging (LiDAR) data and orthophotos. The convolutional neural network (CNN) was adopted in this article to transform compressed features into high-level features, which were used to distinguish buildings from backgrounds. Wang et al. [12] proposed a fully convolutional dense connection network to better learn the rich architectural features. The innovative design of top-down short connections promotes the fusion of high and low feature information. Since the first version of DeepLab model was released in 2015, Google has evolved and expanded to DeepLab V3 +. This model further applies deep dividable volume to the atrous spatial pyramid pooling (ASPP) and decoder modules, resulting in a faster and more powerful semantic segmentation encoder-decoder network [13].
Despite this, deep learning requires an abundance of annotated samples to participate in the training of the model; otherwise, overfitting will occur, which restricts the feasibility and effectiveness of such methods in practical application [14].

Non-Deep Learning Methods
Since the number of annotated samples is often limited, which negatively affects the building detection performance in deep learning, a variety of non-deep learning building detection methods have been proposed.
Building indexes can effectively describe the characteristics of buildings from different aspects, which have been widely used in building detection application. You et al. [15] proposed a scale-invariant feature point detection method considering the multi-scale and multi-directional texture characteristics of the stacking area. In this article, the traditional morphological building index (MBI) was applied to the extracted built-up area, and then the threshold segmentation of MBI feature images was carried out to obtain the results. Bi et al. [16] proposed a multi-channel multi-scale filtering building index (MMFBI) to overcome the drawbacks of MBI. This index is helpful to fully utilize the relatively little spectral information in HRRS images. However, these methods require appropriate thresholds to obtain the final results, and are always limited by the threshold method.
In addition, many scholars have conducted in-depth research on the application of MAP in building detection. Hu et al. [17] proposed a method of combining the new alternating sequential filters (NASFs) strategy with MAPs for building detection from high-resolution synthetic aperture radar (SAR) images. Wang et al. [18] proposed a novel adaptive morphological attribute profile under the object boundary constraint (AMAP-OBC) method. By investigating the associated attributes in MAPs, this method established corresponding relationships between AMAP-OBC and building characteristics in HRRS images. Compared with the building index, MAP adopts multi-category and multi-scale attributes as proofs of building detection, and can obtain more reliable results.
Most of the existing MAP research directly optimizes APs and ignores the information redundancy and evidence conflict between DAPs. As described in Section 1 of this paper, these processing strategies will bring some specific problems in building detection. To this end, we propose a MAP method based on the joint optimization and decision fusion of MAPs in this paper.

MAP Theory and Constitution of Attribute Set
This developed from traditional morphological filtering, MAP theory has a powerful ability to portray geographical objects in fine detail across different scales and different attributes from different angles. At present, MAP theory is widely used in the classification and change detection of HRRS images. MAP uses a Max-Tree structure to represent the image and performs attribute thickening and thinning operations based on the given set of scale parameters N, to evaluate the attribute values of the connecting components in the image. The basic processing flow is as follows: For a given grey-scale image M, let j be any pixel and Bn j n (M) be a binary image determined by the scale parameter n ∈ N. The thickening operation profile ϕ j (M) and the thinning operation profile θ j (M) can be obtained by Equations (1) and (2), respectively: By traversing all the scale parameters, the set of thickening and thinning operations can be extracted. On this basis, the difference operation is carried out for the adjacent scale sections, and the DAPs are represented as follows: Therefore, by treating M as being superimposed by Bn j n (M), the specific attribute characteristics in different scale profiles can be enhanced, and then the corresponding geographic objects can be extracted through DAPs.
Here, four attributes of area, diagonal, standard deviation, and normalized moment of inertia (NMI) are adopted to the fine description of the building. The reasons are as follows: an area attribute can describe the size of a building; the diagonal describes the length of the building's shape; standard deviation can describe the complexity of building texture. NMI reflects the mass distribution of the building and has the advantage of invariance of translation, rotation, and zoom. Studies have shown that the combination of the above four attributes will endow buildings and other ground objects with strong interclass separability [19].

Method
The implementation of the proposed method mainly included three steps: data preprocessing, ACGA-DAPs extraction based on multi-attribute joint optimization, and the construction of an unsupervised decision fusion framework. The implementation process is shown in Figure 1.

Image Segmentation by WJSEG
In the data pre-processing step, the original image is first segmented to divide the discrete pixels into connected sets of pixels with semantic information, thus providing the basic analysis unit for subsequent building detection. At the same time, during the construction of MAPs, object boundaries are used to determine the connectivity domain for thickening and thinning operations so that the calculated results reflect the properties of the actual geographical objects.
For this reason, we adopted wavelet J-Segmentation (WJSEG), a HRRS image segmentation method for urban scenes. This method profitably maintains the integrity of the object contour, and there is no false "narrow, long unit" arising in the segmentation results when using mainstream commercial software Ecognition [20,21]. WJSEG mainly includes several steps, such as multi-band fusion, seed region initialization and secondary extraction, region merging, etc. The specific implementation process is detailed elsewhere [20]. It should be pointed out that the segmentation method adopted in this study was not

Image Segmentation by WJSEG
In the data pre-processing step, the original image is first segmented to divide the discrete pixels into connected sets of pixels with semantic information, thus providing the basic analysis unit for subsequent building detection. At the same time, during the construction of MAPs, object boundaries are used to determine the connectivity domain for thickening and thinning operations so that the calculated results reflect the properties of the actual geographical objects.
For this reason, we adopted wavelet J-Segmentation (WJSEG), a HRRS image segmentation method for urban scenes. This method profitably maintains the integrity of the object contour, and there is no false "narrow, long unit" arising in the segmentation results when using mainstream commercial software Ecognition [20,21]. WJSEG mainly includes several steps, such as multi-band fusion, seed region initialization and secondary extraction, region merging, etc. The specific implementation process is detailed elsewhere [20]. It should Remote Sens. 2021, 13, 357 6 of 22 be pointed out that the segmentation method adopted in this study was not limited to WJSEG, which meant that the use of other methods did not affect the implementation of the subsequent building detection phase.

Non-Building Pre-Screening
In the image segmentation results, there must be objects with significant feature differences such as vegetation, vehicles, and other buildings. The elimination of such objects in the pre-processing stage is not only helpful in reducing the computational burden but can also avoid the possibility of subsequent false detections.
At present, scholars have proposed many preliminary screening strategies for nonbuilding objects. This article adopted the four discriminant rules proposed in the literature [18]: shadow index, normalized difference vegetation index (NDVI), area index and rectangularity. The objects rejected in the initial screening are not considered in the subsequent building detection, while the remaining objects constitute the candidate building set R cdi .

ACGA-DAPs Extraction Based on Multi-Attribute Joint Optimization
The premise of constructing MAPs is to determine the set of scale parameters corresponding to different attributes, and the setting of scale parameters is one of the key factors affecting the accuracy of building detection. However, MAPs only realize the quantitative expression of morphological attributes, and the DAPs obtained by subtracting APs of adjacent scales are the basis for identifying potential building pixels. Therefore, it is difficult to objectively evaluate the rationality of scale parameter selection by directly using traditional measurement methods such as mutual information between scales of MAPs. For this reason, we proposed to transform the scale parameter selection problem of MAPs into the joint optimization problem of multi-attribute DAPs. By using fixed adjacent scale spacing to fully extract the morphological attribute features contained in the original image, the improved genetic algorithm is used to carry out multi-attribute joint optimization screening of the difference features.

Candidate Object Set of DAPs
In the process of MAPs extraction, a wide range of values and a tight sampling interval are set for each attribute, and then a complete set of MAPs as generated by traversing all the scale parameters within the interval. The purposes are to expand MAPs with small sampling interval, to increase computation, and obtain a complete representation of scene spatial structure.
To this end, according to other suggestions [22,23], the area, diagonal, standard deviation, and NMI values were set to [500, 28000], [10,100], [10,70], and [0.2, 0.5]: each of the four attributes extracted 50 scale parameters at equal intervals, resulting in a total of 200 scales of MAPs for the four attributes [18]. On this basis, the initial DAPs set was obtained by applying those differences to all adjacent scales, as defined by DAPs cdi .

ACGA-DAPs
Since DAPs cdi , GA is used to screen out representative DAPs sequences in different attributes and a novel ACGA-DAPs is proposed. The specific steps are as follows: Step 1: for DAPs belonging to the same attribute in DAPs cdi , first random sampling with replacement is performed to obtain the set of Q DAPs corresponding to the attribute.
Step 2: calculate the fitness f (D) by Equation (4): where d represents the difference index of the two DAPs in set D, and d is the difference index of the two DAPs from different sets. d can be calculated by Equation (5): where H represents the information entropy of a DAP, and the mutual information of two DAPs is MI.
Step 3: keep the set of DAPs corresponding to the minimum fitness, defined as D min . According to the roulette wheel selection (RWS) method [24], reselect Q-1 DAPs. On this basis, the one-point crossover method is used to perform pair-wise cross-over operations on the sets of D min and Q-1 DAPs, and set the cross-over probability P c [25]. At this time, the RWS method is adopted to re-select Q DAPs sets, and D min is updated to D min according to Step 2. Among them, whether the setting of P c is reasonable will significantly affect the genetic performance, which is reflected in: if P c is too large, the model may be completely ineffective; otherwise, it may fall into local optimality. To this end, the distance distribution matrix S of all DAPs sets is calculated: where s represents the distance between two sets, and the minimum distance set s min of each row can be obtained. On this basis, let the fitness corresponding to the maximum distance be f m , and the cross-over probability P c can be adaptively determined as: where f max , f min , and f avg are the maximum, minimum, and average fitness of Q DAPs sets, respectively.
Step 4: Steps 2 and 3 are repeated to obtain the representative DAPs corresponding to the current attribute. Four attributes are traversed, and all DAPs screened jointly constitute ACGA-DAPs.

Construct an Unsupervised Decision Fusion Framework
In practical application, buildings are a type of geographical object with complete contours, while ACGA-DAPs only provides the detection results of potential buildings at pixel level. On the other hand, the traditional decision-making method of taking the union directly for the DAPs corresponding to different attributes ignores the evidential conflict and redundant information. Therefore, based on Dempster-Shafer (D-S) evidence theory, this paper proposed an unsupervised decision fusion framework combining ACGA-DAPs and image segmentation [5].

Identification Framework Based on D-S Theory
As an uncertain reasoning method, D-S evidence theory can not only satisfy the weaker condition than Bayesian probability theory, but also has a powerful ability to deal with uncertain information directly. Denote I as the total number of R cdi and define the recognition framework U : {B, NB} for any object R i (i = 1, 2, 3, . . . , I), where B and NB represent building and non-building, respectively. Thus, the non-empty subset A of U is desirable {B}, {NB} and {B, NB}. We define the basic probability assignment formula (BPAF) as m: 2 U → [0, 1] , and satisfy the following constraints: Let the total number of DAPs in ACGA-DAPs be K, then the synthesis rules of K m functions m 1 , m 2 , . . . , m K are as follows:

Calculation of SSBI
In the decision fusion framework, it is required to quantify the degree of uncertainty belonging to buildings (or non-buildings) when constructing BPAF. For this reason, we combined statistics and spatial information to construct a novel SSBI. The SSBI is calculated as follows: Step 1: calculation of statistical indicators Dpro and Dpro . According to the proportion of building pixels in all objects, fuzzy C-means (FCM) method is first used to determine the two proportion parameters ν B and ν NB , which correspond to the clustering center of building and non-building objects, respectively. On this basis, Dpro and Dpro are, respectively, defined as: where ν i denotes the building pixel ratio of R i in a DAP. Dpro and Dpro , respectively, represent that R i is a building and a non-building object (the smaller the distance the higher the likelihood).
Step 2: calculation of spatial information indices Dspa and Dspa . Since the center of mass reflects the center of mass distribution of an object in space, this paper holds that the closer the distance between the pixel and the center of mass, the more reliable it will be as proof of building detection. Based on this assumption, let W B and W NB be the number of building pixels and non-building pixels in R i , respectively. Dspa and Dspa of R i can be calculated by: where s w and s w represent the distance from the center of mass of a pixel belonging to a building or non-building.

BPAF and Discrimination Rules
Through all R i , BPAF is constructed according to Equations (8) to (15) as follows: where γ ate is a confidence factor. γ ate is designed to cope with the problem that there may be an imbalance in the number of DAPs belonging to four different attributes. Let g t (t = 1, 2, 3, 4) be the number of DAPs for each of the four attributes. Then γ ate can be calculated from Equation (17).

Experiments and Evaluation
Three datasets of HRRS images of urban scenes in different regions and with different spatial resolutions are used in the experiment. Compared with various advanced building detection methods, the proposed method is found to have excellent performance by combining quantitative and visual analysis for accuracy evaluation.

Dataset Description
Dataset 1 was a pan-sharpened WorldView image with red, green, and blue bands of Chongqing, China; the acquisition date was August 2011, the spatial resolution was 0.5 m, and the size was 1370 pixels × 1370 pixels, as shown in Figure 2a. Dataset 2 was an aerial remote sensing image with red, green, and blue bands of Nanjing, China; the acquisition date was October 2011, the spatial resolution was 2 m, and the image size was 300 pixels × 500 pixels, as shown in Figure 2b. Dataset 3 was a WorldView pan-sharpened image with red, green, and blue bands of Nanjing, China; the acquisition date was December 2012, the spatial resolution was 0.5 m, and the image size was 1400 pixels × 1400 pixels, as shown in Figure 2c. In addition, as the basis for accuracy evaluation, the ground truth map was manually created by field survey and visual interpretation, where white objects represent buildings and black objects represent non-buildings. Some representative areas marked in red boxes (patches I1, I3, and I5) and blue boxes (patches I2, I4, and I6) in Figure 2 were chosen for more detailed comparison and analysis.
As shown in Figure 2, the three datasets are all typical urban scenes composed of buildings, roads, vegetation, shadows, and other features, but at the same time, there are significant differences in the image lighting conditions, acquisition time, and imaging side view. In addition, the buildings in Dataset 1 are mainly low-rise residential buildings and regularly-shaped factory buildings; in Dataset 2, there are many high-rise buildings and in Dataset 3, the geometric shapes of old commercial buildings to be demolished are irregular. Therefore, experiments on these datasets are helpful to reflect the detection performance of the algorithm in real urban scenes from different angles.

Experimental Set-Up
To evaluate the performance of the proposed method comprehensively and objectively, we used six advanced building detection methods for comparative experiments: based on adaptive MAP S method (Method 1) [18]; the grey-level co-occurrence matrix (GLCM) and support vector machine (SVM) based method (Method 2) [26]; the top-hat filter and K-means classification based method (method 3) [27]; based on a DeepLab V3+ network method (Methods 4 and 5) [13], combine the Otsu method and the evidence fusion strategy proposed in this paper, respectively, to obtain the object-level building detection results. By comparing this with Method 1, it is helpful when analyzing the optimization strategies of different DAPs. Method 2 belongs to the common machine learning method. Method 3 is an automatic detection method based on building descriptors. Methods 4 and 5 are deep learning methods. By comparing with these methods, it is helpful to assess the performance of the proposed methods in general. In addition, the further to investigate the influence of ACGA-DAPs on the detection performance of buildings separately, Method 6 only replaces ACGA-DAPs with DAPs cdi , and the other steps are the same as the proposed method. Meanwhile, to ensure consistency of the geographic object set, all comparison methods are based on the segmentation results of WJSEG to achieve the object-level building detection results. Finally, based on the initialization parameters of the improved GA model, we adopt the recommendation made elsewhere [28], and take Q = 20, and use 500 iterations. As shown in Figure 2, the three datasets are all typical urban scenes composed of buildings, roads, vegetation, shadows, and other features, but at the same time, there are significant differences in the image lighting conditions, acquisition time, and imaging side view. In addition, the buildings in Dataset 1 are mainly low-rise residential buildings and regularly-shaped factory buildings; in Dataset 2, there are many high-rise buildings and in Dataset 3, the geometric shapes of old commercial buildings to be demolished are irregular. Therefore, experiments on these datasets are helpful to reflect the detection per- According to Section 3.2.1, the DAPs cdi can be obtained. After screening the DAPs cdi with the improved GA model, the adaptively extracted ACGA-DAPs contains 85, 76, and 84 DAPs in three datasets of experiments, respectively. Each DAP in ACGA-DAPs is calculated by applying the difference to the two adjacent APs. Let the smaller scale parameter corresponding to one of these two APs be an initial parameter. The obtained scale parameter sets are listed in Tables 1-3.

General Results and Analysis
The building detection results of three datasets are shown in The visual analysis shows that the results of this method are significantly better overall than the five comparison methods. In addition, four accuracy evaluation indices including overall accuracy (OA), FP, FN, and Kappa are adopted for quantitative accuracy evaluation in this work. The results are reported in Tables 4-6. In the three groups of experiments, the OA of the proposed method exceeds 91.9%, offering the best performance among all experimental methods, in line with the conclusions of the visual analysis. Despite the differences between the different datasets the OA fluctuations of the method presented here is less than 2%, showing its stability. The building detection results of three datasets are shown in Figures 3-5: true positive (TP), false positive (FP), false negative (FN), and other non-buildings are represented by four colors.    (e) (f) (g) (h) The visual analysis shows that the results of this method are significantly better overall than the five comparison methods. In addition, four accuracy evaluation indices including overall accuracy (OA), FP, FN, and Kappa are adopted for quantitative accuracy evaluation in this work. The results are reported in Tables 4-6. In the three groups of experiments, the OA of the proposed method exceeds 91.9%, offering the best performance among all experimental methods, in line with the conclusions of the visual analysis. Despite the differences between the different datasets the OA fluctuations of the method presented here is less than 2%, showing its stability.    Table 6. Evaluation of building detection accuracy in Dataset 3.

Evaluation Criteria
The As an automated building detection method based on MAPs, the OA of Method 1 and the proposed method in all three sets of experiments are higher than 90%, and FNs are lower than 3.1%, which confirms the powerful ability of MAPs at portraying buildings in complex urban environments; however, all other accuracy indices of this method are higher than that of Method 1, except for FPs in Dataset 3. Therefore, compared with the strategy of Method 1, we directly select the DAPs set for adaptive selection based on the statistics and spatial information of potential building pixels, which is conducive to more accurate building characterization.
Method 2 is a classification method based SVM, which not only requires manual intervention, but also the detection accuracy is susceptible to the quality and number of training samples. For example, the number of samples in Dataset 1 is 833, which is higher than the 462 in Dataset 2 and 212 in Dataset 3, while the OA is improved by 3.7% and 2.9%, respectively. Method 3 uses fixed shapes for structural elements despite the use of automated building descriptors, while ignoring the complexity and diversity of the buildings, so it has an OA of less than 80% for both Datasets 2 and 3.
As deep learning methods, Methods 4 and 5 show low accuracy and bad stability in all three datasets of experiments. For example, the fluctuation range of the OA reaches 16.9%, and the lowest OA is only 66.6%. Compared with the proposed method, deep learning methods are not applicable to specific building detection applications where a prior knowledge is sparse, the reasons are as follows: (1) Due to the architecture of multilayer neural networks, deep learning models require large, diverse training datasets to avoid the overfitting problem. In the implementation of building detection within a specific area, the efforts to curate these datasets is regarded as the main barrier to adopt the deep learning method. This is the case for the three sets of experiments in this paper. (2) The proposed method can automaticly extract appropriated morphological attributes according to the characteristics of the remote sensing images, which is not limited by the amount of training samples. In addition, compared with the traditional treatment of taking the union of all DAPs adopted in Method 4, the improvement of the OA proves that the proposed fusion strategy is both feasible and effective.
After replacing the ACGA-DAPs with DAPs cdi in Method 6, the OA is reduced in all three sets of experiments, and in particular, the FPs improve significantly. This indicates that the number of DAPs is not maximized, and the redundancy of information and evidential conflicts among DAPs with different attributes and scale parameters may adversely affect the detection performance. Therefore, it is necessary to optimize the selection of DAPs cdi from the perspective of improving the OA and automation, which is also aligned with the goal of the proposed cross-probability adaptive genetic algorithm.

Visual Comparison of Representative Patches
The results of the representative patches in each dataset are reported in Figure 6 (patches I1 and I2), Figure 7 (patches I3 and I4), and Figure 8 (patches I5 and I6). The results for each representative patch are discussed as follows. Remote Sens. 2021, 13, x FOR PEER REVIEW 16 of 23      Residential and industrial buildings are two types of buildings that are common and distributed widely across urban HRRS images. On the other hand, they are always regions of interest (ROIs) in building detection applications based on HRRS images. Therefore, we focused on both residential and industrial buildings to ascertain the detection performance of the proposed method.
As shown in the figure, the segmentation by WJSEG extracts the complete outline of the buildings without obvious over-segmentation and under-segmentation, which lays a good foundation for the subsequent object-level building detection. The detection performance of the method in this paper is significantly better than that of other methods, regardless of the low-rise residential buildings in the green rectangle of I1, the green rectangle of I5, and purple rectangle, or high-rise residential buildings in the green rectangle of I3 and I4; however, only objects adjacent to the building and with regular shape in individual positions have FPs, but no FNs occur. For irregularly shaped buildings, such as villas (e.g., the purple rectangle in I1) and industrial buildings (e.g., the green rectangle in I6), only the methods proposed in this paper and Methods 1, 4, and 5 do not have FNs; for industrial buildings to be demolished (e.g., the yellow and green rectangle in I6), Methods 2, 3, and 4 had severe FPs. For industrial buildings of large size and regular shape (e.g., the yellow rectangle in I2 and the purple rectangle in I6), only the method in this paper and Methods 1, 4, and 5 do not have FPs. For some geographic objects that are located between buildings and have similar morphological features to buildings (e.g., the green rectangle in I4 and the green and purple rectangles in I5), the detection results of this method and Method 6 are better than the other comparators. In addition, the screening strategy employed in the present research can reduce the influences of non-building objects such as vegetation and shadows (e.g., the green and yellow rectangles in I1 and the purple rectangles in I2 and I4). For high-rise buildings using side-view imaging as in Dataset 2, the method proposed in this paper can obtain correct detection results when the building roof and side elevations are partitioned into the same object (e.g., the green rectangles in I3 and I4) and when the side elevations are partitioned into separate objects (e.g., the yellow and purple rectangles in I4).
Through the visual analysis of representative patches, this method can detect different types of buildings in complex urban backgrounds, and is insensitive to interference from factors such as false targets and imaging side view confusion. It is significantly better than other comparison methods and agrees with the conclusions of quantitative analysis.

Discussion
In the process of decision fusion of ACGA-DAPs, we employed the idea of fuzzy clustering to self-adapt to determine the proportion parameters ν B and ν NB . The results are as follows.
On this basis, to further discuss the influence of the setting of OA, we constructed the V NB -V B -OA three-dimensional curves in the intervals [0.05, 0.45] and [0.5, 0.95], respectively, with an interval of 0.05, as shown in Figure 9.
As shown in the figure, when , ν B ), respectively, and OA can exceed 88%. Meanwhile, ν NB and ν B are also located in these intervals according to Table 7. Therefore, we adopted 0.02 as the sampling interval for the above intervals to describe the relationship between the setting of ν NB , ν B , and OA. The results are shown in Figure 10.

Discussion
In the process of decision fusion of ACGA-DAPs, we employed the idea of fuzzy clustering to self-adapt to determine the proportion parameters B ν and NB ν . The results are as follows. On this basis, to further discuss the influence of the setting of OA, we constructed the VNB-VB-OA three-dimensional curves in the intervals [0.05, 0.45] and [0.5, 0.95], respectively, with an interval of 0.05, as shown in Figure 9.  and B ν are also located in these intervals according to Table 7. Therefore, we adopted 0.02 as the sampling interval for the above intervals to describe the relationship between the setting of NB ν , B ν , and OA. The results are shown in Figure 10.   In the above interval, the mean values of OA for the three datasets are 91.2%, 90.8%, and 90.5%, respectively, and the peak values of OA are 94%, 93.5%, and 93.3%, respectively, while the OA of the method proposed in this paper is 93.2%, 92.2%, and 91.9%, respectively. Thus, the OA of the method proposed herein is only slightly lower, by 0.8%, 0.13% and 0.16%, than the corresponding highest OA in three datasets, respectively, and significantly higher than the interquartile range of the overall mean accuracy. Meanwhile, the automation of parameter setting is thus achieved. In the above interval, the mean values of OA for the three datasets are 91.2%, 90.8%, and 90.5%, respectively, and the peak values of OA are 94%, 93.5%, and 93.3%, respectively, while the OA of the method proposed in this paper is 93.2%, 92.2%, and 91.9%, respectively. Thus, the OA of the method proposed herein is only slightly lower, by 0.8%, 0.13% and 0.16%, than the corresponding highest OA in three datasets, respectively, and significantly higher than the interquartile range of the overall mean accuracy. Meanwhile, the automation of parameter setting is thus achieved.

Conclusion and Future Lines of Research
For HRRS images of buildings in complex urban backgrounds, an automatic detection method based on the joint optimization and decision fusion of MAPs is proposed. This method aims to preserve detailed information about the morphological attributes of buildings by transforming the scale parameter setting of MAPs into an optimal selection

Conclusion and Future Lines of Research
For HRRS images of buildings in complex urban backgrounds, an automatic detection method based on the joint optimization and decision fusion of MAPs is proposed. This method aims to preserve detailed information about the morphological attributes of buildings by transforming the scale parameter setting of MAPs into an optimal selection problem for DAPs, and a cross-probability adaptive selection method is developed. Based on these, a building index SSBI that combines statistical and spatial information is designed and an unsupervised decision fusion framework based on D-S evidence theory is established to achieve the automated building detection. In the experiments on HRRS images from different groups of different regions and different sensors, the proposed method outperforms the other six advanced comparison methods in visual and quantitative analysis, with the OA exceeding 91.9%, while FPs and FNs are less than 6.13% and 3.03%, respectively. In addition, the setting of the value intervals of different attributes in this paper was limited