The proposed framework for newly built building detection was evaluated using bi-temporal high-resolution remote sensing imagery, encompassing three core components: pixel-level change detection, post-temporal building extraction, and newly built building recognition. This section discusses (1) the impacts of likelihood threshold and geometric shape index on detection performance; (2) the effects of different feature fusion strategies; and (3) comparative analysis against existing methods.
5.1. Sensitivity Analysis
- (1)
Analysis of Likelihood Threshold
In order to obtain the final newly built buildings, the likelihood map needs to be binarized. Therefore, the selection of the threshold will affect the extraction accuracy of the newly built buildings.
Figure 6a shows the variation in accuracy of newly built buildings with different likelihood thresholds
. PA and MPA are the accuracy indicators of two categories of newly built buildings and non-newly built buildings. They increase with the increase in
, reflecting the accuracy of newly built buildings, but they lack statistics on omission. The IoU is an index to comprehensively measure the extraction effect of newly built buildings. When the likelihood threshold
gradually increases, the IoU value increases first and then decreases.
Figure 6b explains the influence of
on the accuracy IoU of newly built buildings in more detail. The IoU provides comprehensive accuracy statistics of the recall and precision of newly built buildings. When the difference between recall and precision gradually becomes equal, IoU reaches the maximum; when the difference increases, the IoU begins to decrease.
- (2)
Sensitivity analysis of angle tolerance
A sensitivity analysis was conducted to evaluate the impact of the angle tolerance parameter
in the Line segment Verticality Criterion (LVC) function on pixel-level detection accuracy (with the fusion weight
fixed at 0.4). The results, summarized in
Table 2, reveal a clear non-linear relationship between angular tolerance and detection performance.
The analysis reveals that = 4° provides the optimal balance for building detection, achieving the highest F1-score of 0.689 with well-matched precision (0.692) and recall (0.683). This specific tolerance value appears to offer the ideal compromise, enabling effective identification of characteristic building perpendicularity while minimizing both false positives and omissions.
When θ was reduced below this optimum < 4°), performance deteriorated significantly due to severely constrained recall. At = 0°, recall dropped sharply to 0.402, reducing the F1-score to 0.522 despite relatively high precision. This confirms that excessively strict angular criteria cannot adequately capture the near-vertical line segments commonly present in actual building imagery. Notably, when increased to 5°, the F1-score already showed a slight decline to 0.682, indicating that the optimal performance window is relatively narrow. As the tolerance broadened further ( > 5°), a progressive performance decline was observed, with the F1-score dropping to 0.670 at = 6° and continuing to decrease with expanding tolerance. This pattern emerged because increasingly wider angular windows incorporated more non-orthogonal line segments from non-building features, systematically reducing precision while recall also gradually diminished.
This analysis validates = 4° as the optimal parameter setting for the LVC function, providing the most effective balance for utilizing perpendicular line segments as discriminative structural features in building detection at the pixel level.
- (3)
Sensitivity analysis of adjustable parameters
To assess the impact of the fusion weight
between the Building Line Index (BLI) and the Morphological Building Index (MBI), a sensitivity analysis was conducted with the angle tolerance fixed at its optimal value (
= 4°). The parameter
, which ranges from 0 to 1, controls the relative contribution of the structural feature (BLI) and the spectral–spatial feature (MBI) in the fused Building Intensity (BI), as defined in Equation (14). The pixel-level detection accuracy under different φ values is summarized in
Table 3.
The results demonstrate that the proposed fusion strategy is crucial for achieving optimal performance, with the highest F1-score of 0.689 achieved at = 0.4. This balanced weighting allowed for an effective integration of the high precision characteristic of the MBI and the high recall characteristic of the BLI.
When = 0 (relying solely on the MBI), the model maintained a relatively high F1-score of 0.673, with high precision (0.721) but moderate recall (0.630). This indicates that while the MBI alone provides reliable detection of clear building signatures, it remains conservative and misses a substantial number of true positives, particularly buildings with weak spectral–spatial responses. Conversely, when = 1.0 (relying solely on the BLI), the model achieved the highest recall (0.758) but suffered from the lowest precision (0.592), resulting in a lower F1-score of 0.665. This pattern confirms that the structural feature BLI, while highly sensitive in identifying potential building pixels, introduces significant false alarms from non-building objects with linear structures when used without spectral–spatial constraints.
The performance plateau observed for φ values between 0.2 and 0.6 (F1-score > 0.683) indicates robustness in the fusion framework. However, the clear peak at = 0.4 validates that a balanced contribution from both structural (BLI) and spectral–spatial (MBI) features is superior to relying on either feature type alone. The empirically determined value of = 0.4 was therefore adopted for all other experiments in this study.
- (4)
Ablation Study on Change Intensity (CI) Components
An ablation study was conducted to evaluate the individual contribution of the spectral and textural features to the final pixel-level change detection result. The Change Intensity (CI) map, a key input to the framework (Equation (7)), was computed under three different configurations while keeping all other parameters (
= 4°,
= 0.4, likelihood threshold) at their optimal values. The pixel-level accuracy for detecting newly built buildings under each configuration is presented in
Table 4.
The results clearly demonstrate the effectiveness of the proposed feature fusion strategy. Employing the combined CI map (ISFA + Texture) yielded the highest overall performance, achieving a balanced F1-score of 0.689.
The ISFA-only configuration exhibited a higher recall (0.705) but notably lower precision (0.662), resulting in a lower F1-score (0.683). This pattern suggests that spectral change information is highly sensitive in identifying potential changes but is less specific to building-related changes, leading to more false alarms from phenomena like vegetation growth or soil moisture variation. Conversely, the texture-only configuration showed the opposite trend, achieving the highest precision (0.698) among the three but the lowest recall (0.645). This indicates that textural change is a more specific indicator for structured objects like buildings, effectively suppressing some false alarms. However, it fails to capture all relevant building changes, especially those with less pronounced textural variation, resulting in significant under-detection.
The fusion of both features successfully mitigates the individual weaknesses of each. The combined approach balances the high sensitivity of the spectral feature with the high specificity of the textural feature, achieving a superior trade-off between false positives (precision) and false negatives (recall). This ablation study validates the design choice of integrating both spectral and textural information at the pixel-level change detection stage, confirming that their synergistic combination is crucial for the framework’s high performance.
- (5)
Analysis of GI
Table 5 shows the accuracy change of the newly built buildings after filtering by GI. It can be found that the improvement for PA is small, and the MPA and IoU values are 0.017 and 0.023, respectively. The improvement of the IoU value indicates that the overall accuracy of newly built building detection has improved, which verifies the effectiveness of GI in this study. The main purpose of introducing GI is to filter some false alarms with extremely large aspect ratios, thereby improving the detection accuracy of newly built buildings. In this experiment, the detection accuracy of newly built buildings increased by 0.031.
5.2. Analysis of Different Decision Fusion Methods
To evaluate the discriminative performance of the Building Intensity (BI) and Change Intensity (CI) features across major land-cover classes, representative sample points were selected for Newly Built Buildings (NBs), Unchanged Buildings (UCBs), and Other Land (including water bodies, roads, farmland) within the study area (
Figure 7c).
Figure 7a,b depict the CI map and post-temporal BI map, respectively, generated by our method. Regions exhibiting high intensity values in both maps correspond to areas with a high likelihood of being newly built buildings.
Figure 7d presents the mean feature values for each land-cover class across the different feature maps. Analysis reveals that NB samples consistently demonstrate higher values in both BI and CI features compared to UCBs and Other Land. Furthermore, NB samples exhibit significantly higher spectral feature differences (DS), particularly in the first three bands. In contrast, UCB samples show lower CI values but BI values comparable to NB samples. Other Land samples consistently display low values for both BI and CI features.
Figure 7e,f visualize the spatial distribution of these land-cover classes within the spectral difference (DS) feature space and the combined BI-CI feature space, respectively. The BI-CI feature space demonstrates superior class separability, effectively distinguishing NBs from other land-cover classes compared to the DS feature space. These results collectively indicate that the BI and CI features employed in this study provide enhanced discriminative power for the extraction of newly built buildings.
In the proposed framework for newly built building extraction, the final result is derived through the fusion of pixel-level change detection and object-level building information. To investigate the impact of the fusion strategy on the outcome, comparative experiments were conducted, evaluating both single-feature performance and an alternative decision-level fusion approach [
36]. This study employed feature-level fusion. For comparison, the decision-level fusion method involves separately binarizing the two features and subsequently combining them through logical operations to obtain the final fused result.
Figure 8 provides a visual comparison of the results obtained using decision-level fusion versus feature-level fusion. Specifically,
Figure 8a,b depict scatter plots within the spectral difference space, comparing the newly built building results from each fusion method against the reference data. While both sets of results exhibit some similarity, the point cloud representing the decision-level fusion result demonstrates a more pronounced deviation from the reference point set.
Figure 8c overlays the newly built building detection results from both fusion methods with the reference map. Analysis of this figure reveals that the decision-level fusion results contain a higher prevalence of false alarms (depicted by blue areas in
Figure 8c). Furthermore, the presence of yellow patches indicates that the decision-level fusion approach exhibits greater omission errors compared to the feature-level method. Notably, the scarcity of pink patches suggests that the feature-level fusion method achieves superior detection completeness with fewer omissions.
Figure 9 quantitatively compares the accuracy achieved using the individual features (BI and CI, representing post-temporal building intensity and change intensity, respectively) and the two fusion methods. The single-feature comparison clearly indicates that CI yields significantly higher accuracy than BI. This discrepancy primarily arises because BI solely captures building presence in the post-temporal image and lacks explicit change information. Furthermore, the decision-level fusion method exhibits slightly lower accuracy than the feature-level fusion approach adopted in this study. This performance difference is largely attributable to the requirement in decision-level fusion to independently threshold each feature before performing the logical intersection operation. This multi-step process inherently introduces potential errors, leading to an increase in false alarms and consequently reducing overall extraction accuracy.
5.3. Experiments Compared to Other Methods
As an unsupervised technique aimed at practical applications with limited labeled data, the performance of the proposed method was assessed by comparing it with three established unsupervised building change detection approaches. (1) OBISFA: This method builds upon ISFA by incorporating an object-oriented processing step, where the mean ISFA value within each object serves as the detection result. (2) SFAMBI: The difference fusion of SFA and MBI features is used to extract the change information. Three fusion methods were used in article [
37], and the weighted fusion method was used in the comparison experiment in this paper. (3) SPETEXMBI: As described in [
38], this method utilizes spectral and texture features to derive multi-temporal change information, subsequently applying MBI change information to filter out building change objects.
Figure 10 visually compares the newly built building extraction results from the different methods. In the figure, green denotes results from the proposed method, red represents the reference data (ground truth), and blue indicates results from the alternative method being compared in each sub-figure.
Figure 10a compares the proposed method with OBISFA. A higher prevalence of blue patches (e.g., in Region 1) indicates that OBISFA produces more false alarms.
Figure 10b compares the proposed method with SFAMBI. Similarly, this sub-figure shows numerous blue patches, signifying false alarms, alongside omissions (e.g., in Regions 2 and 3).
Figure 10c presents the comparison with SPETEXMBI. Here, a large number of yellow patches are evident, highlighting substantial omissions in the SPETEXMBI results, particularly in large-scale industrial areas (e.g., Region 4). Conversely, the scarcity of blue patches in
Figure 10c suggests SPETEXMBI generates fewer false alarms. Across all sub-figures, green, yellow, and white patches predominate, visually demonstrating the superior detection performance of the proposed framework compared to the three alternative methods.
To qualitatively assess the differences in newly built building extraction across the evaluated methods, scatter plots were generated in both the spectral and texture feature spaces.
Figure 11 presents these scatter plots for the four algorithms, where blue points represent the extracted results for each method and red points represent the reference data. Each point corresponds to the mean spectral or texture feature difference value calculated for an individual object within the respective extraction results.
The degree of spatial overlap between the blue (method result) and red (reference) point sets within these feature spaces serves as an indicator of extraction accuracy; greater overlap signifies closer alignment between the method’s output and the ground truth. Visual inspection of
Figure 11 reveals distinct spatial distributions for the different methods. Notably, the scatter points associated with spectral features exhibit relatively dispersed distributions, while those for texture features appear more concentrated. Crucially, the scatter plots for OBISFA, SFAMBI, and SPETEXMBI demonstrate visibly lower overlap with the reference data points compared to the proposed method.
To quantitatively characterize this spatial overlap, the centroid distance between each method’s scatter set and the reference scatter set was computed. A smaller centroid distance indicates a higher degree of spatial overlap and thus better agreement with the reference data. As shown in
Table 6, the proposed method achieves the smallest centroid distance for both feature spaces. This combined qualitative (visual overlap) and quantitative (centroid distance) analysis demonstrates that the extraction results produced by the proposed method exhibit the closest alignment with the reference results.
The accuracy of the SPETEXMBI method was the lowest. The main reason is that the method involves the threshold selection and logical operation of multiple features, which brings a lot of uncertainty to the extracted results and increases the proportion of false alarms. At the same time, the method detects spectral and texture changes using direct differencing, which leads to many noise points in the results and reduces precision.
Figure 12 presents the newly built building extraction accuracy results across all comparative experiments. The results demonstrate that the method proposed in this study achieved the highest accuracy performance. OBISFA exhibited superior accuracy compared to the other two benchmark methods, attributed to its utilization of iterative weighting within ISFA, which enhances its effectiveness over the basic SFA approach. Conversely, the SPETEXMBI method yielded the lowest accuracy. This is primarily due to its reliance on threshold selection and logical operations for multiple features, introducing significant uncertainty into the extracted results and elevating the proportion of false alarms. Furthermore, SPETEXMBI’s reliance on direct differencing for spectral and texture change detection contributes to the presence of noise in the results, thereby reducing precision.