Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding

Becker, Sarah J.; Wayant, Nicole M.

doi:10.3390/land15020271

Open AccessArticle

Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding

by

Sarah J. Becker

^*

and

Nicole M. Wayant

Geospatial Research Laboratory, Engineer Research & Development Center, U.S. Army Corps of Engineers, 7701 Telegraph Road, Alexandria, VA 22315-3864, USA

^*

Author to whom correspondence should be addressed.

Land 2026, 15(2), 271; https://doi.org/10.3390/land15020271

Submission received: 23 December 2025 / Revised: 28 January 2026 / Accepted: 31 January 2026 / Published: 6 February 2026

(This article belongs to the Special Issue GeoAI for Land Use Observations, Analysis and Forecasting (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Accurate identification of built-up land from remotely sensed imagery is essential for urban planning, environmental monitoring, and disaster response. However, binary built-up maps derived from single-date classifications often contain semantic noise—misclassified pixels resulting from shadows, bare soil confusion, or seasonal conditions. Common denoising methodologies, such as smoothing or filtering, are designed for continuous imagery and can distort small or fragmented features and fail to correct underlying classification errors. To overcome these limitations, this study evaluated a multi-date summation and thresholding workflow as a denoising alternative. Five Sentinel-2 images per site were classified as built-up maps, summed into a composite “built-up frequency” raster, and thresholded using Otsu, adaptive, and voting methods to produce refined binary maps. The results across nine international study sites show that the Otsu thresholding method outperformed the other methods in most locations when comparing their accuracies using the Matthews Correlation Coefficient (MCC), showing that using multiple images can improve identification of built-up land.

Keywords:

binary; built-up land; denoising; thresholding; Otsu

1. Introduction

Accurately delineating built-up areas is essential for applications such as urban planning [1,2], disaster response [3], and environmental monitoring [4,5]. Numerous studies have used remotely sensed imagery to produce binary maps of built-up land cover [6,7,8], yet such classifications inevitably contain noise and misclassified pixels [8,9]. Satellite imagery offers the opportunity to improve classification through repeated observations. This raises a central question: how can binary built-up maps be effectively denoised to improve their accuracy and reliability?

Traditional denoising and smoothing methods, largely developed for continuous data such as spectral reflectance values, aim to reduce random variation while preserving essential image features [10]. Techniques like Gaussian filtering, median filtering, and bilateral filtering are commonly employed. Gaussian filters apply a weighted average, blurring the image and reducing high-frequency noise [11]. Median filters replace each pixel value with the median value within a defined neighborhood, effectively removing salt-and-pepper noise while preserving edges more effectively than a Gaussian filter [12,13]. Bilateral filters weight pixels based on both spatial proximity and spectral similarity, aiming to smooth while preserving sharp boundaries [14]. However, the direct application of these techniques to binary maps is problematic. Smoothing filters can erode legitimate built-up areas or create spurious connections, fundamentally altering the image’s classification integrity.

Related approaches, including classification smoothing, filtering, and generalization, suffer from analogous limitations. For example, Wang et al. [15] noted that applying a majority filter will remove some noise but its ability to improve overall map accuracy is limited because it does not bring in new information. While intuitively appealing for refining map edges and removing isolated pixels, these methods can distort the size and shape of built-up areas, especially when dealing with small or fragmented features. More importantly, these techniques address symptoms rather than the root cause of misclassifications—the semantic errors inherent in the initial classification process—and may even reinforce existing biases.

Beyond these conventional techniques, more sophisticated denoising methods have emerged. Non-Local Means (NLM) filtering [16] exploits image redundancy by averaging pixels based on the similarity of their surrounding patches, potentially preserving details better than simple smoothing. Block-Matching and 3D Filtering (BM3D) [17] groups similar image blocks, performs a 3D transform, and filters in the transform domain, demonstrating effectiveness for Gaussian noise. Total Variation (TV) denoising [18,19] minimizes image variation, promoting piecewise smoothness and preserving edges. However, even these advanced methods are often ill suited for denoising binary built-up maps. The core issue resides in the fundamental nature of the “noise” itself.

In binary maps, “noise” is not typically random variation; it is semantic noise—a fundamentally incorrect classification of land cover. A misclassified pixel represents a flawed interpretation of the scene, not a minor spectral anomaly. Smoothing filters treat all deviations equally, regardless of their semantic validity. NLM and BM3D, while detail-preserving, still rely on averaging, risking the alteration of critical built-up area boundaries. Deep learning approaches, while powerful, are heavily reliant on the quality and representativeness of their training data; biased or limited datasets can exacerbate existing errors [20].

A potentially more effective approach involves using multiple observations within a single year and thresholding the binary raster composite data. This technique constructs a composite “built-up frequency” raster by aggregating binary maps representing built-up areas over the period of a year. A threshold is then applied to this built-up frequency raster, classifying pixels as built-up or non-built-up based on whether their summed value exceeds the threshold. Several global methods are available for threshold calculation, including Otsu’s method, Isodata, voting, and Boolean overlay operations (AND, OR, XOR) [21]. Among these, Otsu’s method [22] is widely adopted [18,23,24]. Adaptive thresholding methodologies, which handle data more locally, also warrant further investigation.

Studies have found that the Otsu threshold can improve accuracy when applied to feature identification in imagery. Che et al. [25] performed binary mapping of water using various water indices in combination with Otsu thresholding to choose the best water index to improve water mapping compared to using a single fixed threshold. Similarly, Karakus [26] applied the Otsu threshold to various water indices to find the optimal boundary and water surface area of a lake in different years.

Research has also shown that a multi-date approach to classifying land cover offers an improved accuracy in identifying land covers, especially when only a small number of land cover classes are being identified [27]. Lawer [27] used random forest to classify two, four, six, and nine land covers. His testing found classification of binary land cover (green space and non-green space) yielded the best accuracy using the multi-date approach. However, while his work shows increased accuracy using multi-date imagery, his single-date approach when applied to two land covers shows only slightly lower accuracy, suggesting single-date and multi-date approaches can be used for binary land cover classifications.

To investigate the effect of multi-date built-up frequency and threshold denoising, we focused on the Maloney et al. [28] binary mapping of built-up areas approach. Maloney et al. [28] presented a novel approach to mapping built-up areas using Sentinel-2 imagery, leveraging existing built-up indices, a novel texture metric derived from the red band, and an ensemble machine learning framework. Their algorithm employed optimized global thresholds for each index to classify pixels, demonstrating the potential for a scalable, worldwide applicability. The methodology, while comprehensive in its selection of diverse geographic locations and land cover types, demonstrates limitations and misclassifications, especially in complex environments. Quantitative analysis indicates low performance in one or both images in multiple locations: winter in Lhasa, China; summer and fall in Mykonos, Greece, and Riga, Latvia; spring and winter in Shobara, Japan; summer and winter in Ta’izz, Yemen; and summer and spring in Leiden, The Netherlands. These classifications rely on single Sentinel-2 images. There is potential for denoising through the incorporation of multi-temporal classified Sentinel-2 imagery, summed together and denoised by global and local thresholds.

Accordingly, this research proposes comparing multi-date thresholding methods and the single-date method to denoise binary built-up maps. Greater detail on the threshold methodologies used and an evaluation of the accuracy of the approaches are described in subsequent sections of this paper.

2. Materials and Methods

The objective of this study is to determine if thresholding the summation of multi-date binary built-up imagery is an effective form of denoising binary built-up maps. To achieve this, we calculate five binary built-up maps over a single area for multiple dates and add all the binary maps together to obtain a built-up frequency raster. We then threshold this matrix to make it binary (Figure 1). Comparisons are made between the single-date binary built-up classification map and the thresholded multi-date binary classification map. This is repeated for several study areas and comparisons are made using the Matthews Correlation Coefficient.

2.1. Imagery

This project uses Sentinel-2 (S2) Level 2A (L2A) Bottom of Atmosphere (BoA) reflectance imagery. Data were accessed through the Copernicus Open Access Hub via SpatioTemporal Asset Catalogs (STAC) (https://stacspec.org/en, accessed on 1 July 2025), which provides a convenient and efficient way to discover and index remotely sensed data. A total of 5 mostly cloud-free images were downloaded for each study site between 2020 and 2021. Five images were selected to try to reduce false positives from other land covers such as bare land, which has been shown to be confused with built-up land [28,29,30]. Additional images beyond 5 were not considered as excess data can increase computational costs [31] without increasing accuracy [32]. We employed the blue, green, red, near-infrared, and shortwave infrared bands—as described in Maloney et al. [28] and summarized below—to classify built-up areas using spectral indices. Specific band center wavelengths and widths are provided in Table 1.

The Global Human Settlement Layer (GHSL) is a static map that was used to compare the accuracy of the output from the single-date and multi-date thresholding. The GHSL is a deep learning output method using convolutional neural networks on high-performance computational equipment. It was calculated using 2020 Sentinel-2 imagery sampled to 100 m and achieved an accuracy of 0.92 using the Intersection over Union statistic for built-up land [33]. Since it is a static map, it is not a direct comparison to the approach examined in this study, but it offers a verified highly accurate output for comparison. It is freely available and does not require extra licensing, which commercial software may require.

2.2. Study Sites

To enable direct comparison with previous work, we used the same study locations as Maloney et al. [28]. However, we excluded Guangzhou, China; Shobara, Japan; and Petropavl, Kazakhstan, due to a lack of sufficient cloud-free S2 imagery during the study period. Obtaining multiple clear images is crucial for robust analysis. Figure 2 provides a visual overview of the study site locations. Table 2 includes their corresponding S2 granule IDs and collection dates. These locations were chosen by Maloney et al. [28] based on the Human Development Index, infrastructure density, and geographic features of interest.

When possible, the date range selected for this study was selected based on the season that achieved the higher F1 accuracy in Maloney et al. [28]; however, in some locations, the F1 values were low in both seasons, such as Leiden, The Netherlands. Exceptions were made in instances where imagery from both the higher and lower performing seasons from a location was selected due to lack of availability of multiple cloud-free images in the higher performing season. Because the ability to consistently acquire imagery in one season was not possible, a comparison of seasonal results is outside the scope of this study.

2.3. Automated Approach to Identify Built-Up Land

This research addresses the research question of how the single-date binary built-up classification approach compares to multi-date methods combining the Maloney et al. [28] approach with adaptive, voting, or Otsu thresholding for identifying built-up land. The reader is referred to Maloney et al. [28] for the single-date binary built-up approach method.

The multi-date methodology centers on creating a “built-up frequency” raster to identify true built-up areas (Figure 1 and Figure 3). This raster is generated by adding together binary built-up area maps derived from imagery across a full temporal period. For our research, we applied the binary built-up approach by Maloney et al. [28] to each of five images per site, as shown in Table 2, and then summed the binary outputs together. Each pixel in the built-up frequency raster therefore represents the number of times that location was classified as “built-up” throughout the year, where 0 represents pixels that were classified as not built-up in all five images; and 1–5 represents the number of images that classified that pixel as built-up. The underlying principle is that genuinely built-up areas will consistently receive a high cumulative value, while misclassified areas will exhibit lower values.

To convert this built-up frequency raster into a final binary built-up map, we applied a threshold. Pixels with a cumulative value exceeding the threshold were classified as built-up; those below were classified as non-built-up. The critical challenge lies in determining the optimal threshold value. We applied several established image processing techniques to achieve this, including Otsu’s method [22], adaptive, and voting thresholding.

Otsu’s method [22] stands out due to its widespread adoption and demonstrated effectiveness [14,23,24,34,35]. Originally designed for grayscale image segmentation—separating foreground from background—Otsu’s method automatically calculates a threshold that minimizes the weighted within-class variance of the two resulting classes (built-up and non-built-up). This means it seeks a threshold that creates classes that are internally as homogenous as possible. Recent advancements have refined Otsu’s method, exploring strategies to maximize variance [24], incorporate statistical distributions [36], and even integrate deep learning techniques [14]. However, these enhancements have not been systematically applied to the specific task of binarizing composite binary geographic maps like our built-up frequency raster.

It is important to note that Otsu’s method is a global thresholding technique. It applies a single, uniform threshold across the entire image. This differs from local or adaptive thresholding methods, which dynamically adjust the threshold based on the characteristics of surrounding pixels. Adaptive thresholding is particularly valuable when dealing with imagery exhibiting varying illumination or strong contrast gradients [37,38,39,40]. Adaptive thresholding involves subtracting a constant value, C, from the mean of a local neighborhood based on the “blockSize” parameter, which determines the size of the neighborhood area. We used a blockSize of 11 and a C value of 2. A blockSize of 11 is a relatively small local region around a pixel, and is effective at handling variations over small areas. The C value provides a small offset to make the local neighborhood mean more conservative. By responding to local variations, adaptive thresholding can improve classification accuracy in heterogeneous landscapes.

Lastly, for the third approach to refine our built-up area classification derived from the built-up frequency raster, we implemented a straightforward global “voting” method as a final processing step. This technique leverages the cumulative value within each pixel of the built-up frequency raster—a value representing how many times that location was identified as “built-up” across the time series. The core principle behind this approach is to filter out potential noise or sporadic misclassifications due to potential shadow interference or cloud cover. We hypothesized that areas genuinely built-up would consistently receive higher cumulative values, reflecting repeated identification as such. Conversely, areas incorrectly flagged as built-up in isolated instances would have lower cumulative values. Therefore, we established a simple global threshold: any pixel in the built-up frequency raster with a value of 3 or greater was definitively classified as “built-up.” This means the location needed to be classified as built-up in at least 3 separate time images to be included in the final map. Any pixel with a cumulative value below 3 was classified as “not built-up.” This threshold of “3” represents a balance between retaining genuine built-up areas and removing potentially erroneous classifications. It essentially requires a majority level of consensus across the time series data. By setting this threshold, we aim to reduce the impact of temporary conditions (like shadows) or isolated errors in the underlying classification process that might falsely identify an area as built-up.

2.4. Ground Truth Validation

Ground truth points for accuracy assessment were generated via an equalized stratified random sampling of an independent reference product, the European Space Agency WorldCover 2021 map (76.7% overall accuracy) [41,42]. For each location, we generated at least 100 “built-up” and 100 “non-built-up” points. To create this binary reference, the WorldCover product’s ten non-urban land cover classes were merged into a single layer and the built-up class was used as the second layer. The WorldCover dataset, itself a machine learning product derived from 2021 Sentinel imagery, was chosen to ensure temporal consistency and to avoid any bias that could arise from using our own classification outputs for validation.

The ground truth points in the accuracy assessment were verified through examination of the underlying base layer imagery from ArcPro 3.1.1, which uses 0.6 cm to 1.2 m resolution imagery for most of the world. The base layer was compared to the 10 m composite image used in the single-image approach for each location by Maloney et al. [28] to verify if the land cover was consistent with the ArcPro base layer.

2.5. Accuracy Statistics

Built-up accuracy for the results of the single-scene approach and the multi-scene thresholding approaches was calculated using the Matthews Correlation Coefficient (MCC). The MCC measures the accuracy of a binary classification, giving a score of −1 (worst value) to +1 (best value). When MCC = 0, it is the equivalent of an expected value of a coin tossing classifier [43]. It is a measure of the correlation of true classes against predicted labels [44].

M C C = \frac{T P \times T N - F P \times F N}{[(T P + F P) \times (F N + T N) \times (F P + T N) \times (T P + F N)]^(1 / 2)}

(1)

where TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative. MCC was chosen over the more widely known Kappa and F1-score, because while Kappa and MCC converge when FP and FN are equal, Kappa and MCC diverge when FP and FN are very different, which is more likely when the two statistics produce negative values. An MCC value of −1 can correspond to any negative value of Kappa, including a Kappa value of close to 0, which would indicate the prediction is similar to a random guess, while the MCC value −1 would indicate an opposite prediction [45].

The MCC is also a more comprehensive metric than the F1-score, because the MCC uses all quadrants of the confusion matrix, while the F1-score only uses TP, FP, and FN, while ignoring TN. MCC is preferred when TN is important or when class distribution is imbalanced [43], both conditions which are met for this study.

3. Results

3.1. Overview of Results

A comparison of single-date with multi-date approaches showed differences in built-up land identification based on approaches across locations. The multi-date adaptive thresholding approach consistently removed false positive pixels in areas containing trees, soil, and other non-built-up land covers, but it produced more false negatives in the built-up areas. The Otsu and voting approaches exhibited opposite results. They demonstrated higher accuracy in identifying built-up pixels in areas containing built-up land covers, such as buildings, roads, and houses, but produced false positives over trees, vegetation, and bare ground. The single-date approach was accurate in identifying built-up pixels in built-up areas but also created false positives in non-built-up areas (see Figure 4 for a comparison). The output for the GHSL is also shown and while it is calculated using 100 m pixels, it shows the built-up results more fully cover the built-up area.

3.2. Matthews Correlation Coefficient Accuracy Statistic Results

Analysis of MCC scores across all study areas reveals that Otsu’s thresholding applied to the built-up frequency raster outperformed the other multi-image and single-image approaches in most instances (Figure 5). Otsu’s thresholding yielded higher MCC scores than the single built-up image in seven of the nine study areas. Otsu was higher than adaptive in eight of the nine study areas, and higher than voting in six of the nine areas and tied in two areas. When comparing voting to the single-image and adaptive approaches, voting outperformed the single-image approach in six of the nine areas, and outperformed the adaptive approach in eight of the nine areas while tying in one area. The adaptive approach was the only multi-image approach to perform more poorly than the single-image approach in most locations. The GHSL static layer performed better than Otsu across most locations, except for Greece and Haiti. In these locations, the GHSL overestimated built-up land in areas where the land cover was bare, which occurred less frequently in other locations.

When comparing the best performing with the poorest performing multi-image approach, the Otsu approach performed best in Australia, where the city, Alice Springs, is surrounded by vegetation and barren land (Figure 6). Built-up land was correctly classified as built-up, while some barren land was misclassified as built-up land. The adaptive approach routinely missed built-up land within Alice Springs but was more effective at correctly classifying non-built-up land.

The Otsu approach was lowest in China (Figure 7). Similar to Australia, Otsu performed well in the city, Lhasa, but was more likely to misclassify other non-built-up land covers. Unlike, Australia, the imagery of China contains a large amount of mountainous land cover. The shadows in the mountainous areas were misclassified as built-up land. While the shadows less frequently caused misclassification with built-up land in the adaptive approach, the adaptive approach was more likely to omit built-up land.

4. Discussion

Accurate and efficient identification of built-up land is critical for urban planning and disaster response. This study investigated the efficacy of leveraging multi-temporal data to improve built-up land classification accuracy compared to an already-established single-image approach. Additionally, the classification of each approach was compared to the classification from a static assessment using the GHSL. The results of this study indicate that Otsu multi-image thresholding is the optimal approach to identify built-up land when compared to single-image and other multi-image thresholding approaches. When compared to the GHSL, the GHSL outperformed Otsu in most locations; however, the GHSL is optimal when only a static layer is necessary. When analysis is required for a different date or across a time series, the Otsu multi-image approach offers an option to identify built-up land.

The accuracies for the multi-image and single-image approaches were still lower than the static GHSL ground truth layer, because the GHSL more effectively classified both built-up and non-built-up land covers accurately. The Otsu approach optimized identification of built-up land at the expense of misclassifying non-built-up land, which occurred in imagery of the mountainous region surrounding Lhasa, China, where shadows were misclassified as built-up land. Recent research has examined issues caused by shadows in imagery analysis [46,47], yielding results that could improve the results of this research. In locations where smaller clusters of non-built-up pixels were misclassified as built-up, post-processing approaches have been undertaken to remove clusters of erroneous pixels as carried out in Becker et al. [9].

This study was unable to address the seasonal component that contributes to semantic noise. Ideally, in a multi-temporal approach, the imagery collection timeframe would be consistent, either collected within one season or throughout the same months across sites, but our collection was constrained by the availability of cloud-free imagery and we collected within season for some sites and across seasons for other sites. Prior research has shown higher accuracy in imagery taken from the dry season when identifying vegetation [9,48] and when taken from multiple scenes within one season [48]. Prior research has also shown imagery taken from the non-growing season contains more built-up land classifications based on seasonal effects in the imagery rather than true land cover classifications [49]. For this research in identifying built-up land, the seasonal effect on semantic noise was not studied.

The adaptive approach demonstrated worse results in identifying built-up pixels while performing better at identifying non-built-up pixels; which could be attributed to the use of the fixed blockSize parameter. Sezgin and Sankur [50] found that adaptive methods can be effective but sensitive to removing noise in the foreground and the background, degrading the quality of the classification, which occurred in all our locations. Adaptive thresholding requires initial parameterization and the performance of the adaptive thresholding is dependent on the selection of appropriate parameters [51]. We chose a fixed blockSize of 11, which is equal to 121 pixels, for our research, because we found many built-up features to be larger than 121 pixels; however, within urban features, pixel values could vary greatly, rendering the block size of 11 too large, causing built-up features to be missed, which occurred frequently. Optimizing accuracy from adaptive thresholding would require that these parameters be tuned by the user for improved performance removing the automated approach that was the goal of this study.

Additionally, since the distribution of points in the summed image is bimodal, modifying adaptive parameter thresholds might not yield an increase in accuracy because of the domination of non-built-up pixels in the built-up frequency rasters. The higher accuracies of the Otsu and voting approaches in our study compared to the adaptive approach could be attributed to the domination of 0s. With the adaptive approach, since it is a local moving window and can correct for local over-classification of built-up pixels by removing individual and small groups of built-up pixels, the adaptive approach might have removed too many pixels due to a suboptimal parameter selection.

Future Research

While the purpose of this research was to test global thresholding approaches using Otsu and voting and fixed parameters for the adaptive threshold, future research could investigate techniques to fine-tune the results at each location. This could be achieved by modifying the parameters used in the adaptive threshold and applying post-processing steps to determine the effect on the results.

Future research in adaptive thresholding could focus on testing different block sizes and constants at each location, exploring the relationship between image characteristics and adaptive parameter designation, quantifying differences by location, type of built-up land, and non-built-up land covers.

Future research could focus on a post-processing modification of the results where a simpler fixed non-adaptive approach could be applied to remove small clusters of pixels. Becker et al. [9] developed a binary forest cover algorithm and used a 200-pixel group minimum and an 8-neighbor sieve to remove small clusters of pixels that should have been identified as forest and a 3 × 3 pixel clump that filled in small areas of land covers that were incorrectly identified as forest. Becker et al. [9] based the neighbor, sieve, and clump parameters on the size of groups of pixels they deemed erroneous and on the pixel size of WorldView-2 imagery at 2 m resolution. Becker et al. [52] applied their binary forest cover algorithm and tested the same neighbor, sieve, and clump parameters with PlanetScope imagery (3 m resolution) and achieved favorable accuracy, but Feliciano-Cruz et al. [53], who expanded on work completed by Becker et al. [9], did not perform a sieve and clump when using Landsat imagery at 30 m resolution because the accuracy would have decreased, as discussed in Becker et al. [52]. The Sentinel-2 imagery used in our study was at a 10 m resolution, suggesting a fixed pixel approach could achieve results in removing pixels and increase accuracy.

5. Conclusions

This study has demonstrated that a multi-image approach combined with an Otsu thresholding technique provides the most effective method for identifying built-up land cover when compared to a single-image approach. These findings indicate that aggregating multiple images improves the robustness of binary built-up classifications while maintaining methodological simplicity. The implications of this work are significant, as the approach requires relatively modest computational and data resources compared to more complex classification frameworks. Consequently, the resulting methods and derived datasets could be incorporated into existing land-cover products or distributed as reproducible analytical workflows. Such outputs would be valuable to a wide range of users, including urban planners, disaster response and humanitarian organizations, defense and security analysts, and environmental researchers, particularly in data-limited or resource-constrained contexts. Moreover, the transferable nature of the multi-image Otsu-based methodology supports interdisciplinary linkages with fields such as urban studies, environmental monitoring, risk and resilience analysis, and socio-economic research, where accurate delineation of built-up areas is a critical input.

This work could be extended across different sensors, spatial resolutions, and geographic contexts. Additionally, it can be extended to examine how variations in temporal aggregation, image seasonality, and the number of input scenes influence classification performance, particularly in regions experiencing rapid urban change or strong seasonal effects. Independent validation strategies using high-resolution reference data, crowdsourced annotations, or alternative ground truth sources would further strengthen confidence in the results and enable more robust uncertainty assessment. Beyond methodological refinement, future work could explore the integration of built-up land cover outputs with socio-economic, environmental, and hazard-related datasets to support interdisciplinary analyses of urbanization, exposure, and vulnerability.

Author Contributions

Conceptualization, S.J.B. and N.M.W.; methodology, S.J.B. and N.M.W.; validation, S.J.B.; formal analysis, S.J.B. and N.M.W.; writing—original draft preparation, S.J.B. and N.M.W.; writing—review and editing, S.J.B. and N.M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Army Engineer Research and Development Center, supported under PE 0602146A/AT9, Project “Tactical Geospatial Information Capabilities”, Task “Geospatial Analytics and Prediction.” This article is a U.S. Government work and is in the public domain in the USA.

Data Availability Statement

All data used in this study were accessed from publicly available repositories, as referenced in the text. Distribution Statement A. Approved for public release: distribution is unlimited.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gaur, S.; Singh, R.A. Comprehensive review on land use/land cover (LULC) change modeling for urban development: Current status and future prospects. Sustainability 2023, 15, 903. [Google Scholar] [CrossRef]
Naikoo, M.W.; Rihan, M.; Ishtiaque, M.; Shahfahad. Analyses of land use land cover (LULC) change and built-up expansion in the suburb of a metropolitan city: Spatio-temporal analysis of Delhi NCR using Landsat datasets. J. Urban Manag. 2020, 9, 347–359. [Google Scholar] [CrossRef]
Rakuasa, H. Spatial-temporal analysis of built-up land development in landslide-prone areas: Disaster risk assessment. Calam. J. Disaster Technol. Eng. 2025, 2, 143–151. [Google Scholar] [CrossRef]
Ullah, S.; Qiao, X.; Abbas, M. Addressing the impact of land use land cover changes on land surface temperature using machine learning algorithms. Sci. Rep. 2024, 14, 18746. [Google Scholar] [CrossRef] [PubMed]
Halder, B.; Bandyopadhyay, J.; Banik, P. Monitoring the effect of urban development on urban heat island based on remote sensing and geo-spatial approach in Kolkata and adjacent areas, India. Sustain. Cities Soc. 2021, 64, 103186. [Google Scholar] [CrossRef]
Firozjaei, M.K.; Sedighi, A.; Kiavarz, M.; Qureshi, S.; Haase, D.; Alavipanah, S.K. Automated built-up extraction index: A new technique for mapping surface built-up areas using LANDSAT 8 OLI imagery. Remote Sens. 2019, 11, 1966. [Google Scholar] [CrossRef]
Waqar, M.M.; Mirza, J.F.; Mumtaz, R.; Hussain, E. Development of new indices for extraction of built-up area & bare soil from Landsat data. Open Access Sci. Rep. 2012, 1, 4. [Google Scholar]
Zha, Y.; Gao, J.; Ni, S. Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. Int. J. Remote Sens. 2003, 24, 583–594. [Google Scholar] [CrossRef]
Becker, S.J.; Daughtry, C.S.; Russ, A.L. Robust forest cover indices for multispectral images. Photogramm. Eng. Remote Sens. 2018, 84, 505–512. [Google Scholar] [CrossRef]
Schindler, K. An overview and comparison of smooth labeling methods for land-cover classification. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4534–4545. [Google Scholar] [CrossRef]
Akhter, M.; Ullah, F.; Mostarda, L.; Cacciagrano, D. A Deep Learning-Based RIDNet Approach for Enhanced Denoising of SAR Images. In Proceedings of the International Conference on Advanced Information Networking and Applications, Cham, Switzerland, 9–11 April 2025; Springer Nature: Cham, Switzerland, 2025; pp. 268–279. [Google Scholar]
Ullah, F.; Kumar, K.; Rahim, T.; Khan, J. A new hybrid image denoising algorithm using adaptive and modified decision-based filters for enhanced image quality. Sci. Rep. 2025, 15, 8971. [Google Scholar] [CrossRef] [PubMed]
Golilarz, N.A.; Gao, H.; Pirasteh, S.; Yazdi, M.; Zhou, J.; Fu, Y. Satellite multispectral and hyperspectral image de-noising with enhanced adaptive generalized gaussian distribution threshold in wavelet domain. Remote Sens. 2020, 13, 101. [Google Scholar] [CrossRef]
Yang, Y.; Sun, Y.; Gao, W.; Wang, X.; Zeng, L. Bilateral regularized optimization model for edge-preserving image smoothing. Image Vis. Comput. 2024, 146, 105031. [Google Scholar] [CrossRef]
Wang, W.; Li, W.; Zhang, C.; Zhang, W. Improving object-based land use/cover classification from medium resolution imagery by Markov chain geostatistical post classification. Land 2018, 7, 31. [Google Scholar] [CrossRef]
Painam, R.K.; Manikandan, S. A comprehensive review of SAR image filtering techniques: Systematic survey and future directions. Arab. J. Geosci. 2020, 14, 37. [Google Scholar] [CrossRef]
Lin, T.; Hong, H.; Wu, L. Improved BM3D for real image denoising. In Proceedings of the 13th International Conference on Wireless Communications and Signal Processing (WCSP), Changsha, China, 20–22 October 2021; pp. 1–5. [Google Scholar]
Yang, F.; Hu, Q.; Su, X. Hyperspectral image denoising based on hyper-Laplacian total variation in spectral gradient domain. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5507917. [Google Scholar] [CrossRef]
Wang, P.; Tianm, S.; Chen, Y.; Ge, K.; Wang, X.; Wang, L. Hyperspectral image denoising based on deep and total variation priors. Remote Sens. 2024, 16, 2071. [Google Scholar] [CrossRef]
Davydezenka, T.; Tahmasebi, P.; Carroll, M. Improving remote sensing classification: A deep-learning-assisted model. Comput. Geosci. 2022, 164, 105123. [Google Scholar] [CrossRef]
González, R.C.; Woods, R.E. Digital Image Processing, 3rd ed.; Pearson Education: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
Otsu, N.A. threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Bangare, S.L.; Vidyavihar, S.; Dubai, A.A.; Patil, S. Reviewing Otsu’s method for image thresholding. Int. J. Appl. Eng. Res. 2015, 10, 21777–21783. [Google Scholar] [CrossRef]
Yousefi, J. Image Binarization using Otsu Thresholding Algorithm. Master’s Thesis, University of Guelph, Guelph, ON, Canada, 2011. [Google Scholar]
Che, L.; Li, S.; Liu, X. Improved surface water mapping using satellite remote sensing imagery based on optimization of the Otsu threshold and effective selection of remote-sensing water index. J. Hydrol. 2025, 654, 132771. [Google Scholar] [CrossRef]
Karakus, P. Detection of Water Surface Using Canny and Otsu Threshold Methods with Machine Learning Algorithms on Google Earth Engine: A Case Study of Lake Van. Appl. Sci. 2025, 15, 2903. [Google Scholar] [CrossRef]
Lawer, E.A. An evaluation of single and multi-date Landsat image classifications using random forest algorithm in a semi-arid savanna of Ghana, West Africa. Sci. Afr. 2024, 26, e02434. [Google Scholar] [CrossRef]
Maloney, M.C.; Becker, S.J.; Griffin, A.W.H.; Lyon, S.L.; Lasko, K. Automated built-up infrastructure land cover extraction using index ensembles with machine learning, automated training data, and red band texture layers. Remote Sens. 2024, 16, 868. [Google Scholar] [CrossRef]
Azedou, A.; Amine, A.; Kisekka, I.; Lahssini, S.; Bouziani, Y.; Moukrim, S. Enhancing Land Cover/Land Use (LCLU) classification through a comparative analysis of hyperparameters optimization approaches for deep neural network (DNN). Ecol. Inform. 2023, 78, 102333. [Google Scholar] [CrossRef]
Becker, S.J.; Maloney, M.C.; Griffin, A.W.; Lasko, K.; Sussman, H.S. Bare ground classification using a spectral index ensemble and machine learning models optimized across 12 international study sites. Geocarto Int. 2025, 40, 2465452. [Google Scholar] [CrossRef]
Adugna, T.; Xu, W.; Fan, J. Effect of using different amounts of multi-temporal data on the accuracy: A case of land cover mapping of parts of Africa using FengYun-3C data. Remote Sens. 2021, 13, 4461. [Google Scholar] [CrossRef]
Silvey, S.; Liu, J. Sample size requirements for popular classification algorithms in tabular clinical data: Empirical study. J. Med. Internet Res. 2024, 26, e60231. [Google Scholar] [CrossRef]
Pesaresi, M.; Schiavina, M.; Politis, P.; Freire, S.; Krasnodębska, K.; Uhl, J.H.; Kemper, T. Advances on the Global Human Settlement Layer by joint assessment of Earth Observation and population survey data. Int. J. Digit. Earth 2024, 17, 2390454. [Google Scholar] [CrossRef]
Yardimci, O.; Ulusoy, I. Evaluation of Pre-Processing, Thresholding and Post-Processing Steps for Very Small Target Detection in Infrared Images. In Proceedings of the Automatic Target Recognition XXVI, Baltimore, MD, USA, 17–21 April 2016; SPIE: Bellingham, WA, USA, 2016; Volume 9844, p. 984405. [Google Scholar]
Kumar, A.; Tiwari, A.A. Comparative Study of Otsu Thresholding and K-means Algorithm of Image Segmentation. Int. J. Eng. Technol. Res. 2019, 9, 12–14. [Google Scholar] [CrossRef]
Tu, T.N. Improving a New Global Thresholding Algorithm Based on Gray Average for Binaryizating Image. Int. J. Latest Eng. Sci. 2024, 7, 5. [Google Scholar]
Zhang, Y.; Zhang, Z.; Xu, R.; Xiong, P. Adaptive Multi-Threshold Image Segmentation Using Neighborhood Minimum Gray Values for Enhanced 2D Histogram Construction. In Proceedings of the Fifth International Conference on Optical Imaging and Image Processing (ICOIP 2025), Shanghai, China, 11–13 July 2025; SPIE: Bellingham, WA, USA, 2025; Volume 13688, pp. 604–611. [Google Scholar]
Roy, P.; Dutta, S.; Dey, N.; Dey, G.; Chakraborty, S.; Ray, R. Adaptive thresholding: A comparative study. In Proceedings of the 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kanyakumari, India, 10–11 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1182–1186. [Google Scholar]
Niblack, W. An Introduction to Digital Image Processing; Prentice-Hall: Englewood Cliffs, NJ, USA, 1986. [Google Scholar]
OpenCV. Adaptive Thresholding. Available online: https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html (accessed on 12 December 2025).
Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 v200; Zenodo: Geneva, Switzerland, 2022. [Google Scholar] [CrossRef]
Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. WorldCover Product User Manual V2.0; ESA WorldCover consortium: Paris, France, 2022; Available online: https://esa-worldcover.s3.eu-central-1.amazonaws.com/v200/2021/docs/WorldCover_PUM_V2.0.pdf (accessed on 22 January 2026).
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Chicco, D.; Tötsch, N.; Jurman, G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021, 14, 13. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Matthews correlation coefficient (MCC) is more informative than Cohen’s Kappa and Brier score in binary classification assessment. IEEE Access 2021, 9, 78368–78381. [Google Scholar] [CrossRef]
Lasko, K. Gap filling cloudy Sentinel-2 NDVI and NDWI pixels with multi-frequency denoised C-band and L-band Synthetic Aperture Radar (SAR), texture, and shallow learning techniques. Remote Sens. 2022, 14, 4221. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, Y.; Wang, F.; Wang, S.; Qin, G.; Zhu, J. Shadow detection and reconstruction of high-resolution remote sensing images in mountainous and hilly environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1233–1243. [Google Scholar] [CrossRef]
Liu, J.; Heiskanen, J.; Aynekulu, E.; Pellikka, P.K.E. Seasonal Variation of Land Cover Classification Accuracy of Landsat 8 Images in Burkina Faso. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 455–460. [Google Scholar] [CrossRef]
Myers, D.T.; Jones, D.; Oviedo-Vargas, D.; Schmit, J.P.; Ficklin, D.L.; Zhang, X. Seasonal Variation in Land Cover Estimates Reveals Sensitivities and Opportunities for Environmental Models. Hydrol. Earth Syst. Sci. 2024, 28, 5295–5310. [Google Scholar] [CrossRef]
Sezgin, M.; Sankur, B.L. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–168. [Google Scholar] [CrossRef]
Zaman, E.A.K.; Ahmad, A.; Mohamed, A. Adaptive threshold optimisation for online feature selection using dynamic particle swarm optimisation in determining feature relevancy and redundancy. Appl. Soft Comput. 2024, 156, 111477. [Google Scholar] [CrossRef]
Becker, S.J.; Maloney, M.C.; Griffin, A.W.H. A Multi-Biome Study of Tree Cover Detection Using the Forest Cover Index; ERDC/GRL TR-21-4; U.S. Army Engineer Research and Development Center: Vicksburg, MS, USA, 2021. [Google Scholar]
Feliciano-Cruz, L.I.; Becker, S.J.; Lasko, K.D.; Daughtry, C.S.; Russ, A.L. Forest Cover Index for Tree Cover Detection Using Landsat-7 Multispectral Imagery; ERDC/GRL TR-19-1; U.S. Army Engineer Research and Development Center: Vicksburg, MS, USA, 2019. [Google Scholar]

Figure 1. Flowchart of the methodology used to denoise a binary built-up map. First, five binary built-up maps are created. Then they are summed together into a “built-up frequency” raster, which is thresholded to make a new a binary built-up map. Three different thresholding techniques were used: Otsu, adaptive, and voting.

Figure 2. Nine study sites from the Maloney et al. [28] study were chosen based on cloud-free imagery availability.

Figure 3. (a) RGB image over Paraty, Brazil; (b) example of built-up frequency raster over the same area.

Figure 4. (a) ArcPro base image of Paraty, Brazil, and a comparison of (b) single-image and multi-image outputs using (c) adaptive, (d) Otsu, (e) voting thresholds, and the (f) GHSL. Yellow denotes built-up areas.

Figure 5. MCC scores across all study areas for each thresholding technique applied to the built-up frequency raster and the single built-up image. The higher scores are in green while the lower scores are in red.

Figure 6. The results of the Otsu (b) and adaptive (c) multi-image thresholding approaches overlaid in yellow over the base image (a) in ArcPro in Alice Springs, Australia. Yellow denotes built-up areas.

Figure 7. The results of the Otsu (b) and adaptive (c) multi-image thresholding approaches overlaid in yellow over the base image (a) in ArcPro in Lhasa, China. Yellow denotes build-up areas.

Table 1. The bands used in this workflow were subset from Sentinel-2 multispectral imagery. Visible and Near-Infrared (VNIR) bands were provided at 10 m. The SWIR band was provided at 20 m and resampled to 10 m using the nearest neighbor approach for processing.

Name	Band Center and Width (nm)	Band Number	Resolution (m)
Blue	492.4 (66)	02	10
Green	559.8 (36)	03	10
Red	664.6 (31)	04	10
Near-Infrared (NIR)	832.8 (106)	08	10
Shortwave Infrared (SWIR)	1613.7 (91)	11	20

Table 2. The S2 granule IDs and collection dates for each study site.

Location	Granule ID	Date 1	Date 2	Date 3	Date 4	Date 5
Alice Springs, Australia	T53SLU	1 August 2021	6 August 2021	11 August 2021	16 August 2021	27 October 2021
Paraty, Brazil	T23KNQ	7 July 2021	10 July 2021	25 July 2021	19 August 2021	23 October 2021
Lhasa, China	T46RCT	17 January 2020	27 January 2020	16 February 2020	12 October 2020	2 November 2020
Mykonos, Greece	T35SLB	4 June 2021	11 June 2021	21 June 2021	1 July 2021	11 July 2021
Port-au-Prince, Haiti	T18QYF	2 February 2021	12 February 2021	27 February 2021	18 April 2021	5 May 2021
Savannakhet, Laos	T48QVD	9 January 2020	19 January 2020	9 March 2020	28 April 2020	14 November 2020
Riga, Latvia	T35VLD	6 June 2021	18 June 2021	21 June 2021	19 July 2021	26 July 2021
Leiden, The Netherlands	T31UET	5 May 2020	23 May 2020	30 May 2020	17 September 2020	22 September 2020
Ta’izz, Yemen	T38PLA	18 August 2021	23 August 2021	2 September 2021	1 December 2021	16 December 2021

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Becker, S.J.; Wayant, N.M. Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding. Land 2026, 15, 271. https://doi.org/10.3390/land15020271

AMA Style

Becker SJ, Wayant NM. Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding. Land. 2026; 15(2):271. https://doi.org/10.3390/land15020271

Chicago/Turabian Style

Becker, Sarah J., and Nicole M. Wayant. 2026. "Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding" Land 15, no. 2: 271. https://doi.org/10.3390/land15020271

APA Style

Becker, S. J., & Wayant, N. M. (2026). Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding. Land, 15(2), 271. https://doi.org/10.3390/land15020271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Denoising of Binary Built-Up Maps Using Multi-Temporal Image Processing Thresholding

Abstract

1. Introduction

2. Materials and Methods

2.1. Imagery

2.2. Study Sites

2.3. Automated Approach to Identify Built-Up Land

2.4. Ground Truth Validation

2.5. Accuracy Statistics

3. Results

3.1. Overview of Results

3.2. Matthews Correlation Coefficient Accuracy Statistic Results

4. Discussion

Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI