Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps

O’Neill, Francis D.; Wayant, Nicole M.; Becker, Sarah J.

doi:10.3390/land14081630

Open AccessArticle

Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps

by

Francis D. O’Neill

^*

,

Nicole M. Wayant

and

Sarah J. Becker

Geospatial Research Laboratory, Engineer Research & Development Center, U.S. Army Corps of Engineers, 7701 Telegraph Road, Alexandria, VA 22315-3864, USA

^*

Author to whom correspondence should be addressed.

Land 2025, 14(8), 1630; https://doi.org/10.3390/land14081630

Submission received: 9 June 2025 / Revised: 31 July 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Integration of Remote Sensing and GIS for Land Use Change Assessment)

Download

Browse Figures

Versions Notes

Abstract

We compare several methods for predicting future built-up land cover using only a short yearly time series of satellite-derived binary urban maps. Existing methods of built-up expansion forecasting often rely on ancillary datasets such as utility networks, distance to transportation nodes, and population density maps, along with remotely sensed aerial or satellite imagery. Such ancillary datasets are not always available and lack the temporal density of satellite imagery. Moreover, existing work often focuses on quantifying the expected volume of built-up expansion, rather than predicting where exactly that expansion will occur. To address these gaps, we evaluate six methods for the creation of prediction maps showing expected areas of future built-up expansion, using yearly built/not-built maps derived from Sentinel-2 imagery as inputs: Cellular Automata, logistic regression, Support Vector Machines, Random Forests, Convolutional Neural Networks (CNNs), and CNNs with the addition of long short-term memory (ConvLSTM). Of these six, we find CNNs to be the best-performing method, with an average Cohen’s kappa score of 0.73 across nine study sites in the continental United States.

Keywords:

urban; machine learning; forecasting; built up; deep learning

Graphical Abstract

1. Introduction

Built-up land cover—referring to human-modified areas like buildings, roads, parking lots, and other infrastructure—occupies a steadily-increasing fraction of Earth’s surface [1]. Given its implications for population growth, climate change, hydrology, and more, it is crucial not only to map the current extent of built-up land cover but also to predict where it will expand in the future. Rapid urban land cover expansion poses unique challenges that impact the quality of life of inhabitants of urban residents, including urban heat islands [2,3,4], health access and outcomes [5], and traffic [6]. While each of these issues interferes with the daily life of urban residents, residents of some cities experience these at a greater rate than others. The following study examines techniques used to predict future urban land cover in the form of built-up land in urban areas throughout the United States of America that undergo urban development at different rates.

The ready availability and spatiotemporal density of remotely sensed imagery, both from airborne and spaceborne sensors, make it ideal for analysis of land cover change. Earth observation data has long been used in built-up mapping applications, evolving over the decades from manual image analysis to cutting-edge machine-learning techniques [7,8,9,10,11,12,13]. Robust techniques exist for both the mapping of current built-up land cover and the prediction of future built-up trends.

When predicting future built-up land cover expansion, existing methods often integrate remotely sensed aerial or satellite imagery with ancillary datasets such as utility networks, distance to transportation nodes, and population density maps [14,15,16,17,18]. However, such ancillary datasets are often static layers that lack the temporal density of satellite imagery. Moreover, existing work often elides the question of where exactly future built-up expansion will occur, focusing instead on quantifying the expected volume of said expansion [16]. Sussman and Becker [17] recently presented a technique that leverages the binary mapping methods of Maloney et al. [13] to predict the location of future built-up areas without ancillary datasets. The Sussman and Becker method is limited by its pixel-wise approach, presenting an opportunity to explore how the information latent in spatial neighborhoods informs built-up expansion.

Approaches to forecasting built-up cover expansion tend to fall into one of a few categories. The most widely used method in the literature, the Cellular Automata Markov Chain (CAMC) approach, relies on an update rule that uses inputs limited to a local neighborhood around each output pixel [18,19,20,21,22,23]. This neighborhood-based approach suggests that the information needed to predict the future built-up status of a single pixel is localized around that pixel. However, the very “Markov assumption” that forms the basis of this approach—where future states of a cell depend only on the current state and not past states—precludes the exploitation of latent spatiotemporal patterns in time-series satellite imagery [24,25,26]. Despite the popularity of the CAMC method for predicting land use change, studies using this method have, in many cases, either been unable to produce highly accurate results predicting change in the built-up category [19], omitted reporting the spatial accuracy of built-up land prediction and change entirely [20], reported only a combined land cover result without quantifying built-up land [21], or relied on the inclusion of ancillary data to improve results [19,27].

Similar limitations affect other existing methods. Logistic regression, for example, is inherently constrained by its simplicity and struggles to capture complex spatiotemporal patterns [14,15,28,29,30]. In contrast, more sophisticated shallow machine-learning techniques such as Random Forests (RFs) and Support Vector Machines (SVMs) have demonstrated more success, achieving good prediction accuracy in the range of 58–90% accuracy, although these methods tend to rely on ancillary datasets [4,15,29,30,31,32,33].

Recent advancements in computing power have led to an increased use of deep learning methods for mapping and forecasting applications. These neural network architectures are powerful and flexible, but they require both substantial processing power to train and careful fine-tuning to implement effectively [34]. Of particular interest to the problem of predicting built-up expansion are Convolutional Neural Networks (CNNs), known for their ability to learn and leverage spatial patterns in images, and Recurrent Neural Networks (RNNs), which are designed to process time-series datasets [35,36]. Previous work has even combined CNNs and RNNs into a single structure: the Convolutional Long Short-Term Memory (ConvLSTM) layer [37]. ConvLSTM-based algorithms have been successfully applied to built-up forecasting tasks [38,39].

The goal of this project is to assess the feasibility of predicting future built-up cover expansion using only local patterns of historical change. Most existing methodologies draw on diverse datasets, such as elevation, transportation nodes, and utility networks. We hypothesize, however, that future built-up cover expansion at a given location on the Earth’s surface can be predicted solely from past patterns within a local spatial neighborhood around that location. To test this, we implement and evaluate versions of the forecasting methodologies described above. We limit input data to a binary time series (built-up/not built-up), with one image per year over a four-year period. Our classifiers produce specific prediction map outputs, rather than just a point estimate of overall future built-up area change. By comparing the accuracy of these methods, we aim to determine the level of predictive information that can be extracted from a relatively simple time series of binary built-up land cover maps, and whether built-up expansion forecasting can be accomplished without the use of ancillary datasets.

2. Data

We acquired Sentinel-2 satellite data to construct an annual time series dataset of multispectral imagery across nine study sites within the continental United States, one image per year between 2019 and 2023, summarized in Table 1 [40]. Sentinel-2 datasets divide the Earth’s surface into a grid of 100 × 100 km² tiles [41]; the tiles over which we acquired data are shown in Figure 1, identified by their Sentinel-2 tile ID: 10SEG (San Francisco, CA, USA), 12TVR (Bozeman, MT, USA), 13SED (Denver/Colorado Springs, CO, USA), 14RPU (Austin, TX, USA), 15RXP (New Orleans, LA, USA), 16TDL (Chicago, IL, USA), 17RMM (Orlando, FL, USA), 18SUJ (Washington, DC, USA), and 18TWL (New York City, NY, USA). These sites cover a variety of urban growth rates, which ranged between 20,171 (New Orleans, LA, USA) and 206,343 (Austin, TX, USA) new building permits issued for each city during the period of interest [42].

To test our models’ prediction abilities without requiring them to generalize across global variations in built-up land cover expansion patterns, study sites were restricted to the continental United States, spanning the arid, continental, and temperate Köppen climate zones.

One additional dataset was included for the CAMC approach, which is described in more detail in Section 3.2: a 30 m-resolution Digital Elevation Model (DEM) generated by the Shuttle Radar Topography Mission [43]. The CAMC algorithm was tested both with and without this ancillary input, based on the recommendations of Aburas et al. [44].

3. Methods

3.1. Pre-Processing

The Sentinel-2 imagery described in Section 2 was preprocessed into time series of yearly binary built-up/not built-up segmented maps following the method described by Maloney et al., which produces 10 m-resolution output [13]. We input the first four years of each time series (2019–2022) into our forecasting methods to generate a prediction of built-up land in 2023; this prediction map was then compared to the observed 2023 data to evaluate the method’s performance.

Because built-up land cover is relatively sparse, even in tiles that include large urban cores, the input datasets experience severe class imbalance. Built-up cover represents less than 5% of input pixels for most tile-year combinations, shown in Table 2. Such an imbalance can have negative effects when training machine-learning classifiers, as the classifier may learn to achieve high accuracies simply by predicting the dominant class. For an image that is 95% not built-up, a classifier could attain 95% accuracy if it labels the entire scene as not built-up.

In order to compensate for the class imbalance, the training dataset was subset using equalized stratified random sampling [45,46,47]. Equalized stratified random sampling improves results with imbalanced datasets by ensuring proportionate representation from each class, unlike standard random sampling, which can be biased toward the majority class. This technique reduces bias, improves minority class performance, and provides more reliable evaluation metrics, which are crucial for accurate modeling in scenarios like anomaly detection or predictiveness. For our use case, the number of samples drawn from each class was fixed at the total number of samples in the non-dominant class (built-up for all scenes) so that all of the non-dominant samples were retained, along with an equal number of randomly selected samples from the dominant class (not built-up). As a result, the subset dataset contains an equal number of records from both classes, with a minimum of 500,000 total records per scene. This balancing step prevents the model from learning to label all outputs as not built-up.

Some misclassification by the preprocessing algorithm is also evident in Table 2, particularly in tile 15RXP for year 2019. This is the New Orleans scene, and for that year, flooding in marshland areas resulted in a large volume of false positive results, flagging waterlogged soil as built-up. These data were not excluded in order to evaluate the algorithms’ robustness to outlier years within the time series.

Once the preprocessing was complete, we implemented five forecasting methodologies: CAMC, a derived-feature approach powered by SVM and RF, CNN, ConvLSTM, and Linear-Kernel Logistic Regression (LKLR). The performance of each of these methods was compared based on overall accuracy, Cohen’s kappa, and log-loss metrics. Metric selection rationale and definitions are presented in Section 3.7.

These five prediction methods were based on existing approaches, whether or not they had been previously applied to built-up expansion prediction or adapted to leverage binary time series inputs. To enable finer-grained evaluation of a method’s predictions, where possible, the algorithms were made to output per-pixel built-up probabilities (between 0.0 and 1.0) rather than binary predictions. These probabilistic maps could then be either converted to binary maps (using a cutoff at 0.5) or used to assess the model’s confidence in its predictions.

3.2. Cellular Automata Markov Chain

We implemented the CAMC method using the Modules for Land Use Change Simulation (MOLUSCE) plugin for QGIS version 2.8.9. The combination of Cellular Automata with a Markov Chain algorithm is often seen in the existing literature [19,20,21]. A Markov Chain was first generated to calculate year-to-year transition matrices for each scene, containing the probabilities of a given pixel in that scene changing from not built-up to built-up, vice-versa, or remaining unchanged.

Because CAMC is calculated based on inter-year transitions, some years of the time series input were not used for training. The CAMC is trained on one year-to-year transition and then used to predict another transition over the same temporal gap. In order to cover the entire span of the input dataset (2019–2023), we chose to train on the transition between 2019 to 2021 and evaluate on the transition between 2021 to 2023. Imagery from 2020 and 2022 was therefore not included for CAMC.

To predict the year 2023, the transition matrix for each tile was fed into an artificial neural network (ANN) to train the update rule for the Cellular Automata. This update rule was unique for each tile, and it was then applied to the 2021 data to produce a prediction map for 2023. Hyperparameters for this ANN were based on MOLUSCE defaults (Table 3). As mentioned in Section 2, the CAMC pipeline was evaluated both with and without an ancillary DEM layer, as recommended by Aburas et al. [44].

Unlike the other approaches, MOLUSCE only outputs binary maps rather than probabilistic predictions. This limited the possible evaluation metrics, preventing the log-loss calculation for CAMC outputs (see Section 3.7). However, we were still able to calculate overall accuracy and Cohen’s kappa.

3.3. Random Forest and Support Vector Machine

Our second approach integrates several existing methods to produce derived feature variables from the binary time series input (Table 4). These derived features then form a training dataset to fit one of two machine-learning classifiers: an RF or an SVM. The two classifier types are grouped because they use an identical method of constructing derived features; the only difference is the model itself. The Random Forest has 500 trees with a maximum depth of 30, while the SVM uses a linear kernel.

Each derived feature aggregates information from a local spatial and temporal neighborhood around a pixel into a single value. Spatial features include Moore’s 3 × 3-pixel Neighborhood for each of the four input years, based on Karimi et al. [32], as well as a 10 × 10-pixel spatial neighborhood taken from Huang et al. [14]. Temporal and spatiotemporal features combine multiple years of information, such as temporal lags and spatiotemporal weights [30]. In total, sixteen variables were included in the training dataset for this approach, summarized in Table 4.

The number of samples for each class was balanced as described in Section 3.1. The modified training dataset was then used to fit a prediction model through the Scikit-Learn library in Python 3.11, using the last year (2023) of the binary time series as the target variable. In order to avoid training and evaluating on the same data, we used leave-one-out cross-validation over our nine Sentinel-2 tiles: for each tile, a model was trained using the other eight tiles, then evaluated over the ninth “unseen” tile [48]. This process was repeated for both RFs and SVMs, resulting in a total of 18 models being evaluated. Both RF and SVM architectures are able to produce probabilistic predictions.

3.4. Convolutional Neural Network

We constructed a simple 2-layer CNN architecture using Tensorflow in Python 3.11 (Figure 2). The first layer consists of 10 independent filter kernels, each convolved over all four timesteps for patches of shape (100, 100). The second layer is a single 1 × 1 kernel that flattens the temporal dimension, effectively weighting the relative importance of each input timestep. The output of the second layer is passed through a sigmoid activation function, producing probabilistic outputs between 0 and 1.

The CNN was trained to minimize log-loss between its predictions and the observed 2023 binary built-up map. (Details on the log-loss metric can be found in Section 3.7). Similar to the RF/SVM approach, the CNNs were evaluated with leave-one-out cross-validation, trained on 8 of the 9 Sentinel tiles, and evaluated on the held-out tile.

As with our other methods, we encountered the issue of class imbalance. However, the end-to-end training structure of deep learning models makes pixel-wise random sampling unfeasible. Instead, we created a custom modification of the binary cross-entropy loss function, termed “weighted log-loss,” that assigns proportionally more weight to built-up pixels compared to not-built-up pixels. We varied the weighting ratio from 1:1 (equal weight) up to 32:1, increasing by powers of 2 (2:1, 4:1, etc.). Model performance was re-evaluated at each weight ratio.

3.5. Convolutional Long Short-Term Memory

Although CNNs can be applied to time-series problems, as demonstrated in Section 3.4, their architecture is not designed for this purpose. It is crucial to prevent data leakage backward in time, as there are no inherent safeguards against a model learning to use future data points to predict past ones. Our CNN architecture avoids this problem through careful structuring and relative simplicity, but there is another deep learning framework that is optimized for time-series forecasting: Long Short-Term Memory (LSTM).

LSTM architecture is a subset of the RNN neural network family, cyclical structures that repeatedly update a single “state” tensor by adding information from successive timesteps of the input dataset. LSTM improves on earlier generations of RNN by allowing the network to prioritize which patterns it “remembers” and which it “forgets” [36]. Although LSTM-based models are designed for time-series data, they lack a spatial component.

To combine the spatial pattern-recognition power of CNNs with the time-series capabilities of RNNs, we implemented the Convolutional Long Short-Term Memory (ConvLSTM) architecture first developed by Shi et al. [37]. This deep learning structure combines a CNN and RNN into a single network layer, allowing the model to learn spatial and temporal patterns simultaneously.

ConvLSTM is the most computationally intensive of our methods. Our initial efforts to replicate the architecture described by Shi et al. were constrained by processing power limitations. While Shi et al. describe a four-layer encoder/decoder structure, we were ultimately limited to a single ConvLSTM layer, a batch normalization layer, and a final 1 × 1 2D convolutional layer (Figure 3). Despite this simplification, the ConvLSTM model remains the most complex of our five forecasting methods, with 2,013,889 trainable parameters (compared to just 1261 for the CNN model).

Like the CNN, the ConvLSTM was run over image patches of shape (100, 100) and optimized for log-loss, using the 2023 binary built-up map as the target data. We performed leave-one-out cross-validation to evaluate the model over all 9 study site tiles, and we also tested the same custom loss weighting described in Section 3.4 to compensate for class imbalance.

3.6. Linear Kernel Logistic Regression

One of the longstanding challenges in machine learning is the bias-variance tradeoff: more complex models are better able to pick up on patterns in the underlying data, but they are also more prone to overfitting and memorizing their inputs [36]. With ConvLSTM at one end of the complexity spectrum, we decided to add a very simple model in order to determine how much predictive power that complexity contributed to model performance.

This “maximally simple” model is structured similarly to the RF/SVM framework (Section 3.3) but reduced to the least possible complexity. Instead of many derived features, there is a single additive kernel that sums the pixel values in a spatiotemporal neighborhood around the input, weighted linearly by proximity in space and constant across years. The RF and SVM classifiers were also replaced with a logistic regression, one of the simplest classification algorithms. The smallest kernel we tested was 1 × 1 pixel (that is, just the input with no spatial neighborhood information at all), along with neighborhoods from 5 × 5 to 55 × 55 pixels (100 m to 1100 m on a side) at 10-pixel size intervals. For each of these 7 kernel sizes, we performed a full leave-one-out cross-validation over all data tiles—a total of 63 models evaluated.

3.7. Accuracy Metrics

To evaluate the relative performance of each model, we compared their outputs across 3 metrics: overall accuracy, Cohen’s kappa, and log-loss.

3.7.1. Overall Accuracy

Overall Accuracy measures the ratio of correctly labeled samples to the overall number of samples, ranging from 0.0 (no correctly-labeled samples) to 1.0 (all samples correctly labeled). It is defined as

OA = (True positive + True negative)/Total samples

(1)

Overall accuracy is simple and intuitive but it can be less representative when a dataset is imbalanced and one class is more prevalent than another, in which case accuracy may be skewed with a bias toward the majority class [49,50]. Overall accuracy can also overestimate performance in general, as part of the accuracy could be due to chance agreement [51].

3.7.2. Cohen’s Kappa

Cohen’s kappa is a robust measure of categorical agreement, taking into account the possibility of chance agreement, which makes it a more conservative measure of accuracy [52]. The kappa statistic is defined as

kappa = (p_o − p_e)/(1 − p_e)

(2)

where p_o is the relative observed agreement and p_e is the hypothetical probability of chance agreement [53]. In the binary case, this is equivalent to

kappa = \frac{2 \times ((T P \times T N) - (F P \times F N))}{(T P + F P) \times (F P + T N) + (T P + F N) \times (F N + T N)}

(3)

where TP, TN, FP, and FN represent the true positives, true negatives, false positives, and false negatives, respectively [51]. A kappa of 1 represents perfect agreement, 0 means that any agreement is what would be expected by chance, and negative values between −1 and 0 indicate agreement that is worse than chance; however, less-than-zero agreement values have been shown to be skewed toward 0, inflating kappa for models that perform poorly [51].

3.7.3. Log-Loss

Log-loss, also called binary cross-entropy, measures how well a probabilistic prediction agrees with binary (0 or 1) ground truth. It is defined for N predictions as

loss = \sum_{i = 0}^{N} ({t r u t h}_{i} \times l o g ({p r e d i c t i o n}_{i})) + ((1 - {t r u t h}_{i}) \times - \log ({p r e d i c t i o n}_{i}))

(4)

In addition to rewarding correct predictions and penalizing incorrect ones (as accuracy and kappa also do), log-loss accounts for the confidence of each prediction, ranging from a minimum of 0 loss for a perfectly accurate prediction to a hypothetical infinite loss for total confidence in an incorrect prediction. Thus, lower log-loss corresponds to more accurate output.

3.7.4. Metric Summary

Given the relative merits of each accuracy assessment metric, we compared model outputs across all three where feasible. However, the CAMC algorithm was an exception, as log-loss could not be calculated. This limitation is because the QGIS MOLUSCE plugin produces categorical—rather than probabilistic—predictions. Within-model evaluations, such as the effect of kernel size on the LKLR model, exclusively relied on log-loss due to its finer granularity in assessing individual pixel-level predictions.

As with model training, evaluating performance can introduce bias from class distribution in the test data. For example, a classifier that disproportionately outputs the dominant class will appear to overperform. To mitigate this risk, validation datasets were balanced through the same equalized random sampling method described in Section 3.1 before any accuracy metrics were calculated. All reported metrics were calculated from balanced validation data, containing an equal number of built-up and not-built-up samples.

4. Results

4.1. Spatial Kernel Size

In the LKLR model, larger spatial kernels consistently perform worse (Figure 4). Across most scenes, the lowest log-loss is achieved by the 1 × 1 kernel, eliminating all spatial neighborhood information. There are two exceptions to this extreme: tiles 16TDL (Chicago, IL, USA) and 18SUJ (Washington, DC, USA) exhibit minimum loss at the second-smallest kernel, the 3 × 3 spatial neighborhood.

The 3 × 3 kernel was also the second-best performing model across the other 7 tiles. Because of this, combined with our interest in spatial neighborhood information, we use the 3 × 3 kernel model as the basis for comparison to other approaches.

Only one group of logistic regression models performs worse than minimum-information (predicting 0.5 for all pixels): kernels 35 pixels and larger over tile 15RXP (New Orleans, LA, USA). The New Orleans scene was difficult in general, as the binary built-up preprocessing algorithm of Maloney et al. [13] struggled to differentiate built-up land from wetland, especially in 2019.

4.2. Loss Function Weighting

The two models where custom loss functions were tested (CNNs and ConvLSTM) reacted differently to the weighting of built-up samples. CNN loss decreased monotonically with increased built-up weighting, while loss weighting had no consistent effect on the performance of ConvLSTM models (Figure 5).

Improvements to CNN performance show diminishing returns to weight ratios larger than 8:1, so we used the model with the largest tested ratio (32:1) for comparison to our other approaches, under the assumption that any gains from further increasing the built-up weight would be marginal. For the sake of consistency, the 32:1 ConvLSTM model was also chosen for inter-method comparison, despite the absence of any similar performance pattern in the ConvLSTM data.

4.3. Visual Comparison

Examination of the output prediction masks reveals major differences in value range for the probabilistic maps (Figure 6). The minimum probability assigned by the LKLR method in the Austin TX scene (tile 14RPU), for example, is 6.7%, while ConvLSTM output for the same scene goes as low as 0.04%. Most of the models approach 100% confidence in at least one pixel, although ConvLSTM has a maximum of 96.3% for the 14RPU scene.

For the reasons described in Section 3.7, the CAMC outputs are binary; all pixels are either 0 or 1. Surprisingly, however, the LKLR output is also highly bimodal, with all values either below 20% or above 85%. The other four methods exhibit more intermediate values.

4.4. Metric Comparison

SVMs and CNNs were the two standout approaches. The SVM method had the highest overall accuracy for 8 out of 9 data tiles, while CNNs performed best in both log-loss and kappa (Table 5). The CNN approach also had the lowest standard deviation for performance between different tiles, across all three metrics.

Notably, CAMC—the most popular method in the literature—performed poorly both on overall accuracy and kappa, failing to achieve the best result for any scene. The CAMC model that was given access to an ancillary DEM did not perform any better than the CAMC trained only on the binary built-up time series. In fact, the no-DEM version achieved higher overall accuracy on 3 out of the 9 tiles and higher kappa in 6 out of 9.

Other methods’ performance varied across tiles and between metrics. The RF approach achieved the best log-loss for 2 tiles and second best for 6 others. Its kappa scores, however, were the lowest of any method. SVMs, despite their high scores in overall accuracy, were unimpressive in terms of kappa and log-loss. This could be due to kappa and log-loss taking into account the probability of chance agreement and calibration of confidence, respectively. LKLR produced mixed scores—achieving the highest kappa for tile 12TVR—while the ConvLSTM outputs were consistently poor across all three metrics.

5. Discussion

The comparative success of our medium-complexity methods suggests that time series maps of built-up cover do contain some information on future built-up expansion but also that this information is limited.

At the low end of the complexity spectrum, the LKLR method struggles to extract any patterns from the neighborhood around a target pixel, as shown by the performance gain when reducing the neighborhood size down to zero, although it must be noted that the pre-processing algorithm uses a texture layer with a 3 × 3 kernel, which may reintroduce spatial neighborhood information [13]. Which specific characteristics of the Chicago, IL, and Washington, DC, sites make them exceptions to the general rule that smaller LKLR kernels perform better are not immediately apparent. The bimodal quality of the LKLR output maps suggests that the learned logistic curve is quite compressed, approximating a sharp cutoff between pixels with fewer than N positive labels in the time series for some N and those with more. This behavior could be caused by the fact that in the degenerate zero-kernel case, the input values are discrete counts (how many times the given pixel was flagged as built-up by the pre-processing algorithm). The discrete nature of the input leads to sharp cutoffs in the output.

At the high-complexity end of the spectrum, the ConvLSTM model suffers from its own structural intricacy, and it fails to establish a stable training regime on the relatively shallow (4 years) and coarse (10 m) input dataset. While it is possible that more fine-tuning of the hyperparameters for this model could have improved its performance, limited computational resources precluded a rigorous search of the hyperparameter space.

The fact that CNNs dominate when measured in log-loss or kappa but not overall accuracy raises the question of why SVMs outperform CNNs in accuracy alone. One possibility lies in the nature of the kappa statistic: SVMs’ high overall accuracy scores might represent chance agreement, for which the kappa statistic compensates. This issue becomes particularly prominent in the case of extremely unbalanced classes, as is the case with built-up mapping. The reversal in performance between overall accuracy and kappa is mostly driven by a decrease in the SVM scores rather than an increase in CNN scores, which suggests that the SVMs’ overall accuracy results were inflated.

The discrepancy between accuracy and log-loss ranking is easier to explain. Log-loss accounts for specific probabilities, rewarding models calibrated beyond the binary threshold of 0.5. Better log-loss scores imply that CNNs (along with the RF models) are the best-calibrated in terms of the specific probabilities they output. The high performance of CNNs provides evidence for latent predictive information in the spatial neighborhood around each pixel, since the CNN architecture’s convolutional filters are particularly suited to picking up such spatial patterns.

Notably, CNN metrics also exhibit the lowest standard deviation across tiles. This stability suggests a robustness to inter-scene variability and challenging years such as the 2019 flooding in tile 15RXP. While other methods struggle with tile 15RXP in particular—the ConvLSTM score for that tile drops to zero—CNN performance is stable and achieves the highest accuracy of any method. Although our study sites are deliberately limited to the continental United States, this robustness hints that the CNN approach might generalize well to larger geographic regions. Performance is strong across humid, mountainous, and temperate sites but does degrade on the arid Texas site, possibly due to the spectral similarity between bare ground and impervious surfaces. Future research should more thoroughly investigate how these methods perform under different climatic conditions, along with the effects of development levels, architectural traditions, and legal regimes that shape construction materials and techniques.

The less favorable results of the CAMC are unsurprising for two reasons. First, because we removed the ancillary datasets on which CAMC typically relies, such as transportation utilities and population density; however, the lack of improvement from including a DEM challenges this explanation [32,54]. It is worth noting that the majority of built-up areas in this study, even if surrounded by mountain ranges or hills, were built on flat areas, which could explain why the DEM did not have more of an influence on performance. Future studies could test in built-up areas with greater variation in elevation. Second, because CAMC is more effective at forecasting general trends and volume of land cover change rather than pixel-level predictions, our chosen accuracy assessment method and metrics do not leverage CAMC’s strengths [16].

Performance results from our RF and SVM models are comparable to those seen in other studies. Karimi et al. [32] achieve 85% accuracy for urban expansion prediction with an SVM model (compared to 91% for our approach), but their model includes many proximity and neighborhood variables related to land cover and land use, as well as access to roads and certain building types. Other studies using SVMs, RFs, and logistic regression to predict urban expansion have achieved accuracy rates between 85% and 92%, again comparable to our results [4,14,15,29,30,31]. Once again, however, additional spatial and temporal datasets were used to inform the prediction, and assessment was limited to a single city, rather than sites across the United States.

The same is true for CNNs and ConvLSTMs. Kim et al. [30,39] were able to predict urban expansion with an accuracy ranging from 50 to 90%, while Boulila et al. produced a very low mean square error [38]. Our results, at 87% average CNN accuracy, sit at the high end of the Kim et al. accuracy range.

Studies leveraging CAMC, which include either proximity and neighborhood variables [3,55,56,57] or multiple bands from multispectral sensors [58,59,60,61], obtain accuracies in the 80% range, compared to the 76% average for our CAMC (without the DEM input). For those studies where the kappa coefficient was reported, results were also comparable to our own.

It is important to note several limitations to this study. First, there is the limited observed urban expansion across most sites within the study period. As noted, this limited growth causes difficulties in training and assessing built-up expansion models. Extending the time series would likely capture more significant growth patterns and provide a more comprehensive assessment. Second, although this methodology demonstrates the ability to predict built-up land cover expansion without ancillary datasets such as elevation, transportation nodes, or utility networks, it omits a comparison to built-up growth identification that does use such ancillary datasets. This omission leaves the question of how performance would improve with the addition of outside datasets unanswered. Third, the geographic limitation of the study areas to the continental United States precludes confident statements regarding the models’ ability to generalize globally. Lastly, other forms of harmonizing imbalanced data could have been explored; for instance, data-level, algorithm-specific, and ensemble methods, balancing techniques such as the Synthetic Minority Oversampling Technique (SMOTE) [62,63].

Future research should expand beyond the continental United States to regions experiencing more rapid urbanization, such as Southeast Asia or sub-Saharan Africa, to assess model performance in contexts with different developmental trajectories. These studies should also incorporate comparative analyses with alternative modeling approaches that explicitly leverage social (e.g., population density, migration patterns) and terrain attributes (e.g., slope, elevation) to evaluate their impact on predictive accuracy. Crucially, future work must account for variations in human development indices and climate zones, recognizing that model performance may vary significantly based on these factors [13]. Specifically, research should investigate how models incorporating social and terrain attributes perform relative to those relying solely on binary built-up datasets. This refined approach will yield a more comprehensive understanding of the conditions under which incorporating additional data sources enhances built-up area prediction, ultimately informing more robust and globally applicable urban modeling strategies.

6. Conclusions

All methods outperform minimum information on log-loss and substantially above chance agreement on kappa (with the exception of ConvLSTM for tile 15RXP), which supports our hypothesis that relatively small spatiotemporal neighborhoods contain latent information on future built-up land cover expansion patterns. Even with just four yearly observations of binary data as input, multiple methods achieve accuracies in the high 80 and low 90 percent range on a class-balanced test set, comparable to previous studies that rely on substantial ancillary inputs. Of the six methods tested, CNNs perform best measured by log-loss and Cohen’s kappa, while our SVM model with derived spatial-temporal variables performs best on overall accuracy.

Such data-sparse prediction workflows are valuable for application in regions where ancillary datasets are difficult to obtain, unreliable, or outdated. Whether for applications in public health, urban planning, or population dynamics, forecasting the growth of built-up areas becomes ever more important. As built-up land cover continues to expand globally, accurate predictions of that expansion are valuable everywhere, not just in data-rich areas. The abundance of satellite imagery, along with machine-learning techniques, makes such predictions possible.

Author Contributions

Conceptualization, F.D.O.; methodology, F.D.O., N.M.W. and S.J.B.; software, F.D.O. and N.M.W.; writing—original draft, F.D.O., N.M.W. and S.J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Army Engineer Research and Development Center, supported under PE 0602146A/AT9, Project ‘Tactical Geospatial Information Capabilities’, Task ‘Geospatial Analytics and Prediction’.

Data Availability Statement

All data used in this study were accessed from publicly available repositories, as referenced in the text.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gao, J.; O’Neill, B. Different spatiotemporal patterns in global human population and built-up land. Earths Future 2021, 9, e2020EF001920. [Google Scholar] [CrossRef]
Zhang, K.; Wang, R.; Shen, C.; Da, L. Temporal and spatial characteristics of the urban heat island during rapid urbanization in Shanghai, China. Environ. Monit. Assess. 2010, 169, 101–112. [Google Scholar] [CrossRef]
Ullah, S.; Ahmad, K.; Sajjad, R.U.; Abbasi, A.M.; Nazeer, A.; Tahir, A.A. Analysis and simulation of land cover changes and their impacts on land surface temperature in a lower Himalayan region. J. Environ. Manag. 2019, 245, 348–357. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, C.; Kafy, A.; Tan, S. Simulating the relationship between land use/land cover change and urban thermal environment using machine learning algorithms in Wuhan City, China. Land 2022, 11, 14. [Google Scholar] [CrossRef]
Rahaman, M.A.; Kalam, A.; Al-Mamun, M. Unplanned urbanization and health risks of Dhaka City in Bangladesh: Uncovering the associations between urban environment and public health. Front. Public Health 2023, 11, 1269362. [Google Scholar] [CrossRef]
Bao, Z.; Ng, S.T.; Yu, G.; Zhang, X.; Ou, Y. The effect of the built environment on spatial-temporal pattern of traffic congestion in a satellite city in emerging economies. Dev. Built Environ. 2023, 14, 100173. [Google Scholar] [CrossRef]
Slonecker, E.T.; Jennings, D.B.; Garofalo, D. Remote Sensing of Impervious Surfaces: A Review. Remote Sens. Rev. 2001, 20, 227–255. [Google Scholar] [CrossRef]
Schneider, A.; Friedl, M.A.; Potere, D. A new map of global urban extent from MOSIA satellite data. Environ. Res. Lett. 2009, 4, 044003. [Google Scholar] [CrossRef]
Brown de Colstoun, E.C.; Huang, C.; Wang, P.; Tilton, J.C.; Tan, B.; Phillips, J.; Niemczura, S.; Ling, P.Y.; Wolfe, R.E. Global Man-Made Impervious Surface (GMIS) Dataset from Landsat (Dataset); NASA Socioeconomic Data and Applications Center (SEDAC): Palisades, NY, USA, 2017. [Google Scholar] [CrossRef]
Guneralp, B.; Reba, M.; Hales, B.U.; Wentz, E.A.; Seto, K.C. Trends in urban land expansion, density, and land transitions from 1970 to 2010: A global synthesis. Environ. Res. Lett. 2020, 15, 044015. [Google Scholar] [CrossRef]
Huang, X.; Yang, J.; Wang, W.; Liu, Z. Mapping 10m global impervious surface area (GISA-10m) using multi-source geospatial data. Earth Syst. Sci. Data 2022, 14, 3649–3672. [Google Scholar] [CrossRef]
Lasko, K.; O’Neill, F.D. Automated method for artificial impervious surface area mapping in temperate, tropical, and arid environments using hyperlocal training data with sentinel-2 imagery. Appl. Earth Obs. Remote Sens. 2023, 99, 1–23. [Google Scholar] [CrossRef]
Maloney, M.C.; Becker, S.J.; Griffin, A.W.; Lyon, S.L.; Lasko, K. Automated Built-Up Infrastructure Land Cover Extraction Using Index Ensembles with Machine Learning, Automated Training Data, and Red Band Texture Layers. Remote Sens. 2024, 16, 868. [Google Scholar] [CrossRef]
Huang, B.; Xie, C.; Tay, R.; Wu, B. Land-use-change modeling using unbalanced support-vector machines. Environ. Plan. B Plan. Des. 2009, 3, 398–416. [Google Scholar] [CrossRef]
Huang, B.; Xie, C.; Tay, R. Support vector machines for urban growth modeling. Geoinformatica 2010, 14, 83–99. [Google Scholar] [CrossRef]
Santé, I.; García, A.M.; Miranda, D.; Crecente, R. Cellular automata models for the simulation of real-world urban processes: A review and analysis. Landsc. Urban Plan. 2010, 96, 108–122. [Google Scholar] [CrossRef]
Sussman, H.S.; Becker, S.B. Automated global method to detect rapid and future urban areas. Land 2025, 14, 1061. [Google Scholar] [CrossRef]
Kari, J. Theory of cellular automata: A survey. Theor. Comput. Sci. 2005, 334, 3–33. [Google Scholar] [CrossRef]
Yang, X.; Zheng, X.Q.; Lv, L.N. A spatiotemporal model of land use change based on ant colony optimization, Markov chain and cellular automata. Ecol. Model. 2012, 233, 11–19. [Google Scholar] [CrossRef]
Al-sharif, A.A.; Pradhan, B. Monitoring and predicting land use change in Tripoli Metropolitan City using an integrated Markov chain and cellular automata models in GIS. Arab. J. Geosci. 2014, 7, 4291–4301. [Google Scholar] [CrossRef]
Rimal, B.; Zhang, L.; Keshtkar, H.; Haack, B.N.; Rijal, S.; Zhang, P. Land use/land cover dynamics and modeling of urban land expansion by the integration of cellular automata and markov chain. ISPRS Int. J. Geo-Inf. 2018, 7, 154. [Google Scholar] [CrossRef]
Kushwaha, K.; Singh, M.M.; Singh, S.K.; Patel, A. Urban growth modeling using earth observation datasets, Cellular Automata-Markov Chain model and urban metrics to measure urban footprints. Remote Sens. Appl. Soc. Environ. 2021, 22, 100479. [Google Scholar] [CrossRef]
Okwuashi, O.; Ndehedehe, C.E. Integrating machine learning with Markov chain and cellular automata models for modelling urban land use change. Remote Sens. Appl. Soc. Environ. 2021, 21, 100461. [Google Scholar] [CrossRef]
Von Neumann, J.; Burks, A.W. Theory of self-reproducing automata. IEEE Trans. Neural Netw. 1966, 5, 3–14. [Google Scholar]
Wolfram, S. Statistical mechanics of cellular automata. Rev. Mod. Phys. 1983, 55, 601. [Google Scholar] [CrossRef]
Hougaard, P. Multi-state models: A review. Lifetime Data Anal. 1999, 5, 239–264. [Google Scholar] [CrossRef] [PubMed]
Xu, T.; Zhou, D.; Li, Y. Integrating ANNs and cellular automata–Markov chain to simulate urban expansion with annual land use data. Land 2022, 11, 1074. [Google Scholar] [CrossRef]
Pal, S.; Ghosh, S.K. Rule based end-to-end learning framework for urban growth prediction. arXiv 2017, arXiv:1711.10801. [Google Scholar]
Gomez, J.A.; Patino, J.; Duque, J.C.; Passos, S. Spatiotemporal modeling of urban growth using machine learning. Remote Sens. 2020, 12, 109. [Google Scholar] [CrossRef]
Kim, Y.; Safikhani, A.; Tepe, E. Machine learning application to spatio-temporal modeling of urban growth. Comput. Environ. Urban Syst. 2022, 94, 101801. [Google Scholar] [CrossRef]
Pozdnoukhov, A.; Matasci, G.; Kanevski, M.; Purves, R.S. Spatio-temporal avalanche forecasting with Support Vector Machines. Nat. Hazards Earth Syst. Sci. 2011, 11, 367–382. [Google Scholar] [CrossRef]
Karimi, F.; Sultana, S.; Babakan, A.S.; Suthahatan, S. An enhanced support vector machine model for urban expansion and prediction. Comput. Environ. Urban Syst. 2019, 75, 61–75. [Google Scholar] [CrossRef]
Mahboob, M.A.; Celik, T.; Genc, B. Predictive modelling of mineral prospectivity using satellite remote sensing and machine learning algorithms. Remote Sens. Appl. Soc. Environ. 2024, 36, 101316. [Google Scholar] [CrossRef]
Bakhshayesh, P.R.; Ejtehadi, M.; Taheri, A.; Behzadipour, S. The Effects of Data Augmentation Methods on the Performance of Human Activity Recognition. In Proceedings of the 2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Behshahr, Iran, 28–29 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Raschka, S.; Mirjalili, V. Python Machine Learning, 3rd ed.; Packt Publishing Limited: Birmingham, UK, 2015. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, Canada, 7–12 December 2015; Curran Associates, Inc.: Red Hook, NY, USA, 2015. ISBN 9781510825024. [Google Scholar]
Boulila, W.; Ghandorh, H.; Khan, M.A.; Ahmed, F.; Ahmad, J. A novel CNN-LSTM-based approach to predict urban expansion. Ecol. Inform. 2021, 64, 101325. [Google Scholar] [CrossRef]
Kim, J.; Park, J.; Lee, C.; Lee, S. Predicting of urban expansion using convolutional LSTM network model: The case of Seoul metropolitan area, Korea. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, X-4/W3-2022, 113–118. [Google Scholar] [CrossRef]
Copernicus Sentinel-2 Level-2A Imagery. SentinelHub. Available online: https://dataspace.copernicus.eu/analyse/apis/sentinel-hub (accessed on 15 January 2024).
European Space Agency. Sentinel-2 User Handbook, Issue 1 Rev. 2. 2015. Available online: https://sentinels.copernicus.eu/documents/247904/685211/Sentinel-2_User_Handbook (accessed on 9 July 2024).
New Privately Owned Housing Units Authorized. Building Permits Survey (BPS) Permits by CBSA; U.S. Census Bureau: Washington, DC, USA, 2025; Available online: https://www.census.gov/construction/bps/msamonthly.html (accessed on 20 July 2025).
U.S. Geological Survey. Shuttle Radar Topography Mission (SRTM) 1 Arc-Second Global Dataset; Earth Resources Observation and Science (EROS) Center; U.S. Geological Survey: Reston, VA, USA, 2017. [Google Scholar] [CrossRef]
Aburas, M.M.; Ho, Y.M.; Ramli, M.F.; Ash’aari, Z.H. The simulation and prediction of spatio-temporal urban growth trends using cellular automata models: A review. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 380–389. [Google Scholar] [CrossRef]
Madow, W.G.; Madow, L.H. On the Theory of Systematic Sampling. Ann. Math. Stat. 1944, 15, 1–24. Available online: https://www.jstor.org/stable/2236209 (accessed on 19 December 2024). [CrossRef]
Sadaiyandi, J.; Arumugam, P.; Sangaiah, A.K.; Zhang, C. Stratified sampling-based deep learning approach to increase prediction accuracy of unbalanced dataset. Electronics 2023, 12, 4423. [Google Scholar] [CrossRef]
Zhao, T.; Zhang, X.; Gao, Y.; Mi, J.; Liu, W.; Wang, J.; Jian, M.; Liu, L. Assessing the accuracy and consistency of six fine-resolution global land cover products using a novel stratified random sampling validation dataset. Remote Sens. 2023, 15, 2285. [Google Scholar] [CrossRef]
Schneider, J. Cross Validation. CMU School of Computer Science, 7 February 1997. Available online: https://www.cs.cmu.edu/~schneide/tut5/node42.html (accessed on 7 March 2025).
Turk, G. GT index: A measure of the success of prediction. Remote Sens. Environ. 1979, 8, 65–75. [Google Scholar] [CrossRef]
Khan, A.A.; Chaudhari, O.; Chandra, R. A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst. Appl. 2023, 244, 122778. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Matthews correlation coefficient (MCC) is more informative than Cohen’s Kappa and Brier score in binary classification assessment. IEEE Access 2021, 9, 78368–78381. [Google Scholar] [CrossRef]
Vieira, S.M.; Kaymak, U.; Sousa, J.M. Cohen’s kappa coefficient as a performance measure for feature selection. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; IEEE: Piscataway, NY, USA, 2010; pp. 1–8. [Google Scholar] [CrossRef]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Al-Dousari, A.E.; Mishra, A.; Singh, S. Land use land cover change detection and urban sprawl prediction for Kuwait metropolitan region, using multi-layer perceptron neural networks (MLPNN). Egypt. J. Remote Sens. Space Sci. 2023, 26, 381–392. [Google Scholar] [CrossRef]
Rana, M.S.; Sarkar, S. Prediction of urban expansion by using land cover change detection approach. Heliyon 2018, 7, e08437. [Google Scholar] [CrossRef]
Zhang, J.; Wu, D.; Zhu, A.X.; Zhu, Y. Modelling urban expansion with cellular automata supported by urban growth intensity over time. Ann. GIS 2023, 29, 337–353. [Google Scholar] [CrossRef]
Shahfahad; Naikoo, M.W.; Das, T.; Talukdar, S.; Asgher, M.S.; Rahman, A. Prediction of land use changes at a metropolitan city using integrated cellular automata: Past and future. Geol. Ecol. Landsc. 2024, 8, 287–305. [Google Scholar] [CrossRef]
Asif, M.; Kazmi, J.H.; Tariq, A.; Zhao, N.; Guluzade, R.; Soufan, W. Modelling of land use and land cover changes and prediction using CA-Markov and Random Forests. Geocarto Int. 2022, 28, 2210532. [Google Scholar] [CrossRef]
Lukas, P.; Melesse, A.M.; Kenea, T.T. Prediction of future land use/land cover changes using a coupled CA-ANN model in the upper Omo-Gibe river basin, Ethiopia. Remote Sens. 2023, 15, 1148. [Google Scholar] [CrossRef]
Mansour, S.; Ghoneium, E.; El-Kersh, A.; Said, S.; Abedlnaby, S. Spatiotemporal monitoring of urban sprawl in a coastal city using GIS-based Markov chain and artificial neural network (ANN). Remote Sens. 2023, 15, 601. [Google Scholar] [CrossRef]
Nath, N.; Sahariah, D.; Meraj, G.; Debnath, J.; Kumar, P.; Lahon, D.; Chand, K.; Farooq, M.; Chandan, P.; Singh, S.K.; et al. Land use and land cover change monitoring and prediction of a UNESCO world heritage site: Kaziranga eco-sensitive zone using cellular automata-markov model. Land 2023, 12, 151. [Google Scholar] [CrossRef]
Altalhan, M.; Algarni, A.; Alouane, M.A. Imbalanced data problem in machine learning: A review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
Wang, H.; Meng, Y.; Xu, H.; Wang, H.; Guan, X.; Liu, Y.; Liu, M.; Wu, Z. Prediction of flood risk levels of urban flooded points though using machine learning with unbalanced data. J. Hydrol. 2024, 630, 130742. [Google Scholar] [CrossRef]

Figure 1. Location of data tile footprints, labeled with Sentinel-2 tile ID. Basemap imagery ESRI World Topographic.

Figure 2. Model architecture of the Convolutional Neural Network.

Figure 3. Architecture of the ConvLSTM model.

Figure 4. Binary cross-entropy (log-loss) of logistic regression model vs. the size of linear kernel in the two spatial dimensions of the input time series. The dashed line shows the hypothetical loss for a “minimum-information” model that assigns a built-up probability of 0.5 to all output pixels.

Figure 5. Effect of increasing the relative weight of built-up pixels in the custom loss function on the CNN model and ConvLSTM model.

Figure 6. Visualization of binary built-up time series and output predictions by various methods for a subset of tile 14RPU (Austin, TX, USA). RGB Sentinel-2 imagery is included for reference.

Table 1. Study site locations, Sentinel-2 Tile IDs, and image acquisition dates for each year in the time series.

Study Site	Tile ID	Image Dates
San Francisco, CA	10SEG	9 June 2019	18 June 2020	18 June 2021	13 June 2022	18 June 2023
Bozeman, MT	12TVR	3 September 2019	7 October 2020	2 October 2021	17 October 2022	17 September 2023
Denver, CO	13SED	10 November 2019	30 October 2020	4 November 2021	20 October 2022	4 November 2023
Austin, TX	14RPU	13 August 2019	29 August 2020	25 July 2021	4 August 2022	19 August 2023
New Orleans, LA	15RXP	21 January 2019	15 April 2020	6 March 2021	31 March 2022	30 April 2023
Chicago, IL	16TDL	8 October 2019	7 October 2020	17 October 2021	1 November 2022	2 October 2023
Orlando, FL	17RMM	29 May 2019	8 May 2020	8 May 2021	8 April 2022	8 May 2023
New York, NY	18TWL	2 November 2019	6 November 2020	6 November 2021	26 November 2022	21 December 2023
Washington, DC	18SUJ	28 October 2019	11 December 2020	26 December 2021	1 December 2022	16 November 2023

Table 2. Percent built-up cover for each tile-year pair, calculated after binary preprocessing.

Tile ID	2019	2020	2021	2022	2023
10SEG	2.2	2.9	3.5	2.5	2.7
12TVR	3.2	1.1	6.5	1.2	1.3
13SED	3.3	2.9	3.0	3.0	2.7
14RPU	4.5	5.5	4.7	7.6	7.6
15RXP	27.7	1.6	1.7	1.4	1.4
16TDL	1.2	1.6	1.4	3.3	1.4
17RMM	2.7	2.4	2.6	3.3	2.8
18TWL	2.2	1.9	1.9	1.9	2.2
18SUJ	4.4	2.1	4.4	3.3	5.5

Table 3. Hyperparameters for the ANN (multilayer perceptron) used to train the Cellular Automata update rule.

Hyperparameter	Value
Neighborhood	1 pixel
Learning rate	0.155
Maximum iterations	100
Hidden layers	10
Momentum	0.05

Table 4. Derived variables used to train Random Forest and Support Vector Machine classifiers.

Variable
Lag 1 Year
Lag 2 Years
Lag 3 Years
Lag 4 Years
Moore’s 3 × 3 Neighborhood, Year 1
Moore’s 3 × 3 Neighborhood, Year 2
Moore’s 3 × 3 Neighborhood, Year 3
Moore’s 3 × 3 Neighborhood, Year 4
10-Pixel Spatial Neighborhood, Year 1
10-Pixel Spatial Neighborhood, Year 2
10-Pixel Spatial Neighborhood, Year 3
10-Pixel Spatial Neighborhood, Year 4
Spatial Temporal Weight, Lag 1 Year
Spatial Temporal Weight, Lag 2 Years
Spatial Temporal Weight, Lag 3 Years
Spatial Temporal Weight, Lag 4 Years

Table 5. Performance of prediction models on (a) overall accuracy, (b) Cohen’s kappa, and (c) log-loss. Best scene-wise results for each metric are bolded. The number of scenes for which each method performed best is summarized at the bottom along with the standard deviation of model performance across all scenes. The greatest number of scene bests is bolded for each metric. Higher accuracy and kappa are better; lower log-loss is better.

Tile ID	SVM	RF	CNN	ConvLSTM	LKLR	CAMC (DEM)	CAMC (no DEM)
(a)
10SEG	0.96	0.90	0.93	0.82	0.90	0.85	0.85
12TVR	0.88	0.75	0.82	0.68	0.83	0.70	0.71
13SED	0.94	0.87	0.912	0.81	0.88	0.69	0.68
14RPU	0.86	0.68	0.81	0.62	0.56	0.70	0.69
15RXP	0.73	0.62	0.86	0.50	0.76	0.78	0.78
16TDL	0.94	0.86	0.89	0.74	0.84	0.74	0.71
17RMM	0.96	0.89	0.87	0.79	0.82	0.81	0.80
18SUJ	0.97	0.93	0.90	0.77	0.87	0.79	0.79
18TWL	0.94	0.90	0.82	0.62	0.81	0.83	0.84
Scene Bests	8	0	1	0	0	0	0
STDEV	0.08	0.11	0.04	0.11	0.1	0.06	0.06
(b)
10SEG	0.50	0.30	0.85	0.63	0.79	0.69	0.69
12TVR	0.45	0.30	0.64	0.36	0.65	0.43	0.45
13SED	0.36	0.25	0.82	0.61	0.76	0.34	0.36
14RPU	0.35	0.19	0.62	0.23	0.11	0.44	0.43
15RXP	0.71	0.04	0.72	0.00	0.51	0.49	0.51
16TDL	0.28	0.13	0.77	0.47	0.69	0.44	0.42
17RMM	0.4	0.28	0.74	0.57	0.64	0.64	0.62
18SUJ	0.5	0.33	0.8	0.55	0.73	0.59	0.61
18TWL	0.49	0.37	0.63	0.24	0.62	0.6	0.63
Scene Bests	0	0	8	0	1	0	0
STDEV	0.12	0.1	0.09	0.22	0.21	0.12	0.11
(c)
10SEG	0.34	0.29	0.21	0.57	0.31	N/A	N/A
12TVR	0.38	0.29	0.41	0.87	0.4	N/A	N/A
13SED	0.4	0.33	0.24	0.61	0.34	N/A	N/A
14RPU	0.58	0.57	0.47	1.18	1.16	N/A	N/A
15RXP	0.79	0.74	0.37	1.73	0.45	N/A	N/A
16TDL	0.39	0.33	0.3	0.72	0.38	N/A	N/A
17RMM	0.4	0.32	0.32	0.64	0.4	N/A	N/A
18SUJ	0.34	0.26	0.26	0.75	0.35	N/A	N/A
18TWL	0.46	0.31	0.48	1.62	0.43	N/A	N/A
Scene Bests	0	2	7	0	0	N/A	N/A
STDEV	0.14	0.16	0.1	0.44	0.26	N/A	N/A

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

O’Neill, F.D.; Wayant, N.M.; Becker, S.J. Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps. Land 2025, 14, 1630. https://doi.org/10.3390/land14081630

AMA Style

O’Neill FD, Wayant NM, Becker SJ. Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps. Land. 2025; 14(8):1630. https://doi.org/10.3390/land14081630

Chicago/Turabian Style

O’Neill, Francis D., Nicole M. Wayant, and Sarah J. Becker. 2025. "Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps" Land 14, no. 8: 1630. https://doi.org/10.3390/land14081630

APA Style

O’Neill, F. D., Wayant, N. M., & Becker, S. J. (2025). Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps. Land, 14(8), 1630. https://doi.org/10.3390/land14081630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Future Built-Up Land Cover from a Yearly Time Series of Satellite-Derived Binary Urban Maps

Abstract

1. Introduction

2. Data

3. Methods

3.1. Pre-Processing

3.2. Cellular Automata Markov Chain

3.3. Random Forest and Support Vector Machine

3.4. Convolutional Neural Network

3.5. Convolutional Long Short-Term Memory

3.6. Linear Kernel Logistic Regression

3.7. Accuracy Metrics

3.7.1. Overall Accuracy

3.7.2. Cohen’s Kappa

3.7.3. Log-Loss

3.7.4. Metric Summary

4. Results

4.1. Spatial Kernel Size

4.2. Loss Function Weighting

4.3. Visual Comparison

4.4. Metric Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI