1. Introduction
High-resolution, web-based satellite imagery has become a popular data source for geospatial analysis, urban mapping, and decision-making due to its global coverage and ease of access. Among these sources, Google Earth imagery is widely used in both academic and practical settings, especially in regions with limited access to authoritative geospatial data. Despite its visual quality and usefulness, Google Earth imagery is not designed for precise measurements. Furthermore, it often exhibits spatially varying geometric distortions, which restrict its use in high-precision geospatial tasks. Therefore, geometric correction is applied as an essential preprocessing step when using Google Earth imagery for quantitative spatial analysis. The precision of this correction largely relies on the number, distribution, and quality of Ground Control Points (GCPs). Inadequate placement, such as point clustering, missing boundary coverage, or redundancy, can significantly reduce geometric accuracy, even when there are sufficient control points. Therefore, creating an effective GCP network remains a key challenge for ensuring reliable geospatial analysis of Google Earth imagery. Most existing studies on the geometric accuracy of Google Earth imagery rely on manually selected or evenly distributed GCPs and focus on a single acquisition time. While these methods offer useful insights into point positional accuracy, they assume that a GCP setup optimized for one acquisition will also be suitable for subsequent ones. In reality, this assumption is rarely correct. Google Earth imagery is generated from multiple sensors, collected on different dates, and processed using various techniques, resulting in geometric inconsistencies that vary across space and over time. As a result, a GCP configuration that is effective in one year may not be appropriate or even valid for another.
Most current research on the geometric accuracy of Google Earth imagery typically optimizes GCP configurations for a single acquisition time, implicitly assuming that this control network is appropriate for other image captures. This assumption is problematic in multi-temporal imagery, where geometric inconsistencies differ across space and time. Instead of relying on temporal differences for change detection, observations from multiple time points are used. This can help assess whether a GCP network remains stable and informative over the years, thereby reducing the risk of overfitting to a single image epoch. Recent developments in artificial intelligence have shown the potential of automated GCP selection strategies to improve geometric correction [
1,
2]. However, the current optimization strategy is only applicable to single-image or single-epoch problems and does not address temporal robustness. The lack of a robust spatio-temporal design framework is a key limitation in current GCP optimization research, particularly for applications that use historical Google Earth imagery or long-term urban monitoring, where maintaining consistent geometric accuracy over many years is essential. This study addresses this limitation by introducing a multi-epoch, robust Ground Control Point (GCP) network design framework grounded in optimal experimental design theory for Google Earth imagery.
Instead of optimizing GCP configurations for a single snapshot, the new approach explicitly evaluates GCP performance across multiple time points at the same location. Rather than analyzing temporal change itself, multi-temporal observations are used to assess cross-epoch stability of the GCP network design and to reduce sensitivity to a single image epoch. By integrating high-accuracy ground-truth data from a mobile mapping system with automated GCP optimization, the approach ensures consistent geometric correction. The approach is tested using multi-temporal Google Earth images from two urban areas, enabling assessment of its spatial and temporal robustness. Rather than averaging geometric performance across epochs, the proposed framework evaluates control-network performance across all available acquisition years and emphasizes the least favorable epoch in the multi-year evaluation.
This study explores the following research questions:
RQ1. Can the proposed multi-temporal GCP design framework ensure robust and stable geometric correction across different acquisition epochs?
RQ2. Can an optimized GCP network sustain consistent geometric accuracy across different temporal epochs of the same urban area?
RQ3. How does the proposed GCP optimization framework perform across geographically distinct urban environments with varying spatial characteristics?
The main contributions of this study are summarized as follows:
A multi-epoch robust GCP network design for georeferencing based on optimal experimental design.
A novel hybrid DI-optimality criterion that balances transformation stability and interior prediction accuracy under temporal variability.
A fully automated strategy for selecting the optimal number of GCPs, validated through multi-temporal experiments and random feasible-subset benchmarks.
Instead of averaging data across acquisition epochs, the proposed framework directly optimizes GCP selection based on the most challenging acquisition epoch, thereby ensuring robustness under adverse temporal conditions.
The imagery used in this study was obtained from Google Earth, which combines optical satellite data from multiple sensors with varying spatial resolutions and geolocation accuracies. Previous studies report that the horizontal positional accuracy of Google Earth imagery generally ranges from 5 to 15 m, depending on the acquisition date and location.
2. Literature Review
Researchers have frequently used Google Earth as a source of geospatial data because of its global coverage, high visual detail, and free access. Nevertheless, Google Earth has limitations for more precise spatial analyses due to geometric inconsistencies that vary by region and over time. Research studies that have methodically evaluated the positional accuracy of Google Earth images have consistently demonstrated that moderate positional errors exist and vary significantly by region. Nwilo et al. applied a multi-epoch assessment of historical Google Earth imagery and found substantial temporal variation in horizontal accuracy [
3]. Their results showed root-mean-square errors (RMSEs) exceeding 16 m in earlier periods, while decreasing to approximately 6 m in more recent images [
3]. This indicates that geometric accuracy is inconsistent over time, even within the same geographic area.
In terms of accuracy, Guo et al. [
4] evaluated Google Earth imagery across various regions in Asia and found horizontal accuracies ranging from roughly 5 m in flat urban areas to more than 16 m in mountainous regions. Their finding emphasized the impact of terrain on positional accuracy. Further research applied by Youngu et al. [
5] compared Google Earth images with ground-surveyed control points and reported planimetric RMSE values of approximately 6 m, along with noticeable directional biases that varied by location. Accordingly, these studies show that Google Earth imagery is affected by spatial and temporal geometric distortions. Therefore, careful georeferencing and validation are important when using Google Earth imagery for quantitative spatial analysis.
GCPs remain essential to achieve reliable image georeferencing. All related studies consistently show that georeferencing accuracy depends not only on the number of GCPs but also on their exceptional spatial distribution within the imaged area. Accordingly, Liu et al. [
6] examined how the number and placement of GCPs influence accuracy and found that accuracy improves when well-distributed GCPs are placed, after which additional points provide only limited benefit. Evidently, adding more GCPs provides accuracy-related benefits but primarily introduces redundancy. A previous study by Stott et al. [
7] showed that the increase in GCP number does not necessarily improve accuracy when a sufficient geometric enclosure is achieved. Recent comparative studies further highlight the significance of GCPs’ spatial distribution, as noted by Atik and Arkali [
8]. They tested various GCP distribution models and demonstrated that small, well-distributed GCP networks can outperform larger, poorly arranged ones. These findings indicate that effective georeferencing depends more on the geometric configuration than on the total number of GCPs, especially for affine and low-order transformation models.
Other researchers have sought to mitigate the bias and inefficiency of manual GCP selection by employing automated, optimization-based methods. Sánchez et al. [
9] proposed a semi-automated framework for assessing positional accuracy that uses large, well-distributed control sets to meet international accuracy standards. Their results highlighted the advantages of systematically placing GCPs rather than selecting them randomly. Muradás Odriozola et al. showed that automated GCP detection and selection strategies can improve consistency and decrease operator-dependent bias in georeferencing tasks [
1]. Although many of these approaches focus on single-image or single-epoch scenarios, they collectively indicate a trend toward data-driven, reproducible GCP design methods.
Although interest from developers and end users in automated GCP selection is growing, most current research focuses on optimizing control-point setups for a single acquisition time. This is particularly problematic for Google Earth imagery, which comprises multisensor, multitemporal mosaics of a specific area. Recent research on robust georeferencing emphasizes stability across varying conditions rather than peak performance on a single dataset. Research by Soszyńska et al. showed that georeferencing techniques that remain stable despite changes in imaging conditions yield more consistent long-term results compared to methods optimized for a single epoch snapshot [
10]. Their findings support the growing understanding that robustness and reproducibility are crucial for multi-temporal geospatial analysis.
In summary, research shows that Google Earth imagery varies across space and time. Distributing GCPs evenly yields better results than having many GCPs clustered in one area. Additionally, automated and optimization-driven methods provide significant benefits over heuristic approaches. This study aims to create a detailed spatio-temporal framework for designing GCP networks that accounts for variability in multi-epoch data collection, ensuring both geometric stability and efficiency.
3. Methodology
The GCP selection problem involves identifying ground control points that can georeference images accurately and consistently. Each candidate GCP contributes geometric information for estimating the transformation model parameters. The objective is therefore to select a compact subset that remains informative and stable across different acquisition epochs. In this study, the problem is formulated within an optimal experimental design framework, in which GCP network quality is evaluated using the transformation model’s information matrix. For the multi-temporal case, the framework seeks configurations that remain reliable even under the least favorable acquisition conditions.
Multi-temporal imagery is integrated into the GCP assessment process to improve resilience to temporal changes in image geometry. Rather than using temporal differences to detect changes, images from different years are used to determine whether a chosen GCP network remains consistent and informative across various acquisition periods. This approach evaluates robustness under the most challenging conditions across all available time points and results in minimizing overfitting to any single image epoch.
Formulating GCP network design as an optimal experimental design problem follows classical statistical design theory, in which control-point locations are selected to maximize the information content of model parameter estimates [
11,
12,
13].
The proposed framework treats GCP selection as a robust optimal experimental design problem focused on accurate absolute georeferencing of multi-temporal imagery. Control points are evaluated simultaneously across all epochs, and the network setup is optimized by maximizing a robust D-optimality criterion, which is defined as the minimum log-determinant of the information matrix over time. Practical constraints are explicitly included in spatial deployment, such as minimum boundary coverage and inter-point spacing. Unlike traditional methods that specify the number of GCPs in advance or rely on heuristic spatial arrangements, this approach determines both the number and the spatial distribution of control points through a data-driven optimization process. Network size is determined either by using marginal-gain stopping during greedy selection or by using cost-regularized μ-sweep analysis, which identifies a knee point in the information–complexity trade-off curve.
To further improve interior prediction, an optional hybrid D–I criterion is introduced that combines parameter stability with an interior-prediction variance term evaluated on a weighted grid in the image domain. This hybrid approach does not explicitly enforce interior-point selection but, when necessary, discourages the use of poorly conditioned interior-point predictions. The resulting framework creates compact, well-conditioned GCP networks that are robust across multiple epochs without relying on arbitrary design rules. This study employs an affine transformation model because the imagery is orthorectified and uses near-nadir Google Earth data covering small urban areas. In these cases, residual misregistration mainly results from translation, scale differences, rotation, and shear, which are well captured by an affine model [
14]. More complex projective models would introduce unnecessary complexity and reduce conditioning in sparse control networks, whereas simpler similarity models often cannot fully address local distortions in multi-temporal web-based imagery.
Figure 1 shows the methodology workflow. Candidate control points and multi-temporal image measurements are used to construct per-point information matrices under an affine transformation model. A DI-optimality-based design objective, combining D-optimality (parameter stability) and I-optimality (interior prediction stability), is then applied, and candidate configurations are evaluated across acquisition years using a worst-case multi-epoch criterion. GCP subsets are generated through two different approaches: greedy forward selection with marginal-gain stopping and cost-regularized μ-sweep analysis. A genetic algorithm is employed solely for validation purposes. The resulting configurations are tested with independent checkpoints and a random feasible-subset RMSE benchmark to measure both absolute accuracy and stability across multiple epochs.
To further clarify the practical interpretation of the proposed formulation,
Figure 2 illustrates how the affine model, information matrix, D-optimality, I-optimality, and worst-case temporal robustness relate to the proposed GCP network design problem.
3.1. Multi-Temporal Affine Coregistration Model
In the multi-temporal setting, each candidate GCP configuration is evaluated across all acquisition years using a worst-case information criterion, thereby strengthening the selection of control points that remain informative under varying imaging conditions.
Consider a set of
image acquisitions at epochs
and a pool of
candidate GCPs indexed by
. Each candidate GCP has known ground coordinates
and measured pixel coordinates
for each epoch. A summary of symbols used throughout the formulation is provided in
Appendix B.
An affine transformation model is adopted for each epoch:
where
denotes the affine parameter vector for epoch
. The affine model captures translation, rotation, scale, and shear, and is widely used for medium-scale coregistration of satellite and aerial images.
For each GCP
at epoch
, the design matrix
is defined as:
For analytical simplicity, the observation noise is assumed to be independent and identically distributed across GCP image measurements. This means that the Fisher Information Matrix (FIM) for a selected GCP subset
at epoch
is:
The inverse of provides the Cramer-Rao lower bound on the covariance of the affine-parameter estimates, thereby directly linking network geometry to registration accuracy. The Fisher Information Matrix shows how well a GCP setup constrains affine transformation parameters. A better-conditioned matrix yields more accurate estimates from control points. Thus, GCPs not only provide correspondence points but also strengthen the geometric estimation. Well-distributed configurations offer more independent information than clustered ones. As mentioned, the resulting per-epoch information measures are then combined through a worst-case robustness criterion.
In this study, the affine sensitivity matrix is defined using the fixed ground coordinates of the candidate GCPs. Under the simplifying assumption of identical measurement variance across epochs, the resulting information matrix primarily reflects the spatial conditioning of the candidate configuration. The framework’s temporal dimension arises not from changes in ground geometry but from repeatedly evaluating the selected configurations across multi-temporal image observations and assessing their stability across all acquisition epochs.
3.2. Robust D-Optimal GCP Selection
The selection of GCPs is intended to be a strong D-optimal experimental design. For a given candidate subset, robust D-optimality is defined as the minimum log-determinant of the per-epoch information matrix across all acquisition years. For a specific candidate subset, robust performance is evaluated using the least favorable acquisition year across the considered epochs. In this formulation, robustness should be understood at the evaluation stage as consistent georeferencing over different years, not as a fully epoch-dependent information-theoretic criterion.
Candidate GCPs are selected using a greedy forward selection strategy that accounts for the following deployment constraints. A minimum number of boundary points is necessary for ensuring spatial containment. Additionally, a minimum distance between points should be maintained to prevent clustering. The selection is iterative, adding the candidate point that yields the greatest improvement to the robust D-optimality measure until no further significant improvement is observed. The D-optimality criterion aims to maximize the determinant of the information matrix [
11,
13], which is equivalent to minimizing the volume of the confidence ellipsoid of the estimated transformation parameters:
This term promotes geometrically stable configurations and naturally favors control points near the spatial boundaries of the study area, where leverage is maximized. It is worth mentioning that maximizing the determinant of the information matrix reduces the volume of the uncertainty ellipsoid for estimated affine parameters. Under affine geometry, points near the boundary provide more spatial leverage than those near the center. That is why D-optimal solutions tend to emphasize boundary points.
3.3. Hybrid D–I Criterion
While D-optimality ensures parameter stability, it does not explicitly control prediction accuracy within the domain’s interior. To address this, an I-optimality term is introduced, which minimizes the average prediction variance over a predefined set of interior evaluation locations
:
where
is the aggregated interior sensitivity matrix computed from the interior grid points. This term explicitly penalizes configurations that yield poor performance on interior prediction. The I-optimality term addresses this issue by penalizing configurations that yield large average prediction variance across the interior of the evaluation grid. It therefore complements D-optimality by emphasizing prediction quality within the spatial domain, not only parameter stability.
To further enhance the performance of interior prediction, an optional hybrid D–I criterion is introduced. Along with the D-optimality term, a prediction variance for the interior is calculated using a weighted grid of test locations within the image domain. This term penalizes poorly conditioned interior predictions by evaluating the trace of the projection of the inverse information matrix onto the interior grid. The hybrid criterion balances overall transformation stability with the quality of interior predictions through a weighting parameter, α.
Importantly, the hybrid formulation does not require selecting interior control points. Instead, it offers a systematic method for penalizing configurations that lead to unstable interior predictions. In compact affine scenes, this may still yield boundary-dominated solutions, an empirical outcome of the optimization process.
The final fitness function combines both criteria into a single scalar objective:
where
controls the trade-off between parameter stability and interior prediction accuracy. Accordingly, the hybrid approach uses the I-optimality term to penalize configurations that produce high average prediction variance across the evaluation grid. It therefore complements D-optimality by emphasizing prediction quality within the spatial domain, not just parameter stability. To balance the scales of I- and D-optimality, a logarithmic transformation of
is applied, reducing the impact of large variance values. This scalar form effectively balances parameter stability and prediction quality in a single criterion.
This hybrid formulation ensures that selected GCP configurations simultaneously:
Stabilize the geometric transformation model (D-optimality), and
Minimize spatially averaged prediction error within the study area (I-optimality).
In this paper, represents the optimal number of selected GCPs that is determined using greedy forward selection with marginal-gain stopping and cost-regularized -sweep analysis.
In Equation (6), the weighting parameter α is selected empirically to reflect the intended emphasis of each strategy and is then evaluated through the sensitivity analysis reported in
Appendix B. The default greedy and optimized greedy methods use
= 0.6, placing greater emphasis on parameter stability (D-optimality) while still considering interior prediction quality. The μ-sweep method uses
= 0.2, which places greater weight on interior prediction accuracy (I-optimality). This lower
value helps identify minimal networks that maintain good interior coverage. This method-specific choice of
allows each strategy to focus on its intended role: greedy methods prioritize transformation stability, while μ-sweep emphasizes prediction accuracy with fewer points.
It is worth noting that the DI-optimality criterion specifies the design goal regardless of the optimization method used to select the subset. In this study, greedy forward selection with marginal-gain stopping and μ-sweep analysis serves as the primary optimizer, whereas a genetic algorithm (GA) is used as a separate global-optimization baseline for validation.
3.3.1. Spatial Feasibility Constraints
To ensure practical deployability and geometric robustness, two spatial constraints are enforced. The first is the minimum spacing constraint. To prevent clustering and ensure an even distribution of GCPs, we enforce a minimum ground distance between selected points.
where
is the Euclidean distance between GCPs and in ground coordinates
is the minimum allowable spacing (in meters), typically set to 50–200 m, depending on scene scale
3.3.2. Boundary Coverage Constraint
At least
selected GCPs must lie within a boundary zone defined as the outer
of the image extent. Boundary points stabilize affine transformations by constraining extrapolation at image edges and corners. To prevent ill-conditioning and extrapolation errors, we require adequate coverage of the image boundary:
where
is the selected subset of GCPs
is the set of boundary points
is the minimum number of required boundary points (typically ≥4)
Boundary points are critical because:
They approximate the convex hull of the control network and limit extrapolation at the image edges
They minimize extrapolation in transformation
They stabilize the corner and edge distortion correction
In practical applications, spatial design parameters depend on scene extent and candidate density. The minimum spacing should prevent GCP clustering while allowing enough subsets; experiments suggest using 10–20% of the shorter scene dimension as a good starting point. The boundary-zone fraction determines how boundary points are classified and is typically optimal at about 10% for compact scenes. However, sparser point sets may require smaller zones to avoid over-constraining the system. For affine transformations, setting to at least 4 is recommended to ensure geometric enclosure and stabilize image edge corners.
3.3.3. Greedy Robust D-Optimal Selection
The optimal subset is constructed via sequential forward selection, leveraging the near-submodular nature of the log-determinant objective.
At each iteration
, the candidate GCP
maximizing the robust marginal gain is selected:
Given the spatial constraints, information matrices are updated incrementally to enable efficient computation. Greedy forward selection is used because the log-determinant objective is almost submodular and provides solid theoretical guarantees for incremental maximization with cardinality limits [
15].
3.3.4. Selection Strategy Variants
In an experimental evaluation, three configurations of the subset selection method were tested. Default Greedy, Optimized Greedy, and cost-regularized μ-sweep were compared. The three approaches represent different ways to assess trade-offs among network size, conditioning, and accuracy. The three selection strategies, along with their respective parameter settings, are described in detail in
Appendix C.
All three strategies enforce and use a robust worst-case aggregation across acquisition epochs. In all three cases, the subset selection method rewards geometrically stable subsets that provide adequate boundary enclosure. The objective function typically increases quickly in the early iterations and then stabilizes, indicating diminishing marginal returns. In the greedy configurations, the stopping rule identifies the point where the marginal benefit is small relative to the initial benefit, while the μ-sweep configuration identifies through the Pareto front trade-off curve.
Algorithm 1 provides the standard greedy forward selection method used for the Default Greedy and Optimized Greedy configurations. The regularized cost μ-sweep follows the same subset-construction logic with respect to the cost incurred during subset creation, as described in
Section 3.4 and detailed in
Appendix C.
The pseudocode of the proposed greedy selection algorithm is shown as follows:
| Algorithm 1: Greedy Robust D-Optimal GCP Selection |
Input: - Candidate GCPs: {1, 2, …, N} - Information matrices: {Qi,t}i=1N, t∈T - Constraints: d_min, k_boundary - Stopping parameters: k_min, k_max, γ_stop Initialize: S ← ∅ Mt ← ε·I6 for all t ∈ T J_prev ← min_t log det(Mt) Δ1 ← undefined Iteration k = 1, 2, …, k_max: 1. Enforce boundary constraint: IF |S ∩ B| < k_boundary THEN Candidates ← B\S ELSE Candidates ← {1,…,N}\S END IF 2. Find the best candidate: FOR EACH i ∈ Candidates: IF spacing constraint violated (Equation (7)) THEN CONTINUE END IF Compute robust gain: g(i) ← min_t [J(Mt + Qi,t) − J(Mt)]//J = D or DI objective IF g(i) > best_gain THEN best_candidate ← i best_gain ← g(i) END IF END FOR 3. Check feasibility: IF no feasible candidate found THEN STOP (constraints too restrictive) END IF 4. Accept the best candidate: S ← S ∪ {best_candidate} Mt ← Mt + Q_best_candidate, t for all t ∈ T 5. Update objective: J_current ← min_t log det(Mt) Δₖ ← J_current - J_prev IF k = 1 THEN Δ1 ← Δk 6. Check stopping criteria: IF k ≥ k_min AND |S ∩ B| ≥ k_boundary THEN IF Δk/Δ1 < γ_stop THEN STOP (elbow detected) END IF END IF 7. Update for next iteration: J_prev ← J_current Output: Selected subset S |
3.4. Automatic Determination of the Optimal Number of GCPs
In addition to marginal-gain-based stopping in greedy selection, the framework provides a fully cost-regularized mechanism to automatically identify the optimal number of GCPs.
Rather than fixing the number of GCPs a priori, the framework introduces a cost-regularized objective:
where
denotes the robust D- or DI-optimality objective defined in
Section 3.2 and
Section 3.3, and
penalizes network size. By sweeping
, a Pareto frontier of information versus network size is obtained. The cost-regularized formulation follows standard convex trade-off principles, balancing model complexity against information gain to identify Pareto-optimal solutions [
16].
- 2.
Knee-Point Detection
The optimal GCP count is selected as the knee point of the Pareto curve, identified by maximum curvature after normalization. This provides a fully computational, non-heuristic method for determining network size, thereby eliminating the need for subjective parameter tuning.
The algorithm has computational complexity:
where
is the number of selected GCPs,
is the number of candidates,
is the number of acquisition epochs, and
is the number of affine parameters. For typical problem sizes (
,
), execution time is under one second. Although greedy selection does not guarantee global optimality, log-determinant objectives are known to provide strong approximation guarantees and are standard in optimal experimental design.
3.5. Accuracy Evaluation
The performance of the proposed GCP selection framework is assessed using independent checkpoints and multi-temporal imagery. The evaluation highlights both the geometric accuracy for each acquisition year and the temporal robustness over multiple years. It separates absolute geometric accuracy, measured by per-year RMSE, from temporal robustness, evaluated through mean, variance, and worst-year RMSE. This approach allows assessment not only of the georeferencing precision for a specific year but also of the consistency of the GCP configuration across different acquisition periods.
3.5.1. Per-Year Geometric Accuracy
Let denote acquisition years. For each year, an affine transformation is estimated using the selected optimal GCPs, based on the corresponding image pixel coordinates and their known ground coordinates .
The estimated affine parameters are denoted by
. Using this transformation, the ground position of each independent checkpoint
is predicted as
The positional errors in the Easting and Northing directions are then computed as
From these errors, the two-dimensional root-mean-square error (RMSE) for the year
is defined as
3.5.2. Temporal Aggregation and Robustness Metrics
For a given selected configuration, the same checkpoint set is used across all acquisition years. The per-year RMSE values can be combined to evaluate temporal stability. The average RMSE (
) over the years is calculated as
where
is the set of acquisition years and
is the number of evaluated epochs.
To quantify the sensitivity of georeferencing accuracy to temporal variability, the temporal spread of the RMSE is measured as its standard deviation.
A lower value of indicates higher temporal stability of the GCP configuration.
Finally, a worst-case performance metric is reported as
which reflects the maximum georeferencing error observed among all evaluated years.
4. Results
This section presents the results of the proposed spatio-temporal GCP network design framework across two urban case studies. The analysis evaluates (i) how geometric information evolves during GCP selection, (ii) absolute georeferencing accuracy using independent checkpoints, and (iii) temporal robustness across multiple acquisition epochs. Results are reported for different selection strategies to assess the trade-offs among network size, accuracy, and robustness. For each case study, a single set of independently surveyed points is used. As mentioned in Case Study I, ground control coordinates were obtained using high-resolution aerial photogrammetry, and the average accuracy was reported as 10 cm.
In Case Study II, GCPs were obtained using a ground-based Global Navigation Satellite System (GNSS) survey. All surveyed points are treated equally as candidate control points, with no prior manual distinction between GCPs and checkpoints. GCPs are automatically selected by the optimization framework, which relies solely on geometric and spatial criteria. After selecting GCPs, all remaining surveyed points are treated as independent checkpoints and excluded from the estimation process. These checkpoints are used only to measure accuracy. This method ensures an unbiased evaluation because the selection of checkpoints is entirely data-driven and requires no user interaction. Geometric accuracy is evaluated using planimetric residuals at the checkpoints, reported as east, north, and total RMSE. In addition to single-split analysis, the applied Monte Carlo resampling of the checkpoints assesses the robustness of the error statistics.
Alongside greedy and cost-regularized optimization methods, a genetic algorithm (GA) is used as a global optimization baseline to evaluate how close the greedy DI-optimal solution is to the global best. The GA optimizes the same DI-optimality goal under the same boundary and spacing constraints, ensuring a fair comparison at the design level rather than the accuracy level. The GA is only for validation, not as the primary selection method, since this work focuses on the design criterion itself rather than the choice of optimization algorithm.
In addition to the overall checkpoint RMSE, the temporal robustness analysis provides per-year RMSE values for each assessed acquisition epoch. It also includes their mean, temporal standard deviation, and worst-case value, in accordance with the definitions in
Section 3.5.
All experiments were conducted on a workstation with the following specifications:
− Processor: Intel Core i5 @ 2.8 GHz
− RAM: 32 GB DDR4
− Operating System: Windows 11 Home Edition
− Software: QGIS, Matlab R2024a
4.1. Case Study 1: Enschede City Center
This case study assesses the proposed spatio-temporal GCP network design framework using multi-temporal Google Earth imagery of Enschede’s city center. This case study included 16 surveyed points that were initially treated as candidate control points. For each selection strategy, the proposed optimization framework automatically selected the GCP subset, while the remaining surveyed points served as independent checkpoints for that specific configuration.
Figure 3 shows the Enschede city center and the spatial distribution of the 16 candidate ground control points used in Case Study I. Pixel coordinates were measured independently for 2015, 2020, and 2025, while ground coordinates were assumed to remain consistent over time.
Figure 4 shows the distributions of candidate points and GCP networks selected by various strategies. The DI-optimality objective function rises sharply during the initial iterations and then stabilizes, with minimal information gain falling below the stopping threshold after about six to eight GCPs are chosen. The μ-sweep method finds a knee point at k = 6, aligning with the chosen optimized greedy solution. The default greedy algorithm picks k = 8 GCPs, resulting in a planimetric RMSE of 7.42 m on independent checkpoints. Both the optimized greedy and μ-sweep settings (k = 6) produce an RMSE of 7.80 m, which is just 0.38 m higher, even though they use 25% fewer control points. Directional errors are evenly spread between easting and northing components (
≈ 5 m,
≈ 6 m), with no systematic directional bias seen. Monte Carlo checkpoint testing confirms the stability of these configurations. The average RMSE across one hundred randomized checkpoint subsets is 7.03 m for the default, 7.61 m for the optimized, and 7.61 m for μ-sweep. When all methods are limited to k = 6, they all converge to an RMSE of 7.79 m. Accordingly, the main differences come from network size rather than the selection method.
To evaluate the efficiency of the selected GCP configuration in terms of absolute georeferencing accuracy, a benchmark was conducted on a randomly sampled, fixed-cardinality subset (k = 6). The algorithm analyzed a total of 2000 randomly sampled subsets that satisfied identical boundary and spacing constraints, using independent checkpoints. The optimized GCP configuration achieved a planimetric RMSE of 7.80 m, placing it within the lowest 1.2% of the RMSE distribution (see
Figure 5). Although some random subsets exhibited lower RMSE values, most performed considerably worse, yielding a median RMSE of 11.45 m. This outcome demonstrates that the proposed framework produces a GCP configuration that is near-optimal in terms of absolute georeferencing accuracy, robustness, and favorable geometric conditioning.
Furthermore, a genetic algorithm (GA) was employed to directly optimize the same robust DI-optimality criterion under identical spatial constraints and a fixed cardinality of k = 6. The GA was used to assess whether the greedy DI-optimal solution is close to the global optimum for the same objective. The GA (see
Figure 5) converged to a slightly higher DI score and achieved a marginally better RMSE percentile (0.8%) than the greedy solution (1.5%) on the random feasible-subset benchmark. This small difference indicates that the greedy strategy already performs near the global optimum while requiring considerably less computational effort.
Table 1 presents the per-year RMSE values for the 2015, 2020, and 2025 acquisitions, along with the mean RMSE, temporal standard deviation, and worst-case RMSE for each selection strategy. These findings indicate that temporal variability remains constrained across the evaluated years, thereby confirming that the chosen GCP networks are not confined to a single acquisition epoch.
Table 1 shows limited temporal variation across the evaluated years for all methods. The default greedy solution achieves the lowest mean RMSE (7.355 m) and the smallest temporal spread (σRMSE = 0.096 m), whereas the optimized greedy and μ-sweep solutions yield slightly higher but still stable values (mean RMSE = 7.652 m, σRMSE = 0.129 m). These results indicate that all three configurations maintain consistent performance across the tested epochs, with worst-case RMSE values remaining close to their mean levels.
4.2. Case Study 2: University of Baghdad Campus
This case study applies the proposed spatio-temporal GCP network design framework to a second, smaller study area with fewer candidate control points. Thirteen surveyed points were available in this case study and were initially treated as candidate control points. As in case study 1, Ground Control Points were then automatically selected by the proposed optimization framework, while the remaining points served as independent checkpoints for accuracy assessment (
Figure 6). Pixel coordinates were measured independently across multiple acquisition years, while ground coordinates were assumed to remain constant over time.
The set of candidate GCPs includes 13 surveyed locations, of which 9 are classified as boundary points and 4 as interior points. The network consistently incorporates four boundary points when applying all proposed selection strategies. This highlights the importance of spatial enclosure in stabilizing affine transformations. Under the default greedy strategy, the algorithm selects k = 8 GCPs, achieving a strong D-optimality score of J = 66.86. The optimized greedy configuration stops earlier at k = 6 (J = 65.86), while the cost-regularized μ-sweep finds a clear inflection at k = 5 (J = 65.38). Five GCPs form the smallest network that retains most of the available geometric information. Checkpoint-based evaluation shows that reducing network size does not always decrease accuracy. Using the available checkpoint sets for each configuration, the default greedy solution achieves a planimetric RMSE of 2.64 m ( = 2.09 m, = 1.60 m). The μ-sweep solution reaches a comparable RMSE of 2.68 m ( = 2.08 m, = 1.69 m) while using only five GCPs. Conversely, the optimized greedy configuration results in an RMSE of 2.80 m ( = 2.14 m, = 1.80 m). The directional components remain balanced across all methods, indicating no strong systematic anisotropy. For robust verification, a Monte Carlo method is used to evaluate these trends: the default configuration has the lowest mean RMSE (2.08 m), followed by μ-sweep (2.66 m) and optimized greedy (2.75 m). A fixed-k comparison at k = 6 further shows that the default selection yields a lower error (2.27 m) than the optimized configuration (2.80 m), while μ-sweep does not produce a k = 6 solution within the tested μ grid. This behavior is common in small networks because removing a few high-influence boundary points can increase variance even while maintaining overall conditioning.
Figure 7 illustrates the spatial distribution of candidate and selected GCPs, along with the progression of the DI-optimality objective, marginal information gain, and the μ-sweep Pareto frontier for Case Study 2.
Overall, Case Study II shows that the proposed μ-sweep DI-optimal framework can automatically select a smaller, well-conditioned GCP network (k = 5) with accuracy similar to wider heuristic choices, while consistently preserving the critical boundary-enclosure structure.
To assess whether the selected GCP configuration is competitive in terms of absolute accuracy, a random feasible-subset test was conducted with a fixed cardinality (k = 6). A total of 877 randomly generated subsets, all meeting the same boundary and spacing requirements, were tested using independent checkpoints. The optimized solution yielded a planimetric RMSE of 2.80 m, placing it in the lowest 8.5% of the RMSE distribution (
Figure 8). While some of the random subsets had lower RMSE values, most were considerably worse, with a median RMSE of 3.44 m. Unlike in Case Study I, the genetic algorithm failed to improve the absolute accuracy of the selected GCP configuration in Case Study II. For the random feasible-subset test with fixed cardinality (k = 6), the greedy DI-optimal solution fell within the lowest 8.5% of the RMSE distribution, whereas the GA-based solution ranked lower at 23.2%. This suggests that, for smaller candidate sets with fewer interior points, stochastic global optimization can yield less robust solutions even when optimizing the same objective function.
The temporal robustness indicators for Case Study 2 are summarized in
Table 2, including per-year RMSE, mean RMSE, temporal standard deviation, and worst-case RMSE. As in Case Study 1, these results indicate limited temporal variation across the years evaluated for acquisition.
Table 2 summarizes the temporal robustness indicators for Case Study 2. The per-year checkpoint RMSE values indicate that all three strategies remain stable in the years 2015, 2020, and 2025, though their accuracy–compactness trade-offs differ. The default greedy solution achieves the lowest mean RMSE (2.160 m) but uses the largest network (k = 8). The μ-sweep solution identifies the smallest network (k = 5) while maintaining a comparable mean RMSE (2.490 m) and the lowest temporal spread (σRMSE = 0.187 m). The optimized greedy solution yields a slightly higher mean RMSE (2.581 m) and an intermediate temporal spread (σRMSE = 0.235 m). Overall, these results show that the proposed framework can maintain stable multi-epoch performance in the second case study while enabling different trade-offs between network size and accuracy.
4.3. Comparison with Single-Epoch Baseline Designs
To directly assess the influence of the multi-epoch design, an explicit single-epoch baseline experiment was conducted for both case studies. In this configuration, the GCP network was optimized independently using only the 2015, 2020, or 2025 image observations, and subsequently tested across all available years utilizing the same temporal RMSE metrics. The results are summarized in
Table 3.
Table 3 shows a case-dependent pattern. In Case Study 1, both the multi-epoch robust design and all three single-epoch baselines converged on the same six-point subset, yielding identical cross-epoch RMSE results. This suggests that the optimal GCP configuration is already stable over time for that dataset. In Case Study 2, the multi-epoch robust design and the single-epoch 2015 baseline again selected the same subset, whereas the single-epoch 2020 and 2025 baselines chose a different six-point network with higher mean and worst-case RMSE values across the years. This demonstrates that the benefits of a multi-epoch design depend on the dataset and are especially valuable when different acquisition years favor different local optima.
Because the checkpoint set depends on the selected subset, these values should be interpreted as configuration-specific cross-epoch evaluations rather than as fixed-test-set comparisons.
5. Discussion
Classical photogrammetric guidelines typically recommend placing GCPs near the corners and the center to stabilize geometric correction. However, these suggestions are mainly heuristic and lack formal optimal design criteria. Conversely, the proposed framework models GCP selection as a robust optimal experimental design problem under an affine transformation model. According to optimal design theory, configurations typically focus on boundary points to improve geometric reliability. In affine transformation models, the Fisher Information Matrix depends on the second-order spatial moments of the control-point coordinates. Points near the convex hull maximize geometric reliability and increase the determinant of the information matrix, thereby reducing parameter variance. This explains why boundary-dominated configurations naturally emerge under D-optimal design.
This pattern is evident in both case studies, in which all optimized configurations contain the minimum number of necessary boundary points and achieve stable georeferencing accuracy without dependence on centrally located GCPs (
Figure 4 and
Figure 7). Once the convex hull of the scene is sufficiently constrained, interior points contribute only marginal additional information, as evidenced by the early saturation of the marginal information-gain curves in Case Study 1 and the boundary-focused solutions in Case Study 2. An important insight from the experimental results is that the proposed DI-optimality criterion penalizes configurations with poor stability of interior predictions. In Case Study 1, the optimized greedy and μ-sweep solutions attain RMSEs similar to the default greedy method, even with fewer GCPs. This means that interior prediction accuracy stays consistent despite network simplification (
Figure 4). Although DI-optimal solutions do not favor interior GCPs, they avoid geometries that result in unstable interior-point predictions once the boundary enclosure requirement is satisfied. This explains why boundary-oriented solutions can still offer competitive accuracy, particularly in compact affine domains.
The saturation of marginal gains seen in Case Study 1 (
Figure 4) has important practical consequences. After six GCPs, adding additional control points offered only slight improvements in geometric information, especially in creating redundancy instead of significant conditioning benefits. The 5% accuracy difference between eight-point and six-point networks (7.42 m vs. 7.80 m) falls well within the typical positional uncertainty of Google Earth imagery itself, which ranges from 5 to 15 m according to prior studies [
3,
4,
5]. Additionally, the convergence of all three methods to nearly identical RMSE at k = 6 (7.79 m) confirms that observed performance differences are related to network size rather than selection strategy. Once the boundary enclosure is satisfied, the greedy DI-optimal criterion yields near-optimal point configurations.
In Case Study 2, a slightly smaller candidate pool (13 points compared to 16 in Case Study 1) made the trade-offs between network size and selection strategy more evident. The μ-sweep approach identified a minimal network of five GCPs (
Figure 7, Pareto frontier) that achieved accuracy comparable to larger networks: RMSE = 2.68 m for k = 5 versus 2.64 m for k = 8 (default greedy), a difference of just 0.04 m. This 37% reduction in network size, with comparable accuracy, verified the effectiveness of cost-regularized optimization in identifying the smallest viable networks. However, the optimized greedy method shows slightly greater variance at k = 6 (RMSE = 2.80 m compared to 2.27 m for the default greedy at the same size; see
Figure 7 for comparison). This highlights how small networks are sensitive to point selection since removing just a few high-leverage boundary points can significantly raise prediction variance. The spacing distribution (
Figure 7, bottom-right panel) confirms that all selected networks meet the minimum spacing constraint, indicating that configuration geometry, rather than point density, drives the differences in accuracy.
Comparison of greedy and genetic algorithm optimization highlights key algorithmic trade-offs. Although the GA achieves a slight improvement in the DI-optimal objective for the larger candidate set in Case Study 1 (
Figure 5, 0.8% percentile vs. 1.5% for greedy), the gain is minimal, supporting the conclusion that greedy optimization provides an efficient, near-optimal solution for robust GCP network design. More importantly, global optimization with genetic algorithms does not necessarily enhance robustness and can even yield less stable solutions: in Case Study 2, the GA-based solution ranked at the 23.2% percentile, compared with 8.3% for the greedy DI-optimal solution (
Figure 8). This reversal in performance with the smaller candidate set suggests that stochastic global optimization may produce less robust solutions when boundary constraints strongly limit the feasible solution space. Greedy optimization tends to be more dependable in such scenarios because it has a deterministic bias toward high-leverage boundary points.
The random feasible-subset benchmark (
Figure 5 and
Figure 8) shows that the proposed framework not only optimizes an information-based criterion but also identifies GCP configurations that are competitive in absolute georeferencing accuracy. Although a few random subsets achieve slightly lower RMSE values, these configurations are not guaranteed to be geometrically stable or reliable across different acquisition epochs. The optimized solutions rank in the top decile of RMSE in both case studies (top 1.2% in Case Study 1, top 8.5% in Case Study 2) while also maximizing robust DI-optimality. This demonstrates that robustness and good conditioning can be achieved with only a small trade-off in single-epoch absolute accuracy, which is a worthwhile compromise for multi-temporal applications where temporal stability is critical.
The RMSE values of about 7–8 m in Case Study 1 and 2–3 m in Case Study 2 match the expected positional accuracy of Google Earth imagery. These figures align with the reported accuracies of medium-resolution optical satellite data such as Sentinel-2, ASTER, and Landsat. The consistent RMSE across acquisition years further demonstrates that the framework improves robustness rather than absolute accuracy, as intended. Although Google Earth imagery’s absolute positional accuracy is fundamentally limited by sensor geolocation and mosaicking processes, the results show that optimal GCP network design remains vital under noisy observation conditions. In both case studies, reducing the number of control points does not significantly degrade accuracy once geometric enclosure is achieved, confirming that redundant GCPs primarily add cost without providing additional stability. The DI-optimal framework, therefore, aims to provide reliable parameter estimation while minimizing redundancy, which is especially useful in practical applications where GCP collection is costly. Finally, the results indicate that incorporating multi-temporal imagery enhances robustness by reducing sensitivity to year-to-year variability. By optimizing GCP selection based on the worst-performing epoch, the framework produces networks that remain stable over multiple years, as evidenced by the small temporal variation in RMSE across both case studies. This confirms that multi-temporal information effectively reduces overfitting to individual acquisitions.
The temporal robustness tables further show that the selected networks remain stable across the years evaluated. In Case Study 1, temporal variability is consistently low across all methods, with σ_RMSE values ranging from 0.096 to 0.129 m. In Case Study 2, the μ-sweep solution shows the least temporal variation (σ_RMSE = 0.187 m) and utilizes the smallest network size (k = 5). These results confirm that the proposed framework maintains stable multi-epoch performance while allowing for different trade-offs between efficiency and accuracy.
The framework is evaluated using an affine transformation model, which is suitable for orthorectified near-nadir imagery and compact urban environments such as those examined here. The candidate GCP pools ( = 16 and = 13) reflect realistic field-survey conditions where control collection is limited. Pixel coordinates were systematically extracted across epochs to ensure comparability. However, manual measurement procedures may introduce operator-dependent uncertainties and repeatability issues that are not currently modeled as distinct stochastic effects. Developing a method that explicitly accounts for image-measurement uncertainty will be a significant future research direction. Additionally, plans are underway to extend the framework to include larger candidate sets and higher-order transformation models.
Within this scope, the proposed framework is well-suited for orthorectified near-nadir imagery of urban areas where registrations are primarily captured by a low-order affine model and the number of candidate control points is constrained by real-world field conditions. The two experiments showed that our method can consistently identify compact GCP networks that enclose the boundaries and maintain stable performance across multiple time epochs. However, broader applications should be approached with caution. The current experiments do not yet establish performance for strongly non-linear distortions, spatially heterogeneous error structures, very large candidate pools, or substantially different landscape types. These extensions represent important possibilities for future research validation.
An additional limitation is that the proposed Fisher-information formulation does not explicitly incorporate epoch-dependent measurement covariance, visibility, or point-specific reliability. Consequently, temporal robustness in this study is demonstrated through multi-epoch evaluation and worst-case performance assessment, rather than through a comprehensive epoch-dependent information model matrix.
Accordingly, the experimental results provide clear answers to the research questions posed in the Introduction:
RQ1. Can the proposed multi-temporal GCP design framework ensure robust and stable geometric correction across different acquisition epochs?
The additional single-epoch baseline experiment shows that the answer is case-dependent. In Case Study 1, the multi-epoch and single-epoch designs converged to the same GCP subset and therefore produced identical cross-epoch accuracy. However, in Case Study 2, the multi-epoch design outperformed the single-epoch 2020 and 2025 baselines in mean and worst-case RMSE across years, while matching the single-epoch 2015 solution. These results indicate that the proposed framework can provide a practical advantage when individual epochs favor different GCP subsets, whereas in temporally stable scenes, it may converge to the same solution as single-epoch optimization.
RQ2: Can an optimized GCP network sustain consistent geometric accuracy across different temporal epochs of the same urban area?
Yes. Both case studies demonstrate temporal consistency: Case Study 1 achieves stable RMSE values across the three evaluated epochs, with only GCPs, while Case Study 2 also maintains stable accuracy across epochs with GCPs. The optimized networks perform reliably across all acquisition years without requiring epoch-specific reconfiguration.
RQ3: How does the proposed GCP optimization framework perform across geographically distinct urban environments with varying spatial characteristics?
The framework generalized effectively across the two study areas. In both case studies, differences in spatial extent, candidate pool size, and urban morphology were evident. These differences produced compact, boundary-focused GCP networks that achieved high accuracy. The automatic determination of optimal network size (k = 6 for case 1, k = 5 for case 2) is adapted to scene characteristics without manual intervention.
6. Conclusions
This paper presents a multi-epoch, robust method for automatically designing GCP networks that accurately georeference Google Earth images. The approach uses an affine-optimal experimental design and assesses candidate configurations over several acquisition years to minimize overfitting to specific images. By combining multi-temporal images in the model with a worst-case optimization technique and employing a hybridized DI-optimality criterion to balance transformation performance and the accurate prediction of interior results, the method designs GCP networks. Results from two urban case studies indicate that the compact, well-conditioned GCP networks designed by this method achieve absolute georeferencing accuracy comparable to that expected from larger, manually designed control networks. These studies also indicate that the geometric leverage available from image boundaries provides sufficient stability in the affine transformation process; any additional GCPs primarily add redundancy to the first GCP once it is placed within a spatially enclosed area. Across the years evaluated, the proposed framework demonstrates consistent georeferencing performance. This indicates stability despite the temporal variability observed in the Google Earth imagery used in the test.
Although the proposed design framework is demonstrated using Google Earth imagery, it is directly applicable to unmanned aerial vehicle (UAV) orthomosaics, historical aerial archives, and medium-resolution satellite data where low-order transformation models are used. The methodology is therefore driven by models rather than specific platforms. The current findings demonstrate the practical usefulness of the proposed framework for the tested multi-temporal urban scenarios, although broader comparative validation remains a key area for future research.
Future research will concentrate on extending the framework to incorporate higher-order transformation models. In addition, spatially variable uncertainty will be incorporated and performance will be evaluated across larger, more diverse geographic regions. Overall, the proposed methodology provides a clear and reproducible foundation for designing robust GCP networks in multi-temporal geospatial analysis.