Archival Aerial Images Georeferencing: A Geostatistically-Based Approach for Improving Orthophoto Accuracy with Minimal Number of Ground Control Points

: Georeferenced archival aerial images are key elements for the study of landscape evolution in the scope of territorial planning and management. The georeferencing process proceeds by applying to photographs advanced digital photogrammetric techniques integrated along with a set of ground truths termed ground control points (GCPs). At the end of that stage, the accuracy of the ﬁnal orthomosaic is assessed by means of root mean square error (RMSE) computation. If the value of that index is deemed to be unsatisfactory, the process is re-run after increasing the GCP number. Unfortunately, the search for GCPs is a costly operation, even when it is visually carried out from recent digital images. Therefore, an open issue is that of achieving the desired accuracy of the orthomosaic with a minimal number of GCPs. The present paper proposes a geostatistically-based methodology that involves performing the spatialization of the GCP errors obtained from a ﬁrst gross version of the georeferenced orthomosaic in order to produce an error map. Then, the placement of a small number of new GCPs within the sub-areas characterized by the highest local errors enables a ﬁner georeferencing to be achieved. The proposed methodology was applied to 67 historical photographs taken on a geo-morphologically complex study area, located in Southern Italy, which covers a total surface of approximately 55,000 ha. The case study showed that 75 GCPs were su ﬃ cient to garner an orthomosaic with coordinate errors below the chosen threshold of 10 m. The study results were compared with similar works on georeferenced images and demonstrated better performance for achieving a ﬁnal orthomosaic with the same RMSE at a lower information rate expressed in terms of nGCPs / km 2 .


Introduction
The study of the evolutionary dynamics of environmental landscapes is typically performed by means of archival aerial image analysis. From such photographs, a large range of details can be grasped about areas of interest at different times that can provide support to many study fields, such as analyses of land-use [1][2][3][4][5], urban area [6], glacier volume and lake surface [7], and landslide evolution [8][9][10] changes. Despite AAPs' (archival aerial photos) informative potential, so far, their usage has been scarce owing to the processing difficulty related to the lack of some key information (e.g., external and

Study Area and Data Description
The study area covers a surface of approximately 55,000 ha, and is located on the border between two regions, Puglia and Basilicata, in Southern Italy. It is included in the 'Murge', a low limestone plateau dominating the landscape of the central part of the Puglia region. The delineation of the study area was carried out considering the hydrographic basin system of the streams crossing the western part of the Regional Natural Park 'Terra delle Gravine'.
The area can be divided into three different longitudinal strips that differ in terms of altitude, morphology, land cover, and presence of forest vegetation (Figure 1a).
The northern zone, between 500 and 380 asl, is characterized by hills generally with south-and south-east exposure, where there are forests of Quercus trojana Webb. The central area, between 400 and 220 asl, consists of a slightly wavy plain, used almost everywhere for agricultural crops [33]. The southern zone, between 350 and 160 asl, is morphologically more complex owing to the presence of a series of deep incisions that appear similar to North American canyons [34], termed 'gravine' (Figure 1b). The vegetation mainly comprises forest communities characterized by an accentuated physiognomic and compositional heterogeneity [33]. The transects reported in Figure 2 show the Remote Sens. 2020, 12 The northern zone, between 500 and 380 asl, is characterized by hills generally with south-and south-east exposure, where there are forests of Quercus trojana Webb. The central area, between 400 and 220 asl, consists of a slightly wavy plain, used almost everywhere for agricultural crops [33]. The southern zone, between 350 and 160 asl, is morphologically more complex owing to the presence of a series of deep incisions that appear similar to North American canyons [34], termed 'gravine' (Figure 1b). The vegetation mainly comprises forest communities characterized by an accentuated physiognomic and compositional heterogeneity [33]. The transects reported in Figure 2 show the inhomogeneity of the study area and the different degrees of morphological complexity of the three zones reported in Figure 1.  The northern zone, between 500 and 380 asl, is characterized by hills generally with south-and south-east exposure, where there are forests of Quercus trojana Webb. The central area, between 400 and 220 asl, consists of a slightly wavy plain, used almost everywhere for agricultural crops [33]. The southern zone, between 350 and 160 asl, is morphologically more complex owing to the presence of a series of deep incisions that appear similar to North American canyons [34], termed 'gravine' (Figure 1b). The vegetation mainly comprises forest communities characterized by an accentuated physiognomic and compositional heterogeneity [33]. The transects reported in Figure 2 show the inhomogeneity of the study area and the different degrees of morphological complexity of the three zones reported in Figure 1. The present study was based on 67 AAPs captured in black and white between 1954 and 1955 during the first Italian National Planimetric Survey. The photographs (23 cm x 23 cm) were taken with metric cameras at a height of 6000 m with an acquisition scale of 1:34,000. The analogue AAPs were digitalized by means of an Epson Expression 1640 XL nonphotogrammetric scanner with a resolution of 800 dpi (or equivalently, 31.75 µm). AAPs were purchased already digitalized from the IAGO/IGM The present study was based on 67 AAPs captured in black and white between 1954 and 1955 during the first Italian National Planimetric Survey. The photographs (23 cm × 23 cm) were taken with metric cameras at a height of 6000 m with an acquisition scale of 1:34,000. The analogue AAPs were digitalized by means of an Epson Expression 1640 XL nonphotogrammetric scanner with a resolution of 800 dpi (or equivalently, 31.75 µm). AAPs were purchased already digitalized from the IAGO/IGM retailer (Italian Army Geographical Office/Istituto Geografico Militare). Scanned images had different orientations relative to the original images, and they had an overlap of approximately 60%.
Furthermore, a digital terrain model (DTM) was obtained by merging DTMs of Apulian and Basilicata regions, which have a resolution of 8 m and 5 m, respectively. Digital orthoimages of the 2016 flight for the Puglia region and that of 2013 for the Basilicata region were used. The two orthophotographs have a scale of 1:5000 with a 50 cm pixel resolution.

Methodology
The proposed workflow can be divided into three main steps ( Figure 3): (i) orthomosaic production and GCP selection; (ii) absolute orthoimage orientation and GCPs' coordinates error assessment; and (iii) spatial analysis and error map production. Orthomosaic production is described in Section 3.1.1, and GCP selection is outlined in Section 3.1.2. Variables considered for the improvement of the final orthomosaic accuracy (Section 3.1.3) are the coordinate errors along the X and Y directions that were subjected to spatial analysis (Section 3.1.4). The present study was based on 67 AAPs captured in black and white between 1954 and 1955 during the first Italian National Planimetric Survey. The photographs (23 cm x 23 cm) were taken with metric cameras at a height of 6000 m with an acquisition scale of 1:34,000. The analogue AAPs were digitalized by means of an Epson Expression 1640 XL nonphotogrammetric scanner with a resolution of 800 dpi (or equivalently, 31.75 µm). AAPs were purchased already digitalized from the IAGO/IGM retailer (Italian Army Geographical Office/Istituto Geografico Militare). Scanned images had different orientations relative to the original images, and they had an overlap of approximately 60%.
Furthermore, a digital terrain model (DTM) was obtained by merging DTMs of Apulian and Basilicata regions, which have a resolution of 8 m and 5 m, respectively. Digital orthoimages of the 2016 flight for the Puglia region and that of 2013 for the Basilicata region were used. The two orthophotographs have a scale of 1:5000 with a 50 cm pixel resolution.

Methodology
The proposed workflow can be divided into three main steps ( Figure 3): (i) orthomosaic production and GCP selection; (ii) absolute orthoimage orientation and GCPs' coordinates error assessment; and (iii) spatial analysis and error map production. Orthomosaic production is described in Section 3.1.1, and GCP selection is outlined in Section 3.1.2. Variables considered for the improvement of the final orthomosaic accuracy (Section 3.1.3) are the coordinate errors along the X and Y directions that were subjected to spatial analysis (Section 3.1.4). In conclusion, by analyzing the obtained error maps, areas overcoming a given user-threshold are sought (Section 3.1.4). If these areas are found, the GCP number is then increased (Section 3.1.5); otherwise, the final fine georeferenced orthomosaic is gained. In conclusion, by analyzing the obtained error maps, areas overcoming a given user-threshold are sought (Section 3.1.4). If these areas are found, the GCP number is then increased (Section 3.1.5); otherwise, the final fine georeferenced orthomosaic is gained.

SfM and Orthomosaic Production
The conventional stereo-approach requires information regarding interior and exterior camera orientation, such as focal length, radial distortion, and fiducial marks [7] located at the four angles of each image. In our case, scans lack the required information. The data characteristics resemble those used in [7,20], and the authors of those contributions highlighted that such cases can be successfully managed by means of the structure from motion-multi-view stereo (SfM-MVS) approach. The basic difference between SfM-MVS and classical photogrammetry consists of three aspects: (i) features that can be automatically identified and matched in images at differing scales, viewing angles, and orientations are considered; (ii) the equations used in the algorithm can be solved without information of camera positions or GCPs, although both can be added and used; and (iii) camera calibration can be automatically solved or refined during the process. The SfM-MVS workflow can be summarized in a number of steps, as follows. During the initial processing, a binary descriptor of the SIFT (scale-invariant feature transform) algorithm is used to extract and then match the features from photographs. On the basis of these features and GCPs, (i) an iterative routine of camera self-calibration, (ii) automatic aerial triangulation (AAT), and (iii) BBA to determine and optimize interior and exterior parameters are performed. After initial processing, maximum point cloud densification is carried out, based on multi-view stereo (MVS) [35][36][37][38]. The final processing steps concern the derivation of a digital elevation model (DEM) and the orthomosaic.
The above-mentioned procedure has been implemented in the software Pix4Dmapper [39]; a commercial product that has a primary point of strength of processing large numbers of images without any knowledge of the camera's calibration parameters [40].
The procedure includes the following steps: -Image matching In each image, key points (points of interest characterized by high contrast or particular textures in the images) are automatically recognized and tied together by the software in relation to their neighbourhood characteristics [7], using SfM algorithms. The number of key points depends on the following: (i) the size of the images and (ii) the visual content. At this step, the image size where the key points were searched and extracted was set to 1 2 , whereas the minimum number of key points to search in each image was set at 10,000 to provide more calibrated images.
-Performing bundle adjustment Using these key points, the AAT and BBA algorithms were performed to recognize the position and orientations of the camera [41]. During this step, the outputs of SfM-MVS are scaled and georeferenced based on GCPs. The 3D point cloud was densified by setting in the software the point density to optimal, which means that every 3D point was calculated for each 4/image scale pixel, and the minimum number of matching among images was set to 2.
-Creation of final orthomosaic At the end of the three stages of SfM-MVS, the mosaic of AAPs was orthorectified on the basis of the DEM obtained from the photogrammetric process.

GCP Selection
The GCP selection was carried out considering, for the entire study area, initially 50 GCPs that were easily identifiable, such as man-made structures [17], in order to balance the need to obtain an initial gross georeferencing of the orthomosaic and a nearly sufficient points number for geostatistical treatment. The amount of GCPs considered is a medium-low number with respect to that reported in the scientific literature [42][43][44][45][46], because it is related to the local spatial variability of the considered variable, which, for the case at hand, is not excessively large.
GCPs were found manually after a visual comparison between old scanned photographs and present orthoimages, which returned a substantially random GCPs sampling [47] with an initial ratio of 50/550~0.1 GCPs/km 2 . The planimetric coordinates of the GCPs (X and Y) were determined in a geographical information system (GIS) using the digital orthophotos, whereas Z coordinates were extrapolated from the DTMs. The reference system used for frame image georeferencing was WGS84/UTM 33N. After the geostatistical analysis, the second stratum of GCPs is selected within the areas characterized by high local errors. In conclusion, the set of GCPs is sampled in a stratified fashion with a first random stratum and a second preferential stratum.

Variables of Interest
The error value at each GCP coordinate (Error X and Error Y) was computed as the difference between the coordinates estimated by Pix4Dmapper in the archival orthomosaic and the real ones identified by the user in a more recent orthophotograph.
In more formal language, an error at a spatial point with coordinates x can be modelled as a sum of two components (1): where η is the spatially auto-correlated component, ε is the pure random spatially un-correlated component (white noise), and D is the study area [48]. The white noise component represents the uncompressible error, whereas η (auto-correlated error) can be suitably modelled and reduced (filtered out) by adding new GCPs. Errors in the X and Y directions were selected as variables to be modelled. The variable error Z was not considered during the geostatistical analysis because the study purpose, being oriented to forestry scope, was to assess only the planimetric accuracy of the final orthomosaic. The maximum error (in absolute value) allowed in terms of accuracy was set to 10 m for both the X and Y coordinates. This threshold was considered appropriate by the authors for the purposes of the study.

Spatial Analysis
Spatial analysis scope includes the following: • statistics for checking the spatial auto-correlation with Moran index; • techniques to analyze and model (variography) the spatial heterogeneity of the variables of interest; • methods to perform predictions of considered variables over the study area (kriging) [46].

-Moran Index
The predisposition of the considered variables to be spatialized is commonly checked using Moran's autocorrelation index (2), which is an extension of the Pearson product-moment correlation coefficient [49]: j=1 ω i,j Moran's index computes a weighted correlation of a variable against itself, where the weights are related to the variable's spatial arrangement [50].

-Geostatistics
The geostatistical analysis is applied for evaluating the spatial heterogeneity of the considered variables (errors along the Cartesian axis) through the structural analysis (variography) and producing maps by means of spatial prediction methods (kriging).
Variography is a two-stage approach, where a function derived from observations, namely the experimental variogram (the grey dotted line in Figure 4), is used to derive a variogram model (the solid line in Figure 4). This model is the mathematical law ruling the spatial heterogeneity of the considered variable [48].
Remote Sens. 2020, 12, 2232 8 of 23 Variography is a two-stage approach, where a function derived from observations, namely the experimental variogram (the grey dotted line in Figure 4), is used to derive a variogram model (the solid line in Figure 4). This model is the mathematical law ruling the spatial heterogeneity of the considered variable [48]. . Experimental and variogram model with corresponding parameters. The total sill ( ) corresponds to the sum of the nugget effect and partial sill (the maximum variability degree of the process); the range (α) can be interpreted as the distance beyond which the spatial correlation becomes negligible [48].

-Variogram fitting
The computational process from experimental variogram to variogram model is referred to as variogram fitting The selection of the best theoretical model is performed by means of an index of goodness-of-fitting termed , which measures the difference between the discrete experimental variogram and the fitted theoretical model at the same lag value (3): where is the variogram model value for the distance ℎ , ℎ is the analogous value of the experimental variogram, and N is the number of point pairs for lag i [51]. The selected model is that having the lowest value of among all those analyzed during the optimization process. -Kriging Variography requires a complementary interpolation method to provide predictions of the variables of interest at unsampled points. The main predictive method in the geostatistical framework is kriging. It is substantially a weighted average, the weights of which are derived from the variogram model. Formula (4) refers to ordinary kriging, which is the most commonly used predictive geostatistical method: where is the spatial location associated to the desired prediction, are kriging weights, and is the observations at the generic location . On the basis of its definition, kriging is a best linear unbiased predictor (BLUP) [46]. Afterwards, a spatial map of predictions can be produced.

-Cross-validation and validation stages
Cross-validation and validation are two quite different processes used for model validation. In the first, validation data are the same as those used previously to calibrate the variogram model; whereas in the second case, validation data are different from those used for model calibration. corresponds to the sum of the nugget effect and partial sill (the maximum variability degree of the process); the range (α) can be interpreted as the distance beyond which the spatial correlation becomes negligible [48].

-Variogram fitting
The computational process from experimental variogram to variogram model is referred to as variogram fitting The selection of the best theoretical model is performed by means of an index of goodness-of-fitting termed SSErr, which measures the difference between the discrete experimental variogram and the fitted theoretical model at the same lag value (3): where γ theor (i) is the variogram model value for the distance h i , γ exp (h i ) is the analogous value of the experimental variogram, and N i is the number of point pairs for lag i [51]. The selected model is that having the lowest value of SSErr among all those analyzed during the optimization process. -Kriging Variography requires a complementary interpolation method to provide predictions of the variables of interest at unsampled points. The main predictive method in the geostatistical framework is kriging. It is substantially a weighted average, the weights of which are derived from the variogram model. Formula (4) refers to ordinary kriging, which is the most commonly used predictive geostatistical method:ẑ where s 0 is the spatial location associated to the desired prediction, λ i are kriging weights, and z(s i ) is the observations at the generic location s i . On the basis of its definition, kriging is a best linear unbiased predictor (BLUP) [46]. Afterwards, a spatial map of predictions can be produced. -

Cross-validation and validation stages
Cross-validation and validation are two quite different processes used for model validation. In the first, validation data are the same as those used previously to calibrate the variogram model; whereas in the second case, validation data are different from those used for model calibration.
Remote Sens. 2020, 12, 2232 9 of 23 (Leave-one-out) cross-validation is a procedure in which, given Z(s i ) a set of n observations at spatial locations s i , the method extracts one observation Z(s i ) at a time and makes the prediction of that observation, referred to asẐ s j −i , using the remaining observations ( Figure 5).
Remote Sens. 2020, 12, 2232 9 of 23 (Leave-one-out) cross-validation is a procedure in which, given a set of n observations at spatial locations , the method extracts one observation at a time and makes the prediction of that observation, referred to as ^ , using the remaining observations ( Figure 5). In the present study, the validation stage was performed using another 25 CPs, selected independently from the GCPs used for the georeferencing. Finally, a scatterplot of the predictions versus the observations can be drawn to determine whether they align along the first quadrant bisector, which would indicate a good capability of the model for capturing the features of the spatial process [48].

-Lin's concordance index
In general, the agreement between predictions and observations is carried out by means of Pearson or Spearman indices. In the present paper, Lin's concordance (agreement) coefficient is preferred to quantify the goodness of model adaptation, checking the agreement between observed versus predicted values. Lin's concordance coefficient provides a measure of overall accuracy that takes into account both bias correction (closeness of the prediction to the actual value) and precision (variability in the predictions). Bias correction is calculated from two bias measures, constant (location-shift) and systematic (scale-shift) bias [52]. The formula as follows (5): where r is the Pearson's coefficient; • are the standard deviations of the true and predicted values, respectively; and are the means; and and are the variances of the true and predicted values, respectively.

GCP Increasing Strategy
If the local accuracy assessed by means of the kriging maps of error X and error Y shows values above the given threshold, to improve that accuracy, a number of new GCPs will be introduced. The applied strategy to increase the GCP number works as follows: the areas surrounding the worst local errors for error X and error Y maps are delineated; each area is multiplied by the ratio between the initial number of GCPs and the study area size to obtain the number of GCPs to be introduced (0.1 for the case at hand); these new GCPs are located with priority within the weak areas where man-made structures are recognizable, otherwise they are placed in the closest allowed positions. In the present study, the validation stage was performed using another 25 CPs, selected independently from the GCPs used for the georeferencing. Finally, a scatterplot of the predictions versus the observations can be drawn to determine whether they align along the first quadrant bisector, which would indicate a good capability of the model for capturing the features of the spatial process [48].
-Lin's concordance index In general, the agreement between predictions and observations is carried out by means of Pearson or Spearman indices. In the present paper, Lin's concordance (agreement) coefficient is preferred to quantify the goodness of model adaptation, checking the agreement between observed versus predicted values. Lin's concordance coefficient provides a measure of overall accuracy that takes into account both bias correction (closeness of the prediction to the actual value) and precision (variability in the predictions). Bias correction is calculated from two bias measures, constant (location-shift) and systematic (scale-shift) bias [52]. The formula as follows (5): where r is the Pearson's coefficient; s X ·s Y are the standard deviations of the true and predicted values, respectively; m X and m Y are the means; and s 2 Y and s 2 X are the variances of the true and predicted values, respectively.

GCP Increasing Strategy
If the local accuracy assessed by means of the kriging maps of error X and error Y shows values above the given threshold, to improve that accuracy, a number of new GCPs will be introduced. The applied strategy to increase the GCP number works as follows: the areas surrounding the worst local errors for error X and error Y maps are delineated; -each area is multiplied by the ratio between the initial number of GCPs and the study area size to obtain the number of GCPs to be introduced (0.1 for the case at hand); -these new GCPs are located with priority within the weak areas where man-made structures are recognizable, otherwise they are placed in the closest allowed positions.

Stage 1: 50 GCPs
According to the methodology described at the specific section, at the first stage, the orthomosaic with a ground sampling distance (GSD) of 1.73 m. was generated, based on the technical characteristics of the considered photographs ( Figure 6).

Stage 1: 50 GCPs
According to the methodology described at the specific section, at the first stage, the orthomosaic with a ground sampling distance (GSD) of 1.73 m. was generated, based on the technical characteristics of the considered photographs ( Figure 6). In the first step, 50 GCPs were randomly selected over the area of interest for georeferencing such an orthomosaic.

Basic Statistics
At the end of the georeferencing process, an error matrix (50 x 3) was provided containing the errors of each GCP (termed error X, error Y, and error Z, respectively) along the (X, Y, Z) axes and three summary RMSE values ( Table 1). As already mentioned, only the error X and error Y variables were analyzed; nevertheless, for the sake of completeness, the statistics related to error Z will also be reported at each process stage. From comparison between RMSE from GCPs and CPs, it emerges that the differences are not particularly marked, even if the RMSEs of CPs are generally slightly larger, as expected. It is noteworthy that all the RMSE values (Table 1) are far below the chosen threshold (10 m), but it should be borne in mind that such an index, being based on an average value, hides possible large local errors. According to such an index, the error Z value appears to be the most uncertain, in contrast to error X. Conversely, the general analysis of GCP errors (Table 2) can be helpful in understanding the spatial distribution of the large errors to ensure a final orthomosaic with the desired degree of accuracy over the entire study area. In the first step, 50 GCPs were randomly selected over the area of interest for georeferencing such an orthomosaic.

Basic Statistics
At the end of the georeferencing process, an error matrix (50 × 3) was provided containing the errors of each GCP (termed error X, error Y, and error Z, respectively) along the (X, Y, Z) axes and three summary RMSE values ( Table 1). As already mentioned, only the error X and error Y variables were analyzed; nevertheless, for the sake of completeness, the statistics related to error Z will also be reported at each process stage. From comparison between RMSE from GCPs and CPs, it emerges that the differences are not particularly marked, even if the RMSEs of CPs are generally slightly larger, as expected. It is noteworthy that all the RMSE values (Table 1) are far below the chosen threshold (10 m), but it should be borne in mind that such an index, being based on an average value, hides possible large local errors. According to such an index, the error Z value appears to be the most uncertain, in contrast to error X. Conversely, the general analysis of GCP errors (Table 2) can be helpful in understanding the spatial distribution of the large errors to ensure a final orthomosaic with the desired degree of accuracy over the entire study area. In fact, from the observation of the extreme (maximum and minimum) values, it can be understood if GCPs errors that overcome the given error threshold exist. In fact, all three maximum values of the considered variables overcome the threshold.

Spatial Analysis
For assessing the extent of weak areas of such high error values, geostatistics provides an effective working environment and tools capable of delineating such areas populated by values above the given threshold.
Because geostatistics requires data Gaussianity, skewness and kurtosis were analyzed. The skewness is always different from zero, indicating an asymmetric shape of the empirical distributions, whereas the analysis of kurtosis values highlights that both the distributions (error X and error Y) have high peaks compared with the normal one, particularly accentuated for the second variable ( Table 2). The errors along the Z-axis (error Z) were neglected in the following analysis, as mentioned previously.
The quantile-quantile plots (QQ-plots) corresponding to the considered variables (Figure 7) indicate the presence of some outliers, confirming that the distributions generally depart slightly from Gaussianity; however, it was preferred not to transform the datasets to ease the interpretation of the phenomenon.  In fact, from the observation of the extreme (maximum and minimum) values, it can be understood if GCPs errors that overcome the given error threshold exist. In fact, all three maximum values of the considered variables overcome the threshold.

Spatial Analysis
For assessing the extent of weak areas of such high error values, geostatistics provides an effective working environment and tools capable of delineating such areas populated by values above the given threshold.
Because geostatistics requires data Gaussianity, skewness and kurtosis were analyzed. The skewness is always different from zero, indicating an asymmetric shape of the empirical distributions, whereas the analysis of kurtosis values highlights that both the distributions (error X and error Y) have high peaks compared with the normal one, particularly accentuated for the second variable ( Table 2). The errors along the Z-axis (error Z) were neglected in the following analysis, as mentioned previously.
The quantile-quantile plots (QQ-plots) corresponding to the considered variables (Figure 7) indicate the presence of some outliers, confirming that the distributions generally depart slightly from Gaussianity; however, it was preferred not to transform the datasets to ease the interpretation of the phenomenon. Before applying geostatistics, a correlation analysis was carried out to determine whether the variables error X and error Y are correlated with the three coordinates (X, Y, Z). Because such a correlation was found (Table 3), it is possible to apply as predictor the universal kriging rather than the ordinary kriging, and consequently produce more accurate maps.
Because the distributions were not Gaussian, the non-parametric Spearman correlation index was computed. For the case at hand, a correlation was found, although not particularly strong, with the X coordinate (Table 3).  Before applying geostatistics, a correlation analysis was carried out to determine whether the variables error X and error Y are correlated with the three coordinates (X, Y, Z). Because such a correlation was found (Table 3), it is possible to apply as predictor the universal kriging rather than the ordinary kriging, and consequently produce more accurate maps. Because the distributions were not Gaussian, the non-parametric Spearman correlation index was computed. For the case at hand, a correlation was found, although not particularly strong, with the X coordinate (Table 3).
Moran's local autocorrelation index showed a significant predisposition of the errors to be spatialized at different ranges. Error X showed a range of 2500 m (Moran's index = 0.325), whereas error Y showed a Moran's index of 0.44 at a range of 3500 m. Therefore, the structural analysis (variography) of these variables was performed, and the experimental and spherical variograms and its main parameters are reported below (Figure 8a,b). The selection of the theoretical models was performed using as the fitness function the lowest value of SSErr, which measures 6.3 × 10 −6 for error X and 7.8 × 10 −6 for error Y. Moran's local autocorrelation index showed a significant predisposition of the errors to be spatialized at different ranges. Error X showed a range of 2500 m (Moran's index = 0.325), whereas error Y showed a Moran's index of 0.44 at a range of 3500 m. Therefore, the structural analysis (variography) of these variables was performed, and the experimental and spherical variograms and its main parameters are reported below (Figure 8a,b). The selection of the theoretical models was performed using as the fitness function the lowest value of , which measures 6.3 x 10 -6 for error X and 7.8 x 10 -6 for error Y. The goodness of fit of the variogram models was checked using the cross-validation and the successive validation stage. The validation was performed using a set of 25 CPs selected independently from the 50 GCPs. The outcomes of this stage are also reported in Table 4.  The goodness of fit of the variogram models was checked using the cross-validation and the successive validation stage. The validation was performed using a set of 25 CPs selected independently from the 50 GCPs. The outcomes of this stage are also reported in Table 4. The goodness-of-fit quality of cross-validations can be considered quite satisfactory. Conversely, the validations are not equally good, not because of the value assumed by Lin's coefficient, but, in fact, owing to its non-significance. The reasons for the low performances of the validation stages can be assumed to be related to different causes. Firstly, a cause can be found in the relatively large extent of the nugget with respect to the sill. In fact, the nugget represents substantially a form of spatially uncorrelated random error that, for the case at hand, impacts adversely on the validation outcomes. Nonetheless, because a correlation structure, although weak, is evident from the variograms (Figure 8a,b), the maps were taken into account, anyway. Secondly, from a purely statistical perspective, the low performances of the validation stage can depend on the number of GCPs (50), which is, probably, too low to be a statistically significant sample of the population. In the following, two maps of errors along X and Y, respectively, are reported ( Figure 9).
Remote Sens. 2020, 12, 2232 13 of 23 The goodness-of-fit quality of cross-validations can be considered quite satisfactory. Conversely, the validations are not equally good, not because of the value assumed by Lin's coefficient, but, in fact, owing to its non-significance. The reasons for the low performances of the validation stages can be assumed to be related to different causes. Firstly, a cause can be found in the relatively large extent of the nugget with respect to the sill. In fact, the nugget represents substantially a form of spatially uncorrelated random error that, for the case at hand, impacts adversely on the validation outcomes. Nonetheless, because a correlation structure, although weak, is evident from the variograms ( Figure  8a,b), the maps were taken into account, anyway. Secondly, from a purely statistical perspective, the low performances of the validation stage can depend on the number of GCPs (50), which is, probably, too low to be a statistically significant sample of the population. In the following, two maps of errors along X and Y, respectively, are reported ( Figure 9). The produced maps can be considered acceptable because a substantial correspondence can be found between observations and predictions, confirming the overall accuracy of values reported above (Table 4). A quick visual inspection of such maps highlights the presence of weak areas that overcome the given error threshold (10 m). The percentage size of such inaccurate areas is 0.21% for error X and 2.53% for error Y, corresponding to approximately 114.5 ha and 1377 ha of the total study area, respectively. The largest errors are located in the north-west and southern parts of the considered maps. Such errors can be related to the poor quality of the archival photographs, leading to increased uncertainty in the choice of points and to an edge effect. The final result highlights that the orthomosaic accuracy cannot be considered completely satisfying for the purpose of the study; hence, it was necessary to increase the number of GCPs, the location of which was suggested by the two error maps.

Stage 2: 75 GCPs
For the case at hand, 25 new GCPs were introduced following the methodology outlined in Section 3.1.5. The number of GCPs was achieved by summing the weak areas' sizes of error X and error Y, approximately 250 km 2 , and multiplying that value by 0.1; that is, the ratio between the initial number of GCPs and the entire study area size (see Section 3.1.2). The placement of the new GCPs was pursued by seeking man-made structures over or close to weak areas; hence, such additional sampling is preferential. In Figure 10, the spatial distribution over the two errors maps of the 25 GCPs just introduced are reported. The produced maps can be considered acceptable because a substantial correspondence can be found between observations and predictions, confirming the overall accuracy of values reported above (Table 4). A quick visual inspection of such maps highlights the presence of weak areas that overcome the given error threshold (10 m). The percentage size of such inaccurate areas is 0.21% for error X and 2.53% for error Y, corresponding to approximately 114.5 ha and 1377 ha of the total study area, respectively. The largest errors are located in the north-west and southern parts of the considered maps. Such errors can be related to the poor quality of the archival photographs, leading to increased uncertainty in the choice of points and to an edge effect. The final result highlights that the orthomosaic accuracy cannot be considered completely satisfying for the purpose of the study; hence, it was necessary to increase the number of GCPs, the location of which was suggested by the two error maps.

Stage 2: 75 GCPs
For the case at hand, 25 new GCPs were introduced following the methodology outlined in Section 3.1.5. The number of GCPs was achieved by summing the weak areas' sizes of error X and error Y, approximately 250 km 2 , and multiplying that value by 0.1; that is, the ratio between the initial number of GCPs and the entire study area size (see Section 3.1.2). The placement of the new GCPs was pursued by seeking man-made structures over or close to weak areas; hence, such additional sampling is preferential. In Figure 10, the spatial distribution over the two errors maps of the 25 GCPs just introduced are reported.

Basic Statistics
Once identified, the GCPs were added to the dataset, and the georeferencing of the orthomosaic was repeated, producing an error matrix of 75 x 3.
The results of the RMSE values are shown in Table 5. The table shows that the RMSEs of GCPs values related to error X and error Y decreased (13.94% and 31.87%, respectively), but as already noted, the analysis of such a parameter is insufficient to establish the absence of weak areas. The RMSE related to error X improved by 13.55%, confirming the improvement of RMSE of GCPs. The RMSE related to error Y showed a worsening of 1.35%. Therefore, a statistical analysis of GCPs was performed once more. The basic statistics referring to the errors of the three coordinates of the GCPs highlight a noticeable improvement characterized by the decrease in absolute terms of the extreme values (minimum and maximum) ( Table 6).
The QQ-plots, skewness, and kurtosis of error X and error Y variables ( Figure 11) show a distribution closer to a Gaussian one, with the exception of error Y, which still has outlier values, but they were lower than the threshold.

Basic Statistics
Once identified, the GCPs were added to the dataset, and the georeferencing of the orthomosaic was repeated, producing an error matrix of 75 × 3.
The results of the RMSE values are shown in Table 5. The table shows that the RMSEs of GCPs values related to error X and error Y decreased (13.94% and 31.87%, respectively), but as already noted, the analysis of such a parameter is insufficient to establish the absence of weak areas. The RMSE related to error X improved by 13.55%, confirming the improvement of RMSE of GCPs. The RMSE related to error Y showed a worsening of 1.35%. Therefore, a statistical analysis of GCPs was performed once more. The basic statistics referring to the errors of the three coordinates of the GCPs highlight a noticeable improvement characterized by the decrease in absolute terms of the extreme values (minimum and maximum) ( Table 6).
The QQ-plots, skewness, and kurtosis of error X and error Y variables ( Figure 11) show a distribution closer to a Gaussian one, with the exception of error Y, which still has outlier values, but they were lower than the threshold.  It can be demonstrated that the worst values (minimums and maximums) had improved significantly after the increase of the GCPs, as reported in Table 7. In more detail, such an analysis can be broadened to check how many points have improved and how many worsened and to what extent they were changed (Figure 12). It can be demonstrated that the worst values (minimums and maximums) had improved significantly after the increase of the GCPs, as reported in Table 7. In more detail, such an analysis can be broadened to check how many points have improved and how many worsened and to what extent they were changed ( Figure 12). Table 8 summarizes the results in Figure 12 for all variables. Concerning error X, the result can seem poor in absolute terms because the number of improvements is less than that of worsening. However, when examining the average values related to the improvements and worsening, it is clear that the average improvements outperform the worsening by approximately threefold. Hence, worsening, when it takes place, can be considered to be practically negligible. In more detail, such an analysis can be broadened to check how many points have improved and how many worsened and to what extent they were changed (Figure 12).  Table 8 summarizes the results in Figure 12 for all variables. Concerning error X, the result can seem poor in absolute terms because the number of improvements is less than that of worsening. However, when examining the average values related to the improvements and worsening, it is clear that the average improvements outperform the worsening by approximately threefold. Hence, worsening, when it takes place, can be considered to be practically negligible. Concerning the error Y results, in this case, the number of improvements overcomes the number of worsening and, once more, the average intensity of the improvement outperforms by about two times the worsening. The comparison of the standard deviation values indicates a tendency to decrease. The error X value goes from 2.84 to 2.22 m; error Y from 3.25 to 2.34 m; and, in particular, the error Z value decreases from 4.88 to 3.25 m. This result indicates that, in general, the model equalizes the error values, making them more similar to each other in this second step than in the first one.

Spatial Analysis
The most relevant result is that both error X and error Y present no more local values that overcome the given threshold. Hence, the addition of a further 25 points enables the desired goal to be achieved, and the spatial analysis was performed anyway to compare the maps obtained by 50 and 75 GCPs.
As noted previously, the correlation analysis between the variables and coordinates was performed. From Table 9, it can be seen that error Y is correlated significantly to all three coordinates. The other two errors do not present any significant correlation.  The most relevant result is that both error X and error Y present no more local values that overcome the given threshold. Hence, the addition of a further 25 points enables the desired goal to be achieved, and the spatial analysis was performed anyway to compare the maps obtained by 50 and 75 GCPs. As noted previously, the correlation analysis between the variables and coordinates was performed. From Table 9, it can be seen that error Y is correlated significantly to all three coordinates. The other two errors do not present any significant correlation.    Table 10 reports the overall accuracies related to cross validation and validation stages. Error X shows an accuracy that is nearly sufficient, whereas error Y shows accuracy values that are quite satisfactory. An explanation for such different behaviour can be tied to the selection of the 25 new points; they lowered the structural part of error X, whereas error Y was only impacted negligibly. This difference may be owing to a different uncertainty degree related to the X coordinate (uncompressible error) with respect to the Y coordinate (auto-correlated error).
Finally, two maps of errors along X and Y, respectively, were reported ( Figure 14).
shows an accuracy that is nearly sufficient, whereas error Y shows accuracy values that are quite satisfactory. An explanation for such different behaviour can be tied to the selection of the 25 new points; they lowered the structural part of error X, whereas error Y was only impacted negligibly. This difference may be owing to a different uncertainty degree related to the X coordinate (uncompressible error) with respect to the Y coordinate (auto-correlated error).
Finally, two maps of errors along X and Y, respectively, were reported ( Figure 14). By analyzing both maps, it can be observed that, after using 75 GCPs for georeferencing the orthomosaic, some residual weak areas with high error values are located approximately at the same positions (Error Y map); however, in absolute terms, the error values are far below those at the earlier stage.  Figure 15a shows the variation of the mean values related to the three errors before and after the application of the proposed methodology. As a first remark, it is noticeable that the mean values were reduced and tend to zero. Another feature of this result is the sign of the mean value that is positive after the methodology application, which suggests that there is a tendency to overestimate the error values during the kriging application. Concerning Figure 15b, a decrease in the standard deviation values is apparent. An interpretation of this result is that a tendency exists towards an equalization of the error values; that is to say, the structural error vanishes, and the white noise remains, as expected. In fact, the three error distributions become even more similar to the general white noise distribution that is Gaussian centred on zero. By analyzing both maps, it can be observed that, after using 75 GCPs for georeferencing the orthomosaic, some residual weak areas with high error values are located approximately at the same positions (Error Y map); however, in absolute terms, the error values are far below those at the earlier stage. Figure 15a shows the variation of the mean values related to the three errors before and after the application of the proposed methodology. As a first remark, it is noticeable that the mean values were reduced and tend to zero. Another feature of this result is the sign of the mean value that is positive after the methodology application, which suggests that there is a tendency to overestimate the error values during the kriging application. Concerning Figure 15b, a decrease in the standard deviation values is apparent. An interpretation of this result is that a tendency exists towards an equalization of the error values; that is to say, the structural error vanishes, and the white noise remains, as expected. In fact, the three error distributions become even more similar to the general white noise distribution that is Gaussian centred on zero. To provide a judgement regarding the appropriateness of this methodology, it is necessary to compare the achieved results with the existing literature. After a review, it emerged that authors typically tend to lower RMSE values and achieve a fine georeferencing using a large information rate for unit area (ratio between number of GCPs and the size of the study area), which can be measured as nGCPs/Km 2 [18,32] (see Table 11). Hence, there is a general tendency for neglecting the GCPs' positions in favour of their number.

Discussion
The issue of minimizing the GCP number optimally has been addressed by many authors. Table  11 reports the results derived from a set of papers, the contributions of which can be grouped into three main categories: (i) the use of archival aerial photogrammetry to quantify the landscape change [1,25]; (ii) the application of SfM for orientation, orthorectification, and mosaic composition of historical aerial images [7,21]; and (iii) the optimal distribution [18] and number [27] of GCPs to optimize the accuracy of the georeferencing process obtained by unmanned aerial vehicle (UAV) photogrammetry. Table 11 demonstrates clearly that, by comparing results from the scientific literature with those achieved in the present paper, it is, in fact, possible to reach the same RMSE values with a reduced information rate for area unit. This was reached by selecting GCPs in the areas highlighted during the geostatistical analysis (weak areas).
Therefore, this finding demonstrates that GCP position can be a key piece of information for achieving a fine georeferencing as much as the GCP number. In Table 11, the references reporting a better result compared with the present study are those of [25,27] and [18]. However, it should be highlighted that the nGCPs/Km 2 is approximately 65, 10, and 871 times larger than the results obtained here, and the corresponding size areas are substantially smaller than the presented study area. It may be that it would have been possible to achieve a similar goal with a smaller increase in the information rate, but such an analysis was beyond the objective of the present study. To provide a judgement regarding the appropriateness of this methodology, it is necessary to compare the achieved results with the existing literature. After a review, it emerged that authors typically tend to lower RMSE values and achieve a fine georeferencing using a large information rate for unit area (ratio between number of GCPs and the size of the study area), which can be measured as nGCPs/Km 2 [18,32] (see Table 11). Hence, there is a general tendency for neglecting the GCPs' positions in favour of their number. The issue of minimizing the GCP number optimally has been addressed by many authors. Table 11 reports the results derived from a set of papers, the contributions of which can be grouped into three main categories: (i) the use of archival aerial photogrammetry to quantify the landscape change [1,25]; (ii) the application of SfM for orientation, orthorectification, and mosaic composition of historical aerial images [7,21]; and (iii) the optimal distribution [18] and number [27] of GCPs to optimize the accuracy of the georeferencing process obtained by unmanned aerial vehicle (UAV) photogrammetry. Table 11 demonstrates clearly that, by comparing results from the scientific literature with those achieved in the present paper, it is, in fact, possible to reach the same RMSE values with a reduced information rate for area unit. This was reached by selecting GCPs in the areas highlighted during the geostatistical analysis (weak areas).
Therefore, this finding demonstrates that GCP position can be a key piece of information for achieving a fine georeferencing as much as the GCP number. In Table 11, the references reporting a better result compared with the present study are those of [25,27] and [18]. However, it should be highlighted that the nGCPs/Km 2 is approximately 65, 10, and 871 times larger than the results obtained here, and the corresponding size areas are substantially smaller than the presented study area. It may be that it would have been possible to achieve a similar goal with a smaller increase in the information rate, but such an analysis was beyond the objective of the present study.
Another important result is that the optimal (minimal) number of GCPs required to achieve the study's main objective is within the range of 50 < opt ≤ 75. Furthermore, it is noticeable that the RMSE related to error Z improved strongly, notwithstanding the fact the GCPs were selected with respect to errors X and Y only. This association suggests the possible existence of inter-relationships between the three errors and that, probably, an improvement along a single direction can impact positively upon all the remaining errors. Therefore, applying the methodology to a single direction could achieve similar results in terms of accuracy by simplifying the overall complexity of the procedure to a significant degree.
From a computational standpoint, it can be stated that the proposed methodology converged after only a single run for the presented case study. This is not surprising, because simulation trials demonstrated that, at most, a couple of runs are sufficient to reach the desired accuracy. Therefore, the methodology is generally also particularly rapid.

Conclusions
This research has presented a geostatistically-based methodology aimed at the following: (i) assessing and mapping the local accuracy of a grossly georeferenced archival orthomosaic; and (ii) improving that accuracy to meet a given error threshold (10 m) by selecting a number of GCPs within or close to areas affected by large errors [16] reported by error maps.
After the methodology's application, (i) the target error was, in fact, reached along all three axes, and (ii) the target was reached with a minimal quantity of GCPs. The (ii) point was gained not by chance, but as an effect of limiting areas within which GCPs were found.
To compare two different GCP configurations over two different areas, the information rate per unit area, measured in terms of nGCPs/Km 2 , can be an effective index.
A review reported in Table 11 shows that the rate was substantially lower than those considered in this study. To test the proposed methodology, a wide and morphologically complex study area was considered. The methodology found the convergence towards a satisfying solution after only one run. The results showed a significant improvement for error X, which decreased far below the given threshold; in fact, the largest local error was approximately 5 m, only half of the required target. The same large positive impact was found for error Z, notwithstanding the fact that the GCP selection strategy was not driven by such a variable. At first sight, error Y, observing the behaviour of the largest errors, would appear to be influenced in a lower measure by the proposed procedure, but this impression is incorrect because the RMSE improved by approximately 32%. In addition, as emerged following the geostatistical analysis, the structural error was not completely filtered out [31], demonstrating potential room for further improvement. However, because the largest errors decreased below the given threshold (even if by only a small amount), and the local errors converged towards a uniform value over the entire study area, the procedure was stopped.
Therefore, the procedure can be considered validated and the application fully successful. Regarding the next developments, we consider that the addressed topic deserves further investigation. In particular, research into the error propagation during georeferencing process is presently in progress. The initial results are interesting and are likely to be of high significance.