Multi-Image Robust Alignment of Medium-Resolution Satellite Imagery

: This paper describes an automatic multi-image robust alignment (MIRA) procedure able to simultaneously co-register a time series of medium-resolution satellite images in a bundle block adjustment (BBA) fashion. Instead of the direct co-registration of each image with respect to a reference ‘master’ image on the basis of corresponding features, MIRA also considers those tie points that may be not be shared with the master, but they only connect the other images (‘slaves’) among them. In a ﬁrst stage, tie points are automatically extracted by using pairwise feature-based matching based on the SURF operator. In a second stage, such extracted features are re-ordered to ﬁnd corresponding tie points visible on multiple image pairs. A ‘master’ image is then selected with the only purpose to establish the datum of the ﬁnal image alignment and to instantiate the computation of approximate registration parameters. All the available information obtained so far is fed into a least-squares BBA to estimate the unknowns, which include the registration parameters and the coordinates of tie points re-projected in the ‘master’ image space. The analysis of inner and outer reliability of the observations is applied to assess whether the residual blunders may be located using data snooping, and to evaluate the inﬂuence of undetected outliers on the ﬁnal registration results. Three experiments with simulated datasets and one example consisting of eleven Landsat-5/TM images are reported and discussed. In the case of real data, results have been positively checked against the ones obtained by using alternative procedures (BBA with manual measurements and ‘slave-to-master’ registration based on automatically extracted tie points). These experiments have conﬁrmed the correctness of the MIRA approach and have highlighted the potential of the inner control on the ﬁnal quality of the solution that may come from the reliability analysis.


Introduction
The growing availability of medium-resolution satellite images for Earth observation (EO) gives an unprecedented opportunity to monitor land cover changes and dynamic processes, up to a hyper-temporal resolution of a few days. Operating satellites such as Landsat, Disaster Monitoring Constellation, and Sentinel-2 may provide data at geometric resolution between 10 m and 30 m in terms of ground sample distance (GSD), while covering a large radiometric spectrum [1][2][3][4].
Typically, such datasets are delivered after topographic correction to remove relief displacement errors and the effect of Earth curvature [5]. These pre-processing steps result in the fact that image registration, which is a fundamental pre-requisite for any multi-image analyses, can be accomplished by using 2-D transformations [6]. The direct use of metadata information for registering image time series is not enough to carry out the precise alignment at pixel and subpixel levels, as required in many applications. series is not enough to carry out the precise alignment at pixel and subpixel levels, as required in many applications.
Traditional methods for image alignment (or registration) [7][8][9] rely on the use of corresponding features identified between a reference image (frequently called 'master' image) and each generic image ('slave' image) in the time series. This registration method is usually carried out for all the 'masterto-slave' combinations, as shown in Figure 1a. Corresponding features may be measured manually, but several procedures have been developed for their automatic extraction [10][11][12]. Such features are used for estimating a geometric 2D transformation to map the images to each other and to obtain pixel-to-pixel overlap after resampling. The 'master' image defines the absolute spatial datum, for example by using metadata information or ground control points (GCPs), which may come from in situ GNSS measurements or from existing digital maps [13].
In remote-sensing applications for change detection, the acquisition of data collected over the same geographic area but at different epochs (e.g., time of the day, season, year) may be accomplished in uneven conditions of cloud cover, illumination, viewing geometry, and/or imaging sensors [14]. Consequently, the radiometric content of a multi-temporal dataset may sharply differ from one image to another [15], even if the sensed areas overlap. In such a case, the image alignment of long time series may become an involved task. This may also result in the fact that all 'slave' images may not share enough corresponding features with the 'master' image, resulting in need of concatenating additional 'master' images with consequent error propagation.
(a) (b) Figure 1. Traditional 'master-to-slave' (a) versus multi-image alignment (b). Lines represent the existence of enough corresponding features shared between two images to enable the computation of the registration parameters.
Starting from Orun and Natarajan [16], an alternative approach for the registration of satellite images on the basis of bundle block adjustment (BBA) was introduced. BBA is commonly applied in photogrammetry and computer vision for image orientation [17]. The basic concept is to use not only corresponding features for estimating the 'master-to-slave' registration parameters, but also to introduce corresponding points shared between the 'slave' images, see Figure 1b. The coordinates of these correspondencies, which are usually addressed to as tie points (TPs), are used to instantiate a system of redundant equations to be solved within a least-squares (LS) framework for the determination of the unknown registration parameters and the coordinates of TPs in a given geodetic datum. This can be defined by introducing enough GCPs or using some inner constraints [18]. In such a way, also those 'slave' images without corresponding features that are directly shared with the 'master' may be registered, limiting error propagation. In addition, since TPs may be observed on more than two images and used to write multiple equations in the BBA formulation, the inner reliability of the observations increases and the procedure may gain robustness against gross errors. Moreover, some additional images could be added into the data processing workflow to improve the network geometry and to benefit to the alignment.
The application of BBA approach to the registration of satellite data has been already proposed in the technical literature. Toutin [19] developed a solution for the BBA of Landsat 7 ETM+ based on Starting from Orun and Natarajan [16], an alternative approach for the registration of satellite images on the basis of bundle block adjustment (BBA) was introduced. BBA is commonly applied in photogrammetry and computer vision for image orientation [17]. The basic concept is to use not only corresponding features for estimating the 'master-to-slave' registration parameters, but also to introduce corresponding points shared between the 'slave' images, see Figure 1b. The coordinates of these correspondencies, which are usually addressed to as tie points (TPs), are used to instantiate a system of redundant equations to be solved within a least-squares (LS) framework for the determination of the unknown registration parameters and the coordinates of TPs in a given geodetic datum. This can be defined by introducing enough GCPs or using some inner constraints [18]. In such a way, also those 'slave' images without corresponding features that are directly shared with the 'master' may be registered, limiting error propagation. In addition, since TPs may be observed on more than two images and used to write multiple equations in the BBA formulation, the inner reliability of the observations increases and the procedure may gain robustness against gross errors. Moreover, some additional images could be added into the data processing workflow to improve the network geometry and to benefit to the alignment.
The application of BBA approach to the registration of satellite data has been already proposed in the technical literature. Toutin [19] developed a solution for the BBA of Landsat 7 ETM+ based on a Remote Sens. 2018, 10,1969 3 of 25 3D analytical geometric model for multi-sensor images, including orbital constraints. Different sets of 3D GCPs integrated to TPs with only known elevations were tested. Results obtained from BBA were similar to the ones from single 'image-to-GCP' alignment, but with a significant reduction of GCP number. The same approach was then extended to deal with high-resolution Ikonos data [20] and to multi-sensor fusion [21,22]. At the same time, Grodecki and Dial [23] investigated a similar technique but based on rational polynomial coefficients (RPCs). Since this model can deal with a large variety of sensors, it applies to any imaging systems with a narrow field-of-view, a calibrated stable interior orientation, and an accurate a priori exterior orientation. The same research direction was continued in Fraser and Hanley [24] and Rottensteiner [25], where a bias-compensated BBA based on RPCs was demonstrated to provide subpixel accuracy notwithstanding the minimum ground control. High-resolution Ikonos, QuickBird, and ALOS data were used in this study. As far as new sensors have been launched, new approaches for BBA of high-resolution satellite images were introduced, as in the case of Chinese ZY3 data [26].
The studies mentioned above mainly focus on the analytical aspects of BBA and the evaluation of the obtainable accuracy when using the proposed methods. In Barazzetti [27] the focus was given on the automatic extraction of TPs for the registration of medium-resolution satellite image sequences, instead of using manual interactive measurements. These TPs are obtained on the basis of robust feature-based matching (FBM) techniques (see Section 2.1), and then input in the BBA for the simultaneous computation of all image registration parameters. Of course, in the case of poor image texture, the automatic extraction of TPs may easily fail. This case frequently happens, for instance, when a significant portion of the image depicts a water body. On the other hand, this problem does not depend on the method used for the measurement of TPs, since, in the case of poor image texture, the interactive approach may also result in severe problems.
The automatic extraction of TPs is a fundamental step towards the complete automation of the registration process, which represents a task of high relevance when dealing with large datasets. An important objective in automatic procedures is to minimize the influence of possible residual measurement errors. Even though FBM may limit the number of blunders, also a small fraction of them in the set of observations to be processed within the BBA might lead to significant biases in the final image registration parameters. The methods usually adopted to reject outliers in BBA are more efficient when data redundancy is large. This property may be obtained on one side by extracting multiple connections between images. On the other hand, the application of the inner and outer reliability concepts, widely popular in geodesy and photogrammetry, also allows evaluating which is the risk to have registration errors larger than prefixed thresholds. Thus, the focus in this paper is given to develop the concept presented in Barazzetti [27] to be integrated into a full, robust procedure (multi-image robust alignment) for the automatic registration of medium-resolution satellite images, with special emphasis on the reliability analysis (see Section 2). In Section 3 an application to a dataset of Landsat-5/TM images is presented. After a discussion on the experimental results in Section 4, Section 5 draws some conclusions and addresses future work.

Overview
The multi-image robust alignment (MIRA) procedure consists in a multi-step process, as shown in Figure 2. The core is a preliminary subdivision of the original dataset made up of n images into all n(n − 1)/2 possible image-pair combinations. Image correspondences are looked for independently in each image pair by using FBM, as described in Section 2.2. In a second stage, all the extracted features are reordered to find multiple TPs (Section 2.3). A reference 'master' image is selected with the only purpose to set up the spatial datum. This selection is based on the analysis of extracted connections between images (see Section 2.4). The TP set is used to jointly estimate all transformations' parameters Remote Sens. 2018, 10,1969 4 of 25 within a BBA (see Section 2.5). One crucial aspect of this procedure is the analysis of observations' inner reliability, which will be the subject of Section 2.6.
Remote Sens. 2018, 11, x FOR PEER REVIEW 4 of 24 Figure 2. Flowchart of the MIRA method.

Analysis of Individual Image Pairs
After the publication of SIFT [28], a new category of SIFT-like algorithms has started to be successfully applied in FBM procedures for the registration of images in close-range photogrammetry and computer vision (see Wu [29]). Thanks to a multi-resolution analysis, SIFT-like algorithms are able to extract distinctive features in the images with a high degree of robustness against shift, scale, and rotation. Each feature is also assigned a descriptor (for example, a vector of 128 elements that can be characterized using its norm) that may be exploited for matching features in different images. SIFT-like algorithms have replaced correlation-based techniques that were previously used in FBM. These were based on the similarity between radiometric values and consequently were less robust against geometric and radiometric changes [11].
Here, the SURF operator has been specifically used [30]. SURF has a lower computational time than other SIFT-like algorithms [31,32] and is prone to extract a large number of key-points (i.e., points that are candidates to become TPs) that may be observed in more than a single image pair (manifold features), as proved in Barazzetti [33]. This capability is very important to obtain a redundant dataset of TPs, which is the primary purpose of MIRA.
Corresponding features are obtained by exhaustively comparing the SURF descriptors (in vectors Dm and Dn) of all key-points extracted on the image pair to register. No preliminary information about the spatial location in the images is used for registration. Then the Euclidean distance (dmn) between descriptors of points in different images is used as metrics for FBM: The following procedure is applied to find a set of homologous points: 1. Distances dmn are worked out between all combinations of key-points in the image pair;

Analysis of Individual Image Pairs
After the publication of SIFT [28], a new category of SIFT-like algorithms has started to be successfully applied in FBM procedures for the registration of images in close-range photogrammetry and computer vision (see Wu [29]). Thanks to a multi-resolution analysis, SIFT-like algorithms are able to extract distinctive features in the images with a high degree of robustness against shift, scale, and rotation. Each feature is also assigned a descriptor (for example, a vector of 128 elements that can be characterized using its norm) that may be exploited for matching features in different images. SIFT-like algorithms have replaced correlation-based techniques that were previously used in FBM. These were based on the similarity between radiometric values and consequently were less robust against geometric and radiometric changes [11].
Here, the SURF operator has been specifically used [30]. SURF has a lower computational time than other SIFT-like algorithms [31,32] and is prone to extract a large number of key-points (i.e., points that are candidates to become TPs) that may be observed in more than a single image pair (manifold features), as proved in Barazzetti [33]. This capability is very important to obtain a redundant dataset of TPs, which is the primary purpose of MIRA.
Corresponding features are obtained by exhaustively comparing the SURF descriptors (in vectors D m and D n ) of all key-points extracted on the image pair to register. No preliminary information about the spatial location in the images is used for registration. Then the Euclidean distance (d mn ) between descriptors of points in different images is used as metrics for FBM: 1. Distances d mn are worked out between all combinations of key-points in the image pair; 2.
Distances d mn are ranked from the shortest d 1 mn to the largest d p mn ; 3.
The couple of key-points corresponding to the smallest distance d 1 mn are assumed to be matched; 4.
A ratio test [34] (first distance d 1 mn /second distance d 2 mn ) is applied to scrutinize distinctive matches; 5.
If the ratio test has passed, both key-points are assigned as corresponding points and are removed from the list of potential matches; otherwise, the points are kept available to be considered at a later stage; and 6.
The analysis moves on to consider the following distance d 2 mn in the rank up to the completion of the list.
When SURF retrieves a sufficient number of image correspondences, some mismatches may often still be present. To remove such outliers, the procedure makes use of the robust estimation of a 2D similarity transformation between both sets of extracted features in the images. In Barazzetti [27] more details can be found about the implementation of this step, which is based on the use of the popular high-breakdown point estimator RANSAC [35].
The similarity transformation may not be the best model to fit real data, depending on the sensor and the ground topography [13]. On the other hand, at this stage, the aim is to remove large blunders, which may account for tens of pixels. The recourse to a similarity transformation may help cope with this problem, while the rejection of smaller measurement errors is afforded afterward by using more refined models (see Table 1) that may be selected during the successive multi-image adjustment. For this reason, a relatively large threshold (e.g., 2-3 pixels) is selected for discriminating outliers with RANSAC.
The geometric distribution of corresponding points is analyzed to seek for weak configurations during RANSAC estimation. The covariance matrix C xx of the registration parameters is compared against a criterion matrix H constructed by assuming an optimal point distribution. The analysis is based on the maximum eigenvalue λ max of matrix K, which is computed as follows: (2) Table 1. Geometric registration models encompassed as subcases of the general planar transformation implemented in MIRA during multi-image adjustment.

Mapping Model # Unknowns (n p ) Equations N min Geometric Deformations
Similarity 4 x i = a 0 + x j mcosα − y j msinα y i = b 0 + x j msinα + y j mcosα 12 2D shifts, rotation, scaling 1st Degree Polynomial (Affine) u = 1 6 x i = a 00 + a 10 x j + a 11 y j y i = b 00 + b 10 x j + b 11 y j 18 2D shifts, rotation, shear, scaling along both axes 2nd Degree Polynomial (u = 2) 12 x i = a 00 + a 10 x j + a 11 y j + +a 20 x 2 j + a 21 x j y j + a 22 y 2 j y i = b 00 + b 10 x j + b 11 y j + +b 20 x 2 j + b 21 x j y j + b 22  x i = a 00 + a 10 x j + a 11 y j + a 20 x 2 j + a 21 x j y j + +a 22 y 2 j + a 30 x 3 j + a 31 x 2 j y j + a 32 x j y 2 j + a 33 y 3 j y i = b 00 + b 10 x j + b 11 y j + b 20 x 2 j + b 21 x j y j + +b 22 y 2 j + b 30 x 3 j + b 31 x 2 j y j + b 32 x j y 2 j + b 33 y 3 j 60 2D shifts, rotation, scaling along both axes, torsion, and convexity along both axes, other deformations without geometric interpretation Here the approach proposed in Förstner and Wrobel [36] has been modified to account for the different number of points in the real (n c ) and the ideal case (n h ), see Syrris [15]. If λ max ≤ 1 the real configuration is better than the ideal one. In such a case, the solution is accepted, and the RANSAC Remote Sens. 2018, 10,1969 6 of 25 estimate is terminated. If the condition λ max ≤ 1 is never verified, the solution corresponding to the minimum eigenvalue, but less than a safety threshold (λ th = 4) is selected.
After the completion of RANSAC cycle, the set of inliers is rejected if the number of corresponding points is below N min = 12, otherwise they are used for estimating the four parameters of the similarity transformation on the basis of LS regression [37]. The threshold N min corresponds to six times the minimum dataset to compute a similarity transformation (i.e., two points), which is twice the safety value (three) frequently adopted in geodesy to guarantee a sufficient global redundancy in LS regressions. A statistical testing procedure based on data snooping [38] is applied to remove the possible remaining tiny fraction of outliers (<5%).
The quality assessment of the solution is based on checking the theoretical accuracy of shift parameters (σ c andσ r along columns and rows, respectively) that may be extracted from the estimated covariance matrix C xx . These are related to the standard deviation of an observation with unit weight (σ 0 ) estimated after LS regression, and to the number of extracted inliers (F): A threshold onσ c andσ r is introduced to check weak configurations. In fact, results worse than σ c =σ r = ±3 pixels for shift parameters are mainly caused by incorrect matches and should be removed from the datasets. It should be also mentioned that the FBM process is highly prone to be parallelized. On one side the extraction of key-points with the SURF operator may be independently carried out in each image. On the other, the pairwise FBM procedure may be applied to each considered image pair disregarding others.

Derivation of Multiple Tie Points
As expected from the multiple overlaps, it is likely that some corresponding features are 'visible' on more than a single image pair. Indeed, as already addressed in Section 2.2, an essential characteristic of SURF operator is the capability of finding the same feature in multiple images under different geometric and radiometric conditions. A comparison of the numerical value of the extracted pixel coordinates of corresponding points obtained from the image-to-image matching stage may provide a regular structure of multiple (or manifold) tie points (TPs), i.e., TPs that may be measured on more than two images. Such points improve the inner reliability of the observations and make the registration process less sensitive to residual measurement errors [39], as it will be illustrated in Section 2.6.

Selection of the 'Master' Image
The selection of the 'master' image is based on the distribution of corresponding points between the images. A connectivity graph is drawn to show the relationship between the images and to highlight possible weak connections in the network. An example of connectivity graph related to the example presented in Section 3 is displayed in Figure 3 using a matrix representation, where a cross indicates the presence of sufficient TPs for co-registering the pair of images corresponding to the intersecting row and column. While a rapid look at the connectivity matrix is enough to check out its consistency in the case of visual interpretation, a test is implemented to afford this task automatically. Once the consistency of the connectivity graph is assessed, the 'master' image is chosen on the basis of the following criteria: to maximize the number of images that share enough TPs with the 'master', i.e., at least N min ; and 2.
the 'master' image should be preferably in a central position in the time series t = 1, 2, 3, . . . , n. Indeed, a time series covers the same spatial region within a time span and, consequently, the temporal factor is more important than the spatial factor for selecting the 'master' image. The connectivity graph is also useful to work out the approximate values for any parameters to be estimated in the global multi-image BBA, see Section 2.5. The adopted procedure is similar to the scheme followed in the 'structure-from-motion' technique [40] for image orientation in close-range photogrammetry. By looking at the connectivity graph, all 'slave' images that are directly linked to the 'master' image may be approximately registered in pairwise, independent manner. In this way, any images in this first group of registered images can be used as new 'master' images to register other 'slaves' that would share sufficient corresponding points with one of them. By using such approximate registration parameters it is then possible to re-project the coordinates of any points on the space of the main 'master' image.
Remote Sens. 2018, 11, x FOR PEER REVIEW 7 of 24 photogrammetry. By looking at the connectivity graph, all 'slave' images that are directly linked to the 'master' image may be approximately registered in pairwise, independent manner. In this way, any images in this first group of registered images can be used as new 'master' images to register other 'slaves' that would share sufficient corresponding points with one of them. By using such approximate registration parameters it is then possible to re-project the coordinates of any points on the space of the main 'master' image.

Analysis of Multiple Images
Since the MIRA method provides corresponding TPs not only for 'master-to-slave' combinations but also for 'slave-to-slave' pairs, all transformation parameters can be concurrently estimated.
A generic 2D polynomial transformation of degree p between two images is implemented at this stage. Given a feature k, the transformation of its coordinates between images i and j is given by:

Analysis of Multiple Images
Since the MIRA method provides corresponding TPs not only for 'master-to-slave' combinations but also for 'slave-to-slave' pairs, all transformation parameters can be concurrently estimated.
A generic 2D polynomial transformation of degree p between two images is implemented at this stage. Given a feature k, the transformation of its coordinates between images i and j is given by: The number (n p ) of transformation parameters (a uvj , b uvj ) depends on the selected geometric model (see Table 1).
The implementation of Equation (4) in the BBA has been revised with respect to the previous version published in Barazzetti [27] to define a more rigorous stochastic model. All transformations in the global adjustment are written between the 2D image space of the generic 'slave' j and the 2D image space of the 'master' (M). The functional model of the observation equations based on Equation (4) then becomes: where x Mk and y Mk are the image coordinates of tie point k projected in the image space of the 'master' image, while v xjk and v yjk are residuals. Underlined parameters in Equation (5) are considered as unknowns, including TP coordinates re-projected on the 'master' image. This formulation is stochastically corrected, since the observed quantities (with errors) are bounded on the left side and unknowns are on the right side. Consequently, the observed coordinates (x jk , y jk ) of any TPs may be properly weighted if their measurement precision is uneven. Equation (5) is not linear in the parameters and requires linearization around approximate values (see Section 2.4). Consequently, the unknown parameters in Equation (5) are replaced by corrections da uvj , db uvj , dx MK , and dy Mk .
Since it is not possible to predict the accuracy of individual features extracted by SURF, the observations are assigned unit weights. In the case a TP is cast into Equation (5), its coordinates on the 'master' image will have to be fixed to the measured value. A set of additional pseudo-observation equations is introduced to keep corresponding corrections constrained to zero. Additional constraint equations may also be included in the functional model to enforce conditions between parameters, for example, when a first-degree polynomial has to be reduced to a similarity transformation.
A TPs found on n im image (see Section 2.3) provides 2n im observation equations. The total number of unknowns depends on the number of images (n), the parameters of the adopted geometric transformation (n p ), the coordinates of 'slave-to-slave' matches re-projected onto the 'master' image (2n rep ), and the number of 2D points (2u) used as a reference (GCPs) on the 'master' image.
The system of observation, pseudo-observation and constrain equations can be rewritten in compact form as follows: where: vector of corrections to the transformation parameters between any 'slaves' and the 'master' image; dx 2 : vector of corrections to point coordinates on the 'master' image; A 1 , A 2 : coefficient (or design) matrices of parameter vectors dx 1 and dx 2 ; y: vector of measured coordinates of TPs; c: vector of constants in linearized observation equations; v, w 1 , w 2 : vectors of residuals; and D: coefficient matrix of additional constraint equations, if any.
After casting all unknown parameters into a single unknown vector dx = [dx 1 dx 2 ] T the design matrix can be redefined as follows: The system of linearized equations may be solved to estimate the vector of unknownsx is the constant vector and W the weight matrix. The theoretical accuracy of the solution may be derived from the estimate of the covariance matrix C xx =σ 2 0 N −1 , whereσ 2 0 is the estimated variance of unit weight observations (or sigma naught).
Data snooping is applied again on the residuals after BBA to remove small errors. The effectiveness of this procedure is directly related to the evaluation of reliability that is described in the following section.

Analysis of the Reliability of Multi-Image BBA
With the term reliability the chance to identify a gross error in the observations (inner reliability) and the estimated parameters (outer reliability) is referred to. While the readers may find the theoretical background about reliability analysis in Förstner [41] and Kraus [42], the basic concepts are briefly reviewed in the following of this section.
It is well known that a gross error in the ith observation may be reflected only to a limited extent into the corresponding residual v i after BBA. Thus, the largest residual does not necessarily correspond to a gross error. Moreover, outliers and random errors mask one another, so that the localization of gross errors may be difficult. In the case no gross errors are in the observations, the probability distribution of each residual is supposed to be Gaussian as N(0,σ 2 vi ). The estimated variance of residual v i may be derived from the estimated covariance matrix of residuals: The data snooping [38] technique implemented for scrutinizing gross errors is based on the analysis of standardized residuals z i = v i /σ vi , which are expected to be distributed as a standard Gaussian probability density function N(0,1). In the hypothesis that each z i should follow the distribution N(0,1), an upper threshold k α on the absolute size of acceptable standardized residuals is then fixed to filter out possible outliers. An observation is accepted only if the corresponding standardized residual |z i | ≤ k α , where k α is directly related to a given risk probability (α) that |z i |> k α (see Figure 4). The rejection might occur in two different cases: (1) the residual is supposed not to follow the distribution N(0,1) because the related observation is a gross error; or (2) a rare event belonging to the distribution N(0,1) but with low risk probability α has happened. In such a case, an inlier would be erroneously discarded (type I error).
is then fixed to filter out possible outliers. An observation is accepted only if the corresponding standardized residual |zi| ≤ kα, where kα is directly related to a given risk probability (α) that |zi|> kα (see Figure 4). The rejection might occur in two different cases: (1) the residual is supposed not to follow the distribution N(0,1) because the related observation is a gross error; or (2) a rare event belonging to the distribution N(0,1) but with low risk probability α has happened. In such a case, an inlier would be erroneously discarded (type I error). The test adopted to scrutinize the residuals may also fail in the presence of an outlier that does not follow the standard Gaussian distribution, but has a biased expectation E(zi) = δ. As shown in Figure 4, a small outlier may not be detected because it features a corresponding standardized residual zi that is smaller than the acceptance threshold kα. The probability of accepting an outlier (type II error) is given by β, which is related either to the risk probability α and to the bias δ. This last parameter is commonly referred to as the non-centrality parameter (δ) and gives in standardized coordinates the size of the minimum measurement error that may be detected using data snooping. The test adopted to scrutinize the residuals may also fail in the presence of an outlier that does not follow the standard Gaussian distribution, but has a biased expectation E(z i ) = δ. As shown in Figure 4, a small outlier may not be detected because it features a corresponding standardized residual z i that is smaller than the acceptance threshold k α . The probability of accepting an outlier (type II error) is given by β, which is related either to the risk probability α and to the bias δ. This last parameter is commonly referred to as the non-centrality parameter (δ) and gives in standardized coordinates the size of the minimum measurement error that may be detected using data snooping. Once the risk probability (α) and the power of the test (β) have been selected, the non-centrality parameter (δ) can be worked out. A typical set of parameters that has been also adopted in the experiments reported in this paper is: α = 1%, β = 93%, k α = 2.56, and δ = 4.
The inner reliability (E(∆l a ) rel ) of the observation ith is defined as the size of the minimum detectable error, according to selected parameters α and β. Its corresponding value can be computed using the expression: where σ wi is the expected precision of the ith observation and r i is its local redundancy, which may be obtained from the ith element of the main diagonal of the redundancy matrix R: Equation (9) shows that the test on standardized residuals may leave undetected a gross error whose size is δ i times the precision of the observation ith, given prefixed values for α and β. On the other hand, the local redundancy r i may have a mitigating effect on the inner reliability. Higher values of r i lead to a smaller size for the detectable errors in corresponding observations. In a BBA, a manifold observation results in a high r i and consequently in a better inner reliability.
The final step is to evaluate the effect of an undetected gross error on the estimated parameters. This may be computed through the corresponding outer reliability: where vector ∆l a reports the value of inner reliability E(∆l a ) rel on the line ith and zero elsewhere.
To demonstrate how the inner and outer reliabilities may be obtained in the case of a BBA adopted to register a satellite time series implementing a 2D similarity transformation model, the following simulated examples are proposed (see also Scaioni [43]). In Table 2 the range of inner reliabilities and their average values are shown for three different cases. Additionally, the outer reliability corresponding to the average inner reliability is reported in the same table.
In Case 1, two images have been registered by using 16 corresponding features. It may be seen that the limit for the minimum detectable error corresponds to a bias of 0.27 pixels in the estimated shift in the same direction. In Case 2, a total number of three 'slave' images have been included, in addition to the 'master'. Any 'slaves' share 16 TPs with the 'master'. 'Slaves' also share nine TPs among them. As may be seen from Table 2, the inner reliability of TPs shared with the 'master' is only slightly better than the one in Case 1. Looking at TPs between 'slaves', the minimum detectable error is larger with respect to the previous group of TPs. However, when looking at the corresponding outer reliability, the maximum detectable errors lead to biases in the estimated parameters that are significantly smaller than in Case 1.
In Case 3, the dataset adopted in Case 2 has been integrated by three additional images, leading to a total number of six 'slaves.' New images are not directly connected to the 'master', but they share 16 TPs among them.
The result regarding the inner reliability shows a further improvement concerning both Cases 1 and 2. Similarly, TPs have a higher threshold for non-detectable errors when they are shared among 'slave' images only. In Table 2 the inner reliabilities have been separately computed for TPs shared between three (as in Case 2) and six 'slave' images, respectively. As expected, values of inner reliability are lower in the case of points visible on six images (Subset 'SS6') than in the case of points visible on three images only (Subset 'SS3'). It is also interesting to observe how the outer reliability has improved in Case 3 concerning the previous cases with less redundant observations. In particular, the effect of maximum non-detectable gross errors in TPs is quite low (less than 1/10 pixels for shifts). In Section 3, some results related to a real dataset are also presented. Table 2. Inner and outer reliabilities computed for the simulated Cases 1, 2, and 3. In the second column, the subset of TPs adopted for evaluating the inner/outer reliabilities are described using the following symbols: 'MS' is the subset of TPs shared between the 'master' and one or more 'slaves'; 'SS' is the subset of TPs shared between two or more 'slaves'; 'All' is the subset including all types of TPs.

Dataset and Data Processing
The MIRA algorithm has been tested on a Landsat-5/TM time series (Level 1 products) made up of eleven frames imaged over the city of Multan, Pakistan. The images were collected from February 1998 to December 1998 (ID1: 9 February; ID2: 23 March; ID3: 14 April; ID4: 16 May; ID5: 4 August; ID6: 20 August; ID7: 7 October; ID8: 23 October; ID9: 8 November; ID10: 24 November; and ID11: 10 December). The GSD (ground sample distance) is 30 m for all the available spectral bands except the thermal infrared channel (TIR-Band TM6), whose GSD is 120 m. Due to such a low geometric resolution, images from Band TM6 have not been considered during next processing stages.
Since the radiometric content may largely differ in different wavelengths of multispectral imagery (see Table 3), uneven results are expected during pairwise FBM aimed at extracting corresponding TPs. Indeed, different land-cover types and surface materials are characterized by a non homogeneous spectral response. Thus, they may exhibit different image contrast according to the specific wavelength.
In addition, shorter wavelengths in the visible domain (mainly bands TM1/TM2) are more affected by atmospheric scattering. In a previous paper [44], the robustness of MIRA against atmospheric effects was demonstrated. For this reason, FBM has been applied to the images in all bands without any preprocessing to correct these effects. Table 3 shows the results in terms of image pairs that have been successfully matched in different bands, and the total number of extracted TPs. A single image pair is considered as 'successfully matched' if at least N min = 12 TPs have been found. As can be seen, all bands except TM4 show results that are very close to one another regarding the fraction of successfully matched image pairs, ranging between 96% and 100% of 55 total possible combinations. This achievement is also motivated by the large spatial overlap between all the images. In the case of Band TM4 (NIR-near infrared), changes in vegetation cover during different seasons of the year may be strongly correlated to the variation of NIR content of the images, resulting in problems when seeking for corresponding features.
After the pairwise FBM stage is accomplished, image pairs need to be linked together to find multiple TPs and to detect which image should be preferably used as 'master' for the time series. This selection is done by analyzing the pairwise connections between images that are summarized in the connectivity matrices for different bands displayed in Figure 3. According to the results obtained from the most wavelengths, Image ID8 has been selected as 'master' using the methodology reported in Section 2.4. The choice is motivated by the highest number of images that are connected to it, i.e., images that share at least N min = 12 TPs. Image ID8 is shown in Figure 5, where extracted TPs are also overlaid. In some bands, only two images are not directly connected to Image ID8. The poor results obtained with FBM in the case of Band TM4 have also reflected in the connectivity graph, which is split into two clusters ( Figure 6). The lack of connections suggests that the time series cannot be processed as a whole when using TPs extracted in spectral Band TM4. Looking at the other wavelengths, the total number of extracted TPs after reordering spans from 47,812 (TM1) to 75,073 (TM3) in terms of image observations (Table 3). This variability shows that the number of extracted multiple features significantly depends on the adopted wavelength, even though TPs are always sufficient to compute the image registration parameters. In Figure 7 three patches of the images from the 'Multan' dataset are shown before and after the alignment obtained using MIRA procedure.
wavelengths, the total number of extracted TPs after reordering spans from 47,812 (TM1) to 75,073 (TM3) in terms of image observations (Table 3). This variability shows that the number of extracted multiple features significantly depends on the adopted wavelength, even though TPs are always sufficient to compute the image registration parameters. In Figure 7 three patches of the images from the 'Multan' dataset are shown before and after the alignment obtained using MIRA procedure. Figure 5. In the upper-left subfigure, the master image (ID8) of 'Multan' dataset is shown. In the lower-left subfigure, the same image is overlaid with all the TPs extracted with MIRA procedure. Also, those features that have not been found in the 'master' but they connect other images of the same dataset have been re-projected over the master. The two zoom-in windows on the right side display some details of the extracted features in two specific areas.
In Table 4 the focus is given to the multiplicity of extracted TPs, which directly reflects in the inner reliability of the observations. For simplicity, here the results obtained with spectral Band TM3 have been reported, since this band has provided the largest number of TPs. Similar outcomes have been also found from other spectral bands (except TM4). While the majority of TPs (approx. 70% of the total) are measured on two images only, the remaining 30% comprehend manifold points, i.e., TPs observed on at least three images. As expected, the most numerous classes are the ones including TPs visible on three (approx. 18%) and four images (approx. 7%), while the number of TPs Figure 5. In the upper-left subfigure, the master image (ID8) of 'Multan' dataset is shown. In the lower-left subfigure, the same image is overlaid with all the TPs extracted with MIRA procedure. Also, those features that have not been found in the 'master' but they connect other images of the same dataset have been re-projected over the master. The two zoom-in windows on the right side display some details of the extracted features in two specific areas.
In Table 4 the focus is given to the multiplicity of extracted TPs, which directly reflects in the inner reliability of the observations. For simplicity, here the results obtained with spectral Band TM3 have been reported, since this band has provided the largest number of TPs. Similar outcomes have been also found from other spectral bands (except TM4). While the majority of TPs (approx. 70% of the total) are measured on two images only, the remaining 30% comprehend manifold points, i.e., TPs observed on at least three images. As expected, the most numerous classes are the ones including TPs visible on three (approx. 18%) and four images (approx. 7%), while the number of TPs dramatically drops down when considering higher multiplicities. It should be observed that a small group of TPs are visible on more than seven images.  Table 5 shows some figures illustrating the ratio between equations and unknowns of the system adopted to estimate the image registration parameters and the coordinates of all TPs re-projected on the 'master' image space. In any spectral bands, this ratio is over three, resulting in a good global redundancy. In the case of Band TM4 as well, whose processing has been more problematic, the ratio equations/unknowns has been found close to three (2.8). As can be seen, subpixel accuracy has been obtained for all the spectral bands, as pinpointed by the values of estimated σ 0 , which can be assumed as metrics for the average theoretical accuracy of measured image coordinates. Results achieved with visible bands (TM1, TM2, TM3) have been slightly better (σ 0 ∼ = 0.5 pixels) than the ones from short-wave infrared (TM5, TM7), which yielded σ 0 ∼ = 0.6/0.7 pixels. As mentioned before, NIR (TM4) has output two different clusters of images. Thus, the system as a whole has shown a rank deficiency and could not be solved together.
Remote Sens. 2018, 11, x FOR PEER REVIEW 14 of 24 Figure 6. Connectivity graph obtained from Band TM4 (on the left) of the 'Multan' dataset, showing the separation of the entire blocks in two parts. This graph is compared to the one obtained from band TM3 (on the right), which corresponds to the best connections. In these graphs, circles represent the images while lines indicate the availability of sufficient TPs to connect each pair. A set of estimated parameters (aj, bj, cj, dj) for the 2D similarity transformation mapping each image j to the reference datum has been obtained per each spectral band. Since all bands are already mutually co-registered, the use of a specific band should provide the same parameters as in the case of others. The computed unknown parameters for all the spectral bands are graphically displayed in Figure 8, where it can be seen that similar results have been obtained. On one side, parameters aj and bj that provide information about scale and rotation, have been estimated for all the images as aj ≅ 1 and bj ≅ 0. These values demonstrate that all the images have neither scale variation nor rotation. On the other side, results for shift parameters (cj, dj) have been different for individual images, as obvious, but consistent values have been obtained from different spectral bands. Indeed, the root mean square (RMS) of their variations have resulted approximately 0.05 pixels in both directions. This result shows that for the 'Multan' dataset all Landsat-5/TM spectral bands may be potentially used for image alignment, exception made for Bands TM4 and TM6. This graph is compared to the one obtained from band TM3 (on the right), which corresponds to the best connections. In these graphs, circles represent the images while lines indicate the availability of sufficient TPs to connect each pair. A set of estimated parameters (a j , b j , c j , d j ) for the 2D similarity transformation mapping each image j to the reference datum has been obtained per each spectral band. Since all bands are already mutually co-registered, the use of a specific band should provide the same parameters as in the case of others. The computed unknown parameters for all the spectral bands are graphically displayed in Figure 8, where it can be seen that similar results have been obtained. On one side, parameters a j and b j that provide information about scale and rotation, have been estimated for all the images as a j ∼ = 1 and b j ∼ = 0. These values demonstrate that all the images have neither scale variation nor rotation. On the other side, results for shift parameters (c j , d j ) have been different for individual images, as obvious, but consistent values have been obtained from different spectral bands. Indeed, the root mean square (RMS) of their variations have resulted approximately 0.05 pixels in both directions. This result shows that for the 'Multan' dataset all Landsat-5/TM spectral bands may be potentially used for image alignment, exception made for Bands TM4 and TM6.

Validation
A first evaluation of the results described in the previous section has been based on the estimated theoretical accuracy and the consistency of the outcomes achieved in different spectral bands (see

Validation
A first evaluation of the results described in the previous section has been based on the estimated theoretical accuracy and the consistency of the outcomes achieved in different spectral bands (see Section 3.1). On the other hand, a further assessment has been accomplished by comparing these results with the ones obtained from different methods applied to the same dataset:

1.
A standard 'slave-to-master' registration approach based on automatic measurements; and 2.
A multi-image approach based on BBA but using TPs that have been manually measured by a human operator.

Comparison between MIRA and 'Slave-To-Master' Registration
The goal of this experiment has been to compare the traditional 'slave-to-master' approach for image registration and MIRA. To this purpose, all the images have been directly aligned to the same 'master' (ID8) selected within MIRA application. The pairwise corresponding points obtained from FBM have been considered for computing the registration parameters of 2D similarity transformations between each 'slave' and the 'master'. In this case, the registration of each individual image is completely independent from the others. Some statistics on the results are shown in Table 6.
As it can be seen, the estimates of shift parameters have shown an average variation in the order of 0.15 pixels in x direction and 0.08 pixels in y direction, with maximum absolute differences of 0.39 pixels and 0.17 pixels in the case of Images ID1 and ID2, respectively. No relevant differences of scale and rotation have been found between both methods. It is interesting to notice that the largest variations have been found in the case of Images ID1 and ID2. This outcome might be motivated by the longer elapsed time between the 'master' (recorded on October) and these two images (recorded on February and March of the same year, respectively), which have resulted in large changes in the image content due to the different seasons.

Comparison between MIRA and Manual Registration
The main difference between automated (MIRA) and manual measurements is the number of extracted corresponding features. The automatic MIRA method has found 75,073 corresponding features in the images after pairwise FBM (see Table 3), corresponding to 37,275 different TPs after re-ordering. Manual measurements have provided only a few tens TPs, as shown in Table 4. Here, the number is limited by the operator's skill and time. This means that the ratio equations/unknowns and therefore the size of the design and normal matrix are very different in both cases. Table 5 shows the 'manual' results obtained with band TM3. This band has been chosen for manual processing because of the highest number of extracted features when using MIRA method. Table 4 shows a comparison between the 'visibility' of the corresponding features in two or more images of the time series for both MIRA and manual measurements. The human operator only identified 210 corresponding features (in one working day), corresponding to only 33 multiple features, since they have been measured in several images.
Another critical difference is the size of the normal matrix N. In the case of manual measurements, a higher fraction of manifold TPs connecting the 'master' image with multiple 'slaves' has been measured. The number of 'slave-to-slave' TPs is lower. This gives to the N matrix a more compact form around the main diagonal, which may help reduce the computational cost. With the automatic MIRA method, the matrix N is instead very sparse as there are many more TPs also connecting 'slaves' images among them. This means that more off-diagonal non-zero elements are present. Table 7 summarizes the average accuracy of the MIRA method compared to manual measurements. The departure between the average differences of shift vectors is 0.12 pixel in both x and y directions, whereas it is evident there is neither rotation nor scale variation for all the images of the dataset. The theoretical standard deviations achieved with the automated measurements is better (RMS = 0.06 pixels for shifts) than the one obtained with the manual method (RMS = 0.12 pixels for shifts), notwithstanding results are at subpixel level in both cases. The larger number of observations obtained with the MIRA method may be addressed as the main reason for the better theoretical accuracy.

Discussion
The quality of image registration obtained from the MIRA method may be evaluated by considering three different aspects: the estimated value of the variance of unit weight observations (σ 2 0 ), the comparison against results obtained from 'slave-to-master' registration based on automatically extracted TPs, and the results obtained using manual measurements of TPs, but within a BBA solution.
The estimatedσ 2 0 after BBA based on observation extracted with MIRA has resulted in the order of approximately half pixel size, with small variations depending on the adopted spectral band (see Table 5). The achieved subpixel value has confirmed the good fit with the adopted geometric model implemented for image registration (2D similarity) as well as the precision of automatically extracted TPs remained after data snooping. In Figures 9 and 10 the results achieved with all three applied registration methods are graphically compared. No significant variations may be noticed about rotations and scales, which can be derived from parameters a j and b j . This is due to the imaging scheme of Landsat: fixed nadir-looking. Thus, images of the same site, but collected at different times, will have tiny variations in image scale and very small rotations. However, when processing time series imaged with different viewing angles, image rotations and scale variations could not be neglected. That is the typical case of high-resolution satellite images, but also that of multi-sensor datasets, including the data processing of radar images, which are usually collected with different incidence viewing angles and/or different orbits [45,46]. On the other hand, subpixel discrepancies have been found for shifts (parameters c j and d j ,) in the Landsat time-series. The consistency between all these outputs confirms the correctness of MIRA approach.   Inner and outer reliabilities have been computed for all three processing methods based on MIRA and manual observations. Values are reported in Table 8, where the results based on automatically extracted TPs have been also recomputed using the same precision of 'manual' measurements, which is supposed to be ±1 pixel. This solution has allowed to normalize the results and to make them comparable, since the TP precision had a linear scaling effect. In fact, the theoretical accuracy of MIRA observations has been estimated in the order of 0.5 pixels, which is consistent with the expected precision of FBM [33]. The analysis of inner reliability clearly highlights that the minimum size of detectable errors during data snooping significantly decreases when moving from 'slave-to-master' to BBA approach. In the former case, the inner reliability of twofold TPs (i.e., the ones visible on two images) is in the order of 6.1 pixels, while it improves up to 4.1 pixels when considering multiple TPs that can be obtained from BBA. As can be seen in Table 8, the inner reliabilities do not depend only on the multiplicity and the precision of the considered TP, but also Inner and outer reliabilities have been computed for all three processing methods based on MIRA and manual observations. Values are reported in Table 8, where the results based on automatically extracted TPs have been also recomputed using the same precision of 'manual' measurements, which is supposed to be ±1 pixel. This solution has allowed to normalize the results and to make them comparable, since the TP precision had a linear scaling effect. In fact, the theoretical accuracy of MIRA observations has been estimated in the order of 0.5 pixels, which is consistent with the expected precision of FBM [33]. The analysis of inner reliability clearly highlights that the minimum size of detectable errors during data snooping significantly decreases when moving from 'slave-to-master' to BBA approach. In the former case, the inner reliability of twofold TPs (i.e., the ones visible on two images) is in the order of 6.1 pixels, while it improves up to 4.1 pixels when considering multiple TPs that can be obtained from BBA. As can be seen in Table 8, the inner reliabilities do not depend only on the multiplicity and the precision of the considered TP, but also on the fact that this is shared with the 'master' image or not (see the second column). These discrepancies are in the order of a few tenths pixels and are motivated by the fact that the observation equations implemented in BBA are always referred to the 'master'. When looking at the outer reliability, a first comment should concern the evident advantage of using the large number of TPs extracted by the automatic MIRA procedure. Indeed, the negative effect of a maximum undetected error, which is indicated by the inner reliability value, is mitigated by the redundancy of the observations. Consequently, similar values for the inner reliability for manual and automatic measurements (in the range between 4-6 pixels), when they are processed within the BBA, may result in errors on the final shift estimates up to approximately 1.5 pixels in the former case and 0.04 pixels in in the latter case, respectively. The differences of scaling and rotation parameters are less influenced. Table 8. Inner and outer reliabilities computed for the 'Multan' dataset. In the second column, the subset of TPs adopted for evaluating the inner/outer reliabilities are described using the following symbols: 'MS' is the subset of TPs shared between the 'master' and one or more 'slaves'; 'SS' is the subset of TPs shared between two or more 'slaves'; 'All' is the subset including all types of TPs. These results conclude that the high data redundancy and the improved inner/outer reliability that may be obtained when using MIRA are two fundamental properties supporting the use of such an automatic procedure. The high redundancy allows to mitigate the degrading effect of residual measurement errors. The low values for the inner reliability may support the chance to limit the size of undetected errors. Of course, this second advantage depends on the fact that a small number of residual outliers are input in the BBA observation dataset. This chance is supported by the preliminary application of multiple scrutinizing technique to detect outliers during the FBM stage, which is supposed to leave a small number of outliers.

Case
It should be also recalled that the other important advantage of the MIRA procedure is given by the chance to register possible images that are not directly connected to the 'master' because they do not individually share enough TPs with it. If these images may be linked to other images in the block that are connected to the 'master', the BBA solution allows to compute the registration of the whole dataset. This alternative solution is not possible using traditional 'slave-to-master' registration approach.

Conclusions
This paper presented some important developments of an automatic method (multi-image robust alignment-MIRA) for the registration of remotely sensed time series, where multiple images are simultaneously registered in a bundle block adjustment fashion after the extraction of corresponding features using feature-based matching (FBM).
This approach has two main advantages if compared to standard 'slave-to-master' registration methods. The first consists in the chance to align also those images without direct connection with the 'master', which in the MIRA procedure is only adopted for setting up the spatial datum. The second is the higher reliability of the solution since the redundancy of the observations is fully exploited. While in a previous paper [27] the basic concept of this approach was presented, here the focus is given to the automation of the whole procedure, which requires a high-degree of robustness against blunders, the availability of objective parameters and criteria to support decisions within the process, a rigorous stochastic formulation for the observation equations in least-squares adjustment, and the presence of intermediate quality checks (e.g., after pairwise FBM).
A set of 2D polynomial transformations is available to better fit different datasets and images from diverse sensors. Consequently, MIRA may be successfully applied to medium-resolution satellite data (GSD between 10 m and 30 m), while its implementation with high-and very high-resolution imagery still needs additional development to integrate more suitable geometric models, e.g., rational polynomial functions or physical sensor models [47]. Anyway, it should be mentioned that with more involved geometric models that require a larger number of parameters, the reliability analysis that has been proposed here may be less meaningful. On the other hand, 2D polynomial transformations are sufficient for the registration of medium-resolution satellite images, such as the ones derived from NASA Landsat, ESA Sentinel-2 platform, and the British Disaster Monitoring Constellation. These datasets and other similar ones, which may be expected in the future, are highly prone to be exploited for hyper-temporal remote sensing with very short revisit time (a few days between observations). The MIRA procedure may play a vital role in the automatic subpixel alignment of such datasets.
In the perspective of processing huge datasets as well, the large size may create some problems in the inversion of the normal matrix N, due to the high computational cost of this operation. This would prevent the computation of the covariance matrix of the solution, and then the evaluation of the theoretical accuracy of estimated parameters as well as the redundancy matrix that is necessary for the reliability analysis [48]. To overcome this shortcoming, an ad hoc procedure for the decimation of the corresponding features based on their image multiplicity will be implemented in future developments.
At the moment, the selection of the spectral band to be used for the image alignment is left to the user. In the considered case study, similar results regarding the achievable precision have been obtained from different wavelengths, with the exception of near and thermal infrared. On the other hand, combining the point correspondences obtained in different bands may also be an opportunity to extend the use of the MIRA procedure, especially when different types of images should be registered together.