Groupwise Image Alignment via Self Quotient Images.

Compared with pairwise registration, the groupwise one is capable of handling a large-scale population of images simultaneously in an unbiased way. In this work we improve upon the state-of-the-art pixel-level, Least-Squares (LS)-based groupwise image registration methods. Specifically, the registration technique is properly adapted by the use of Self Quotient Images (SQI) in order to become capable for solving the groupwise registration of photometrically distorted, partially occluded as well as unimodal and multimodal images. Moreover, the proposed groupwise technique is linear to the cardinality of the image set and thus it can be used for the successful solution of the problem on large image sets with low complexity. From the application of the proposed technique on a series of experiments for the groupwise registration of photometrically and geometrically distorted, partially occluded faces as well as unimodal and multimodal magnetic resonance image sets and its comparison with the Lucas–Kanade Entropy (LKE) algorithm, the obtained results look very promising, in terms of alignment quality, using as figures of merit the mean Peak Signal to Noise Ratio (mPSNR) and mean Structural Similarity (mSSIM), and computational cost.


Introduction
Groupwise image alignment/registration or congealign is a joint alignment process that handles a large scale of images simultaneously, in contrast to pairwise alignment/registration. The goal is to align all images with one another in an unbiased way, no specific image should introduce a registration bias. A good congealing algorithm can be used as preprocessing to notably improve the performance of other vision tasks within different research areas such as medical, satellite and aerial image registration [1][2][3].
Congealign algorithms tend to utilize one image at a time as the held out image and the rest as the stack, that keep changing while a dissimilarity/similarity function is iteratively minimized/maximized. This is done by estimating warp updates for each image that best align them with the stack. Methods based on the aforementioned idea include groupwise methods with entropy-based cost functions as well as LS-based cost functions. In [4] entropy-based congealign was introduced, by defining a method that minimizes the joint entropy across pixel stacks distributions. In [5], entropy-based congealing was adapted to handle large sets of gray-scale valued, 3D medical images. In [6] the need for ad hoc regularization of the calculated transformations over an iteration was removed. In [7] and [8] Least Square (LS)-based methods were introduced, using sum-of-squared-differences (SSD)-based objective functions. Their efficiency compared to entropy-based techniques was demonstrated, regarding both the convergence rate and accuracy. In the LS framework, given a held out image and a stack and

Preliminaries
Adopting the notation used in [25], we consider the following sets of images: Set S i contains a group of N similar in shape and aligned images, with i denoting the column-wise of length N x N y vectorized version of size N x × N y image I, while set S i w (P N ) contains the geometrically distorted vectorized images of set S i in (1). This latter set depends on the set containing N warp parameter vectors used in warping transformation w( . ; p n ) which is parameterized by the vector p n ∈ R M , that deforms the support of image i n , of set S i and maps its values onto the corresponding image i w (p n ) of set S i w (P N ). In this paper to model the warping process we use the class of affine transformations with p n ∈ R 6 . Then, groupwise registration, or congealing [1] can be defined as the minimization problem of a misalignment function, let us denote it by C(P N ), which is defined over the set S i w (P N ). Solving this problem is not an easy task and its complexity depends on several factors, such as the size of the ensemble, the size of images and the strongness of the geometric distortions, to name a few, and in most cases its solution results in a highly nonlinear and computationally demanding procedure. This is basically because the goal of estimating the set P N of the unknown parameters should be achieved by defining a misalignment function C(P N ) over the entire ensemble of images. Such a function, which is known as the Cumulative Squared Misalignment Error (CSME): where was proposed in [7]. However, this total cost function is difficult optimize directly [26] and the minimization of the individual cost function (p n ) for each geometrically distorted image i w (p m ), that was proposed, demands the solution of O(N 2 ) pairwise alignment problems. Instead of the CSME defined in Equation (4), in [25] the following total mean misalignment function: was proposed, withī denoting the "mean" image of set S i , that is: that constitutes the most representative image of the above mentioned set. Assuming that the "mean" image is known, it is clear that the above misalignment function, although non-linear with respect to each member of the set of warp parameters defined in (3), is separable and demands the solution of O(N) pairwise alignment problems. In such a case, for each one of the cost functions involved in (6) its minimization requires nonlinear optimization techniques either by using direct search or by following gradient-based approaches.
It is a common strategy in iterative techniques the original minimization problem to be replaced by a sequence of secondary ones. Each secondary problem relies on the outcome of its previous one, thus generating a chain of parameter estimates which hopefully converges to the desired optimal solution of the original problem. Adopting an additive update rule for the parameters vector, that is p n (k) = p n (k − 1) + ∆p n (k), where ∆p n (k) denotes a vector of perturbations, their optimal values result from the solution of the following optimization problem: with the cost function C 0 (∆P N (k);ī ) defined as follows and the set ∆P N (k) contains the N perturbations of the warp parameter vectors at the k-th iteration of the minimization process, that is: The optimal solution of the optimization problem (8) is given by: where is the M × N x N y pseudo inverse of the Jacobian matrix G w (p n (k − 1)) evaluated at p n (k − 1) [27]. Note however, that since the "mean" imageī is unknown, the optimal values of the perturbations in (11) can not be computed. For avoiding this obstacle a strategy based on the use of a particle system and imposing its center of mass to be motionless during the optimization process was proposed in [25]. In the next paragraph, we propose the solution of the above-mentioned problem formulating a different optimization problem.

The Proposed Solution
To this end, let us define the following sequence of vectorized images: with the following two properties: • P 1 : The k-th member of the sequence, is an approximation of the "mean" image in the k-th iteration of the minimization process • P 2 : The limit of the sequence we would like to be the unknown "mean" image, that is: Then, we can redefine the secondary cost function (9) as follows: and our goal now is its double minimization; namely, with respect to parameter's set ∆P N (k) as well as the k-th "mean" image i(k).
Solving the above optimization problem does not constitute a very difficult task. However, the computation of its optimal solution is expensive.
We can avoid this obstacle by defining a cost function that can be minimized with respect to the "mean" image and in combination, in an alternating way, with the minimization of the cost function C 0 (∆P N (k);ī ) (9), permits at each iteration a computationally cheap, but not necessarily equivalent, solution of the desired problem.
To this end, let us define the following cost function: where the parameter's set P N (k − 1) is known and we would like to minimize with respect to the k-th approximation of the "mean" image i(k). It is clear that the new cost function is strongly related to both above defined cost functions C 0 (P N (k − 1); i(k)) and C 1 (∆P N (k), i(k)) in (6) and (15) respectively. Minimization of the cost function C 2 (i(k); P N (k − 1)) with respect to the "mean" image i(k), results in the following optimal solution: which, actually, is the average of the warped vectorized images and its computation is cheap. An outline of the proposed algorithm shown in Algorithm 1.

Algorithm 1: Outline of the Proposed LS-Groupwise Algorithm
Input: Image set S i w (P N (0)), the parameter vectors set P N (0) the maximum number of iterations k max and the maximum desired error e max . Set k = 1 repeat Use Equation (17) to update the "mean" image i(k) for n = 1 : N do Compute the Jacobian of the image i w (p n (k − 1)) Use (11) to compute the optimal perturbations ∆p n (k) that align image i w (p n (k − 1)) to the "mean" image i(k) and warp the image accordingly end k = k + 1 until C 2 (i(k); P N (k − 1)) ≤ e max or k = k max ; Output: Image set S i Before we proceed in presenting the image registration problem in the case of multi modal images, let us apply the above-mentioned algorithm in the groupwise alignment problem of the ten strongly geometrically deformed images from the Yale database [28] shown in the first row of Figure 1. The strongness of the geometric deformations is evident in their mean image which is shown at the end of first row of this figure. In the second row of this figure we can see the results we have obtained after 20 iterations of the proposed algorithm. The successful alignment of the image set can be validated by looking at their mean image which is clearly enhanced, compared to the original mean image before the application of the proposed groupwise alignment technique.

Registration of Multimodal Images
In this section, we are going to examine the registration problem of images of different modalities. Such problems appear when we want to register photometrically deformed and/or occluded images as well as MR Images of different modalities, to name a few. We are going to present examples of photometrically deformed and partially occluded images that we would like to align. Then, we present the existing different modalities of MR images. Finally, an edge-preserving filtering scheme, originally proposed in [29], used for their preprocessing will be shortly explained.

Photometrically-Distorted Images
In this paragraph, we focus on the alignment of photometrically distorted and occluded images, that constitutes a well-known and difficult problem [30,31]. In these kinds of alignment problems, there is a large number of outliers, thus not all pixels must be used during the optimization. In the first and fourth column of Figure 2, three photometrically distorted images from the Yale database [28] and an equal number of images with occluded areas from the AR database [32] are shown. Clearly, these images have totally different intensity distributions.

Multimodal MR Images
In MR Images the contrast depends on the magnetic properties and the number of hydrogen nuclei existing in the area being imaged. Common type MR Images include T1 and T2-weighted resulting from different timing radiofrequency pulse sequences, Proton Density (PD) that display the number of nuclei in the area and magnetic resonance angiography (MRA) that highlights movement in the body's blood vessels, among others. These different types are presented in Figure 3. Examining different types of medical images, such as different types MRIs, CT images etc, that have totally different intensity distributions, can often be presented as a problem of strong photometric distortions, so we are going to address it as such in the next subsection.

Self Quotient Images
Considering images such as presented above, the use of intensity-based techniques for aligning, either pairwise or groupwise, is not a good choice for the solution of the problem. In order to be able to use such area-based techniques, the preprocessing of the images with a known edge-preserving filter [33], was proposed in [29], which is briefly presented in this section. The SQI is defined as: where I σ (x) is a smoothed version of the image I(x) resulting from its convolution with the isotropic Gaussian kernel G σ (x), with the subscript denoting its standard deviation. Note that the deviation of the Gaussian kernel controls the width of the edges in the image defined in Equation (18). To address the noise as well as the outliers problem from which SQI suffers from we use a hard thresholding procedure. To this end, let σ Q be the standard deviation of the vectorized counterpart of SQ Image Q(x), with x in the support of Q(x). Then, we can define the following threshold: with 0 < µ < 1 a data-dependent parameter that can be used to have additional control over the value of threshold T µ and use it for "denoising" purposes. In all the experiments we conducted we set that parameter equal to 0.5. The SQIs resulting from the application of the above-mentioned procedure on the photometrically distorted images and the four different contrast type MRI slices, are shown in Figures 2 and 3 respectively. Having filtered out the strong photometric distortions, we can use the above mentioned area-based technique to solve the groupwise alignment problem.

Figure 2.
Photometrically distorted images from Yale database (first column) and their Self Quotient Images (SQI) counterparts before (second column) and after thresholding (third column). Images with occluded areas from Yale database (fourth column) and their SQI counterparts before (fifth column) and after thresholding (sixth column).

Experiments
In this section, we are going to present our results. In order to demonstrate the performance of the proposed technique, we conducted three experiments. To test the effectiveness in highly deformed sets of images we applied warps to the original images using the framework presented in [27] was used, with the distortion parameter σ 2 taking values in the interval [1,10], with the values 1 and 10 corresponding to the smallest and strongest geometric distortions respectively.
Specifically, we are going to evaluate the performance of the proposed against Lucas-Kanade Entropy [24]-based technique, a technique that outperforms other well-known alignment methods such as least square congealing [7], data-driven image models through continuous joint alignment [4], and robust alignment by sparse and low-rank decomposition for linearly correlated images [34]. Gradient Correlation Coefficient-based techniques [10], as it was already mentioned, are appropriate for the small size of image sets (N ≤ 100). Its computational cost is prohibitive for sets of greater cardinalities [10] and special strategies must be followed for solving efficiently the groupwise alignment problem.

Figures of Merit:
Since there are not ground truth images, in order measure performance we are going to use as figures of merit the "mean" Peak Signal to Noise Ratio (m PSNR ) and the "mean" Structural Similarity (m SSI M ). To this end, for the computation of the "mean" PSNR we compute the PSNR of each image i w (p n (k max )), n = 1, 2, · · · , N of the ensemble with respect to the "mean" N x × N y image i(k max ) obtained after k max iterations of the algorithms, i.e., where the MSE(n) is the mean squared error between the "mean" image i(k max ) and i w (p n (k max )), that is: and take their mean value: Similarly, for the computation of the "mean" SSIM we compute the SSIM of each image with the "mean" one, i.e., where µ 0 , µ n , are the mean values of the "mean" image i(k max ) and the warped image i w (p n (k max )) respectively, σ 2 0 , σ 2 n and σ 0n their variancies and covariance respectively and c 1 = 0.01max{i(k max )}, c 2 = 0.03max{i(k max )} two constants for avoiding possible arithmetic problems, and take their mean value, ie., We have run all the experiments on a 2.2 GHz Intel Core i7 processor and 16 GB RAM. Finally, in all the experiments we used k max = 200 and set e max = 10 −8 .
Computational Complexity: In groupwise alignment techniques computational complexity is a critical aspect, since they handle large sets of images. Considering the alignment process of a set of N images, of size N x × N y each, aiming to estimate a deformation vector with N p parameters, the computational complexity of LKE [24], LS centroid [25] and the proposed are presented in Table 1  Table 1. Computational complexity of groupwise alignment techniques.

LKE [24] LS Centroid [25] Proposed
where M = 10 is the proposed number of clusters computed in the preprocessing stage of LKE, while 10 < K < 20 is the cost of computing the quantities necessary for the update of ∆p n (k) in each iteration of LS centroid.

Experiment 1
In this experiment, we are going to apply the proposed technique in solving the groupwise alignment problem in a set of geometrically and strongly photometrically distorted images of size 85 × 100 from the YALE database [28]. In order to evaluate our technique, we compare its performance against the Lucas-Kanade entropy (LKE)-based method [24] which is considered, although more computationally expensive than the proposed one, as a state of the art technique for solving groupwise and multi-modal image alignment problems. In Figure 4 the obtained results from the application of the methods in a set of 110 photometrically distorted images for three different values of the distortion parameter σ 2 are depicted with the corresponding m PSNR as well as m SSI M achieved by the methods after 200 iterations. It is clear that all methods succeed to visually improve the mean misalignment image although it is not clearly reflected the m PSNR as well as m SSI M values. It also seems that the performances of the two techniques are similar but with the computational cost of LKE much higher. More specifically, the computational cost per iteration for the proposed technique was 0.5 s while the cost of LKE was 1.1 s excluding the cost that is needed in its pre-processing step. However, as we can see in Figure 5 it is not true. Indeed, although the resulting total mean images seem to be, at least visually, alike, the mean images resulting from our method of each one of the ten aforementioned subsets are better not only visually but also in terms of their corresponding m PSNR as well as m SSI M . Indeed, in most subsets, the proposed technique, even marginally, has better performance.

Experiment 2
In this experiment, we are going to apply the methods on geometrically distorted images from the AR database [32]. Specifically, we used images of size 80 × 100 selected from the AR database. We tested the methods on a set containing 300 images composed by 100 in neutral frontal face pose, 100 partially occluded by sunglasses images and another subset of 100 more partially occluded by scarves images. Samples of these three different kinds of images are shown in the fourth column of Figure 2. Note that in this database the images are not centered on a common center of coordinates, meaning they are already geometrically distorted each other, occasionally including large rotation and/or translation distortions. The initial average of the warped images, the optimal ones as well as the optimal average of each one of the three aforementioned subsets obtained by the methods for three different values of the distortion parameter σ 2 are shown in Figure 6. Specifically, for the distortion parameter σ 2 taking the values 2, 6 and 10 respectively and as we are moving from the top to bottom of this figure, we can see in the odd rows the obtained results from the application of the proposed method while in the even ones the results from the application of LKE method. We can see that the performance of the proposed technique in terms of m PSNR as well as m SSI M is marginally better. The achieved mean alignment images per image subset are of high quality. In addition, the computational cost of the proposed technique is substantially lower. More specifically, the mean computational cost per iteration for the proposed technique was 1.00 s while the cost of LKE 2.05 s, leaving out the heavy cost that is needed in its pre-processing stage. Figure 6. Mean misalignment images (first column for σ 2 = 2 (top), σ 2 = 6 (middle) and σ 2 = 10 (bottom)) of 300 images from AR database (100 neutral frontal pose (third column), 100 partially occluded by sunglasses (fourth column) and 100 partially occluded by scarfs (fifth column)) and mean images (second column) resulting from their groupwise alignment by the proposed first, third and fifth row and the LKE technique second, fourth and sixth row.

Experiment 3
In this experiment we tested the methods with artificial MRI data obtained from Brainweb Database (https://www.mcgill.ca/bic/software/brainweb-mri-simulator) and real data from IXI Dataset (https://brain-development.org/ixi-dataset/), to test the alignment of images on the same or different modalities. We also applied artificial warps to the images, specifically small of size σ 2 = 1 and larger of size σ 2 = 5. We conducted a series of experiments with different MR image modalities, namely T1, T2, PD and MRA. First, we aim to align images from IXI Dataset containing the same slice, of the same modality (that is T1), from 100 different subjects of size 124 × 124. In Figure 7 we can see the "mean" image resulting after alignment, with artificial warping of σ 2 = 5 and no warping.
Next, we tested the methods with artificial data from the Brainweb Database, where we applied warps of σ 2 = 1, 5 to the images. We used neighboring slices of T1, T2 and PD imaging, in a total of 35 images of size 109 × 91. In Figure 8 we can see the resulting "mean" image of LKE and the proposed method, as well as the original "mean" of the warped images.
Last, we tested using a dataset from IXI Dataset, aligning across the same slice different modalities T2, MRA and PD of size 128 × 128. In Figure 9 we may see the "mean" image for randomly selected slices from 28 subjects from each modality.   It is evident in all three experiments conducted with MR images, unimodal or multimodal, that the mean image resulting from the proposed method has more defined edges, especially when larger warps are applied to the images, leading to a more blurred original mean, even when in some cases, in terms of m PSNR , as well as m SSI M , the results of the two methods, are very close. The computational cost per iteration was 0.3, 0.1 and 0.5 s, per experiment, for the proposed and 2.0, 1.5 and 6.3 s, per experiment, for LKE, excluding preprocessing costs. As results indicate, clearly, iteration running time depends on the size of the image set. For LKE the increase in time cost may result in large overall time cost in the case of very large sets, i.e., over 1000 images, while in the case of the proposed the increase in iteration running time is very much smaller, allowing possible testing on sets of much larger size.

Conclusions
In this work, a new least-squares-based groupwise image registration method based on the use of self quotient images was proposed. The proposed technique has a very low computational cost. This was achieved by optimally defining a sequence of images whose limit was the desired but unknown "mean" image for solving the groupwise problem. Since the proposed technique is based on the self quotient images it was successfully used in solving the alignment problem of strongly photometrically distorted images, partially occluded images as well as in successfully solving the groupwise registration of multimodal MR images. Using as figures of merit the mean Peak Signal to Noise Ratio and mean Structural Similarity, the performance of the proposed technique from its application on a series of experiments was very good. The extensive evaluation of its performance against another state of the art groupwise registration techniques and its extension for solving the corresponding groupwise volume problem are currently under investigation. Funding: This research is implemented through the Operational Program "Human Resources Development, Education and Lifelong Learning" and is co-financed by the European Union (European Social Fund) and Greek national funds.

Conflicts of Interest:
The authors declare no conflict of interest.