Water-Air Interface Imaging: Recovering the Images Distorted by Surface Waves via an Efficient Registration Algorithm

Imaging through the wavy water–air interface is challenging since the random fluctuations of water will cause complex geometric distortion and motion blur in the images, seriously affecting the effective identification of the monitored object. Considering the problems of image recovery accuracy and computational efficiency, an efficient reconstruction scheme that combines lucky-patch search and image registration technologies was proposed in this paper. Firstly, a high-quality reference frame is rebuilt using a lucky-patch search strategy. Then an iterative registration algorithm is employed to remove severe geometric distortions by registering warped frames to the reference frame. During the registration process, we integrate JADE and LBFGS algorithms as an optimization strategy to expedite the control parameter optimization process. Finally, the registered frames are refined using PCA and the lucky-patch search algorithm to remove residual distortions and random noise. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods in terms of sharpness and contrast.


Introduction
Imaging through water is an important research direction of underwater optics and marine optics, which is widely employed in environmental, economic and military application fields [1][2][3][4][5][6]. Unlike other underwater imaging systems, the main challenge in this imaging scenario comes from the air-water interface. Random fluctuations of surface waves will introduce a variety of effects, such as non-single viewpoint distortion [7], double images [8], scene distortion, illumination caustics [9] and black holes [10,11]. These effects can cause extreme distortion and motion blur in the images. Therefore, inverting these distortions and retrieving high-quality images remains a formidable task.
Several efforts have been made in earlier works to address the issue of retrieving the distortion-free image from a video sequence distorted by surface waves. If the turbulence of surface waves is small, averaging the sequence or the blind deconvolution method [12] will perform well, whereas the reconstruction result using them will be extremely blurry once the waves are strong. In prior research, better techniques were to select the image patches with the least distortions and combine them [13][14][15][16][17][18]. The limit of these methods lies in the fact that they assume the corresponding patches from different times are spatially invariant, which causes such algorithms to handle only slightly fluctuating situations and is easy to cause information loss. Later, another distortion mitigation approach, an image restoration scheme of water surface modeling based on image sequence is described in [19][20][21][22][23][24]. However, it is challenging to portray different complicated fluid motion events in real environments using such a simplistic spatial model.
Recently, registration technology, which was initially developed to recover atmospheric turbulence images, has been applied to this problem. The typical strategy [25][26][27] is to use the temporal mean as the reference frame and to improve mean-frame correlation by blurring the sharp frames with a blur kernel. The mean and frames are then iteratively refined via robust registration. Although this scheme can significantly eliminate severe geometric distortions, it is prone to inadvertent misalignment and motion blur. Once in a strong wave environment, such effects will be more pronounced. Subsequently, Zhen Zhang et al. [28,29] attempt to reconstruct a higher-quality reference frame by using a lucky-patch search and a deblurring process. they attempted to guide the registration of a sharper boundary to remove geometry distortions. However, they ignore two key factors that affect the registration process: (1) The premise of a lucky patches search is that the corresponding regions from different frames are spatially invariant or approximate; otherwise, it will result in the apparition of double images in the synthetic frame; (2) The deconvolution procedure required in deblurring is often ill-posed, even the most advanced deblurring algorithms, such as [30][31][32], frequently fail and may result in the appearance of visual artifacts (commonly described as "artificial" edges) [33]. Although the contrast of the image is improved in their processing results, the overall performance evaluation is not satisfactory, and even inferior to Oreifej's method [25] in individual indicators.
In this work, we present a video-sequence-based method for image reconstruction. At first, a new lucky-patch search method is used to reconstruct a high-quality reference frame, where all warped frames are aligned in the first iteration, followed by the patch search in subsequent iterations. Secondly, the warped frames are then registered against the reference frame using image registration to eliminate the severe geometry distortions in the images. Finally, the registered frames are refined using PCA and the lucky-patch search algorithm to remove residual distortions and random noise.It is worth noting that in the registration stage, we introduce a hybrid optimization strategy to speed up the optimization process of the control parameters. The experimental results show that the proposed approach outperforms the state-of-the-art methods in terms of sharpness and contrast.

Related Work
Researchers from various fields have attempted to tackle the problem of recovering the real scene from a distorted video sequence. According to Cox-Munk law [34], the normals of the water surface generally follow a Gaussian distribution when the interface is sufficiently large and calm. As such, numerous approaches related to the lucky-patch were presented Efors et al. [13] described the reconstruction issue as a manifold embedding problem in which a patch with the smallest global distance was chosen as the distortion-free patch. Donate et al. [14] presented a multi-stage clustering approach to eliminate regions with significant amounts of translation and motion blur, independently. Wen et al. [15] used a bi-spectral analysis approach to transform the reconstruction issue into a phase recovery issue. A.V. Kanaev et al. [16] developed a method for image restoration based on optical flow and the lucky region. The lucky patch was determined using an image evaluation metric and the current frame's nonlinear gain coefficient at each location. Later in [17], A.V. Kanaev et al. propose a structural tensor-oriented image quality measure (STOIQ) to evaluate image quality. Zhang et al. [18] recently proposed "center evolution imaging", a simple yet effective progressive recovery method that improves restoration quality by constantly updating following warped frames. Subsequently, several researchers attempted to recover the distortion-free image based on model methods. Tian et al. [19] used a modelbased tracking approach to reconstruct the real scene and instantaneous shape of the water surface simultaneously. In later work, they proposed a data-driven iterative algorithm [20] and introduced the operation of "pulling back". In Ref [21], Seemakurthy K et al. proposed methods to remove motion blur caused by unidirectional cyclic waves and circular ripples, respectively. More recently, Li et al. [22] demonstrated their trained convolutional neural network that can remove dynamic refraction using a single image. James et al. [26] came up with a method for restoring underwater images using both compressive sensing (CS) and a local polynomial image representation. Thapa et al. [24] addressed a distortion-guided network (DG-Net) to restore distortion-free underwater images. They first use a physically constrained convolutional network to estimate the distortion map from the refracted image. Then, using the distortion map as a guide, a generative adversarial network was used to retrieve a sharp image with no distortion.
Recently, another typical approach, registration technology, was used to tackle this issue. Oreifej et al. [25] proposed a two-stage restoration method that first removed structured noise using a robust registration process, and then eliminated sparse noise by rank minimization. Hu Wenrui et al. [26] proposed a motion field kernel regression-based method to reconstruct distortion-free images. Halder et al. [27] introduced a pixel-shift map-based restoration method. First, the method uses image registration to estimate pixel displacement maps of warped images, and then reconstructs distortion-free scenes based on these maps. Zhen Zhang et al. [28,29] attempted to reconstruct a reference frame using a lucky-patch search and a deblurring process. They tried to guide the distorted frame registration to sharper boundaries to remove geometric distortions, but the results were not convincing. Cai et al. [35] proposed a reconstruction algorithm for recovering distorted images based on a two-round registration algorithm, which obviously improves the restoration accuracy but reduces the computational efficiency of image registration.
With the aim to raise the restoration accuracy, visual effects, and computational efficiency, we propose a new image reconstruction approach that utilizes a high-quality reference image to guide the distorted frames toward focusing on the sharper boundary or region. Moreover, an optimization strategy which combines the JADE with LBFGS methods is introduced to accelerate the iterative registration process. Unlike other methods, ours does not need a particular camera height [19], image priori [20], or special illumination [36,37]. Our approach is similar to the state-of-the-art [25,[28][29][30] in that it only needs a short sequence (61 frames) rather than 800 frames in [14], 300k frames in [22], 100 frames in [23], and 43,600 frames in [24], where it is the prerequisite for underwater imaging systems to achieve covert observation.

Methods
Assume a water-deformed video sequence V = {I 1 , I 2 , · · · , I n } in which each frame I k ∈ R w×h , k = 1, · · · , n is warped by an unknown deformation Γ(x). Our primary objective is to rebuild a distortion-free frame sequence V f = I f 1 , · · · , I f n and a highquality dewarped frame I F . The framework of the image reconstruction method in this paper is shown in Figure 1.

Principles
As Figure 1 shows, we optimize both the reference frame and registered frames using a robust iteration registration procedure, in which at each iteration the distorted frames are registered against the reference frame which is closer to the real-scene image. At first, a new lucky-patch search strategy is proposed to generate a higher-quality reference frame than the mean, in which we align all distorted frames to the average frame using an initial registration in the first iteration, the purpose of which is to remove the spatial variability between images introduced by random fluctuations of the water surface. We then perform the patch search algorithm in subsequent iterations. Next, similar to [25], we estimate the blur kernel to make sure that the distorted frames have the same blur level as the reference frame. Subsequently, a non-rigid image registration based on the hybrid optimization strategy is employed to estimate the motion vector field Γ(x) and remove the serious geometric distortions. The hybrid optimization strategy consists of a global optimization algorithm JADE [38] and a local optimization algorithm LBFGS [39], with the purpose of avoiding deformation parameters falling into local optimum and improve the registration process.

Principles
As Figure 1 shows, we optimize both the reference frame and registered frames using a robust iteration registration procedure, in which at each iteration the distorted frames are registered against the reference frame which is closer to the real-scene image. At first, a new lucky-patch search strategy is proposed to generate a higher-quality reference frame than the mean, in which we align all distorted frames to the average frame using an initial registration in the first iteration, the purpose of which is to remove the spatial variability between images introduced by random fluctuations of the water surface. We then perform the patch search algorithm in subsequent iterations. Next, similar to [25], we estimate the blur kernel to make sure that the distorted frames have the same blur level as the reference frame. Subsequently, a non-rigid image registration based on the hybrid optimization strategy is employed to estimate the motion vector field ()  x and remove the serious geometric distortions. The hybrid optimization strategy consists of a global optimization algorithm JADE [38] and a local optimization algorithm LBFGS [39], with the purpose of avoiding deformation parameters falling into local optimum and improve the registration process.
After a few iterations, the robust iteration registration will converge to new registered frames { , , , } r r1 r2 rn which are less distorted and blurred. However, there are still several sparse random noise and misalignments among them, which are difficult to remove by robust registration. In order to acquire a distortion-free sequence After a few iterations, the robust iteration registration will converge to new registered frames V r = {I r1 , I r2 , · · · , I rn } which are less distorted and blurred. However, there are still several sparse random noise and misalignments among them, which are difficult to remove by robust registration. In order to acquire a distortion-free sequence V f = I f 1 , · · · , I f n and a high-quality dewarped frame I F , the registered frames are refined using PCA technology and the lucky-patch search method described above to remove residual distortions and random noise.

Lucky-Patch Search with Initial Registration
In order to reconstruct a higher-quality synthetic image than the temporal mean of the sequence, we introduce an image-quality-assessment-driven lucky-patch search approach. Firstly, the video frames are divided into N overlapping sub-regions with the same size, R k = {P 1 , P 2 , · · · , P N }, k = 1, · · · , n , where the overlapping area of adjacent image patches is 50%. The k-Medoids clustering algorithm [41] is then used to try to search for the set of high-quality patches with the fewest distortions from each patch sequence P m = {b m1 , · · · , b mn } , m = 1, · · · , N. Subsequently, all of the high-quality patches from each patch sequence are averaged into a single patch. Finally, a complete fused image is constructed by splicing all the average patches together.
In this paper, we use edge strength similarity (ESSIM) [42] as a clustering criterion. ESSIM is a structural similarity index which has better performance than other image quality metrics such as SSIM [43] and FSIM [44]. EESIM is used for measuring the similarity between the patch b mk , m = 1, · · · , N ; k = 1, · · · , n and the mean u m of the corresponding patch sequence. EESIM is given by where M denotes the total number of pixels. E(u m , η) and E(b mk , η) represent the total edge-strength and the edge-strength of b mk at the η th pixel, respectively. The c is the scaling parameter. Furthermore, in order to ensure the corresponding patches from different times are spatially invariant or approximate, we first register all frames to the mean at the 1th iteration. The framework to reconstruct the reference frame is shown in Figure 2.
structed by splicing all the average patches together.
In this paper, we use edge strength similarity (ESSIM) [42] as a clustering criterion. ESSIM is a structural similarity index which has better performance than other image quality metrics such as SSIM [43] and FSIM [44]. EESIM is used for measuring the similarity between the patch = = , 1, , ; 1, , mk b m N k n and the mean m u of the corresponding patch sequence. EESIM is given by is the scaling parameter. Furthermore, in order to ensure the corresponding patches from different times are spatially invariant or approximate, we first register all frames to the mean at the 1th iteration. The framework to reconstruct the reference frame is shown in Figure 2.

Image Registration Based on the JADE & LBFGS Optimization Strategy
Similar to [30], a nonrigid image registration algorithm based on the JADE and LBFGS optimization strategies is applied to estimate the deformation () x and remove the distortion. Assume the image domain as of the iterative registration is to find the optimal transformation () x to guide registering the video sequence to the reference frame. The deformation () x can be expressed as local global Figure 2. The framework to reconstruct high-quality reference frame.

Image Registration Based on the JADE & LBFGS Optimization Strategy
Similar to [30], a nonrigid image registration algorithm based on the JADE and LBFGS optimization strategies is applied to estimate the deformation Γ(x) and remove the distortion. Assume the image domain as Ω = {(x, y) | 0 ≤ x < X, 0 ≤ y < Y}, the goal of the iterative registration is to find the optimal transformation Γ(x) to guide registering the video sequence to the reference frame. The deformation Γ(x) can be expressed as where Γ global and Γ local are the global transformation and the local transformation, respectively. Γ global describes the overall motion of the moving image, which is generally represented by an affine transformation with six parameters, namely where Θ = {θ 11 , θ 12 , θ 13 , θ 21 , θ 22 , θ 23 } represents the set of the six parameters.Γ local models the local deformation of the distorted frames. Local deformation is difficult to characterize using parameterized transformations since the wavy water surface introduces anisoplanatism effects for imaging through water. Instead, we employ the B-spline based free-form deformation model (FFD) to represent the local deformation. The basic ideal of FFD is to describe the geometry distortions between the moving image and the reference frame by manipulating an underlying mesh of control points. Let Φ as a n x × n y mesh of control points ϕ i,j with uniform spacing, the FFD can be expressed as where We minimize a cost function associated with the global transformation parameters Θ and the local control parameters Φ to determine an optimal transformation Γ(x). The function can be defined as where C similarity (Θ, Φ) is the similarity metric function which describes the difference between the reference frame I R and the distorted frame I k . Here, we utilize the squared sum of intensity differences (SSD) to denote the similarity metric function, namely where I k (Γ(x)) and I R (x) are the intensity of the corresponding pixels of two images, respectively. Furthermore, λ is the weighting parameter which defines the tradeoff between the alignment of the two images and the smoothness of the transformation. C smooth (Φ) is a regularization penalty term that controls the smoothness of the transformation. The penalty term in 2-D is expressed as follows: where A denotes the image domain.
Unlike the previous works [25][26][27][28][29][30], we no longer use the LBFGS search algorithm or the gradient descent method alone to minimize Equation (6), but combine the JADE and LBFGS algorithms as the search strategy that is described in detail in Section 3.3.1.

Optimization Strategy
Consider the minimization of a smooth nonlinear function f : R τ → R , min f (δ) (9) where δ denotes the multi-variable vector, δ ∈ R τ , and τ is the length of variables. First of all, first-order techniques, such as gradient descent, can be utilized to solve Equation (9). However, gradient descent may not perform well in ill-conditioned situations [45], and the convergence speed of the method will be slower as it approaches the objective function's lowest point. Typically, second-order techniques, such as Newton's method, are used to address this deficiency. Unfortunately, Newton's method needs the inverse of the Hessian matrix ∇ 2 f (δ) at each iteration, and its computing cost is prohibitively expensive for variables with a large dimension. In quasi-Newton approaches such as [39,46,47], an approximated Hessian matrix is used to take advantage of the cur- vature while also reducing computing complexity. Generally speaking, BFGS performs the best in practice. The limited-memory variant of the original BFGS algorithm (LBFGS) reduces the computational cost of each iteration even more, making it one of the most effective algorithms in this domain. However, similar to other local optimization methods, the LBFGS algorithm easily converges to the local optimal solution once the initial value δ 0 is not selected properly. Therefore, how to select the initialization parameters becomes the key to obtaining the optimal parameters. JADE [38] is a simple and effective global optimization method, which is a variant of the differential evolution algorithm. Differential evolution follows the general process of evolutionary algorithms. The initial population δ 0 NP is randomly generated according to a uniform distribution, where NP is the population size.
If nothing is known about the solution of the objective function, the initial population is randomly selected. If a preliminary solution is available, the initial population is usually generated by adding random deviations from the normal distribution. The JADE algorithm then enters a cycle of evolutionary operations: mutation, crossover, and selection. After repeating the above process, the algorithm will gradually approach the global optimal solution. Therefore, in order to avoid the transformation parameters (Θ, Φ) falling into a local optimum, we attempt to minimize Equation (6) by utilizing JADE with LBFGS algorithms. Among them, JADE is used to calculate a neighboring solution to the global optimal solution, and LBFGS is used to obtain the optimal solution. Furthermore, for computational efficiency, we do not use an iterative multi-resolution search strategy [40,41] any longer in the stage where the affine transformation parameters Θ are optimized. In practice, it is sufficient to utilize the combination optimization strategy.
Specifically, the optimization process in this paper can be divided into the following two stages. In the first stage, the integrated JADE and LBFGS algorithms are used to optimize the global transformation parameters Θ. In the second stage, the local transformation parameters Φ are optimized through a hierarchical multi-resolution strategy [40] such that the resolution of the control mesh is increased, along with the image resolution in a coarse-to-fine fashion.

Post-Processing
After a few iterations, we can obtain a new image sequence V r = {I r1 , I r2 , · · · , I rn } which is less distorted and blurred. However, there are still several sparse random noise and misalignments among them, which are difficult to remove by robust registration. In order to obtain a distortion-free image sequence V f = I f 1 , · · · , I f n , we refine the image sequence using PCA technology [48]. Subsequently, we perform the patch search strategy described in Section 3.2 again to select the patches with less distortion from the registered frames to fuse a higher-quality single image I F . Unlike the first fusion, we first divide each patch sequence into five groups rather than two groups using the k-Medoids clustering algorithm. The reason for this is that the differences between adjacent images are already small after iterative robust registration and sparse denoising. Moreover, to improve the registration efficiency, we can add a suitable termination condition ξ(v) to determine if the registration process has ended or not. The comprehensive process for image reconstruction is described in Algorithm 1. Algorithm 1: The correct format is:An efficient method for recovering the images distorted by surface waves Input: Distorted Video sequence V = {I 1 , I 2 , · · · , I n } , I k ∈ R w×h , k = 1, 2, · · · , n Output: Distortion-free sequence and high-quality frame GaussianFilter(I k ,B); Γ ⇐ ComputerDe f ormation(I blur , I R ); V r (:, :, k) ⇐ Dwrap(I k , Γ); end V = {I 1 , I 2 , · · · , I n } ⇐ V r = {I r1 , I r2 , · · · , I rn }, I rk ∈ R w×h , k = 1, 2, · · · , n; r = r + 1; end

Results
In the experiment, the proposed method described above was implemented on MAT-LAB (Math Work Co., Natick, MA, USA). To validate the effectiveness of our method, we first conducted extensive experiments using several standard underwater sequences from [19]. Then, we further tested the proposed method with two typical real-scene sequences from [18]. Moreover, we made a comparison between our experimental results with other state-of-the-art methods, such as Tian's method [19], Oreifej's method [25], Zhen Zhang's method [28] and Tao Sun's method [29]. The maximum number of iterations for all image registration algorithms was set as five and each patch of the corresponding sequence was set in the same size. The source codes of our method are available online in [49].

Test with the Data Set from the Air-to-Water Imaging Scenario
To more accurately validate the proposed method, the experiment employed the same standard underwater scene sequences from [19]. The data sets contain five image sequences (as Table 1 shows) of which each sequence contains 61 frames. Table 1 shows the size of each frame and patch utilized in the patch search procedure. In order to quantify the results of these methods, we employed three standard image quality/similarity metrics for quantitative evaluation, namely PSNR [30], MSE [6] and SSIM. Above metrics are classified into full-reference that represent the different between the moving frame and the reference frame. The expressions for MSE, PSNR, and SSIM can be described as where F and G are the estimated image and the reference image, respectively. max(·) represents the maximum value. u F and u G are the averages of F and G, respectively. σ F and σ G denote the variance of F and G, respectively. σ FG is the covariance of F and G. c 1 = ( 0.01 × β ), c 2 = ( 0.03 × β ) denote the constants, in which β represents the dynamic range of pixel values. Moreover, since the template image for "Checkerboard" and "Large fonts" are unknown, a novel non-reference underwater image quality measurement method UIQM [50] is used as another quality metric. The UIQM metric is given by where UICM, UISM and UIConM represent the image colorfulness metric, the image sharpness metric and the image contrast metric, respectively. λ 1 , λ 2 , λ 3 are constants which are generally set as 0.0282, 0.2953 and 3.5753. Due to the image sequences being gray images, the UICM is identically equal to zero. Figure 3 shows the recovery results of our method and other state-of-the-art methods. The image quality evaluation results are shown in Table 2. Compared with Oreifej's results, our results perform better in term of sharpness and contrast and preserve more detail of the underwater scene, which illustrate that the higher-quality reference frame than the mean better guides the warped image to focus on sharp boundaries or areas. In addition, we find that the results from Z. Zhang and T. Sun's methods still contain lots of residual distortions such as misalignment, ghosting and local deformation, etc. The main reason is that they ignore the premise of spatial invariance, which is very important for the lucky-patch search strategies. Unlike the methods [28,29], we attempted to reconstruct a high-quality reference frame utilizing a new patches search strategy which added an alignment process at the first iteration. The results show that our method can reconstruct all of the underwater scenes and make high-quality results that are better than others.

Analysis of Computational Efficiency
In term of computational efficiency, we compared the running time of different algorithms using two different termination conditions. In order to ensure fairness, we performed these methods on the same laptop computer. The operating system contains a CPU (i7-6700HQ) and RAM (8 GB). The data set was "Middle fonts". Moreover, in Oreifej's work [25], it was proved that the processing time of Tian's method was twice that of Oreifej's method, so it is no longer compared with Tian's method here in terms of computational efficiency. residual distortions such as misalignment, ghosting and local deformation, etc. The main reason is that they ignore the premise of spatial invariance, which is very important for the lucky-patch search strategies. Unlike the methods [28,29], we attempted to reconstruct a high-quality reference frame utilizing a new patches search strategy which added an alignment process at the first iteration. The results show that our method can reconstruct all of the underwater scenes and make high-quality results that are better than others.  [19], the results of Oreifej's method [25], the results of Z. Zhang' method [28], the results of T. Sun' method [29] and the result of our method.   [19], the results of Oreifej's method [25], the results of Z. Zhang' method [28], the results of T. Sun' method [29] and the result of our method. Table 2. Numerical comparison based on the standard data sets 1.

Large fonts
Tian [19] × × × 2.2026 Oreifej [25] × × × 2.0533 Z. Zhang [28] × × × 2.1072 T. Sun [29] ×  [25] × × × 2.4900 Z. Zhang [28] × × × 2.5504 T. Sun [29] × × × 2.5240 Our method × × × 2.5928 1 Because the template images of "Large " and "Checkerboard" are unavailable, the PSNR, MSE, SSIM are presented with "× Firstly, the termination condition is set as the maximum number of iterations. During the registration process, the comparison of the running times of different algorithms is shown in  For the purpose of raising efficiency, we also consider another approach, which is to set the iteration termination condition as the difference between the registered frames and the average of the current sequence V . Similar to [25], the termination condition can be expressed as where the threshold is set to 0.025. The restoration time of different methods is shown in Figure 5. We can notice that our method's overall running time and iteration numbers are significantly less than those of other approaches. For Oreifej's results, the main reason is due to the severely blurred mean image. The blurring of the mean will become worse once the fluctuation of the For the purpose of raising efficiency, we also consider another approach, which is to set the iteration termination condition as the difference between the registered frames and the average of the current sequence V. Similar to [25], the termination condition can be expressed as where the threshold is set to 0.025. The restoration time of different methods is shown in Figure 5. We can notice that our method's overall running time and iteration numbers are significantly less than those of other approaches. For Oreifej's results, the main reason is due to the severely blurred mean image. The blurring of the mean will become worse once the fluctuation of the water surface becomes stronger. In order to remove geometry distortions, the video frames are then also blurred to the same fuzzy level. However, the above strategy may reduce the registration efficiency due to the large amount of overlap and ghosting present in the mean image. As for Zhen Zhang's and Tao Sun's methods, the main reason is due to two aspects: (1) they ignored the effect of spatial inconsistency between different frames on the fused image. (2) They did not consider the problem of artificial artifacts introduced by deblurring methods. These problems can easily lead to distorted images being registered to the wrong sharp edges, which often have priority in their strategy. Thereby, a new patches search strategy is applied to generate the reference frame in our method, where we first perform an alignment operation for the input video sequence. Our reference frame is better than the mean image in terms of sharpness and contrast. Moreover, another important reason is that our combinatorial optimization strategy is significantly better than other algorithms in term of computational efficiency, which we discuss in detail in Section 5.3.

Test with the Data Set from the Water-to-Air Imaging Scenario
In order to further confirm the effectiveness of the proposed approach,, we tested data sets from another imaging scenario, the underwater-to-air imaging scenario. The data sets come from Zhang's experiment [18], whose source data are also available online. The data sets contain two typical sequences, "gravity" and "ripple", in which the first sequence is distorted by a gravity wave and the other is distorted by wind-generated capillary waves. Each sequence is also composed of 61 frames. The maximum number of iterations for all image registration algorithms was set to five, and each patch in the related sequence was set to the same size.
The recovery results with different methods are shown in Figure 6. Moreover, we compared these algorithms with a non-reference quality evaluation metric, UIQM, considering that template images are not available for the above datasets. The comparison results of the UIQM metric for different methods are shown in Table 3. The results show that our algorithm is still effective in water-to-air imaging scenarios and produces higher-quality results. Therefore, the method has better computational efficiency when the number of iterations is the same or unknown, which helps to promote the application of this technique in underwater covert observations, such as a virtual periscope.

Test with the Data Set from the Water-to-Air Imaging Scenario
In order to further confirm the effectiveness of the proposed approach" we tested data sets from another imaging scenario, the underwater-to-air imaging scenario. The data sets come from Zhang's experiment [18], whose source data are also available online. The data sets contain two typical sequences, "gravity" and "ripple", in which the first sequence is distorted by a gravity wave and the other is distorted by wind-generated capillary waves. Each sequence is also composed of 61 frames. The maximum number of iterations for all image registration algorithms was set to five, and each patch in the related sequence was set to the same size.
The recovery results with different methods are shown in Figure 6. Moreover, we compared these algorithms with a non-reference quality evaluation metric, UIQM, considering that template images are not available for the above datasets. The comparison results of the UIQM metric for different methods are shown in Table 3. The results show that our algorithm is still effective in water-to-air imaging scenarios and produces higher-quality results.
online. The data sets contain two typical sequences, "gravity" and "ripple", in which the first sequence is distorted by a gravity wave and the other is distorted by wind-generated capillary waves. Each sequence is also composed of 61 frames. The maximum number of iterations for all image registration algorithms was set to five, and each patch in the related sequence was set to the same size.
The recovery results with different methods are shown in Figure 6. Moreover, we compared these algorithms with a non-reference quality evaluation metric, UIQM, considering that template images are not available for the above datasets. The comparison results of the UIQM metric for different methods are shown in Table 3. The results show that our algorithm is still effective in water-to-air imaging scenarios and produces higher-quality results. Figure 6. The restoration results with different methods in water-to-air imaging scenarios. Figure 6. The restoration results with different methods in water-to-air imaging scenarios. Table 3. Numerical comparison based on Zhang's data sets [18] (Bold values are the best). During the patch fusion process, ESSIM was chosen as the quality metric. We also tested our algorithm using SSIM, FSIM, and NIQE [51] metrics, where NIQE is a nonreference metric. The fusion results based on the different image quality metrics at the 2th iteration stage are shown in Figure 7. In addition, the quality scores for the above four quality indicators (here, SSIM is used as the evaluation standard) are 0.6500, 0.6571, 0.6510, and 0.6289, and the computational efficiency is 4.56 s, 11.03 s, 78.12 s, and 33.03 s, respectively. According to the above results, considering the computational efficiency of the algorithm, we chose EESIM as the image quality metric at the patch fusion stage. The results show that other reliability metrics can be considered in future work to further optimize our patch fusion process.  Table 3. Numerical comparison based on Zhang's data sets [18] (Bold values are the best).

Selection the Image Quality Metric
During the patch fusion process, ESSIM was chosen as the quality metric. We also tested our algorithm using SSIM, FSIM, and NIQE [51] metrics, where NIQE is a non-reference metric. The fusion results based on the different image quality metrics at the 2th iteration stage are shown in Figure 7. In addition, the quality scores for the above four quality indicators (here, SSIM is used as the evaluation standard) are 0.6500, 0.6571, 0.6510, and 0.6289, and the computational efficiency is 4.56 s, 11.03 s, 78.12 s, and 33.03 s, respectively. According to the above results, considering the computational efficiency of the algorithm, we chose EESIM as the image quality metric at the patch fusion stage. The results show that other reliability metrics can be considered in future work to further optimize our patch fusion process.  Table 4 In Z. Zhang and T. Sun's approaches [28,29], they directly perform the patch search process to generate reference frames, ignoring the issue of spatial differences in input sequences. For this reason, the  Figure 8 demonstrates the evolution of the reference frame for the different methods in the registration process. The image quality evaluation results for the reference frames of different algorithms for each iteration are shown in Table 4 In Z. Zhang and T. Sun's approaches [28,29], they directly perform the patch search process to generate reference frames, ignoring the issue of spatial differences in input sequences. For this reason, the reference frames inevitably produce ghosting, misalignment, and overlapping effects as shown in Figure 8 (remarked by the red rectangle). Furthermore, we find that there are artifacts in the reference frame that are not related to image details in the results of Z. Zhang's method, which are clearly the result of blind convolution. In other words, deblurring operations similar to blind convolution algorithms are obviously infeasible and tend to degrade the restoration results. However, our method can always generate higher-quality reference frames than the mean, which would guide the registration process and lead to better restoration results, as shown in Figure 3. The reference frames generated by our patch search strategy perform better in sharpness and contrast than other methods. As shown in Figure 8, compared with other methods, the text information in the reference frame reconstructed by our method is clearer and more accurate (remarked by the green rectangle). The results show that it is essential to align the input sequence before applying the lucky block fusion technique. Obviously, our patch search strategy always produces higher-quality results.  [25], the results of Z. Zhang's method [28], the results of T. Sun's method [29] and the results of our method.   [25], the results of Z. Zhang's method [28], the results of T. Sun's method [29] and the results of our method.  Figure 9 demonstrates the evolution process of the synthetic frame I F for the "Middle fonts" sequence in the iterative registration process with three cases. As shown in Figure 9, our method recovers high-quality results after several iterations under all three strategies. However, textual details in images, such as "Imaging", "Tracking", and "computer", can only be clearly reconstructed using our method.  Figure 9 demonstrates the evolution process of the synthetic frame F I for the "Middle fonts" sequence in the iterative registration process with three cases. As shown in Figure 9, our method recovers high-quality results after several iterations under all three strategies. However, textual details in images, such as "Imaging", "Tracking", and "computer", can only be clearly reconstructed using our method. Figure 9. Evolution of the synthetic frame F I in registration process. Top to bottom: Our method employed without blurring or deblurring, with reference frame deblurring using [52], and with frame blurring using our method.

Analysis of the Evolution of Reference Frame
Furthermore, it is worth noting that, compared to the first strategy, both frame blurring and reference frame deblurring strategies can produce sharper results, which seems to mean that we can try to seek a more effective deblurring scheme to obtain high-quality results. Figure 9. Evolution of the synthetic frame I F in registration process. Top to bottom: Our method employed without blurring or deblurring, with reference frame deblurring using [52], and with frame blurring using our method. Furthermore, it is worth noting that, compared to the first strategy, both frame blurring and reference frame deblurring strategies can produce sharper results, which seems to mean that we can try to seek a more effective deblurring scheme to obtain high-quality results.

Comparison of the Optimization Strategy
The comparison results of the running time of different optimization strategies after each iteration are shown in Figure 10. The results show that our optimization strategy outperforms the state-of-the-art optimization methods used in [25][26][27][28][29][30]41] in terms of computational efficiency. The main reasons can be attributed to the following two aspects: (1) The combined strategy of JADE and LBFGS is used to estimate the initial value of the optimization parameters (Θ, Φ), which avoids the local optimum problem. (2) In the affine parameter optimization stage, the multi-resolution optimization strategy is no longer used.

Compared with Deep Learning Method
To more thoroughly validate its performance, we compare our method to another cutting-edge strategy, deep learning techniques, which Tian's method [19] also belongs to. In terms of repeatability, James' technique [23], the source code for which is available online, was chosen as a reference method. The proposed method and James' method were tested using identical testing data, "large fonts" and "ripple." Figure 11 shows the restoration effects of the two methods. The results show that these two methods can effectively reduce the geometric distortion and motion blur of the image and restore better results. However, in terms of contrast and sharpness, our algorithm shows better performance. As Figure 11 shows, when the water surface disturbance is small, both James' method and our method can restore the distorted scene image well, and our algorithm shows better sharpness (as indicated by the green markers). Once the water surface disturbance becomes stronger, the James method will not be able to recover high-quality results, especially the details of the scene (marked by the red rectangle). However, our method can reconstruct better boundaries and more complete details. Experimental results show that simple spatial distortion models similar to [23] may fail in heavily disturbed environments. However, our algorithm still produces high-quality results.

Compared with Deep Learning Method
To more thoroughly validate its performance, we compare our method to another cutting-edge strategy, deep learning techniques, which Tian's method [19] also belongs to. In terms of repeatability, James' technique [23], the source code for which is available online, was chosen as a reference method. The proposed method and James' method were tested using identical testing data, "large fonts" and "ripple." Figure 11 shows the restoration effects of the two methods. The results show that these two methods can effectively reduce the geometric distortion and motion blur of the image and restore better results. However, in terms of contrast and sharpness, our algorithm shows better performance. As Figure 11 shows, when the water surface disturbance is small, both James' method and our method can restore the distorted scene image well, and our algorithm shows better sharpness (as indicated by the green markers). Once the water surface disturbance becomes stronger, the James method will not be able to recover high-quality results, especially the details of the scene (marked by the red rectangle). However, our method can reconstruct better boundaries and more complete details. Experimental results show that simple spatial distortion models similar to [23] may fail in heavily disturbed environments. However, our algorithm still produces high-quality results. Figure 11. The comparison between James's method [23] and the proposed method. Left to right: the sample images, the results of James's [23], and the results of our method.

Conclusions
This paper proposes a new image reconstruction scheme with distorted video sequences while keeping image restoration accuracy and computational efficiency in mind. The main contributions are as follows: (1) The optimal combination of lucky-patch fusion and iterative registration algorithms is realized, which effectively eliminates the ghosting problem in a fused reference frame and raises the recovery accuracy of the distorted frames; (2) Introduce a mixed optimization strategy of JADE and LBFGS to accelerate the registration process.
We extensively experimented on the proposed method using five standard underwater sequences from [18] and the two typical data sets from [19]. Experiments show that our approach performs better in terms of sharpness and contrast. Therefore, this means that our algorithm has better development prospects in applications such as underwater or airborne objective monitoring, obstacle avoidance for underwater vehicles, seafloor mapping, etc.
To further raise the recovery accuracy and computational efficiency, we will develop a more effective method to generate a higher-quality reference frame in the future. Moreover, it is worth noting that the machine learning methods [53][54][55][56][57][58] have unique advantages in improving image recognition. For application scenarios, where covert monitoring is not considered, once the model is established, it can be used as a means of quickly detecting objects.  Institutional Review Board Statement: Not applicable Figure 11. The comparison between James's method [23] and the proposed method. Left to right: the sample images, the results of James's [23], and the results of our method.

Conclusions
This paper proposes a new image reconstruction scheme with distorted video sequences while keeping image restoration accuracy and computational efficiency in mind. The main contributions are as follows: (1) The optimal combination of lucky-patch fusion and iterative registration algorithms is realized, which effectively eliminates the ghosting problem in a fused reference frame and raises the recovery accuracy of the distorted frames; (2) Introduce a mixed optimization strategy of JADE and LBFGS to accelerate the registration process.
We extensively experimented on the proposed method using five standard underwater sequences from [18] and the two typical data sets from [19]. Experiments show that our approach performs better in terms of sharpness and contrast. Therefore, this means that our algorithm has better development prospects in applications such as underwater or airborne objective monitoring, obstacle avoidance for underwater vehicles, seafloor mapping, etc.
To further raise the recovery accuracy and computational efficiency, we will develop a more effective method to generate a higher-quality reference frame in the future. Moreover, it is worth noting that the machine learning methods [53][54][55][56][57][58] have unique advantages in improving image recognition. For application scenarios, where covert monitoring is not considered, once the model is established, it can be used as a means of quickly detecting objects.