Underwater Image Enhancement and Mosaicking System Based on A-KAZE Feature Matching

Feature extraction and matching is a key component in image stitching and a critical step in advancing image reconstructions, machine vision and robotic perception algorithms. This paper presents a fast and robust underwater image mosaicking system based on (2D)2PCA and A-KAZE key-points extraction and optimal seam-line methods. The system utilizes image enhancement as a preprocessing step to improve quality and allow for greater keyframe extraction and matching performance, leading to better quality mosaicking. The application focus of this paper is underwater imaging and it demonstrates the suitability of the developed system in advanced underwater reconstructions. The results show that the proposed method can address the problems of noise, mismatching and quality issues which are typically found in underwater image datasets. The results demonstrate the proposed method as scale-invariant and show improvements in terms of processing speed and system robustness over other methods found in the literature.


Introduction
Underwater imaging is an important technique used to document and reconstruct biologically or historically important sites that are generally inaccessible for the majority of public and scientific communities. A variety of technical and advanced equipment has been used to acquire underwater imaging, including sonar, thermal, optical camera and laser. In modern day underwater imaging, high quality optical cameras are typically the preferred method used for a host of computer vision tasks from scene reconstructions and navigation [1] and to intervention tasks [2]. However, there are some critical issues in processing underwater images and in maintaining high levels of image quality robustness. Environment and optical noise, wave disturbances, light stability and equality, temperature fluctuations and other environment factors can all affect underwater imaging quality and can make documentation of underwater domains one of the most challenging tasks. Image processing techniques can help researchers in other fields to process data and find best objects and key-points in the images. In the underwater environment, optical scattering is the one of main issues that can cause various distortions including color loss in the underwater image [3].
Research in the area of image processing for underwater domains has been an area of growing importance and relevance to the blue economy and has been a focus also for energy production sectors. Lu et al. [4] found that some flicker exists in the underwater images and they proposed a method of median dark channel prior technique for descattering. Li et al. [3] proposed a system for improving the quality of underwater images. Lu et al. [5] proposed a system for transferring an underwater style image to a recovered style to restore underwater images using Multi-Scale Cycle Generative Adversarial Network (MCycle GAN) System. They included a Structural Similarity Index Measure of loss (SSIM loss) for underwater image restoration. They designed an adaptive SSIM loss to adapt underwater image quality using dark channel prior (DCP) algorithm.
There are some new methods presented for feature extraction in nonlinear scale spaces. Alcantarilla et al. [6] introduced a multiscale 2D feature detection method in nonlinear scale spaces called KAZE which means wind in the Japanese language. Other, more common techniques can extract features by building the Gaussian scale space of images at different levels which results in smoothing the image boundaries. Alcantarilla et al.'s approach described 2D features in a nonlinear scale space using nonlinear diffusion filtering. This method of nonlinear diffusion filtering increases repeatability and distinctiveness compared to SIFT (scale-invariant feature transform) [7] and SURF (speeded-up robust features) [8] approaches. However, a disadvantage to this approach is that it can be computationally intense.
Image stitching includes image matching, feature matching, bundle adjustment, gain compensation, automatic panorama straightening and multiband blending [9]. An image stitching system is very useful but possibly cannot guarantee accuracy for the underwater images. Chen et al. [10] proposed a method for UAV image mosaicking with optical flow based on nonrigid matching algorithms, local transformation descriptions and aerial image mosaicking. Zhang et al. [11] used the classic SIFT and matching algorithm for the registration of images, in which the false matching points are removed by the RANSAC (random sample consensus) algorithm. These systems also result in very high mismatch data for underwater image features and factors.
Nunes et al. [12] presented a mosaicking method for underwater robotic operations such as real-time object detection. This model is called robust and large-scale mosaicking (ROLAMOS) which composes sequences of the seafloor from visual. Elibol et al. [13] proposed an image mosaicking technique for visual mapping in underwater environments using multiple underwater robots that classifies overlapping image pairs in the trajectories carried out by the robot formation. In an experimental study, Ferentinos et al. [14] used objective computer vision and mosaicking techniques in processing sidescan sonar seafloor images for the separation of potential ancient shipwreck targets from other seafloor features with similar acoustic signatures. This paper proposes and investigates a novel method for image processing and an advanced image stitching technique which is suited to the underwater domain and addresses many of the concerns raised over quality, domain suitability and robustness. The paper is organized as follows: Section 2 introduces the preprocessing of underwater images such as noise removal and image enhancement operations and provides a proposed method for feature extraction and image matching using principle component analysis and other techniques. Section 3 shows the results of the proposed model and comparison of this work and other valid literature results. Finally, Section 4 documents paper conclusions.

Materials and Methods
This paper puts forward an image stitching technique for underwater pipe images based on (1) Fast Fourier Transform (FFT) for image noise removal, (2) Mix-CLAHE for image enhancement, (3) A-KAZE and (2D)2PCA for image matching and finally (4) optimal seam-line technique for image stitching. The flow chart of the proposed method is shown in Figure 1. Image stitching is carried out by the optimal seam-line method.

Noise Reduction
Noise is known as uncorrelated with respect to the image and there is no relationship between the noise values and image pixels. Underwater images are frequently corrupted by noise due to several parameters such as environment factors including hazing and turbidity. The computational problem for the Fast Fourier Transform (FFT) is to compute the sequence Xk of N complex-valued numbers that given another sequence of data xn of length N, according to [15]:

Image Enhancement
Currently there is much research ongoing in underwater image enhancement and in the use of algorithms designed around the characteristics of the underwater image such as low contrast and color cast. Outdoor image enhancement models can be adapted and used for underwater images; however, these methods change the pixel values in either the transformed domain or the spatial domain. There are some more advanced models for underwater image enhancement such as deep learning models, Convolutional Neural Networks (CNN) utilisation, transform-domain image enhancement and spatial-domain image enhancement [16]. Many methods have been reviewed in this work for both outdoor and underwater image enhancement domains. The area of research is broad and includes: Contrast Limited Adaptive Histogram Equalization (CLAHE) [17], Gamma Correction, and Generalized Unsharp Masking (GUM) [18], elative global histogram stretching (RGHS) [19], homomorphic filter and an anisotropic filter [20], wavelet-based fusion [21], waveletbased perspective enhancement technique [22], CNN-based underwater image enhancement method [23], UIE-net (Underwater Image Enhancement-net) [24], WaterGAN [25], adopted GANs [26], Wasserstein GAN [27] and others.
The refractive index is an important key in underwater imaging which has a perceptible impact on underwater imaging sensors. In the underwater imaging, even the case of a camera can act as a lens and causes refraction of light. Jordt et al. [28] using optical cameras proposed a geometrical imaging model for 3D reconstruction to address the geometric effect of refraction. The geometric effect of refraction can be seen in the NIR wavelength as radial distortion. Anwer et al. [29] proposed a time of flight correction method to overcome the effect of refraction of light in the underwater imaging. Łuczyński et al. [30] presented the pinax model for calibration and rectification correction of underwater cameras in flat-pane housings. Their model takes the refraction indices of water into

Noise Reduction
Noise is known as uncorrelated with respect to the image and there is no relationship between the noise values and image pixels. Underwater images are frequently corrupted by noise due to several parameters such as environment factors including hazing and turbidity. The computational problem for the Fast Fourier Transform (FFT) is to compute the sequence X k of N complex-valued numbers that given another sequence of data x n of length N, according to [15]:

Image Enhancement
Currently there is much research ongoing in underwater image enhancement and in the use of algorithms designed around the characteristics of the underwater image such as low contrast and color cast. Outdoor image enhancement models can be adapted and used for underwater images; however, these methods change the pixel values in either the transformed domain or the spatial domain. There are some more advanced models for underwater image enhancement such as deep learning models, Convolutional Neural Networks (CNN) utilisation, transform-domain image enhancement and spatial-domain image enhancement [16]. Many methods have been reviewed in this work for both outdoor and underwater image enhancement domains. The area of research is broad and includes: Contrast Limited Adaptive Histogram Equalization (CLAHE) [17], Gamma Correction, and Generalized Unsharp Masking (GUM) [18], elative global histogram stretching (RGHS) [19], homomorphic filter and an anisotropic filter [20], wavelet-based fusion [21], wavelet-based perspective enhancement technique [22], CNN-based underwater image enhancement method [23], UIE-net (Underwater Image Enhancement-net) [24], WaterGAN [25], adopted GANs [26], Wasserstein GAN [27] and others.
The refractive index is an important key in underwater imaging which has a perceptible impact on underwater imaging sensors. In the underwater imaging, even the case of a camera can act as a lens and causes refraction of light. Jordt et al. [28] using optical cameras proposed a geometrical imaging model for 3D reconstruction to address the geometric effect of refraction. The geometric effect of refraction can be seen in the NIR wavelength as radial distortion. Anwer et al. [29] proposed a time of flight correction method to overcome the effect of refraction of light in the underwater imaging. Łuczyński et al. [30] presented the pinax model for calibration and rectification correction of underwater cameras in flat-pane housings. Their model takes the refraction indices of water into account and is enough to calibrate the underwater camera only once in air and for underwater imaging. In this proposed model, a method called Mixture Contrast Limited Adaptive Histogram Equalization (Mix-CLAHE) [31] has been applied to improve the visibility and contrast of underwater images. The method operates CLAHE on the HSV and RGB color models to generate two images, which are combined by the Euclidean norm.
CLAHE is a kind of Adaptive Histogram Equalization (AHE) which limits the amplification by clipping the histogram at clip limit which is a user-defined value. It determines noise smoothing in the contrast enhancement and histogram. Mix-CLAHE is a mix of the results of CLAHE-RGB and CLAHE-HSV. The Mix-CLAHE first normalize the result of CLAHE-RGB as: where Red (R), Green (G) and Blue (B) above are RGB color model terms and Hue (H), Saturation (S) and Value (V) are HSV color model terms. Then the result of CLAHE-HSV is converted to RGB with the conversion from HSV to RGB denoted by (r c2 ; g c2 ; b c2 ) and the results are combined using a Euclidean norm as: (3)

Image Matching
An advanced technique for feature extraction is 2-directional 2-dimensional principal component analysis ((2D) 2 PCA). In this method a 2-dimensional PCA is used in the row direction of images, and then an alternative 2-dimensional PCA is operated on the column direction of images. In the (2D) 2 PCA technique for feature extraction, the size reduction is applied in the rows and columns of images simultaneously [32]. In order to describe the different patterns and angles of underwater image within one image, the texture attribute can be used, since texture contains information about the spatial distribution of gray levels and keeps information with variations in brightness, orientation, and angles. However, the high dimensionality of a feature vector in underwater image that represents texture attributes limits its computational efficiency, so it is necessary to choose a method that combines the representation of the texture with the decrease of dimensionality, in a way to make the retrieval and mosaicking algorithm more effective and computationally treatable. Furthermore, 2-directional 2-dimensional principal component analysis is a fast and accurate feature extraction and data representation technique that aims at finding a less redundant and more compact representation of data in which a reduced number of components can be independently responsible for data variation.
To apply this technique on an underwater image A with m rows and n columns, the covariance matrix C can be defined as: where M is defined as the training sample with m by n matrices, which are shown by A k (k = 1, 2, . . . , M) and A and C are defined as the average matrix and covariance matrix respectively and A denote the i-th row vectors of A k and A respectively. Equation (4) is a 2-dimensional PCA operator in the image rows and another 2-dimensional PCA can be applied in the image columns as: denote the j-th column vectors of A k and A respectively. q first high eigenvalues of matrix C are located as columns in the matrix Z which Z ∈ R m×q . Projecting the random matrix A onto Z yields a "q by n" matrix Y = Z T A and projecting the matrix A onto Z and X generates a "q by d" matrix Y = Z T AX. The matrix C is then used as the extracted feature matrix in the proposed method.
Alcantarilla et al. [33] also proposed a fast and novel multiscale feature detection approach that exploits the benefits of nonlinear scale spaces called Accelerated-KAZE (A-KAZE). After the use of noise removal techniques, the main data in the image can typically be damaged and blurred. The A-KAZE method uses the nonlinear scale space that blurs the image data, resulting in noise reduction without damaging the image pixels. Nonlinear scale space is built using the fast-explicit diffusion (FED) algorithm and the principle of nonlinear diffusion filter. The image luminance is diffused by the nonlinear nature of the partial differential equation with nonlinear scale space. The classic nonlinear diffusion is defined by: where Lum is the luminance of the image, div is divergence, ∇ is gradient operator, t is a scale parameter of function and C is the conductivity function, being local to image structure C that guarantees the applicability of diffusion. The function C can be either a scalar or a tensor based on the image structure and is defined by: where Lum σ and ∇Lum σ are smoothed Gaussian versions of the image and gradient of Lum σ , respectively. Although there are some conductivity functions, the conductivity function G 2 supports wide regions and can be expressed by: where λ is a contrast factor that is used to remove edges. In the A-KAZE algorithm, after feature extraction, the element of the Hessian for each of the filtered images Lum i in the nonlinear scale space will be computed. The calculation of the Hessian matrix can be defined by: where σ 2 i,norm is the normalized scale factor of the octave of each image in the nonlinear scale (i.e., σ i,norm = σ i 2 σ i ). Lum i xx and Lum i yy are the horizontal and vertical image of the second-order derivative, respectively and Lum i xy is the cross-partial derivative. The eigenvectors and the main directions of the eigenvalues are constructed. In this step, the eigenvectors with scale and rotation invariance are extracted based on the first-order differential images.
The A-KAZE algorithm uses a Modified-Local Difference Binary (M-LDB) to describe the feature points and exploit gradient and intensity information from the nonlinear scale space. Yang and Cheng [34] introduced the LDB descriptor and developed the same principle as BRIEF [35].

Image Mosaicking
Image mosaicking is a technology that combines several overlapped images into a large image, including steps of image acquisition, image registration and image fusion [36]. A fast and high accurate image matching method is necessary to achieve a high-resolution mosaic of underwater images.
After feature and key-points extraction, the system selects main pixels and images to put in the final stitched image. In the optimal seam-line method, these pixels should be combined to minimize visible seams and ghosting [37]. Considering the stitching method for two images (a) and (b), the optimal seamline can be defined as: where c a and c b represent changes of the two images in axis directions x and y, respectively. Meanwhile, E clo and E str are the difference of the average energy in the related neighborhood and the similarity of the geometrical structure between images indicated by gradients. α and β are weighting factors and used to measure the proportion of the relationship between structural change and color change.

Results
The proposed model and reported test results of this paper have been applied to a sample of underwater pipe images from the online MARIS dataset [38]. The MARIS dataset was acquired underwater near Portofino in Italy utilizing a stereo vision imaging system. It provides images of cylindrical pipes with different colour submerged at 10 m depth. The dataset includes 9600 stereo images in Bayer encoded format with 1292 × 964 resolution.
The (2D) 2 PCA technique has been used for the feature extraction. Several feature matrices have been selected and compared to other literature techniques. Techniques such as PCA, 2DPCA and SVD are used for feature extraction in machine vision applications on the same dataset and as demonstrator for comparison of results. The system demonstrates improved results while utilizing few principal components. Figure 2 shows two input images and the subsequent results of FFT noise reduction algorithm. As can be seen, the processed images have improved vision clarity and have been prepared for the next step as input to the image to image enhancement stage.
= min ( , ), ( , ) ( , ) = ∝ ( , ) + ( , ) where ca and cb represent changes of the two images in axis directions x and y, respectively. Meanwhile, Eclo and Estr are the difference of the average energy in the related neighborhood and the similarity of the geometrical structure between images indicated by gradients. α and β are weighting factors and used to measure the proportion of the relationship between structural change and color change.

Results
The proposed model and reported test results of this paper have been applied to a sample of underwater pipe images from the online MARIS dataset [38]. The MARIS dataset was acquired underwater near Portofino in Italy utilizing a stereo vision imaging system. It provides images of cylindrical pipes with different colour submerged at 10 m depth. The dataset includes 9600 stereo images in Bayer encoded format with 1292 × 964 resolution.
The (2D) 2 PCA technique has been used for the feature extraction. Several feature matrices have been selected and compared to other literature techniques. Techniques such as PCA, 2DPCA and SVD are used for feature extraction in machine vision applications on the same dataset and as demonstrator for comparison of results. The system demonstrates improved results while utilizing few principal components. Figure 2 shows two input images and the subsequent results of FFT noise reduction algorithm. As can be seen, the processed images have improved vision clarity and have been prepared for the next step as input to the image to image enhancement stage.   Figure 4 shows a comparison between this model and other image enhancement method results [39]. Histogram Equalization (HE) is a traditional technique for image intensities adjusting to enhance contrast. Integrated Color Model (ICM) converts      Figure 5 shows the result of feature extraction and image matching using (2D) 2 PCA and A-KAZE algorithms. The results show the features matched from two images with different camera angles in a video frame from an underwater motion camera. Figure 6 shows a comparison between this model and another image matching method result called Oriented FAST and Rotated BRIEF (ORB) for underwater image [45] which is basically a fusion of the scale-invariant feature transform (FAST) key-point detector and the Binary Robust Independent Elementary Features (BRIEF) descriptor with many modifications to enhance the performance. As shown, the key-points extraction and matching in the proposed method is very accurate and this method is better than other methods for image mosaicking. The used number of match point for the proposed method and ORB method ( Figure 6) is 110 points. The highest difference angle is selected between first and last image to show the accurate of matching and mismatching points in this comparison. Figure 7 shows the result of the proposed method for a collection of video frames with different angels and contrasts. The results clearly show that this method can create a good underwater mosaicking from different frames with high time distance in pipe underwater image dataset. The dataset of frames used is acquired at 15 frames per second and in the first step one image in every 10 frames has been selected for the mosaicking process. Regarding the high difference angle between images, a little curviness for the final image is inevitable. This curve is generated to keep all image pixels for final image. In the image mosaicking process, this curve can be improved in the final image based on the number of match points.  [19]; (e) Unsupervised Color Correction Method [43]; (f) Rayleigh Distribution [44] and (g) Screened Poisson [40]. Figure 5 shows the result of feature extraction and image matching using (2D) 2 PCA and A-KAZE algorithms. The results show the features matched from two images with different camera angles in a video frame from an underwater motion camera. Figure 6 shows a comparison between this model and another image matching method result called Oriented FAST and Rotated BRIEF (ORB) for underwater image [45] which is basically a fusion of the scale-invariant feature transform (FAST) key-point detector and the Binary Robust Independent Elementary Features (BRIEF) descriptor with many modifications to enhance the performance. As shown, the key-points extraction and matching in the proposed method is very accurate and this method is better than other methods for image mosaicking. The used number of match point for the proposed method and ORB method ( Figure 6) is 110 points. The highest difference angle is selected between first and last image to show the accurate of matching and mismatching points in this comparison.    Figure 7 shows the result of the proposed method for a collection of video frames with different angels and contrasts. The results clearly show that this method can create a good underwater mosaicking from different frames with high time distance in pipe underwater image dataset. The dataset of frames used is acquired at 15 frames per second and in the first step one image in every 10 frames has been selected for the mosaicking process. Regarding the high difference angle between images, a little curviness for the final image is inevitable. This curve is generated to keep all image pixels for final image. In the image mosaicking process, this curve can be improved in the final image based on the number of match points.   Figure 8 shows the result of the proposed method for a collection of video frames with more different angles and in this step one image in every 20 frames has been selected and the difference frame between first and last image is 60.  Figure 9 presents the underwater pipe mosaicked images for other mosaicking methods based on SIFT and random sample consensus (RANSAC) [46] and shows differences between these method and the proposed model. This technique is used to keep the accuracy and quality of mosaicking by eliminating the influence of mismatched point pairs in underwater images on mosaicking. Red rectangles show obviously mismatched points and regions based on SIFT and random sample  Figure 9 presents the underwater pipe mosaicked images for other mosaicking methods based on SIFT and random sample consensus (RANSAC) [46] and shows differences between these method and the proposed model. This technique is used to keep the accuracy and quality of mosaicking by eliminating the influence of mismatched point pairs in underwater images on mosaicking. Red rectangles show obviously mismatched points and regions based on SIFT and random sample consensus (RANSAC) and green rectangles show differences between results of the proposed method and the other technique. Accuracy assessment is critical to discern the performance of the image matching and mosaicking methods and is an important key for image mosaicking use in underwater smart robots. It is to be noted that the input images for this technique (shown in Figure 9) are the denoised underwater pipe images using the FFT technique. consensus (RANSAC) and green rectangles show differences between results of the proposed method and the other technique. Accuracy assessment is critical to discern the performance of the image matching and mosaicking methods and is an important key for image mosaicking use in underwater smart robots. It is to be noted that the input images for this technique (shown in Figure 9) are the denoised underwater pipe images using the FFT technique.

Discussion
(a) (b) To evaluate the accuracy, the Root-Mean-Square Error (RMSE) is used. The distance between a point in the first image (in each image pairs) and the corresponding point in the sequence frame with different look and angle (second image in each image pairs) is [47]: where (xi, yi) and (Ui, Vi) are the coordinates of a pair of corresponding points in the first image and second image, respectively and N is the total number of correct matching pairs. RMSE is defined as: To evaluate the accuracy and effectiveness of the proposed method, five image pairs are tested ( Figure 5). Table 1 shows comparisons of the results of matching accuracy (RMSE), number of extracted feature points and correct matching rate applied on test images. It can be seen from Figure  9 and Table 1 that the proposed method results are more accurate than other methods. Table 1 shows that although the number of feature points extracted by SURF and SIFT is larger than the proposed method points number for each image pairs, matching accuracy in the proposed technique is better than other methods. The correct matching rate (CMR) can be defined as: where NC and N are logarithm of correct matching points and all logarithm of all matching points, respectively. To evaluate the accuracy, the Root-Mean-Square Error (RMSE) is used. The distance between a point in the first image (in each image pairs) and the corresponding point in the sequence frame with different look and angle (second image in each image pairs) is [47]: where (x i , y i ) and (U i , V i ) are the coordinates of a pair of corresponding points in the first image and second image, respectively and N is the total number of correct matching pairs. RMSE is defined as: To evaluate the accuracy and effectiveness of the proposed method, five image pairs are tested ( Figure 5). Table 1 shows comparisons of the results of matching accuracy (RMSE), number of extracted feature points and correct matching rate applied on test images. It can be seen from Figure 9 and Table 1 that the proposed method results are more accurate than other methods. Table 1 shows that although the number of feature points extracted by SURF and SIFT is larger than the proposed method points number for each image pairs, matching accuracy in the proposed technique is better than other methods. The correct matching rate (CMR) can be defined as: where N C and N are logarithm of correct matching points and all logarithm of all matching points, respectively. Finally, for another image mosaicking accuracy evaluation, the mosaicked results have been compared to the ground truth [48]. The mosaicking accuracy is calculated by the mean difference between the coordinates of the corresponding pixel and average reprojection error in pixels (e M ) can be defined as: where N is the number of corresponding pixel pairs. x i and x i are matching pixels in the mosaicked and the ground-truth, respectively. Large e M shows the deformation in the mosaicking process. Table 2 shows the mosaicking error for the proposed method and other techniques. For this comparison, data 1 and data 2 are mosaicked images from the main dataset with 10 and 20 frames difference between images.

Conclusions
In this paper, we propose an optimized mosaicking underwater image method based on (2D) 2 PCA, A-KAZE and optimal seam line technique which is evaluated against other competing techniques using an underwater pipe image dataset. Firstly, the preprocessing is based on FFT and Mix-CLAHE methods, with feature extraction completion. After that, mosaicked images are generated based on optimal seam-line method for image fusion. The effectiveness of the proposed method has been shown in comparison with other approaches reported in the literature. The developed method is shown, through demonstration and a Root-Mean-Square Error estimator, to give significant improvement to that of comparison systems. Future work will investigate adaptation of this system to real-time underwater image mosaicking.

Conflicts of Interest:
The authors declare no conflict of interest.