Blind Restoration of a Single Real Turbulence-Degraded Image Based on Self-Supervised Learning

: Turbulence-degraded image frames are distorted by both turbulent deformations and space–time varying blurs. Restoration of the atmospheric turbulence-degraded image is of great importance in the state of affairs, such as remoting sensing, surveillance, trafﬁc control


Introduction
Space detection research, long-range video surveillance, and drone imaging systems are often affected by the atmosphere of the Earth [1,2]. This is mainly due to the continuous heating and cooling of the atmosphere during day and night; the atmosphere refractive index changes randomly, which makes the Earth's atmosphere constantly move irregularly. Generally, this phenomenon is called atmospheric turbulence. Atmospheric turbulence will seriously affect the imaging resolution and imaging quality of long-range imaging systems [3]. Therefore, eliminating the influence of atmospheric turbulence from images is crucial for remote imaging systems.
In the mid-1960s, inverse convolution was first used in the restoration of turbulencedegraded images. Subsequently, Labeyrie proposed a speckle interferometry method, which superimposed the power spectrum containing a large number of short-exposure images; however, it does not seem possible to use it for discriminating faint stars against the sky background [4]. Dainty and Ayers proposed a blind convolution restoration method based on a single frame, which is called the iterative blind deconvolution (IBD) algorithm; it should be stressed that the uniqueness and convergence properties of the deconvolution algorithm are uncertain and that the effect of various amounts of noise existing in the convolution data is unknown at present [5]. Non-iterative algorithms mainly include the lucky imaging algorithm and the additive system of photographic exposure (APEX) algorithm [6,7]. However, both the iterative and non-iterative algorithms are sensitive to the system noise contained in the image, and the turbulence degradation model is difficult to express with an accurate mathematical analytical expression, which further increases the difficulty by using model-based or related prior conditions to remove the turbulence effect from an image.
In recent years, machine learning has shown outstanding performance on various image processing tasks, especially in Convolutional Neural Networks (CNN) regarding image feature extraction [8][9][10][11]. Accordingly, researchers have increasingly begun to apply machine learning to restore images degraded by atmospheric turbulence. Currently, the existing machine learning method is mainly a supervised learning framework, which is simply a formalization of the idea of learning from examples, and this type of learning relies on quantities of data to train the model. However, it is difficult to obtain plenty of real blurred-sharp image pairs for the training network. Therefore, the training set is generally constructed using synthetic blurred-sharp image pairs, including in studies by Gao et al. [12,13] and Chen et al. [14]. Although the existing networks have obtained good results for simulated degraded images, the restoration results of real turbulence-distorted images are unknown. For instance, Bai et al. [15] presented the Fully Convolutional Networks (FCN) and the Conditional Generative Adversarial Network (CGAN) model, respectively, for the blind restoration of turbulence-distorted images, which had an excellent output in simulated degraded images. However, the experiment was not carried out for natural turbulence-distorted images. Subsequently, in 2020, a deep autoencoder combined with the U-Net network model was proposed by Gao et al. [13] to remove the turbulence effect; they conducted restoration experiments on real degraded images but did not introduce evaluation indicators of the output results.
In general, it is a demanding task to resume high-level turbulence corrupted images as the image degradations get more severe and obtaining a correct reconstruction. In this article, instead of requiring a massive training sample size in deep networks, we adopt a self-supervised training strategy to model the turbulence from a relatively small dataset. Specifically, an effective model was proposed to attempt to solve the problem, which only inputs a single real turbulence-degraded image for training and testing. The whole network consists of two parts, one of which is to generate the latent clear image using the Deep Image Prior (DIP) network [16], and the other is to estimate the fuzzy kernel of the distorted image by three layers of the Convolutional Neural Network (CNN). Additionally, an effective regularization is introduced into the whole network framework in order to ensure a better restoration effect. In fact, the process of the proposed restoration approach can be explained as a kind of "zero-shot" self-supervised learning of the generative networks. Presently, "zero-shot" self-supervised learning has been widely applied in multiple task domains [17][18][19].
Here, we summarize the novelties of this paper as follows: (1) We proposed an effective self-supervised model to attempt to solve the problem; our proposed method can minimize finer geometrical distortions while requiring only a single turbulent image to get the restored image; (2) Effective regularization by denoising (RED) is introduced into the whole network framework in order to ensure a better restoration effect; (3) We conducted extensive experiments by incorporating the proposed approach, which demonstrate that our method can surpass previous state-of-art methods both quantitatively and qualitatively (for more information, see Sections 4.1-4.4).
The rest of this article is organized as follows. Section 2 mainly introduces the meaning of self-supervised learning, the network architecture, and the training process. Section 3 describes the follow-up comparison method, the data source, and the index of the com-Remote Sens. 2023, 15, 4076 3 of 18 parative experiment. Section 4 presents the results and discussions. Finally, this work is concluded in Section 5.

Self-Supervised Learning
Self-supervised learning was firstly introduced in robotics, where training data is automatically labeled by leveraging the relationship between different input sensor signals. Recently, self-supervised learning has received extensive attention from growing machine learning researchers [20][21][22][23]. Yann LeCun also spoke highly of self-supervised learning on AAAI 2020. Specifically, the approach is essentially special unsupervised learning with the supervised form. To complete the related specific task requirements, the label information usually comes from the data itself, and various auxiliary tasks are used to improve the quality of the representation learning. In terms of representation learning, self-supervised learning has great potential to replace fully supervised learning. The objective function of which can be expressed as follows: where given the training data distribution D and the augmentation distribution T, the model parameter θ is trained to maximize/minimize Equation (1), where f (θ) is the encoder that contains a backbone a projector, , and is the similarity/distance function [24]. In contrast to the artificial label provided as the label of each data in supervised learning, self-supervised learning generates corresponding pseudo labels for training according to specific task requirements. Therefore, the method for automatically obtaining pseudo-labels is very important. Self-supervised representation learning methods can be divided into three types for different types of pseudo-labels: (I) Generative [25,26]; (II) Contrastive [27,28]; and (III) Generative-Contrastive (adversarial) [29,30].
Since the restoration task corresponding to this article is mainly the first task type, it is considered to have good meaning to be able to restore the characteristics of the original data through the learned elements in the data recovery scheme. Currently, the usual method of this type is mainly the Autoencoder, and the architecture of Autoencoder is as follows: It can be seen from Figure 1 that the Autoencoder mainly includes an encoder and a decoder [31], and its structure is usually symmetrical. Overall, the purpose of the Autoencoder is to reconstruct the input data in the output layer. The ideal situation is that the output signal, y, is the same as the input signal, x. Thus, the encoding and decoding process of Autoencoder can be described as the following expressions: where Equation (2) denotes the encoding process, Equation (3) is the decoding process, and W 1 and b 1 represent the weight and bias of the encoder, respectively. Similarly, W 2 and b 2 are the weight and bias of the decoder, respectively. σ e represents the non-linear transformation in the encoding process, with the more commonly used ones being sigmoid, tanh, and Relu, and σ d denotes the nonlinear transformation or affine transformation in the decoding process. Consequently, the loss function of the Autoencoder is to minimize the error between y and x, which is expressed as follows: In Equation (4), W represents weight and b denotes bias. In addition, L is the loss function calculated by mean square error.
In Equation (4), W represents weight and b denotes bias. In addition, L is the loss function calculated by mean square error.

Proposed Network Architecture
Atmospheric turbulence distortion of an image is a complex physical process. It produces geometric distortion, space and time-variant defocus blur, and motion blur. According to the literature (Zhu and Milanfar) [32], the imaging process can be formulated as follows: where x denotes the ideal image, the " * " is namely convolution operation, and j M and j H represent the geometric deformation matrix and blurring matrix, respectively. j n denotes additive noise and j y is the j-th observed frame. When we ignore the impact of noise, the formula above can be simplified to: where j K denotes the turbulence-distorted operator. In that case, the proposed framework is shown in Figure 2. Specifically, it contains two parts: x F and K F . The function of x F is mainly to generate the latent clear image when inputting two-dimensional random noise ( x r ). x F is composed of a four-layer symmetrical encoder-decoder network, and each layer of encoding and decoding modules is linked by the skip connection, which can effectively reduce the problems of gradient disappearance and network degradation. Simultaneously, the function of K F is to generate a turbulence-distorted operator by inputting a real degraded image ( j y ), which is mainly a fully convolutional network (seen in Figure 3) that can restore the complicated 2D degraded kernel. Moreover, each layer of network parameters of x F are shown in Figure 4. In most cases, the turbulence-degraded image can be equivalent to the convolution generation of the turbulence-degraded operator and the potentially clear image [32,33]. Therefore, by substituting x and j K with x F and K F , the deconvolution can be written as follows:

Proposed Network Architecture
Atmospheric turbulence distortion of an image is a complex physical process. It produces geometric distortion, space and time-variant defocus blur, and motion blur. According to the literature (Zhu and Milanfar) [32], the imaging process can be formulated as follows: where x denotes the ideal image, the " * " is namely convolution operation, and M j and H j represent the geometric deformation matrix and blurring matrix, respectively. n j denotes additive noise and y j is the j-th observed frame. When we ignore the impact of noise, the formula above can be simplified to: where K j denotes the turbulence-distorted operator. In that case, the proposed framework is shown in Figure 2. Specifically, it contains two parts: F x and F K . The function of F x is mainly to generate the latent clear image when inputting two-dimensional random noise (r x ). F x is composed of a four-layer symmetrical encoder-decoder network, and each layer of encoding and decoding modules is linked by the skip connection, which can effectively reduce the problems of gradient disappearance and network degradation. Simultaneously, the function of F K is to generate a turbulence-distorted operator by inputting a real degraded image (y j ), which is mainly a fully convolutional network (seen in Figure 3) that can restore the complicated 2D degraded kernel. Moreover, each layer of network parameters of F x are shown in Figure 4. In most cases, the turbulence-degraded image can be equivalent to the convolution generation of the turbulence-degraded operator and the potentially clear image [32,33]. Therefore, by substituting x and K j with F x and F K , the deconvolution can be written as follows: where (·) i and (·) m represent the i-th and m-th elements, respectively. min ( (r ) F (y ), y ) min F (r ) (y ) , where ( Figure 4 shows the specific composition of the x F network. The network used the same encoder-decoder architecture. Therefore, the network composition for one layer of the encoder-decoder is listed, which is roughly similar to the composition of the DIP network [16]. The down-sampling is implemented using stride = 2, and the up-sampling is implemented by using 2× bilinear interpolation. min ( (r ) F (y ), y ) min F (r ) (y ) , where ( Figure 4 shows the specific composition of the x F network. The network used the same encoder-decoder architecture. Therefore, the network composition for one layer of the encoder-decoder is listed, which is roughly similar to the composition of the DIP network [16]. The down-sampling is implemented using stride = 2, and the up-sampling is implemented by using 2× bilinear interpolation.

Regularization by Denoising (RED)
In essence, the image restoration process is a classical ill-posed problem, especially for the unconstrained self-supervised learning solution used in this research. Therefore, we added a denoising regular term after the existing objective function to produce a better restored image. The denoising regularization term was first proposed by Romano et al. [34]. It uses existing denoisers to regularize the inverse problem and relies on the Alternating Direction Multiplier Method (ADMM) optimization technology. Meanwhile, it has been proven effective in image denoising, over-segmentation, deblurring, etc. The objective function after adding denoising regularization is:  Figure 4 shows the specific composition of the F x network. The network used the same encoder-decoder architecture. Therefore, the network composition for one layer of the encoder-decoder is listed, which is roughly similar to the composition of the DIP network [16]. The down-sampling is implemented using stride = 2, and the up-sampling is implemented by using 2× bilinear interpolation.

Regularization by Denoising (RED)
In essence, the image restoration process is a classical ill-posed problem, especially for the unconstrained self-supervised learning solution used in this research. Therefore, we added a denoising regular term after the existing objective function to produce a better restored image. The denoising regularization term was first proposed by Romano et al. [34]. It uses existing denoisers to regularize the inverse problem and relies on the Alternating Direction Multiplier Method (ADMM) optimization technology. Meanwhile, it has been proven effective in image denoising, over-segmentation, deblurring, etc. The objective function after adding denoising regularization is: Equation (9) is the objective function used in this study, where ρ(x) represents denoising regularization. Here, ADMM and the Augmented Lagrangian penalty term are introduced. The specific expression is shown in the following formula: where u is a set of Lagrangian multiplier vectors of equality constraints. The ADMM algorithm mainly updates the above three parameters: u, x, and F x , and when fixing x and u, F x is updated by solving the following expression: When fixing F x and u, x is updated by solving the following form: According to the Augmented Lagrangian (AL) method [35], the update of parameter u can be realized by following Equation (13).

Implementation Details
The designed network is implemented in the PyTorch1.5 deep learning framework [36] and is trained on the ubuntu 18.04 system using a single NVIDIA GTX 1080Ti GPU. As shown in Figure 2, the input of the F x network is a two-dimensional random noise, and is sampled from the uniform distribution. On the other hand, the input of the F K network is a single distorted image frame. Furthermore, the network does not also rely on external data for the pre-training and only uses the designed objective function for continuous iterative calculation. Since the model weight is randomly initialized, choosing a larger learning rate will cause the model to oscillate; the initial learning rate is set to 0.001. Meanwhile, the loss function is set at two-stage loss to ensure a superior output restoration effect. The first stage loss function is set to smooth L1 loss [37], and using the pixel-by-pixel comparison, the contour of the image to be restored can be quickly obtained. When the iteration reaches a certain number of times, the loss function does not decrease anymore, and the SSIM loss is used as the second stage loss function to restore the image detail information [38]. Our observation system ( Figure 5), listed in Table 1, mainly includes a Richey-Chretien telescope (RC12), a German equatorial telescope mount (CGX-L), an optical camera (ASI071MC Pro), and a high-performance computer (PC). Subsequently, the iterative process of the turbulence-distorted moon image taken by our system is shown in Figure 6a, and the output result of each stage in Figure 6b. tion effect. The first stage loss function is set to smooth L1 loss [37], and using the pixelby-pixel comparison, the contour of the image to be restored can be quickly obtained. When the iteration reaches a certain number of times, the loss function does not decrease anymore, and the SSIM loss is used as the second stage loss function to restore the image detail information [38]. Our observation system ( Figure 5), listed in Table 1, mainly includes a Richey-Chretien telescope (RC12), a German equatorial telescope mount (CGX-L), an optical camera (ASI071MC Pro), and a high-performance computer (PC). Subsequently, the iterative process of the turbulence-distorted moon image taken by our system is shown in Figure 6a, and the output result of each stage in Figure 6b.

PC System
Imaging Camera

Existing Restoration Methods
The proposed method is compared with the following state-of-the-art methods, including physics-based approaches (CLEAR [39], SGL [40], IBD [41]), and a learning-based

Existing Restoration Methods
The proposed method is compared with the following state-of-the-art methods, including physics-based approaches (CLEAR [39], SGL [40], IBD [41]), and a learning-based method (DNCNN [42]). Specifically, the CLEAR algorithm uses a region-level fusion algorithm based on a binary tree-based wavelet transform to solve the turbulence distorted problem. A Sobolev Gradient method was applied by the SGL algorithm to sharpen individual frames and mitigate the temporal distortions with the Laplace operator [43]. The IBD algorithm directly estimates the blur kernel of the degraded image and latent clear image according to prior conditions. DNCNN is based on supervised learning [42], which was proven to play an important role in restoring turbulence-distorted images by researchers from the University of Bristol [44]. The outputs of all compared methods are generated using the authors' codes, with the related parameters unchanged.
Obviously, DNCNN can be used to recover high-quality images when it is trained on a large synthetic turbulence dataset. Therefore, we use the power spectrum inversion method combined with sub-harmonics to simulate turbulence-degraded images of different intensities as the training set. The random phase screen ϕ(m, z) of atmospheric turbulence is obtained by using Fourier transform, and the equation can be expressed as follows [45]: where κ m and κ z are the spatial frequencies in the m and z directions, respectively. r(κ m , κ z ) is a complex Gaussian random number. Therefore, this power spectral density function φ ϕ (κ m , κ z ) can be written as follows: r 0 represents the atmospheric coherence length, which is a characteristic scale reflecting the intensity of the atmospheric turbulence. According to the parameter settings in Table 2, the random phase screens with different turbulence intensities are finally obtained.  As shown in Figure 7, the Point Spread Function (PSF), the phase screen, and the simulated turbulence-degraded images obtained with different intensities are denoted, respectively. There are 1800 images, including training data (1500) and validation data (300), and all image sizes are 256 × 256. We use the training set to pre-train the DNCNN network. The hyperparameter settings of the network adopt the default settings in the original paper. When the number of iterations reaches approximately 150 epochs, the loss function curve has stabilized and no longer continues to converge downwards; the DNCNN model reaches the minimum convergence standard at this time.
Remote Sens. 2023, 15, 4076 9 of 18 simulated turbulence-degraded images obtained with different intensities are denoted, respectively. There are 1800 images, including training data (1500) and validation data (300), and all image sizes are 256 × 256. We use the training set to pre-train the DNCNN network. The hyperparameter settings of the network adopt the default settings in the original paper. When the number of iterations reaches approximately 150 epochs, the loss function curve has stabilized and no longer continues to converge downwards; the DNCNN model reaches the minimum convergence standard at this time.

Experimental Datasets
It is necessary to obtain real degraded images under different conditions for experimental analysis to test the proposed networks' robustness and adaptability. Therefore, extensive experiments with the proposed algorithm and four comparison methods are conducted in Hirsch's dataset [46], the Open Turbulent Image Set (OTIS) [47], and the YouTube dataset, respectively. These datasets are introduced as follows: Hirsch's dataset: Hirsch's dataset was used for testing the Efficient Filter Flow (EFF) framework. The dataset was taken by using the Canon EOS 5D Mark Ⅱ camera, equipped with a 200 mm zoom lens. By capturing a static scene of the hot air discharged from the

Experimental Datasets
It is necessary to obtain real degraded images under different conditions for experimental analysis to test the proposed networks' robustness and adaptability. Therefore, extensive experiments with the proposed algorithm and four comparison methods are conducted in Hirsch's dataset [46], the Open Turbulent Image Set (OTIS) [47], and the YouTube dataset, respectively. These datasets are introduced as follows: Hirsch's dataset: Hirsch's dataset was used for testing the Efficient Filter Flow (EFF) framework. The dataset was taken by using the Canon EOS 5D Mark II camera, equipped with a 200 mm zoom lens. By capturing a static scene of the hot air discharged from the vents of the building, the image sequence is a video stream of 100 frames (the exposure time of each frame is 1/250 s) degraded due to spatial changes and blurring. The image sequences mainly include chimneys, buildings, water tanks, etc.
OTIS: OTIS was put forward by Jérôme Gilles et al. to make the comparison between algorithms. All image sequences are real turbulence-distorted images acquired in the hot summer. The dataset includes 4628 static sequences and 567 dynamic sequences. The turbulence impact is also divided into three levels: strong, medium, and weak. All sequences are captured with GoPro Hero 4 Black cameras, and the camera equipment is modified with a Ribcage Air chassis permitting to adapt to different lens types.
YouTube dataset: Since there is no publicly available astronomical object turbulence distorted image dataset, we obtained astronomical object videos from YouTube. These video frames include the moon's surface and Jupiter taken by foreign astronomy enthusiasts. This data captured comes from different devices, which will put more tests on the restoration effect for the proposed algorithm.

Evaluation Metrics
This study is aimed at the restoration of real turbulence distorted images. Therefore, no-reference metrics are used for objective evaluation. The selected no-reference metrics include: Entropy, Average Gradient, Natural Image Quality Evaluator (NIQE) [48], and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [49]. Simultaneously, the specific explanations are as follows:

Entropy:
Typically, it indicates the average amount of information contained in the image. In particular, the greater the information entropy of an image, the better the image quality. The definition of which is as follows: where p(x i ) is the probability that a random event x is x i .

Average Gradient:
Average Gradient puts more emphasis on the layering of the image and whether the image details are rich, which is expressed as follows: (17) where m and n represent the width and height of the image, respectively, and i and j denotes the position of the image pixel. Generally, the larger the Average Gradient value, the more the detailed the image, and the clearer the image.

Near-Ground Turbulence-Distorted Image Results
The real near-ground turbulence-distorted image contains two parts, which are randomly selected from Hirsch's dataset [46] and OTIS [47]. As shown in Figure 8, the first two rows of degraded images are from Hirsch's dataset, and the last two rows are from the OTIS.  Table 3 presents the average values of the results by several algorithms. Entropy estimates the complexity of the image texture; the Average Gradient refers to the obvious difference in grayscale near the border of the image or on both sides of the shadow line, and reflects the rate of density change in the multi-dimensional direction of the image to characterize the relative clarity; NIQE is based on the construction of a 'quality aware' collection of statistical features based on a simple and successful space domain Natural Scene Statistic (NSS) model, and these features are derived from a corpus of natural, undistorted images [48]; BRISQUE does not compute distortion-specific features, such as ringing, blur, or blocking, but instead uses scene statistics of locally normalized luminance coefficients to quantify possible losses of "naturalness" in the image due to the presence of distortions, thereby leading to a holistic measure of quality [49]. The specific content is shown in Figure 8 and Table 3. the construction of a 'quality aware' collection of statistical features based on a simple and successful space domain Natural Scene Statistic (NSS) model, and these features are derived from a corpus of natural, undistorted images [48]; BRISQUE does not compute distortion-specific features, such as ringing, blur, or blocking, but instead uses scene statistics of locally normalized luminance coefficients to quantify possible losses of "naturalness" in the image due to the presence of distortions, thereby leading to a holistic measure of quality [49]. The specific content is shown in Figure 8 and Table 3.  a Note: "↑" indicates that the bigger scores represent better perceptual quality of the images, "↓" indicates the opposite result; the blue indicates that the restoration effect is worse than inputting, and the black bold is our result.
For the above recovery results, it can seem that different degrees of artifacts will appear for all restoration results. Significantly, the output result of the IBD algorithm may appear as deformation (see the last row in Figure 8d), and the CLEAR and DNCNN For the above recovery results, it can seem that different degrees of artifacts will appear for all restoration results. Significantly, the output result of the IBD algorithm may appear as deformation (see the last row in Figure 8d), and the CLEAR and DNCNN algorithms will change the background of the testing images. Additionally, it can be clearly found that the various image quality indicators of the DNCNN algorithm are worse than the indicators of the inputting degraded image. The results have been marked blue, which illustrates that the DNCNN pre-trained with simulated data may aggravate inputting image degradation. Related reasons will be analyzed in subsequent ablation experiments. Regardless of Hirsch's dataset or OTIS, the proposed approach has shown an excellent restoration effect, especially using the BRISQUE indicator evaluation.

Turbulence-Degraded Astronomical Object Results
In this section, test images include the surface of the moon and Jupiter from YouTube. Moreover, we add a sunspot image taken by the National Astronomical Observatory in order to enrich the content of the test data. Accordingly, the abovementioned images restoration effects and no-reference metrics are shown in the following figure.
The comprehensive experimental results of different algorithms are shown in Figure 9 and Table 4. Specifically, the SGL algorithm has almost no improvement on the input degraded image, which may be because the default input of the SGL is video sequences. The restoration effect is ineffective for a single turbulence distorted image since it lacks more feature information. The IBD algorithm seems to have a good restoration effect from a subjective point of view. Still, the objective evaluation index did not produce such a result, which may be due to the IBD algorithm having enhanced the restoration image. The DNCNN algorithm changes the main characteristics of the image, which is especially obvious in the last row in Figure 9e. Instead, the proposed model fully combines the advantages of regularization and machine learning, so it achieves a relatively excellent restoration effect. 9 and Table 4. Specifically, the SGL algorithm has almost no improvement on the input degraded image, which may be because the default input of the SGL is video sequences. The restoration effect is ineffective for a single turbulence distorted image since it lacks more feature information. The IBD algorithm seems to have a good restoration effect from a subjective point of view. Still, the objective evaluation index did not produce such a result, which may be due to the IBD algorithm having enhanced the restoration image. The DNCNN algorithm changes the main characteristics of the image, which is especially obvious in the last row in Figure 9e. Instead, the proposed model fully combines the advantages of regularization and machine learning, so it achieves a relatively excellent restoration effect.

Motion-Blurred Image Results
As stated in the first half of the article, the approach proposed can output excellent results without pre-training. Therefore, the proposed model is still applicable for a single motion-blurred image. To verify the restoration effect on motion-blurred images, we randomly choose motion-blurred images from the GoPro dataset and the Internet to verify our idea [50]. The specific restoration effect and objective evaluation are displayed in Figure 10 and Table 5: As stated in the first half of the article, the approach proposed can output excellent results without pre-training. Therefore, the proposed model is still applicable for a single motion-blurred image. To verify the restoration effect on motion-blurred images, we randomly choose motion-blurred images from the GoPro dataset and the Internet to verify our idea [50]. The specific restoration effect and objective evaluation are displayed in Figure 10 and Table 5:   The above comprehensive experimental results show that: (1) in the car scene, the restoration results of the four comparative algorithms cannot effectively see the license plate information and the vehicle brand, and there are varying degrees of artifacts; (2) the output results of the CLEAR algorithm seem to increase the blur effect of the image, which may be because the physical model of the CLEAR algorithm does not match with the motion blur; (3) the supervised learning (DNCNN) has poor generalization ability for motion-blurred images (blue marker); and (4) from the comprehensive analysis of the visual effects of multiple indicators and restored images, the proposed approach is the best of all comparison methods.

Ablation Study
As previously mentioned, it is not difficult to see that the self-supervised model proposed has a better restoration effect on the real degraded images than the DNCNN network by analyzing the above experiments (Tables 3-5). The reason may be that the pre-trained simulated data is inconsistent with the data in the natural state. Therefore, ablation experiments are conducted to prove whether the analysis is true. There are 150 turbulence-degraded images randomly selected from the untrained simulation data, including 50 appearances with D/r0 = 5, D/r0 = 10, and D/r0 = 15, respectively. Examples of testing images are illustrated in Figure 11; there are the Jupiter image, the Nebula1 image, the Galaxy image, the Nebula2 image, and the Mars image: methods.

Ablation Study
As previously mentioned, it is not difficult to see that the self-supervised model proposed has a better restoration effect on the real degraded images than the DNCNN network by analyzing the above experiments (Tables 3-5). The reason may be that the pretrained simulated data is inconsistent with the data in the natural state. Therefore, ablation experiments are conducted to prove whether the analysis is true. There are 150 turbulencedegraded images randomly selected from the untrained simulation data, including 50 appearances with D/r0 = 5, D/r0 = 10, and D/r0 = 15, respectively. Examples of testing images are illustrated in Figure 11; there are the Jupiter image, the Nebula1 image, the Galaxy image, the Nebula2 image, and the Mars image: The simulated degraded images of different intensities are sent to DNCNN and the proposed network for testing. The test indicators are the Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM). The specific diagrams are as follows. Figure 12 shows that when the image input is consistent with the training dataset, the turbulence intensity is relatively weak (D/r0 = 5), or when the turbulence intensities are medium and strong (D/r0 = 10, D/r0 = 15), the DNCNN network achieves better results than the proposed algorithm. Therefore, it can be explained that when the data distribution of the pre-trained set and the test set are consistent, the output result of DNCNN is better. When the data distribution is inconsistent, the restoration results by DNCNN are shown in Tables 3-5. The output is often not as expected, and the network even aggravates the degradation of the inputting. Therefore, the ablation experiment demonstrates that the DNCNN architectures have trouble with generalization outside of the training data (as do most supervised neural networks). Instead, the proposed approach (ours) does not need to rely on external data to drive, and it may be more adaptable and robust to real degraded images. the pre-trained set and the test set are consistent, the output result of DNCNN is better. When the data distribution is inconsistent, the restoration results by DNCNN are shown in Tables 3-5. The output is often not as expected, and the network even aggravates the degradation of the inputting. Therefore, the ablation experiment demonstrates that the DNCNN architectures have trouble with generalization outside of the training data (as do most supervised neural networks). Instead, the proposed approach (ours) does not need to rely on external data to drive, and it may be more adaptable and robust to real degraded images.

Conclusions
A large sample size of training data is typically necessary for solving tasks using deep learning approaches, but unfortunately, there is limited turbulence-distorted data available. In this study, firstly, we presented a blind restoration method for a single turbulence distorted image based on self-supervised learning. The designed framework is mainly aimed at real turbulence-degraded images without labels. To our knowledge, this work is also the first to apply self-supervised learning to the task of alleviating the turbulence effect on images. Subsequently, we also curate a natural turbulent dataset from Hirsch's

Conclusions
A large sample size of training data is typically necessary for solving tasks using deep learning approaches, but unfortunately, there is limited turbulence-distorted data available. In this study, firstly, we presented a blind restoration method for a single turbulence distorted image based on self-supervised learning. The designed framework is mainly aimed at real turbulence-degraded images without labels. To our knowledge, this work is also the first to apply self-supervised learning to the task of alleviating the turbulence effect on images. Subsequently, we also curate a natural turbulent dataset from Hirsch's dataset, OTIS, and YouTube to show the generalist ability of the proposed model. Moreover, we conducted ablation experiments to further verify the differences between supervised learning and the proposed self-supervised learning method. Eventually, through the above work carried out, we can draw the following conclusions.
The proposed model can recover the information of the image itself well under different levels. Additionally, instead of using a single loss function, this approach proposed uses the first stage of smooth L1 loss, and the second stage SSIM loss function is employed so that the overall contour of the image can be restored firstly, and then the edges and details can be recovered. Meanwhile, the brightness of the image does not change during this processing. For the most critical part of the network, an effective self-supervised learning mechanism is designed to fully extract the turbulence-distorted features implicit in the image itself, which depends on repeated experiments and parameterized debugging. In particular, quantitative evaluation of four image quality indicators shows that the proposed method has a more superior performance than the competing methods, both in terms of sharpness and visual consistency. Furthermore, unlike previous methods, our approach neither uses any prior knowledge about atmospheric turbulence conditions nor requires the fusion of multiple images to get a single restored result, which has great engineering value in long-range video surveillance, defense systems, and drone imaging systems. However, the proposed algorithm still needs some improvements. For instance, there will be artifacts in some restored results (see the Figure 8(f2) (the f2 refers to the second image in column f) and Figure 9(f1) (the f1 means the first image in column f)), which will be optimized with the following debugging.
In the future, we also intend to remove turbulence on video and consider combining some atmospheric turbulence parameters (e.g., r 0 , C 2 n ) measured by our instruments to the neural network model for mitigating the turbulence effect. Furthermore, since the network needs multiple iterations in the process of outputting results, it is not suitable for astronomical observation and video surveillance in the real state. We hope the proposed model can be embedded in high-level vision tasks and considered a "real-time" task in the following work. Meanwhile, in the following remote sensing observations, the parameters of the proposed self-supervised learning network are optimized by acquiring a large number of real turbulence-degradation images.  Data Availability Statement: The data were prepared and analyzed in this study.