A New Blind Video Quality Metric for Assessing Different Turbulence Mitigation Algorithms Assessing Mitigation

: Although many algorithms have been proposed to mitigate air turbulence in optical videos, there do not seem to be consistent blind video quality assessment metrics that can reliably assess different approaches. Blind video quality assessment metrics are necessary because many videos containing air turbulence do not have ground truth. In this paper, a simple and intuitive blind video quality assessment metric is proposed. This metric can reliably and consistently assess various turbulent mitigation algorithms for optical videos. Experimental results using more than 10 videos in the literature show that the proposed metrics correlate well with human subjective evaluations. Compared with an existing blind video metric and two other blind image quality metrics, the proposed metrics performed consistently Abstract: Although many algorithms have been proposed to mitigate air turbulence in optical videos, there do not seem to be consistent blind video quality assessment metrics that can reliably assess different approaches. Blind video quality assessment metrics are necessary because many videos containing air turbulence do not have ground truth. In this paper, a simple and intuitive blind video quality assessment metric is proposed. This metric can reliably and consistently assess various turbulent mitigation algorithms for optical videos. Experimental results using more than 10 videos in the literature show that the proposed metrics correlate well with human subjective evaluations. Compared with an existing blind video metric and two other blind image quality metrics, the proposed metrics performed consistently better.


Introduction
Air turbulence can seriously distort image contents and consequently can negatively affect target detection and classification performance in video surveillance [1][2][3][4]. Figure 1 shows the impact of air turbulence on video quality. All the fine features of the tower are smeared. In the past, researchers have developed numerous algorithms to mitigate turbulence effects [5][6][7][8][9][10][11][12][13] among which there are simultaneous turbulence mitigation and super-resolution (SR) algorithms [8,9]. In recent years, there are also new SR algorithms using deep learning approaches [14,15]. Combining SR with some turbulence mitigation only algorithm is of interest to the community as well.

Introduction
Air turbulence can seriously distort image contents and consequently can negatively affect target detection and classification performance in video surveillance [1][2][3][4]. Figure 1 shows the impact of air turbulence on video quality. All the fine features of the tower are smeared. In the past, researchers have developed numerous algorithms to mitigate turbulence effects [5][6][7][8][9][10][11][12][13] among which there are simultaneous turbulence mitigation and super-resolution (SR) algorithms [8,9]. In recent years, there are also new SR algorithms using deep learning approaches [14,15]. Combining SR with some turbulence mitigation only algorithm is of interest to the community as well.
In the aforementioned turbulence mitigation studies, researchers used simulated and real turbulence videos for demonstrations. For real videos, it is difficult to assess which algorithm is performing better because of lack of ground truth. Hence, most of the time, subjective evaluations are used, which may not be consistent in the sense that some methods with close performance may be difficult to differentiate by humans. For simulated turbulence videos, objective metrics can be generated to compare different algorithms. In the aforementioned turbulence mitigation studies, researchers used simulated and real turbulence videos for demonstrations. For real videos, it is difficult to assess which algorithm is performing better because of lack of ground truth. Hence, most of the time, subjective evaluations are used, which may not be consistent in the sense that some methods with close performance may be difficult to differentiate by humans. For simulated turbulence videos, objective metrics can be generated to compare different algorithms.
In [16], a blind video quality assessment metric known as the Video Intrinsic Integrity and Distortion Evaluation Oracle (VIIDEO) was developed. However, it was tailored towards assessing compressed video quality. At this time, there does not seem to exist a consistent blind video quality assessment tool for evaluating different turbulence mitigation algorithms. In [11], a blind video quality assessment tool was mentioned. However, the code is not available to the public. In [17,18], blind video quality assessment methods were proposed for videos containing natural scenes without air turbulence. In [19], the authors compared three air turbulence mitigation algorithms. One idea uses a reference marker in the scene and this may not be practical because many videos with air turbulence are recorded in the wild.
In this paper, a new blind video quality metric for assessing different turbulence mitigation algorithms is proposed. First, given a video containing turbulence or a video that is already turbulence mitigated, one can apply existing blind still image quality assessment metrics such as Perception-based Image Quality Evaluator (PIQE) [20] and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [21] to assess the intraframe spatial quality. PIQE and BRISQUE are well-known tools that can meet this intraframe spatial quality assessment need. In this paper, the proposed metric will be utilizing BRISQUE as it was found to behave more consistently with visual inspection. Second, in order to capture the inter-frame fluctuations due to turbulence, an inter-frame root mean square error (IFRMSE) metric is proposed, which simply computes the RMSE between two neighboring frames and then take the average of all the RMSE of the frame pairs in the video. The intuition behind IFRMSE is that there are random fluctuations due to turbulence between the same pixels of different frames. If the IFRMSE is small, then turbulence effect should be small as well. Third, a hybrid metric that computes the geometric mean of the intra-frame scores and the inter-frame scores is proposed. As a result, both the intra-and inter-frame qualities have been taken into account. The proposed blind metric is in sharp contrast to the BRISQUE metric, which does not explicitly consider inter-frame fluctuations.
Here, several research questions are addressed. First, is the proposed hybrid blind video quality metric consistent with subjective evaluation results? Several turbulence mitigation algorithms in the literature were investigated, as well as the combination of those algorithms with a deep learning-based SR algorithm known as Zooming Slow Motion (ZSM) [22]. This question is important for assessing some algorithms that have very close turbulence mitigation performance. Second, can the use of additional alignment and registration techniques alongside the well-known method CLEAR [11] improve air turbulence mitigation? Third, is the existing blind video quality metric known as VIIDEO [16] suitable for assessing turbulence mitigation algorithms? Answering this will motivate new research in blind video quality assessment specifically for turbulence mitigation.
The contributions of this paper are as follows: A new blind video quality assessment metric specifically for assessing turbulence mitigation algorithms is proposed. The new metric combines both intra-frame and inter-frame qualities in videos. This metric is consistent with subjective evaluations, meaning that the new metric can help differentiate algorithms that are too close in visual inspection. Hence, the first question raised earlier is answered. The use of additional alignment and registration techniques are demonstrated to visually improve videos with air turbulence. This answers the second question raised earlier. This metric is compared with an existing metric and observed that the previous metric in [16] is not suitable for assessing turbulent mitigation algorithms. This answers the third question above.
The remainder of this paper is organized as follows. In Section 2, a few representative and recent turbulence mitigation methods are summarized. Relevant works in the field of blind quality assessment, air turbulence mitigation, and super resolution are described. In Section 3, the proposed blind video quality assessment metric and a workflow for performing air turbulence mitigation are explained in detail. Section 4 showcases the experimental Electronics 2021, 10, 2277 3 of 16 results of this workflow and metric on 12 videos. In Section 5, a few concluding remarks and future directions are mentioned.

Turbulence Mitigation Approaches
This section gives a glimpse of some representative papers in turbulent mitigation. It is not meant for an exhaustive literature survey. Moreover, since the focus is on assessing different turbulence mitigation algorithms and not on the advancement of those algorithms, some algorithms may not be included in the experiments.

Complex Wavelet Fusion for Atmospheric Turbulence
As shown in Figure 2, Complex waveLEt fusion for Atmospheric tuRbulence (CLEAR) consists of four key modules [11]: frame selection, image registration, image fusion, and image deblurring. There are some key parameters to be aware of in CLEAR. Frame selection is an essential parameter to turn on in order to stabilize frames where there might be significant motion. Once this is active, the results drastically improve as can be seen in Figure 3. The number of frames used is also an integral parameter to fine-tune. The number of frames used can improve the quality of the air turbulence removal. However, if there are too many frames with motion, there can be significant motion blurring. For the case in Figure 3, 10 frames are used. Another key aspect of CLEAR is the region of interest (ROI) selector. This allows users to specify a particular bounding box location of their subjective ROI. When used, this ROI selector significantly improves the registration portion of CLEAR, as can be seen below. This is especially useful when there is a particular object in the frame that is of interest to the user, as is the case with the vehicles in the SENSIAC dataset [23].
The remainder of this paper is organized as follows. In Section 2, a few representative and recent turbulence mitigation methods are summarized. Relevant works in the field of blind quality assessment, air turbulence mitigation, and super resolution are described. In Section 3, the proposed blind video quality assessment metric and a workflow for performing air turbulence mitigation are explained in detail. Section 4 showcases the experimental results of this workflow and metric on 12 videos. In Section 5, a few concluding remarks and future directions are mentioned.

Turbulence Mitigation Approaches
This section gives a glimpse of some representative papers in turbulent mitigation. It is not meant for an exhaustive literature survey. Moreover, since the focus is on assessing different turbulence mitigation algorithms and not on the advancement of those algorithms, some algorithms may not be included in the experiments.

Complex Wavelet Fusion for Atmospheric Turbulence
As shown in Figure 2, Complex waveLEt fusion for Atmospheric tuRbulence (CLEAR) consists of four key modules [11]: frame selection, image registration, image fusion, and image deblurring. There are some key parameters to be aware of in CLEAR. Frame selection is an essential parameter to turn on in order to stabilize frames where there might be significant motion. Once this is active, the results drastically improve as can be seen in Figure 3. The number of frames used is also an integral parameter to finetune. The number of frames used can improve the quality of the air turbulence removal. However, if there are too many frames with motion, there can be significant motion blurring. For the case in Figure 3, 10 frames are used. Another key aspect of CLEAR is the region of interest (ROI) selector. This allows users to specify a particular bounding box location of their subjective ROI. When used, this ROI selector significantly improves the registration portion of CLEAR, as can be seen below. This is especially useful when there is a particular object in the frame that is of interest to the user, as is the case with the vehicles in the SENSIAC dataset [23].  [11]. Frame selection generates lucky regions which are referring to image patches that are least affected by air turbulence. Details can be found in [11].   [11]. Frame selection generates lucky regions which are referring to image patches that are least affected by air turbulence. Details can be found in [11]. The remainder of this paper is organized as follows. In Section 2, a few representative and recent turbulence mitigation methods are summarized. Relevant works in the field of blind quality assessment, air turbulence mitigation, and super resolution are described. In Section 3, the proposed blind video quality assessment metric and a workflow for performing air turbulence mitigation are explained in detail. Section 4 showcases the experimental results of this workflow and metric on 12 videos. In Section 5, a few concluding remarks and future directions are mentioned.

Turbulence Mitigation Approaches
This section gives a glimpse of some representative papers in turbulent mitigation. It is not meant for an exhaustive literature survey. Moreover, since the focus is on assessing different turbulence mitigation algorithms and not on the advancement of those algorithms, some algorithms may not be included in the experiments.

Complex Wavelet Fusion for Atmospheric Turbulence
As shown in Figure 2, Complex waveLEt fusion for Atmospheric tuRbulence (CLEAR) consists of four key modules [11]: frame selection, image registration, image fusion, and image deblurring. There are some key parameters to be aware of in CLEAR. Frame selection is an essential parameter to turn on in order to stabilize frames where there might be significant motion. Once this is active, the results drastically improve as can be seen in Figure 3. The number of frames used is also an integral parameter to finetune. The number of frames used can improve the quality of the air turbulence removal. However, if there are too many frames with motion, there can be significant motion blurring. For the case in Figure 3, 10 frames are used. Another key aspect of CLEAR is the region of interest (ROI) selector. This allows users to specify a particular bounding box location of their subjective ROI. When used, this ROI selector significantly improves the registration portion of CLEAR, as can be seen below. This is especially useful when there is a particular object in the frame that is of interest to the user, as is the case with the vehicles in the SENSIAC dataset [23].  [11]. Frame selection generates lucky regions which are referring to image patches that are least affected by air turbulence. Details can be found in [11].

A Recent Turbulence Mitigation Approach Using Image Reconstruction
The approach proposed by Mao et al. [13] makes some decent improvements to previous air turbulence mitigation methods. First, they improve on previous methods by implementing a new way to generate reference frames, which is discussed in detail in Section 3.1. Second, they implement a geometric and sharpness metric to create the lucky frame, which is a frame with least distortion. Finally, they implement a novel blind deconvolution algorithm.

Video Super-Resolution (VSR)
In recent research [24], two video super-resolution algorithms were compared. The Zoom Slow-Motion (ZSM) Algorithm [22] for video super-resolution performed better than the Dynamic Upsampling Filter (DUF) approach [17]. As such, only ZSM is used within the experiments in this paper. The objective is to investigate whether or not VSR can help mitigate air turbulence due to the fact that VSR normally uses multiple frames together for resolution improvement. The use of multiple frames has some inherent deblurring effects to reduce air turbulence.
ZSM can be broken down into three key components: feature temporal interpolation network, a deformable convolutional long short-term memory (ConvLSTM) network, and a deep construction network [22]. The feature temporal interpolation network is used to interpolate missing temporal information between the input low resolution frames. Next, the deformable ConvLSTM is used to align and aggregate the temporal information together. Lastly, the deep construction network predicts and generates the super resolution upsampled video frames. This overall architecture is outlined in Figure 4.

A Recent Turbulence Mitigation Approach Using Image Reconstruction
The approach proposed by Mao et al. [13] makes some decent improvements to previous air turbulence mitigation methods. First, they improve on previous methods by implementing a new way to generate reference frames, which is discussed in detail in Section 3.1. Second, they implement a geometric and sharpness metric to create the lucky frame, which is a frame with least distortion. Finally, they implement a novel blind deconvolution algorithm.

Video Super-Resolution (VSR)
In recent research [24], two video super-resolution algorithms were compared. The Zoom Slow-Motion (ZSM) Algorithm [22] for video super-resolution performed better than the Dynamic Upsampling Filter (DUF) approach [17]. As such, only ZSM is used within the experiments in this paper. The objective is to investigate whether or not VSR can help mitigate air turbulence due to the fact that VSR normally uses multiple frames together for resolution improvement. The use of multiple frames has some inherent deblurring effects to reduce air turbulence.
ZSM can be broken down into three key components: feature temporal interpolation network, a deformable convolutional long short-term memory (ConvLSTM) network, and a deep construction network [22]. The feature temporal interpolation network is used to interpolate missing temporal information between the input low resolution frames. Next, the deformable ConvLSTM is used to align and aggregate the temporal information together. Lastly, the deep construction network predicts and generates the super resolution upsampled video frames. This overall architecture is outlined in Figure 4.

Blind Metrics to Assess Turbulence Mitigation Algorithms
PIQE is a blind spatial quality assessment metric [20]. The input image is first normalized and split into blocks. Each block is then fed into a distortion estimator. These block scores are then pooled together and a final quality score is outputted. The lower the score, the better quality the image is. MATLAB has a built-in function for computing PIQE [25].

BRISQUE
BRISQUE is a no-reference image quality assessment metric [21]. It first extracts natural scene statistics and then calculates several feature vectors. Using those features and statistics in a Support Vector Machine (SVM), it then predicts an image quality score. The lower the score the better quality the image is. No custom fitting was done for this model. There is a MATLAB function for computing BRISQUE [26].

Blind Metrics to Assess Turbulence Mitigation Algorithms
PIQE is a blind spatial quality assessment metric [20]. The input image is first normalized and split into blocks. Each block is then fed into a distortion estimator. These block scores are then pooled together and a final quality score is outputted. The lower the score, the better quality the image is. MATLAB has a built-in function for computing PIQE [25].

BRISQUE
BRISQUE is a no-reference image quality assessment metric [21]. It first extracts natural scene statistics and then calculates several feature vectors. Using those features and statistics in a Support Vector Machine (SVM), it then predicts an image quality score. The lower the score the better quality the image is. No custom fitting was done for this model. There is a MATLAB function for computing BRISQUE [26].

Using Reference Frame Only for Turbulence Mitigation
One way to potentially mitigate air turbulence is by using a new approach to generate a reference frame, which is important for image alignment. The authors of [13] outline this approach in the equation below. For each given patch in a frame, a search is performed on a set number of subsequent frames and previous frames for that same patch in a search window slightly larger than the patch size. There is an assumption that there is no Electronics 2021, 10, 2277 5 of 16 significant motion of objects between consecutive frames. Based on the Euclidian distance of the patch in the subsequent and previous frames, the best possible match for the patch at the current frame is found. Afterwards, a weighted average of the patches is used to generate a new reference frameŷ re f (r, t).
where r denotes the given patch center location, Ω t is a search window surrounding the current patch, ∆t denotes the shift in terms of pixels between the current patch and a patch in the search window, and ω r,t represents the weighting of a given patch and is the inverse of Euclidean distance between the current patch and a given patch within the search window. Figure 5 illustrates how to generate a sequence of reference (Ref) video. For a group of N frames, Equation (1) is applied to generate one reference frame. The window of frames shifts to the right by one and then another reference frame is generated. This process repeats for the whole video.

Using Reference Frame Only for Turbulence Mitigation
One way to potentially mitigate air turbulence is by using a new approach to generate a reference frame, which is important for image alignment. The authors of [13] outline this approach in the equation below. For each given patch in a frame, a search is performed on a set number of subsequent frames and previous frames for that same patch in a search window slightly larger than the patch size. There is an assumption that there is no significant motion of objects between consecutive frames. Based on the Euclidian distance of the patch in the subsequent and previous frames, the best possible match for the patch at the current frame is found. Afterwards, a weighted average of the patches is used to generate a new reference frame ( , ).
where r denotes the given patch center location, Ω is a search window surrounding the current patch, denotes the shift in terms of pixels between the current patch and a patch in the search window, and , represents the weighting of a given patch and is the inverse of Euclidean distance between the current patch and a given patch within the search window. Figure 5 illustrates how to generate a sequence of reference (Ref) video. For a group of N frames, Equation (1) is applied to generate one reference frame. The window of frames shifts to the right by one and then another reference frame is generated. This process repeats for the whole video. Previous approaches, such as averaging the frames [8][9][10], do not work as well as this approach. If there is air turbulence or even motion in the frames, simply averaging the frames will only smear the moving objects. This new approach to generating reference frames creates a much smooth frame that maintains the edges and shapes of moving objects. Figure 6 shows a comparison of the two different approaches and the advantages offered by the patch-based reference generation Previous approaches, such as averaging the frames [8][9][10], do not work as well as this approach. If there is air turbulence or even motion in the frames, simply averaging the frames will only smear the moving objects. This new approach to generating reference frames creates a much smooth frame that maintains the edges and shapes of moving objects. Figure 6 shows a comparison of the two different approaches and the advantages offered by the patch-based reference generation.

Using Reference Frame Only for Turbulence Mitigation
One way to potentially mitigate air turbulence is by using a new approach to generate a reference frame, which is important for image alignment. The authors of [13] outline this approach in the equation below. For each given patch in a frame, a search is performed on a set number of subsequent frames and previous frames for that same patch in a search window slightly larger than the patch size. There is an assumption that there is no significant motion of objects between consecutive frames. Based on the Euclidian distance of the patch in the subsequent and previous frames, the best possible match for the patch at the current frame is found. Afterwards, a weighted average of the patches is used to generate a new reference frame ( , ).
where r denotes the given patch center location, Ω is a search window surrounding the current patch, denotes the shift in terms of pixels between the current patch and a patch in the search window, and , represents the weighting of a given patch and is the inverse of Euclidean distance between the current patch and a given patch within the search window. Figure 5 illustrates how to generate a sequence of reference (Ref) video. For a group of N frames, Equation (1) is applied to generate one reference frame. The window of frames shifts to the right by one and then another reference frame is generated. This process repeats for the whole video. Previous approaches, such as averaging the frames [8][9][10], do not work as well as this approach. If there is air turbulence or even motion in the frames, simply averaging the frames will only smear the moving objects. This new approach to generating reference frames creates a much smooth frame that maintains the edges and shapes of moving objects. Figure 6 shows a comparison of the two different approaches and the advantages offered by the patch-based reference generation

Proposed Workflow for Air Turbulence Mitigation
Instead of independently using reference frame alignment, CLEAR, or ZSM, these can be used in conjunction with one another to further improve image quality of air turbulent videos. The following order of algorithms are proposed: reference frame alignment, CLEAR, ZSM. As shown above in Section 3.1, the reference frame alignment can provide Electronics 2021, 10, 2277 6 of 16 crisper frames. In theory, this will make objects sharper even before the air turbulence mitigation. The final step would be to apply ZSM to the output of CLEAR. Although ZSM is a super resolution method, it also performs image fusion. It does so by aligning neighboring sequences of frames. These alignments further improve the quality of the frames. In essence, this proposed workflow utilizes a variety of registration and alignment methods to perform more holistic air turbulence mitigation.

Proposed Simple Inter-Frame RMSE (IFRMSE) Metric for Inter-Frame Quality Assessment
Simply using a blind quality assessment metric on a still frame does not give a holistic representation of the performance of different turbulence mitigation algorithms in a video with turbulence. The issue with air turbulence is not the noise in the individual frame but the noise that randomly changes from frame to frame due to air turbulence. In order to better assess air turbulence mitigation, a metric needs to measure the consistency of pixels across frames.
One simple way to measure the consistency is to take the RMSE at each pixel location between frame pairs. Using an inter-frame RMSE to measure the effect of air turbulence across frames is proposed. For every frame pair, the RMSE between the current frame, F i , and the subsequent frame, F i+1 is taken. That is, the inter-frame RMSE for frame pair i is defined as where F i,j denotes the jth pixel in frame F i and N is the total number of pixels in a frame.
To assess the video quality, it is necessary to compute IFRMSE values from multiple frame pairs in order to reduce the statistical variations. For a video sequence, v, of n frames, there will be n−1 frame pairs. The following equation calculates the mean of the IFRMSEs as A few cautionary notes are needed. First, if the camera moves, then an image registration step is needed between the two frames in a frame pair. The aligned images can then be used for computing the IMRMSE. Second, if there are moving objects in the videos, the number and size of the moving objects may limit the effectiveness of the metric. If the number of pixels related to the moving objects is a small percentage (<5%) of the total number of pixels in a frame, then it should be alright to proceed with the calculation of IFRMSE. However, if the number of moving pixels due to moving objects is too large (10% or more), then one must apply optical flow techniques to determine those large moving objects and then exclude them in the computation of IFRMSE.

Proposed Hybrid Blind Video Quality Assessment Metric
One shortcoming of the IFRMSE metric is that it does not measure the actual quality of individual frames. For example, in a case where the image quality in a video was poor and the frames were consistently poor across the sequence, the IFRMSE metric would be quite low (indicating a high-quality video) even though there is no air turbulence. Such a case can happen to a highly compressed video in which all frames have poor quality due to compression. To overcome this shortcoming of IFRMSE, one can combine the score with one of the blind quality assessment scores. More specifically, take the geometric mean of the IFRMSE metric over many frame pairs and the intra-frame metric BRISQUE scores over many frames to generate a hybrid score. That is, the hybrid metric is given by Experiments were conducted with PIQE, but could not get consistent results. Hence, only Equation (4) is used in the experiments. PIQE could be used as a replacement for BRISQUE for certain datasets if needed.
The geometric mean has a few advantages over the arithmetic mean. First, in dealing with metrics from different domains, it will be a good practice to use geometric mean. This will be fair to each of the contributing metrics in the product. That is, a given percentage change in any of the two metrics has the same effect on the product. For example, a 10% change in intra-frame metric from 0.1 to 0.11 has the same effect on the overall geometric mean as a 10% change in inter-frame metric from 0.5 to 0.55. Second, the geometric mean can better handle large dynamic ranges of two metrics. For instance, if the intra-frame metric is 0.01 and inter-frame is 0.9, the geometric of the two will be 0.3, but the arithmetic mean will give 0.455. As a result, the arithmetic mean will favor the metric that has large values.
The hybrid metric above should be more indicative of air turbulence mitigation across videos. Additionally, the metric can be used on datasets without having a ground truth video. This makes it more flexible in real-world scenarios where the ground truth video may not be available or possible to attain.
One might think that the proposed metric in Equation (4) is too simple and lacks novelty. Indeed, the proposed blind metric is simple and intuitive. However, there were no similar approaches in the literature. Moreover, scientific discoveries are usually incremental in nature. In this sense, the proposed metric is contributing new knowledge to the literature.
It is worth mentioning some differences between VIIDEO and the proposed metric. First, the VIIDEO metric is a patch-based method. It computes the local contrast of each patch across multiple frames. The metric is simpler in that one only needs to compute the RMSE between neighboring frames without using patches. Second, VIIDEO does not have a spatial only metric for assessing the image quality in each frame whereas the proposed approach has an explicit spatial quality component.

Experimental Results
Here, an overview of the 12 video datasets containing air turbulence as well as assessment results are presented. Eleven of them are well-known in the literature.

Dataset and Workflow Overview
A combination of simulated data and real video datasets were used to validate both the blind quality assessment metric as well as the proposed air turbulence mitigation workflow. Table 1 and Figure 7 summarize those videos. Barcode #1-#3 have different levels of turbulence. If a video is not wild it is a simulated video. Moving refers to whether the object or camera is moving in the video. This is an important distinction because the motion of the object or camera can interfere with blind quality measurements.
For each video, the following workflows are compared: Raw: This is the raw video with turbulence. Ref: This is the video generated by using the method described in Section 3. In the following tables showcasing the experimental results, the IFRMSE and BRISQUE scores are the average score across the frames used in a particular video. This is done to give a more holistic view of how a particular method performs over the course of a video.  Videos comparing the various methods can be found in the following link: https: //rb.gy/wdwxh7 (accessed on 16 September 2021). Readers are encouraged to watch those videos and check the visual performance of different methods.

Hybrid Blind Quality Assessment Results
As one can observe from Table 2 that the results for the rankings of hybrid score of BRISQUE and the proposed IFRMSE are very consistent. Raw and Ref methods are always ranked 5-6, CLEAR based methods are ranked 3-4, and ZSM methods are always ranked 1-2. These rankings are strongly aligned with visual inspection of various outputs from each method. Using the reference alignment produces a slight improvement in IFRMSE over the Raw in all videos except for 'Barcode #3 . Although the difference in IFRMSE is not very significant between the Raw and Ref, the effects are amplified when used in conjunction with CLEAR and ZSM. CLEAR and ZSM have drastrically lower IFRMSE when using Ref based frames than just the Raw for most videos. BRISQUE scores across the Raw and Ref methods are fairly consistent, with only the 'Chimney' sequence as an exception. This makes sense as BRISQUE does not take into account inter-frame variability.

Effects of CLEAR and ZSM
If CLEAR is respectively applied to the output of the Raw and Ref, there is a significant improvement in the IFRMSE and therefore the H IB as well. When CLEAR removes the air turbulence and the frame contents become stable, there is significantly less motion between frames. The scores for any method utilizing CLEAR are significantly lower than that of its respective counterpart. This means the IFRMSE is in agreement with these visual observations in all the video datasets. CLEAR has mostly positive effects BRISQUE scores of the Raw and Ref based methods. For example, in all three 'Barcode' sequences there is ã 25% improvement in BRISQUE scores from the Raw and Ref to the Raw + CLEAR and Ref + CLEAR. In the Watertower and Base videos, the effects are more subtle.
When ZSM is applied to the output of the CLEAR based methods, there is another significant improvement in IFRMSE scores. Although the ZSM visual effects are subtle, the IFRMSE does indicate a significant improvement. These improvements can be attributed to the ConvLSTM module in the ZSM which temporally aligns sequences of frames. This alignment further reduces pixel fluctuations between frames. There is no consistent trend for the BRISQUE scores of ZSM based methods perhaps due to the fact that BRISQUE is not sensitive enough to notice the minor improvements from the ZSM.

Chimney and Building Videos
Here, there are detailed analysis of the Chimney and Building videos. Readers are encouraged to visualize these videos in the link: https://rb.gy/wdwxh7 (accessed on 16 September 2021). Snapshots of one frame from every video are shown in Figures 8 and 9 for the Chimney and Building videos, respectively.
From Tables 3 and 4, Figures 8 and 9, following observations can be made:  Tables 3 and 4. It is difficult to visually differentiate between the ZSM results with their non-ZSM counterparts. That is why it is important to have an objective metric to be able to better distinguish the subtle differences in the images. In the included cases, the Hybrid IFRMSE and BRISQUE metric seemed to better assess the quality of the videos. It can be observed in the videos that CLEAR greatly reduces the 'wobbling' of the scene suffering from significant air turbulence.
Although it is difficult to determine the extent of improvement from a still image, the images do provide a general indication of the level of improvement. Figures 8 and 9 below show the various methods affect the quality of individual frames.

Comparison with VIIDEO
The VIIDEO metric was tested on the three videos. Initially, the 'Building' sequence was used to test various parameters for the metric. In particular, experiments were conducted to determine the optimal block size. From the initial study, it was found that a small block size, like 4, more accurately rates the videos. As can be seen in Table 5 and Figure 10, VIIDEO with a smaller block size more appropriately ranks the methods. The Raw is rated as the worst and the Raw + CLEAR is the best. These are more closely aligned with visual inspection than the rankings from the other block sizes.

Comparison with VIIDEO
The VIIDEO metric was tested on the three videos. Initially, the 'Building' sequence was used to test various parameters for the metric. In particular, experiments were conducted to determine the optimal block size. From the initial study, it was found that a small block size, like 4, more accurately rates the videos. As can be seen in Table 5 and Figure 10, VIIDEO with a smaller block size more appropriately ranks the methods. The Raw is rated as the worst and the Raw + CLEAR is the best. These are more closely aligned with visual inspection than the rankings from the other block sizes.   Table 5.
A block size of 4 was used for VIIDEO in all three videos. The VIIDEO metrics are shown in Table 6 and Figure 11. There is inconsistency across the three video sequences in terms of which method performs the best. For instance, VIIDEO ranked the raw turbulence video as the second in the Chimney case, which is clearly inconsistent with subjective evaluation. Another inconsistent instance is that the Ref + CLEAR + ZSM videos were ranked 4, 6, and 4 in the Building, Chimney, and SENSIAC videos, respectively. This inconsistency indicates that VIIDEO metrics are still not robust at handling various types of distortions in videos. For example, authors in [16] found that the VIIDEO metric worked well in distinguishing between compression artifacts but did not mention their performance for other types of distortions that may arise, such as air turbulence. From the pre-   Table 5.
A block size of 4 was used for VIIDEO in all three videos. The VIIDEO metrics are shown in Table 6 and Figure 11. There is inconsistency across the three video sequences in terms of which method performs the best. For instance, VIIDEO ranked the raw turbulence video as the second in the Chimney case, which is clearly inconsistent with subjective evaluation. Another inconsistent instance is that the Ref + CLEAR + ZSM videos were ranked 4, 6, and 4 in the Building, Chimney, and SENSIAC videos, respectively. This inconsistency indicates that VIIDEO metrics are still not robust at handling various types of distortions in videos. For example, authors in [16] found that the VIIDEO metric worked well in distinguishing between compression artifacts but did not mention their performance for other types of distortions that may arise, such as air turbulence. From the preliminary studies, one can see that the VIIDEO performance metric was inconsistent for air turbulence.  Figure 11. Bar charts depicting the contents in Table 6. Chimney SENSIAC Figure 11. Bar charts depicting the contents in Table 6.

Conclusions
A new blind video quality metric was proposed to assess air turbulence mitigation performance in video sequences. These results were in agreement with visual inspection. Twelve commonly used videos were used to perform this validation. Experimental results showed that the metrics are consistent with subjective evaluation of videos. The experiments demonstrated the effectiveness of using CLEAR in combination with reference frame alignment and ZSM to mitigate air turbulence.
One anonymous reviewer pointed out that turbulence may be treated as shake and blurry effects and hence some denoising algorithms such as [27,28] can be used. This could be a reasonable future research topic. Another future direction is to investigate how one can adapt some blind assessment metrics, such as those in [16,29,30], to turbulence mitigated videos. Finally, there could be further research into the effects of motion within videos and how they may affect blind quality assessment metrics like IFRMSE. Some new developments in video quality assessment [31][32][33] in non-air-turbulence mitigation areas could potentially be adapted to blind image quality assessment in air turbulence videos.