No-Reference Objective Video Quality Measure for Frame Freezing Degradation

In this paper we present a novel no-reference video quality measure, NR-FFM (no-reference frame–freezing measure), designed to estimate quality degradations caused by frame freezing of streamed video. The performance of the measure was evaluated using 40 degraded video sequences from the laboratory for image and video engineering (LIVE) mobile database. Proposed quality measure can be used in different scenarios such as mobile video transmission by itself or in combination with other quality measures. These two types of applications were presented and studied together with considerations on relevant normalization issues. The results showed promising correlation values between the user assigned quality and the estimated quality scores.


Introduction
Quality of media content can be evaluated using different types of measures. The most reliable way of doing this is by conducting visual experiments under controlled conditions, in which human observers grade the quality of the multimedia contents under evaluation [1]. Unfortunately, such experiments are time-consuming and costly, making the search for alternative quality estimation methods an important research topic. A much simpler approach is to use some computable objective measure that equates quality degradation with the (numerical) error between the original and the distorted media [2,3]. Every objective quality measure has as its aim approximating the human quality perception (or human visual system, HVS) as closely as possible, meaning that good correlation with subjective measures (mean opinion score, MOS) is sought. Objective quality measures for image and video can be generally divided into three categories according to the reference information they use, as follows: • full-reference (FR) quality measures; • reduced-reference (RR) quality measures; • no-reference (NR) quality measures.
FR quality measures require the original undistorted or unprocessed signal. NR quality measures require only the processed/degraded signal. RR quality measures need information derived from the original signal [3].
In this paper, we present no-reference video quality measure for frame freezing degradations called NR-FFM (no-reference frame-freezing measure). This type of degradation often occurs during video transmission in different types of TV broadcasting (internet, mobile), with low SNR (signal to noise ratio) margin. It will be shown that the proposed measure achieves high correlation with MOS and that it can be used in cases where only this type of degradation is expected or in combination with other degradation types, so that NR-FFM is combined with some other FR or RR objective measure.
This paper is organized as follows: Section 2 describes related work, Section 3 describes the test of the laboratory for image and video engineering (LIVE) mobile database, Section 4 describes the development of the proposed NR-FFM measure, Section 5 presents the experimental results, Section 6 provides the discussion, and Section 7 draws the conclusions.

Related Work
A survey that classifies and compares current objective video quality assessment algorithms can be found in [4]. Generally, any image quality measure can be extended to video sequence analysis by computing the quality difference of each image of the distorted video to its corresponding image in the original video (or by computing the quality of each image of the distorted video alone, for NR measures). These measures include the well-known PSNR (peak signal-to-noise ratio) [5], SSIM (structural similarity index) [6], VSNR (visual signal to noise ratio) [7], and reduced reference image quality assessment meter (IQAM) [8], among others. Because these image-based measures are not able to capture temporal distortions, different full reference or reduced reference video quality measures for simultaneous capturing of spatial and temporal distortions have been developed. These measures include MOVIE (motion-based video integrity evaluation) [9], VQM (video quality measure) [10], STRRED (spatio-temporal reduced reference entropic differences) [11], and RVQM (reduced video quality measure) [12], among others. VQM accounting for variable frame delay distortions (VQM-VFD) have been also proposed in [13].
Because frame-freezes produce video sequences that may have a total play time longer than reference video sequence (e.g., in the case of stored video delivery), it is unclear how to compare such video sequences by using full or reduced reference quality measures. Also, by matching segments of the degraded sequence to corresponding segments of the reference, usual quality measures would always be perfect (because the sequences tested, from the LIVE mobile database, have no other degradations besides frame freezing).
The paper [14] presents a subjective test that estimates the quality of video subject to frame freeze and frame skip distortions, showing that they impact the video quality differently. The paper [15] describes a no-reference temporal quality measure to model the effect of frame freezing impairments on perceived video quality; however, their subjective tests results are not available, preventing comparison with our proposed measure NR-FFM. In the paper [16], a new objective video quality measure based on spatiotemporal distortion is described. However, this measure also takes into account other degradations besides frame freezes and so the quality estimate reflects the combination of these effects. For sequences with frame freezing degradation only, this measure would calculate the same objective grade (even with different AFR (affected frame rate) for frame freezing degradations). The paper [17] tested video quality model (VQM) that accounts for the perceptual impact of variable frame delays (VFD) in videos. VQM-VFD demonstrated top performance on the laboratory for image and video engineering (LIVE) mobile video quality assessment (VQA) database [18].
There are also other algorithms for the detection of dropped video frames. In one of the state-of-the-art algorithms [19], the author proposed a jerkiness measure that is also a NR method. Jerkiness is a perceptual degradation that includes both jitter (repeating of frames and dropping frames) and freezing (exceedingly long display of frames). The jerkiness measure takes two variables into consideration: the display time and the motion intensity. It is a sum of the product of relative display time and a monotone function of display time, as well as a monotone function of motion intensity over all frame times. This measure is also applicable to a variety of video resolutions. In [20], Sensors 2019, 19, 4655 3 of 20 the motion energy of videos is used in order to detect dropped video frames. The algorithm calculates temporal perceptual information (TI) and compares it to a dynamic threshold in order to detect long and short duration of frame repetition. However, the problem with this NR algorithm is the periods of low motion causing false-positive results, which is the reason why the RR version is better to apply. The authors in [21] use algorithms from [20] and [15] for comparison and in terms per frame computation times, their algorithm outperforms the mentioned NR methods. This algorithm uses two separate thresholds for videos with high and low motion content. The NR algorithm in [22], aside from [20] and [15], uses the algorithm in [19] for comparison. However, the comparison with [15] did not show good results in cases of freeze duration less than 0.5 s, and thus the focus was put on [20] and [19]. This algorithm [22] showed better performance in terms of estimating the impact of multiple frame freezing impairments and is closer to subjective test results. The new NR method in [23] that measures the impact of frame freezing due to packet loss and delay in video streaming network is based on [21] with a few modifications. This method was also tested on three more databases and detailed comparison is given for the algorithm in [20]. The study in [24] showed that the algorithm in [20] has better performance. A recent work from [25] is a novel temporal jerkiness quality metric that uses the algorithm from [20] and neural network for mapping between features (number of freezes, freeze duration statistics, inter-freeze distance statistics, frame difference before and after the freeze, normal frame difference, and the ratio of them) and subjective test scores. In recent papers [26,27], algorithms were tested on UHD (ultra high definition) videos. In [26], the histogram-based freezing artifacts detection algorithm (HBFDA) is proposed. HBFDA methodology includes comparison of consecutive video frames which consist of splitting a frame into regions and then comparing the regions' consecutive frame histograms. The HBFDA algorithm adapts its parameters in real-time and can process 75 frames per second. The authors of [27] propose a new real-time no-reference freezing detection algorithm, called the RTFDA. A detailed comparison of results is given for the algorithm in [20]. RTFDA has a high detection rate of video freezing artefacts (in videos of different content type, resolution, and compression norm) with low false-positive rate in comparison to other commonly used freezing detection algorithms. It achieves real-time performance on 4K UHD videos by processing 216 frames per second.
In this study, we developed a no-reference measure for frame-freezing degradations (NR-FFM) on the basis of AFR and spatial information of the video content. NR-FFM can be used to quantify frame freezing degradations alone, or in combination with other measures to calculate quality in the presence of different types of degradations. LIVE mobile database [18] was used for NR-FFM evaluation. Additionally, the VQEG (Video Quality Experts Group) HD5 dataset [28] was also used to further test NR-FFM measures using a new set of video sequences.
Compared to the existing measures, contributions in this paper are: • NR-FFM measure based on overall frame duration, number of frame freezings, and spatial information of the tested video sequence; • NR-FFM obtains best correlation among other tested frame freezing objective measures in LIVE mobile dataset and similar correlation in VQEG HD5 dataset as other tested measures; • NR-FFM can be combined with other objective measures designed for other degradation types, tested on LIVE mobile dataset with five degradation types, and combined with RVQM and STRRED as reduced reference measures.

LIVE Mobile Dataset
To develop and test the proposed measure NR-FFM, we used sequences from the LIVE Mobile Video Quality Database [18] (later called LIVE mobile) using data from the mobile study and from the tablet study. The 10 video files in mobile study (out of which the first 5 were used in tablet study) are stored in planar YUV 4:2:0 format, with a spatial resolution of 1280 × 720 pixels and 30 fps and The first frame from each video sequence is shown in Figure 1. The first frame from each video sequence is shown in Figure 1. In [18], the authors compared different objective measures using four degradation types from the dataset, omitting frame freezing degradation. In our study, we developed a no-reference objective measure for frame-freezing degradations, which was tested using 10 degraded sequences and 4 degradation levels, obtained from LIVE mobile database [18] (mobile subjective testing)-the first 3 degradations simulated the transmission of stored video material, which meant that after a freeze, playing re-started from the next frame in time, whereas the fourth degradation simulated live streaming, for example, after a frame freeze, all skipped frames were lost. Affected frame rate (AFR) was the same for the first three degradations, for example, the first degradation had 8 × 30 freezing frames, the second degradation had 4 × 60, and the third had 2 × 120 freezing frames; AFR was 240/690 = 34.78%. The fourth degradation had 1 freezing episode in the middle of the sequence lasting 120 frames, and 1 frame freeze at the end of the sequence lasting 60 frames; AFR was 180/450 = 40%. The first 3 s of all degraded video sequences were without freezing and duration of non-freezing video parts, set to be 1.5× the duration of the freezing parts. Figure 2 shows a graph with the DMOS (differential mean opinion score) results for these sequences. All 40 degraded videos had only frame freezing degradations, with no other degradation types included in them.  In [18], the authors compared different objective measures using four degradation types from the dataset, omitting frame freezing degradation. In our study, we developed a no-reference objective measure for frame-freezing degradations, which was tested using 10 degraded sequences and 4 degradation levels, obtained from LIVE mobile database [18] (mobile subjective testing)-the first 3 degradations simulated the transmission of stored video material, which meant that after a freeze, playing re-started from the next frame in time, whereas the fourth degradation simulated live streaming, for example, after a frame freeze, all skipped frames were lost. Affected frame rate (AFR) was the same for the first three degradations, for example, the first degradation had 8 × 30 freezing frames, the second degradation had 4 × 60, and the third had 2 × 120 freezing frames; AFR was 240/690 = 34.78%. The fourth degradation had 1 freezing episode in the middle of the sequence lasting 120 frames, and 1 frame freeze at the end of the sequence lasting 60 frames; AFR was 180/450 = 40%. The first 3 s of all degraded video sequences were without freezing and duration of non-freezing video parts, set to be 1.5× the duration of the freezing parts. Figure 2 shows a graph with the DMOS (differential mean opinion score) results for these sequences. All 40 degraded videos had only frame freezing degradations, with no other degradation types included in them. frames, the second degradation had 4 × 60, and the third had 2 × 120 freezing frames; AFR was 240/690 = 34.78%. The fourth degradation had 1 freezing episode in the middle of the sequence lasting 120 frames, and 1 frame freeze at the end of the sequence lasting 60 frames; AFR was 180/450 = 40%. The first 3 s of all degraded video sequences were without freezing and duration of non-freezing video parts, set to be 1.5× the duration of the freezing parts. Figure 2 shows a graph with the DMOS (differential mean opinion score) results for these sequences. All 40 degraded videos had only frame freezing degradations, with no other degradation types included in them.  Several conclusions can be drawn from the graph (similar conclusions can be found in [14] where authors used a different database): • DMOS score was lower (better) for longer frame freezing in the case of the stored streaming scenario (degradation types 1-3) with the same affected frame rate; • DMOS score was higher (worse) in the case of live streaming scenario (1 × 120 + 1 × 60 frame freeze) where frames were 'lost', compared with the case of stored video and 2 × 120 frame freezes; however, it was not clear by how much, as the former had a freeze at the end lasting for 60 frames; • DMOS score was different for different video sequences and the same degree of AFR, which means that the final DMOS score depended also upon video content-a good example would be sequence number 11, with a very low DMOS score, as this sequence had low spatial and temporal information, so the long frame freeze did not have a higher impact on the subjective score.
It was assumed that the positions of all frame freezing events were known. This assumption was not unrealistic, as in a real transmission application it is possible to determine the starting time of the freezes from other parameters, such as headers in transport stream or in video compressed packets.
Furthermore, the proposed NR-FFM measure was evaluated, comparing its estimates and the database scores using data for frame freezing degradations only, as well as in combination with other quality measures, to obtain the overall correlation computed on all 200 distorted sequences.

VQEG HD5 Dataset
The VQEG HD5 dataset [28] is used later in this study to compare newly proposed objective measure NR-FFM and several existing objective measures designed for frame freezing degradation. Details of the LIVE mobile dataset have been previously discussed. The VQEG-HD5 dataset consists of several original full-HD video sequences, with 25 frames per second and 10 s overall duration. All video sequences can be downloaded from [29]. Degraded video sequences have general degradations due to the MPEG-2 (Moving Picture Experts Group) and H.264/AVC (advanced video coding) compression artefacts and packet losses, which introduce slicing errors and/or frame freezing; however, only in some video sequences do packet losses introduce frame freezing. Because of this, we used 7 video sequences (src01, src02, src04, src05, src06, src08, and src09) and 4 degradation types with frame freezing (hrc10, hrc11, hrc12, and hrc15), resulting overall in 28 degraded video sequences.
Specifically, degradations that were included in later comparison are, briefly:

NR-FFM Measure Development
To be able to further analyze video content, we used spatial and temporal activity indicators, the spatial information (SI), and the temporal information (TI) for all degraded video sequences. SI and TI are defined as follows [30]: where F n represents luminance plane at time n, Sobel represents the Sobel operator and by convolvingwith 3 × 3 kernel, calculation of the SI is defined as maximal value of all Sobel-filtered frames standard deviation (stdspace ) values. By default, Sobel operator should be calculated for both horizontal and vertical edges. However, we later calculated SI for only horizontal (later described as SI H ), only vertical (later described as SI V ), and both horizontal and vertical edges (later described as SI H,V ): Results of TI versus SI are presented in Figure 3 for the LIVE mobile dataset and the VQEG dataset. For the LIVE mobile dataset, these were presented separately for the first three degradations (stored transmission) and fourth degradation (live transmission) separately. It can be seen that TI in the case of live transmission scenario was higher than in the case of the stored scenario. For the VQEG dataset, generally SI and TI characteristics were grouped for each video sequence. Also, it can be seen that the VQEG dataset was more diverse in terms of their dynamic characteristics when compared to the LIVE dataset. Later in this study, VQEG is used to compare proposed NR-FFM measure and several existing objective quality measures.
Results of TI versus SI are presented in Figure 3 for the LIVE mobile dataset and the VQEG dataset. For the LIVE mobile dataset, these were presented separately for the first three degradations (stored transmission) and fourth degradation (live transmission) separately. It can be seen that TI in the case of live transmission scenario was higher than in the case of the stored scenario. For the VQEG dataset, generally SI and TI characteristics were grouped for each video sequence. Also, it can be seen that the VQEG dataset was more diverse in terms of their dynamic characteristics when compared to the LIVE dataset. Later in this study, VQEG is used to compare proposed NR-FFM measure and several existing objective quality measures. . Temporal information versus spatial information (SIH,V) for: (a) LIVE mobile dataset, stored transmission scenario (blue) and live transmission scenario (red); (b) VQEG dataset (video sequences src01, src02, src04, src05, src06, src08, and src09 with degradations hrc10, hrc11, hrc12, and hrc15, previously described).
Firstly, we tried to find if there was a relationship between spatial information (SI), temporal information (TI), and DMOS scores, of all the 40 degraded sequences in the LIVE dataset. A genetic algorithm (GA) [31] was used to find a relationship between the four observed degradation types with the goal of maximizing Spearman's correlation (GA population size was set to 50).
We then hypothesized the following form for the reference frame freezing measure (NR-FFM) for degraded video sequence j (j℮{1,40}):  . Temporal information versus spatial information (SI H,V ) for: (a) LIVE mobile dataset, stored transmission scenario (blue) and live transmission scenario (red); (b) VQEG dataset (video sequences src01, src02, src04, src05, src06, src08, and src09 with degradations hrc10, hrc11, hrc12, and hrc15, previously described). Firstly, we tried to find if there was a relationship between spatial information (SI), temporal information (TI), and DMOS scores, of all the 40 degraded sequences in the LIVE dataset. A genetic algorithm (GA) [31] was used to find a relationship between the four observed degradation types with the goal of maximizing Spearman's correlation (GA population size was set to 50).
We then hypothesized the following form for the reference frame freezing measure (NR-FFM) for degraded video sequence j (j ∈ {1,40}): where higher value means worse quality and n is number of freezes in one sequence j, NFD(i,j) is normalized frame duration of one freezing (normalized by sequence frame length), SI(j) and TI(j) are spatial and temporal information of the whole sequence j (according to the Equation (1)), respectively, and coefficients α, β, and γ were optimized using a GA to maximize Spearman's correlation. Coefficient α takes into account impact of the duration of one frame freeze as well as overall frame freeze duration on final video quality. It should be expected that 0 < α < 1. α < 1 means that one freezing event with longer duration will have a better (lower) objective grade than few shorter freezing events (with same overall duration as one longer freezing event), which is in accordance with Figure 2. Also, α > 0 means that a longer frame freeze will have a higher impact (worse, higher objective grade) on video quality than a shorter frame freeze. Coefficients β and γ show the impact of spatial and temporal information on video quality. It would be expected that β and γ were higher than 0, meaning that higher spatial or temporal activity will have a higher influence on video quality.

Results Using Frame Freezing Degradations-Overall Measure, LIVE Mobile Dataset
We used LIVE mobile database frame freezing degradation sequences (40 video sequences) to train the proposed model from Equation (2). We calculated GA over the training set and tested calculated parameters α and β over the non-overlapping test set. We divided 10 original sequences in 7 or 8 for training (multiplied by 4 degradation levels per video sequence, giving 28 or 32 sequences overall) and 3 or 2 for testing (12 or 8 sequences overall). This gave (10 over 7) = 120 combinations or (10 over 8) = 45 combinations. Each combination was run 10 times and best (highest) Spearman's correlation was taken as being the correct model. Afterwards, the model was tested on the remaining part of the frame-freezing dataset. The model was trained to have the highest possible Spearman's correlation. Running GA 10 times (0 < α <1, β > 0, γ > 0, population size 50), using all combinations, we obtained the best results with mean γ near 0. Because of that, we decided to calculate NR-FFM using SI information only. By discarding TI from Equation (2), proposed NR-FFM should then have a following form: In order to use them for comparison, Pearson's, Spearman's, and Kendall's correlation were employed as follows. Pearson's product-moment correlation coefficient was calculated as a normalized covariance between two variables, x and y: where x i and y i are sample values (e.g., x is results from different objective measures and y is results from subjective tests), whereas x and y are sample mean and s x and s y are standard deviations from variables x and y. Spearman's correlation assesses how well an arbitrary monotonic function can describe the relationship between two variables without making any assumptions about the frequency distribution of the variables [32]. It was calculated similarly as a Pearson's correlation, Equation (5), but over ranked variables (every sample from both variables was firstly put in order: first, second, third, etc.; average rank was assigned to all tied ranks). Kendall's correlation [33] is also a ranked correlation coefficient. It takes into account pair observations over ranked variables and calculates concordant pairs (sort order by variables x and by y having the same direction), discordant pairs (sort order by variables x and by y having the opposite direction), and possibly adjusts for tied pair observations (neither concordant nor discordant pairs). Generally, three types of Kendall's correlation coefficient have been defined: τ a , τ b , and τ c . Kendall's τ a does not make any adjustment for ties, whereas τ b and τ c make adjustments. Kendall's τ b is used if the underlying scale of both variables has the same number of possible values (before ranking) and τ c if they differ. As in our case, both variables x and y can have many different possible values, and Kendall's τ b coefficient (which is also defined in Matlab as the Kendall's correlation coefficient, τ b ) is used later in this study. Results for Spearman's and Pearson's correlation are presented in Tables 1-3 for SI with only horizontal (SI H ), both horizontal and vertical edges (SI H,V ), and only vertical edges (SI V ), respectively. Mean parameters α and β over non-overlapping test sets for those cases are presented in Table 4. After nonlinear regression using four parameter logistic functions Q 1 , Q 2, Q 3 , and Q 4 , Pearson's correlation was calculated according to: Q 1 and Q 2 are defined in [34] and [35], respectively, whereas Q 3 and Q 4 represent cubic and linear fit. From Tables 1-3, generally it can be concluded that the highest Pearson's correlation was obtained using Q 1 and Q 3 fitting functions. Q 2 had a somewhat lower correlation and Q 4 function gave the lowest correlation (as could be expected from linear fitting function).
When the entire mobile dataset was used for training, we obtained Kendall's, Spearman's, and Pearson's correlation according to Table 5. Pearson's correlation was also calculated according to Equation (4) for Q 1 , Q 2 , Q 3 , and Q 4 . When comparing Spearman's and Pearson's correlations from Tables 1-3, we obtained similar correlations in all three cases. However, according to Table 5, the best Kendall's, Spearman's, and Pearson's correlations were obtained for SI with only a horizontal Sobel operator. Because of this, we further used this case (SI with horizontally calculated Sobel operator). Proposed NR-FFM should then have an explicit form: Table 6 presents Kendall's, Spearman's, and Pearson's correlations using Equation (7) for LIVE mobile, stored transmission video sequences (30 video sequences), and LIVE mobile live transmission video sequences (10 video sequences) separately. To further compare proposed NR-FFM measure, we also used DMOS tablet scores (20 video sequences with frame freezing degradations) with proposed NR-FFM from Equation (7). It has to be noted that these scores were obtained from the same video sequences as in the mobile dataset, however, they were shown on the tablet display.
If we used the NR-FFM measure defined in Equation (7), we obtained Kendall's correlation of 0.6000; Spearman's correlation of 0.8030; and Pearson's correlation of 0.8357, 0.8353, 0.8358, and 0.8253 for Q 1 , Q 2, Q 3 , and Q 4 , respectively. Figure 4 shows the estimated values of NR-FFM measure versus the subjective DMOS scores using Equation (7). Also, Table 7 shows the fitting coefficients b 1 -b 5 , as defined in Equation (6) versus the subjective DMOS scores using Equation (7). Also, Table 7 shows the fitting coefficients b1-b5, as defined in Equation (6), for this case.   Another solution to determine NR-FFM measure coefficients would be to train it on a tablet dataset (with 20 video sequences) and test it on a mobile dataset (with 40 video sequences). Training was performed equally as previously described for the overall mobile dataset. Parameters α and β in NR-FFM, Equation (4), were in this case determined to be 0.6356 and 0.1098, respectively. Table 8 presents Kendall's, Spearman's, and Pearson's correlation (using Q1, Q2, Q3, and Q4 as fitting functions), also for this case.

Combined Results, Overall LIVE Mobile Dataset
To be able to combine the proposed NR measure with other objective measures, it has to be rescaled and to have the same regression as the targeted objective measure. One way of achieving this is by using the nonlinear regression model proposed in Equations (8) and (9), where free parameter δ was calculated using GA to give the highest possible Spearman's correlation with DMOS. Rescaling was done using the maximum and minimum grade from the objective measure tested in advance for other degradation types, as well as from the proposed NR measure. We implemented proposed NR measure from Equation (4)    Another solution to determine NR-FFM measure coefficients would be to train it on a tablet dataset (with 20 video sequences) and test it on a mobile dataset (with 40 video sequences). Training was performed equally as previously described for the overall mobile dataset. Parameters α and β in NR-FFM, Equation (4), were in this case determined to be 0.6356 and 0.1098, respectively. Table 8 presents Kendall's, Spearman's, and Pearson's correlation (using Q 1 , Q 2 , Q 3 , and Q 4 as fitting functions), also for this case.

Combined Results, Overall LIVE Mobile Dataset
To be able to combine the proposed NR measure with other objective measures, it has to be rescaled and to have the same regression as the targeted objective measure. One way of achieving this is by using the nonlinear regression model proposed in Equations (8) and (9), where free parameter δ was calculated using GA to give the highest possible Spearman's correlation with DMOS. Rescaling was done using the maximum and minimum grade from the objective measure tested in advance for other degradation types, as well as from the proposed NR measure. We implemented proposed NR measure from Equation (4) on two reduced video quality measures, RVQM [12], and STRRED [10]. RVQM uses a scale of 0-1, where 0 means worst quality and 1 means no degradation. STRRED uses a scale from 0, where 0 means no degradation.
RVQM measure is based on 3D (three-dimensional) steerable wavelet transform (Riesz transform [36]) and modified SSIM (structural similarity index [6]) measure (using only contrast and structure terms). Here, we resized video frames into rectangular cuboid with size 64 × 64 per frame with 32 frames, and averaging filter with four pixels.
Step was set to half the size of the tested cube in time, with 16 pixels. Modified SSIM index was calculated from the third component from the first Riesz order decomposition with Shannon prefiltering and Simoncelli filter, as it was concluded in [12].
STRRED (spatio-temporal reduced reference entropic differences) is another measure with variable range of the reference information (from single scalar to the full reference information). It combines spatial RRED index [37] (SRRED) and temporal RRED index (TRRED).
In the case of implementing the proposed measure of frame freezing degradations in RVQM, rescaling was done according to and in the case of STRRED according to Equation (8): At the end, combined measures could be calculated as follows: RVQM combined = NR-FFM RVQM · RVQM STRRED combined = (NR-FFM STRRED + 1) · (STRRED + 1) .
In Equation (10), STRRED combined was rescaled to start from 1 (meaning perfect quality), otherwise the combined measure would give 0 even if only one part of the measure gave a perfect grade (zero).
As in the previous case (only frame-freezing degradations), we divided the dataset in seven or eight (original video sequences) for training and the rest for testing (three and two, respectively). This gave overall (multiplied by four degradation levels per degradation type and five degradation types) 140 or 160 video sequences for training. The rest of the non-overlapping video sequences were used for testing (60 and 40, respectively). Results for Kendall's, Spearman's, and Pearson's correlation between RVQM, STRRED, and DMOS are presented in Table 9. If the entire dataset was used for training in Equations (8) and (9), δ was calculated using GA running 10 times (again with target to maximize Spearman's correlation, with α = 0.6327 and β = 0.1167, from Equation (7)); δ1 = 3.2023 and δ2 = 3.8115. In this case, rescaled measures can be explicitly written as in following equations: NR-FFM STRRED = 3.3464 · 10 5 · NR-FFM 3.8115 .
(12) Figure 5 shows the relationship between combined RVQM measure and combined STRRED measure with DMOS scores (200 degraded video sequences for mobile dataset and 100 degraded sequences for tablet dataset), using Equations (10)- (12). In the tablet dataset, the same coefficients have been used as in the mobile dataset, Equations (11) and (12). If the entire dataset was used for training in Equations (8) and (9), δ was calculated using GA running 10 times (again with target to maximize Spearman's correlation, with α = 0.6327 and β = 0.1167, from Equation (7)); δ1 = 3.2023 and δ2 = 3.8115. In this case, rescaled measures can be explicitly written as in following equations: (12) Figure 5 shows the relationship between combined RVQM measure and combined STRRED measure with DMOS scores (200 degraded video sequences for mobile dataset and 100 degraded sequences for tablet dataset), using Equations (10)- (12). In the tablet dataset, the same coefficients have been used as in the mobile dataset, Equations (11) and (12).  Table 10 shows Kendall's, Spearman's, and Pearson's correlation, before and after combining RVQM and STRRED measures with NR-FFM measure, using the entire mobile dataset. Table 11 shows Kendall's, Spearman's, and Pearson's correlation for the tablet dataset using the fitting coefficient from the mobile dataset study, Equations (11) and (12).
Pearson's correlation was calculated after nonlinear regression using Equation (6). In these tables, results for VQM measure [13] that incorporates variable frame delay distortion and full or reduced reference calibration, called VQM-VFD-FR or VQM-VFD-RR measures, have also been  Table 10 shows Kendall's, Spearman's, and Pearson's correlation, before and after combining RVQM and STRRED measures with NR-FFM measure, using the entire mobile dataset. Table 11 shows Kendall's, Spearman's, and Pearson's correlation for the tablet dataset using the fitting coefficient from the mobile dataset study, Equations (11) and (12). Table 10. Kendall's, Spearman's, and Pearson's correlation, before and after combining RVQM and STRRED with freezing degradations; VQM-VFD-RR and VQM-VFD-FR correlation from [17]; mobile dataset (best values are marked in bold).

Measure
Kendall Pearson's correlation was calculated after nonlinear regression using Equation (6). In these tables, results for VQM measure [13] that incorporates variable frame delay distortion and full or reduced reference calibration, called VQM-VFD-FR or VQM-VFD-RR measures, have also been included from [17]. It can be concluded that NR-FFM can be included in other measures without lowering overall correlation with DMOS. Regarding the fitting functions for Pearson's correlation, Q 1 , Q 2 , and Q 3 have similar correlations, whereas Q 4 produces a much lower correlation. This means that the linear fitting function Q 4 cannot be used in this case. It has to be also noted that in this database, only one part from Equation (10) has an influence on the final objective metric (frame-freezing degradation or any other degradation type), whereas the other part shows perfect quality (with score 1).

NR-FFM Measure Comparison with Other Objective Measures for Frame Freezing Degradations
In this subsection we will compare newly proposed NR-FFM measure with several existing measures, named "Borer" (no reference objective measure) [19], "FDF" (no reference objective measure) [20], "FDF-RR" (reduced reference objective measure) [20], and "Quanhuyn" (no reference objective measure) [15]. Table 12 presents Kendall's, Spearman's, and Pearson's correlation for the LIVE mobile dataset (40 degraded video sequences described earlier), whereas Table 13 presents Kendall's, Spearman's, and Pearson's correlation for the VQEG dataset (28 degraded video sequences described earlier). In both tables, Borer measure was calculated according to the paper [19], but with the motion intensity value set to 1 for all degraded video sequences (otherwise, correlation was always lower). FDF and FDF-RR measures were calculated using Matlab code from [38] (with read_avi function from command line video quality metric, CVQM, MATLAB source code for CVQM version 3.0 [39]). Quanhuyn measure was also calculated according to the paper [15]. Our proposed measure, NR-FFM, was in Table 12 equal to that in Table 1, training-test ratio 8:2, mean value. In Table 13, NR-FFM was calculated according to Equation (7) and using SI H (from Equations (1) and (2)) as spatial information values.

Discussion
In this paper we developed no reference objective measure NR-FFM and tested it using two different datasets, LIVE mobile and VQEG-HD5. NR-FFM measure was also compared with some existing objective measures for frame freezing degradations named Borer (no-reference), FDF (no-reference), Quanhuyn (no-reference), and FDF-RR (reduced reference). In the LIVE mobile dataset, NR-FFM gave the best correlation results between all tested measures (Table 12). In the VQEG-HD5 dataset, NR-FFM (which was trained on the LIVE mobile dataset) had the best Pearson's correlation for Q1, Q2, and Q3 fitting functions (Table 13). Borer and FDF-RR measures obtained somewhat higher Spearman's and Kendall's correlations (Table 13).
The main difference between LIVE mobile frame freezing degradations and VQEG-HD5 frame freezing degradations were the type and overall duration of the freezing occurrences. In the LIVE mobile dataset, there were online degradations (due to, for example, packet losses), which resulted in frame drops and offline degradations (due to, for example, packet delay), where no frame was actually lost, only delayed. Also, in the LIVE mobile dataset, overall freezing duration was generally the same-in offline frame freezing 8 × 1 s long, 4 × 2 s long, and 2 × 4 s long. Only online freezing had one 4 s long freeze and one 2 s long freeze. In the VQEG-HD5 dataset, all freezing frames were online degradations, resulting in frame drops. Also, higher packet loss ratio (PLR) resulted in longer frame freezes and, consequently, more lost frames. This meant that overall frame freezing duration was different for different degradation types (in the VQEG-HD5 dataset this was due to the different PLR ratio). Probably because of this, the NR measures FDF and Borer had lower correlation in the LIVE mobile dataset compared to the VQEG-HD5 dataset. Also, motion intensity or temporal information (or any other measure that would take differences between consecutive frames into account) did not have a higher value in offline frame freezing compared to the original video sequence (and that degradation was present only in the LIVE mobile dataset). NR-FFM measure had the best correlation in the LIVE mobile dataset compared to the other tested measures, probably also because it was calculated only on the basis of frame freezing duration and spatial information. NR-FFM measure also had different values for the equal overall frame freezing, but with different numbers of occurrences (see Equation (7)). Nonetheless, in the VQEG-HD5 dataset, NR-FFM had similar correlation to Borer and FDF measures without pretrained values on the VQEG-HD5 dataset.
When comparing online degradation (e.g., stored video sequences) and offline degradation (e.g., live video sequences) types in the LIVE mobile dataset (Table 6), offline degradation had higher correlation. However, in online degradation there were only 10 video sequences with equal degradation type-two equally spaced frame drops that were 4 and 2 s long (3-7 and 13-15 s in each video sequence); thus, these correlation coefficients had lower confidence. Also, compared with Table 13, also with only an online degradation type, NR-FFM had in this case a higher correlation for the tested VQEG-HD5 dataset, probably due to the different degradation levels for each video sequence (4 video sequences), as well as more tested video sequences (28 video sequences overall).
NR-FFM measure was also tested using both the LIVE mobile (40 video sequences) and LIVE tablet study (20 video sequences) and was trained/tested using another dataset, showing similar fitting coefficients and correlations in both cases (trained on the LIVE mobile and tested on the LIVE tablet and vice versa).
Furthermore, we combined the proposed NR-FFM measure with some existing reduced-reference measures (RVQM and STRRED) and tested it using all video sequences from the LIVE mobile/tablet dataset. Results have shown that it is possible to obtain similar correlation for the overall combined objective measure using all five degradation types in the LIVE mobile dataset (Tables 10 and 11).

Conclusions
In this paper we proposed a new NR-FFM measure for frame freezing degradations. It can be used only for this degradation type or in combination with other degradation types, not lowering their overall correlation (which we checked on RVQM and STRRED video quality measures in the LIVE mobile and tablet video databases).
Future research may be based on temporal information (or motion intensity) implementation in the proposed NR-FFM measure to obtain higher correlation with subjective grades for video sequences with both offline and online frame freezing. Alternatively, different formulae maybe defined for spatial and temporal information instead of the proposed formula in Equation (7). A larger dataset should also then be developed to be able to calculate correlations with higher confidence.