Skip to Content
SensorsSensors
  • Article
  • Open Access

26 October 2019

No-Reference Objective Video Quality Measure for Frame Freezing Degradation

and
1
Department of Electrical Engineering, University North, 42000 Varaždin, Croatia
2
Department of Electrical Engineering and Computing, University of Dubrovnik, 20000 Dubrovnik, Croatia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advance and Applications of RGB Sensors

Abstract

In this paper we present a novel no-reference video quality measure, NR-FFM (no-reference frame–freezing measure), designed to estimate quality degradations caused by frame freezing of streamed video. The performance of the measure was evaluated using 40 degraded video sequences from the laboratory for image and video engineering (LIVE) mobile database. Proposed quality measure can be used in different scenarios such as mobile video transmission by itself or in combination with other quality measures. These two types of applications were presented and studied together with considerations on relevant normalization issues. The results showed promising correlation values between the user assigned quality and the estimated quality scores.

1. Introduction

Quality of media content can be evaluated using different types of measures. The most reliable way of doing this is by conducting visual experiments under controlled conditions, in which human observers grade the quality of the multimedia contents under evaluation [1]. Unfortunately, such experiments are time-consuming and costly, making the search for alternative quality estimation methods an important research topic. A much simpler approach is to use some computable objective measure that equates quality degradation with the (numerical) error between the original and the distorted media [2,3]. Every objective quality measure has as its aim approximating the human quality perception (or human visual system, HVS) as closely as possible, meaning that good correlation with subjective measures (mean opinion score, MOS) is sought. Objective quality measures for image and video can be generally divided into three categories according to the reference information they use, as follows:
  • full-reference (FR) quality measures;
  • reduced-reference (RR) quality measures;
  • no-reference (NR) quality measures.
FR quality measures require the original undistorted or unprocessed signal. NR quality measures require only the processed/degraded signal. RR quality measures need information derived from the original signal [3].
In this paper, we present no-reference video quality measure for frame freezing degradations called NR-FFM (no-reference frame–freezing measure). This type of degradation often occurs during video transmission in different types of TV broadcasting (internet, mobile), with low SNR (signal to noise ratio) margin. It will be shown that the proposed measure achieves high correlation with MOS and that it can be used in cases where only this type of degradation is expected or in combination with other degradation types, so that NR-FFM is combined with some other FR or RR objective measure.
This paper is organized as follows: Section 2 describes related work, Section 3 describes the test of the laboratory for image and video engineering (LIVE) mobile database, Section 4 describes the development of the proposed NR-FFM measure, Section 5 presents the experimental results, Section 6 provides the discussion, and Section 7 draws the conclusions.

3. Description of Used Datasets with Subjective Ratings

3.1. LIVE Mobile Dataset

To develop and test the proposed measure NR-FFM, we used sequences from the LIVE Mobile Video Quality Database [18] (later called LIVE mobile) using data from the mobile study and from the tablet study. The 10 video files in mobile study (out of which the first 5 were used in tablet study) are stored in planar YUV 4:2:0 format, with a spatial resolution of 1280 × 720 pixels and 30 fps and each is 15 s long. There are 20 degraded video sequences per original sequence with 5 distortion types (altogether 200 distorted video sequences in the mobile study and 100 video sequences in the tablet study):
  • H.264 compression (four test videos per reference);
  • Wireless channel packet-loss (four test videos per reference);
  • Frame-freezes (four test videos per reference);
  • Rate adaptation (three test videos per reference);
  • Temporal dynamics (five test videos per reference).
The first frame from each video sequence is shown in Figure 1.
Figure 1. First frame, laboratory for image and video engineering (LIVE) mobile database: (a) bulldozer with fence, (b) Barton Springs pool diving, (c) friend drinking Coke, (d) Harmonicat, (e) landing airplane, (f) panning under oak, (g) Runners skinny guy, (h) two swans dunking, (i) students looming across street, (j) trail pink kid.
In [18], the authors compared different objective measures using four degradation types from the dataset, omitting frame freezing degradation. In our study, we developed a no-reference objective measure for frame-freezing degradations, which was tested using 10 degraded sequences and 4 degradation levels, obtained from LIVE mobile database [18] (mobile subjective testing)—the first 3 degradations simulated the transmission of stored video material, which meant that after a freeze, playing re-started from the next frame in time, whereas the fourth degradation simulated live streaming, for example, after a frame freeze, all skipped frames were lost. Affected frame rate (AFR) was the same for the first three degradations, for example, the first degradation had 8 × 30 freezing frames, the second degradation had 4 × 60, and the third had 2 × 120 freezing frames; AFR was 240/690 = 34.78%. The fourth degradation had 1 freezing episode in the middle of the sequence lasting 120 frames, and 1 frame freeze at the end of the sequence lasting 60 frames; AFR was 180/450 = 40%. The first 3 s of all degraded video sequences were without freezing and duration of non-freezing video parts, set to be 1.5× the duration of the freezing parts. Figure 2 shows a graph with the DMOS (differential mean opinion score) results for these sequences. All 40 degraded videos had only frame freezing degradations, with no other degradation types included in them.
Figure 2. DMOS scores on LIVE mobile database frames with freezing degradation only.
Several conclusions can be drawn from the graph (similar conclusions can be found in [14] where authors used a different database):
  • DMOS score was lower (better) for longer frame freezing in the case of the stored streaming scenario (degradation types 1–3) with the same affected frame rate;
  • DMOS score was higher (worse) in the case of live streaming scenario (1 × 120 + 1 × 60 frame freeze) where frames were ‘lost’, compared with the case of stored video and 2 × 120 frame freezes; however, it was not clear by how much, as the former had a freeze at the end lasting for 60 frames;
  • DMOS score was different for different video sequences and the same degree of AFR, which means that the final DMOS score depended also upon video content—a good example would be sequence number 11, with a very low DMOS score, as this sequence had low spatial and temporal information, so the long frame freeze did not have a higher impact on the subjective score.
It was assumed that the positions of all frame freezing events were known. This assumption was not unrealistic, as in a real transmission application it is possible to determine the starting time of the freezes from other parameters, such as headers in transport stream or in video compressed packets.
Furthermore, the proposed NR-FFM measure was evaluated, comparing its estimates and the database scores using data for frame freezing degradations only, as well as in combination with other quality measures, to obtain the overall correlation computed on all 200 distorted sequences.

3.2. VQEG HD5 Dataset

The VQEG HD5 dataset [28] is used later in this study to compare newly proposed objective measure NR-FFM and several existing objective measures designed for frame freezing degradation. Details of the LIVE mobile dataset have been previously discussed. The VQEG-HD5 dataset consists of several original full-HD video sequences, with 25 frames per second and 10 s overall duration. All video sequences can be downloaded from [29]. Degraded video sequences have general degradations due to the MPEG-2 (Moving Picture Experts Group) and H.264/AVC (advanced video coding) compression artefacts and packet losses, which introduce slicing errors and/or frame freezing; however, only in some video sequences do packet losses introduce frame freezing. Because of this, we used 7 video sequences (src01, src02, src04, src05, src06, src08, and src09) and 4 degradation types with frame freezing (hrc10, hrc11, hrc12, and hrc15), resulting overall in 28 degraded video sequences.
Specifically, degradations that were included in later comparison are, briefly:
  • hrc10: H.264/AVC compression, bitrate 16 Mbit/s, 1-pass encoding, bursty packet loss, packet loss rate 0.125%; freezing errors;
  • hrc11: H.264/AVC compression, bitrate 16 Mbit/s, 1-pass encoding, bursty packet loss, packet loss rate 0.25%; freezing errors;
  • hrc12: H.264/AVC compression, bitrate 16 Mbit/s, 1-pass encoding, bursty packet loss, packet loss rate 0.5%; freezing errors;
  • hrc15: H.264/AVC compression, bitrate 4 Mbit/s, 1-pass encoding, bursty packet loss, packet loss rate 0.25%; freezing errors.

4. NR-FFM Measure Development

To be able to further analyze video content, we used spatial and temporal activity indicators, the spatial information (SI), and the temporal information (TI) for all degraded video sequences. SI and TI are defined as follows [30]:
SI = max t i m e ( s t d s p a c e ( S o b e l H , V ( F n ) ) ) TI = max t i m e ( s t d s p a c e ( F n F n 1 ) )
where Fn represents luminance plane at time n, Sobel represents the Sobel operator and by convolvingwith 3 × 3 kernel, calculation of the SI is defined as maximal value of all Sobel-filtered frames standard deviation (stdspace ) values. By default, Sobel operator should be calculated for both horizontal and vertical edges. However, we later calculated SI for only horizontal (later described as SIH), only vertical (later described as SIV), and both horizontal and vertical edges (later described as SIH,V):
S o b e l H = [ 1 2 1 0 0 0 1 2 1 ] ;   S o b e l V = [ 1 0 1 2 0 2 1 0 1 ] ;   S o b e l H , V ( F n ) = S o b e l H 2 ( F n ) + S o b e l V 2 ( F n ) .
Results of TI versus SI are presented in Figure 3 for the LIVE mobile dataset and the VQEG dataset. For the LIVE mobile dataset, these were presented separately for the first three degradations (stored transmission) and fourth degradation (live transmission) separately. It can be seen that TI in the case of live transmission scenario was higher than in the case of the stored scenario. For the VQEG dataset, generally SI and TI characteristics were grouped for each video sequence. Also, it can be seen that the VQEG dataset was more diverse in terms of their dynamic characteristics when compared to the LIVE dataset. Later in this study, VQEG is used to compare proposed NR-FFM measure and several existing objective quality measures.
Figure 3. Temporal information versus spatial information (SIH,V) for: (a) LIVE mobile dataset, stored transmission scenario (blue) and live transmission scenario (red); (b) VQEG dataset (video sequences src01, src02, src04, src05, src06, src08, and src09 with degradations hrc10, hrc11, hrc12, and hrc15, previously described).
Firstly, we tried to find if there was a relationship between spatial information (SI), temporal information (TI), and DMOS scores, of all the 40 degraded sequences in the LIVE dataset. A genetic algorithm (GA) [31] was used to find a relationship between the four observed degradation types with the goal of maximizing Spearman’s correlation (GA population size was set to 50).
We then hypothesized the following form for the reference frame freezing measure (NR-FFM) for degraded video sequence j (j ∈ {1,40}):
NR-FFM ( j ) = ( i = 1 n NFD ( i , j ) α ) SI ( j ) β TI ( j ) γ
where higher value means worse quality and n is number of freezes in one sequence j, NFD(i,j) is normalized frame duration of one freezing (normalized by sequence frame length), SI(j) and TI(j) are spatial and temporal information of the whole sequence j (according to the Equation (1)), respectively, and coefficients α, β, and γ were optimized using a GA to maximize Spearman’s correlation. Coefficient α takes into account impact of the duration of one frame freeze as well as overall frame freeze duration on final video quality. It should be expected that 0 < α < 1. α < 1 means that one freezing event with longer duration will have a better (lower) objective grade than few shorter freezing events (with same overall duration as one longer freezing event), which is in accordance with Figure 2. Also, α > 0 means that a longer frame freeze will have a higher impact (worse, higher objective grade) on video quality than a shorter frame freeze. Coefficients β and γ show the impact of spatial and temporal information on video quality. It would be expected that β and γ were higher than 0, meaning that higher spatial or temporal activity will have a higher influence on video quality.

5. Experiments and Results

5.1. Results Using Frame Freezing Degradations—Overall Measure, LIVE Mobile Dataset

We used LIVE mobile database frame freezing degradation sequences (40 video sequences) to train the proposed model from Equation (2). We calculated GA over the training set and tested calculated parameters α and β over the non-overlapping test set. We divided 10 original sequences in 7 or 8 for training (multiplied by 4 degradation levels per video sequence, giving 28 or 32 sequences overall) and 3 or 2 for testing (12 or 8 sequences overall). This gave (10 over 7) = 120 combinations or (10 over 8) = 45 combinations. Each combination was run 10 times and best (highest) Spearman’s correlation was taken as being the correct model. Afterwards, the model was tested on the remaining part of the frame-freezing dataset. The model was trained to have the highest possible Spearman’s correlation. Running GA 10 times (0 < α <1, β > 0, γ > 0, population size 50), using all combinations, we obtained the best results with mean γ near 0. Because of that, we decided to calculate NR-FFM using SI information only. By discarding TI from Equation (2), proposed NR-FFM should then have a following form:
NR-FFM ( j ) = ( i = 1 n NFD ( i , j ) α ) SI ( j ) β .
In order to use them for comparison, Pearson’s, Spearman’s, and Kendall’s correlation were employed as follows. Pearson’s product–moment correlation coefficient was calculated as a normalized covariance between two variables, x and y:
r x y = i = 1 n ( x i x ¯ ) ( y i y ¯ ) ( n 1 ) s x s y
where x i and y i are sample values (e.g., x is results from different objective measures and y is results from subjective tests), whereas x ¯ and y ¯ are sample mean and s x and s y are standard deviations from variables x and y. Spearman’s correlation assesses how well an arbitrary monotonic function can describe the relationship between two variables without making any assumptions about the frequency distribution of the variables [32]. It was calculated similarly as a Pearson’s correlation, Equation (5), but over ranked variables (every sample from both variables was firstly put in order: first, second, third, etc.; average rank was assigned to all tied ranks). Kendall’s correlation [33] is also a ranked correlation coefficient. It takes into account pair observations over ranked variables and calculates concordant pairs (sort order by variables x and by y having the same direction), discordant pairs (sort order by variables x and by y having the opposite direction), and possibly adjusts for tied pair observations (neither concordant nor discordant pairs). Generally, three types of Kendall’s correlation coefficient have been defined: τ a , τ b , and τ c . Kendall’s τ a does not make any adjustment for ties, whereas τ b and τ c make adjustments. Kendall’s τ b is used if the underlying scale of both variables has the same number of possible values (before ranking) and τ c if they differ. As in our case, both variables x and y can have many different possible values, and Kendall’s τ b coefficient (which is also defined in Matlab as the Kendall’s correlation coefficient,   τ b ) is used later in this study.
Results for Spearman’s and Pearson’s correlation are presented in Table 1, Table 2 and Table 3 for SI with only horizontal (SIH), both horizontal and vertical edges (SIH,V), and only vertical edges (SIV), respectively. Mean parameters α and β over non-overlapping test sets for those cases are presented in Table 4. After nonlinear regression using four parameter logistic functions Q1, Q2, Q3, and Q4, Pearson’s correlation was calculated according to:
Q 1 ( z ) = b 1 ( 1 2 1 1 + e b 2 ( z b 3 ) ) + b 4 z + b 5 Q 2 ( z ) = b 1 b 2 1 + e z b 3 b 4 + b 2 Q 3 ( z ) = b 1 z 3 + b 2 z 2 + b 3 z + b 4 Q 4 ( z ) = b 1 z + b 2 .
Table 1. Kendall’s, Spearman’s, and Pearson’s correlation for different train-test dataset ratio—SIH.
Table 2. Kendall’s, Spearman’s, and Pearson’s correlation for different train-test dataset ratio—SIH,V.
Table 3. Kendall’s, Spearman’s, and Pearson’s correlation for different train-test dataset ratio—SIV.
Table 4. Mean parameters α and β over non-overlapping test sets. SIH: only horizontal spatial information, SIH,V: horizontal and vertical spatial information, SIV: only vertical spatial information.
Q1 and Q2 are defined in [34] and [35], respectively, whereas Q3 and Q4 represent cubic and linear fit.
From Table 1, Table 2 and Table 3, generally it can be concluded that the highest Pearson’s correlation was obtained using Q1 and Q3 fitting functions. Q2 had a somewhat lower correlation and Q4 function gave the lowest correlation (as could be expected from linear fitting function).
When the entire mobile dataset was used for training, we obtained Kendall’s, Spearman’s, and Pearson’s correlation according to Table 5. Pearson’s correlation was also calculated according to Equation (4) for Q1, Q2, Q3, and Q4.
Table 5. Kendall’s, Spearman’s, and Pearson’s correlation for the overall mobile dataset, different SI calculations (best values are marked in bold). H: horizontal, V: vertical.
When comparing Spearman’s and Pearson’s correlations from Table 1, Table 2 and Table 3, we obtained similar correlations in all three cases. However, according to Table 5, the best Kendall’s, Spearman’s, and Pearson’s correlations were obtained for SI with only a horizontal Sobel operator. Because of this, we further used this case (SI with horizontally calculated Sobel operator). Proposed NR-FFM should then have an explicit form:
NR-FFM ( j ) = ( i = 1 n NFD ( i , j ) 0.6327 ) SI ( j ) 0.1167 .
Table 6 presents Kendall’s, Spearman’s, and Pearson’s correlations using Equation (7) for LIVE mobile, stored transmission video sequences (30 video sequences), and LIVE mobile live transmission video sequences (10 video sequences) separately.
Table 6. Kendall’s, Spearman’s, and Pearson’s correlation, using LIVE mobile, stored transmission video sequences, and LIVE mobile live transmission video sequences.

5.2. NR-FFM Measure: Comparison between “Mobile” and “Tablet” Sub-Dataset from LIVE Mobile Dataset

To further compare proposed NR-FFM measure, we also used DMOS tablet scores (20 video sequences with frame freezing degradations) with proposed NR-FFM from Equation (7). It has to be noted that these scores were obtained from the same video sequences as in the mobile dataset, however, they were shown on the tablet display.
If we used the NR-FFM measure defined in Equation (7), we obtained Kendall’s correlation of 0.6000; Spearman’s correlation of 0.8030; and Pearson’s correlation of 0.8357, 0.8353, 0.8358, and 0.8253 for Q1, Q2, Q3, and Q4, respectively. Figure 4 shows the estimated values of NR-FFM measure versus the subjective DMOS scores using Equation (7). Also, Table 7 shows the fitting coefficients b1–b5, as defined in Equation (6), for this case.
Figure 4. No-reference frame–freezing measure (NR-FFM) versus DMOS (trained using LIVE mobile sub-dataset): (a) LIVE mobile dataset; (b) LIVE tablet dataset.
Table 7. Fitting coefficients b1–b5.
Another solution to determine NR-FFM measure coefficients would be to train it on a tablet dataset (with 20 video sequences) and test it on a mobile dataset (with 40 video sequences). Training was performed equally as previously described for the overall mobile dataset. Parameters α and β in NR-FFM, Equation (4), were in this case determined to be 0.6356 and 0.1098, respectively. Table 8 presents Kendall’s, Spearman’s, and Pearson’s correlation (using Q1, Q2, Q3, and Q4 as fitting functions), also for this case.
Table 8. Kendall’s, Spearman’s, and Pearson’s correlation for the overall frame freezing dataset, using the LIVE tablet sub-dataset for training and the LIVE mobile sub-dataset for testing.

5.3. Combined Results, Overall LIVE Mobile Dataset

To be able to combine the proposed NR measure with other objective measures, it has to be rescaled and to have the same regression as the targeted objective measure. One way of achieving this is by using the nonlinear regression model proposed in Equations (8) and (9), where free parameter δ was calculated using GA to give the highest possible Spearman’s correlation with DMOS. Rescaling was done using the maximum and minimum grade from the objective measure tested in advance for other degradation types, as well as from the proposed NR measure. We implemented proposed NR measure from Equation (4) on two reduced video quality measures, RVQM [12], and STRRED [10]. RVQM uses a scale of 0–1, where 0 means worst quality and 1 means no degradation. STRRED uses a scale from 0, where 0 means no degradation.
RVQM measure is based on 3D (three-dimensional) steerable wavelet transform (Riesz transform [36]) and modified SSIM (structural similarity index [6]) measure (using only contrast and structure terms). Here, we resized video frames into rectangular cuboid with size 64 × 64 per frame with 32 frames, and averaging filter with four pixels. Step was set to half the size of the tested cube in time, with 16 pixels. Modified SSIM index was calculated from the third component from the first Riesz order decomposition with Shannon prefiltering and Simoncelli filter, as it was concluded in [12].
STRRED (spatio-temporal reduced reference entropic differences) is another measure with variable range of the reference information (from single scalar to the full reference information). It combines spatial RRED index [37] (SRRED) and temporal RRED index (TRRED).
In the case of implementing the proposed measure of frame freezing degradations in RVQM, rescaling was done according to
NR-FFM RVQM = 1 1 m i n ( R V Q M ) m a x ( NR-FFM δ 1 ) NR-FFM δ 1 ,
and in the case of STRRED according to Equation (8):
NR-FFM S T R R E D = max ( STRRED ) max ( NR-FFM δ 2 ) NR-FFM δ 2
At the end, combined measures could be calculated as follows:
R V Q M c o m b i n e d = NR-FFM R V Q M R V Q M S T R R E D c o m b i n e d = ( NR-FFM S T R R E D + 1 ) ( S T R R E D + 1 ) .
In Equation (10), STRRED combined was rescaled to start from 1 (meaning perfect quality), otherwise the combined measure would give 0 even if only one part of the measure gave a perfect grade (zero).
As in the previous case (only frame-freezing degradations), we divided the dataset in seven or eight (original video sequences) for training and the rest for testing (three and two, respectively). This gave overall (multiplied by four degradation levels per degradation type and five degradation types) 140 or 160 video sequences for training. The rest of the non-overlapping video sequences were used for testing (60 and 40, respectively). Results for Kendall’s, Spearman’s, and Pearson’s correlation between RVQM, STRRED, and DMOS are presented in Table 9.
Table 9. Kendall’s, Spearman’s, and Pearson’s correlation for different train-test dataset ratio (best values are marked in bold).
If the entire dataset was used for training in Equations (8) and (9), δ was calculated using GA running 10 times (again with target to maximize Spearman’s correlation, with α = 0.6327 and β = 0.1167, from Equation (7)); δ1 = 3.2023 and δ2 = 3.8115. In this case, rescaled measures can be explicitly written as in following equations:
NR-FFM R V Q M = 1 64.6136 NR-FFM 3.2023 ,
NR-FFM S T R R E D = 3.3464 1 0 5 NR-FFM 3.8115 .
Figure 5 shows the relationship between combined RVQM measure and combined STRRED measure with DMOS scores (200 degraded video sequences for mobile dataset and 100 degraded sequences for tablet dataset), using Equations (10)–(12). In the tablet dataset, the same coefficients have been used as in the mobile dataset, Equations (11) and (12).
Figure 5. Combined objective measure: (a) RVQM (reduced video quality measure), mobile dataset; (b) STRRED (spatio-temporal reduced reference entropic differences), mobile dataset; (c) RVQM, tablet dataset; (d) STRRED, tablet dataset.
Table 10 shows Kendall’s, Spearman’s, and Pearson’s correlation, before and after combining RVQM and STRRED measures with NR-FFM measure, using the entire mobile dataset. Table 11 shows Kendall’s, Spearman’s, and Pearson’s correlation for the tablet dataset using the fitting coefficient from the mobile dataset study, Equations (11) and (12).
Table 10. Kendall’s, Spearman’s, and Pearson’s correlation, before and after combining RVQM and STRRED with freezing degradations; VQM-VFD-RR and VQM-VFD-FR correlation from [17]; mobile dataset (best values are marked in bold).
Table 11. Kendall’s, Spearman’s, and Pearson’s correlation, before and after combining RVQM and STRRED with freezing degradations; VQM-VFD-RR and VQM-VFD-FR correlation from [17]; tablet dataset (best values are marked in bold).
Pearson’s correlation was calculated after nonlinear regression using Equation (6). In these tables, results for VQM measure [13] that incorporates variable frame delay distortion and full or reduced reference calibration, called VQM-VFD-FR or VQM-VFD-RR measures, have also been included from [17]. It can be concluded that NR-FFM can be included in other measures without lowering overall correlation with DMOS. Regarding the fitting functions for Pearson’s correlation, Q1, Q2, and Q3 have similar correlations, whereas Q4 produces a much lower correlation. This means that the linear fitting function Q4 cannot be used in this case. It has to be also noted that in this database, only one part from Equation (10) has an influence on the final objective metric (frame-freezing degradation or any other degradation type), whereas the other part shows perfect quality (with score 1).

5.4. NR-FFM Measure Comparison with Other Objective Measures for Frame Freezing Degradations

In this subsection we will compare newly proposed NR-FFM measure with several existing measures, named “Borer” (no reference objective measure) [19], “FDF” (no reference objective measure) [20], “FDF-RR” (reduced reference objective measure) [20], and “Quanhuyn” (no reference objective measure) [15]. Table 12 presents Kendall’s, Spearman’s, and Pearson’s correlation for the LIVE mobile dataset (40 degraded video sequences described earlier), whereas Table 13 presents Kendall’s, Spearman’s, and Pearson’s correlation for the VQEG dataset (28 degraded video sequences described earlier). In both tables, Borer measure was calculated according to the paper [19], but with the motion intensity value set to 1 for all degraded video sequences (otherwise, correlation was always lower). FDF and FDF-RR measures were calculated using Matlab code from [38] (with read_avi function from command line video quality metric, CVQM, MATLAB source code for CVQM version 3.0 [39]). Quanhuyn measure was also calculated according to the paper [15]. Our proposed measure, NR-FFM, was in Table 12 equal to that in Table 1, training-test ratio 8:2, mean value. In Table 13, NR-FFM was calculated according to Equation (7) and using SIH (from Equations (1) and (2)) as spatial information values.
Table 12. Kendall’s, Spearman’s, and Pearson’s correlation for the LIVE mobile dataset—40 degraded video sequences (best values are marked in bold). Borer: no reference objective measure, FDF: no reference objective measure, FDF-RR: reduced reference objective measure, Quanhuyn: no reference objective measure.
Table 13. Kendall’s, Spearman’s, and Pearson’s correlation for the VQEG dataset—28 degraded video sequences (best values are marked in bold).

6. Discussion

In this paper we developed no reference objective measure NR-FFM and tested it using two different datasets, LIVE mobile and VQEG-HD5. NR-FFM measure was also compared with some existing objective measures for frame freezing degradations named Borer (no-reference), FDF (no-reference), Quanhuyn (no-reference), and FDF-RR (reduced reference). In the LIVE mobile dataset, NR-FFM gave the best correlation results between all tested measures (Table 12). In the VQEG-HD5 dataset, NR-FFM (which was trained on the LIVE mobile dataset) had the best Pearson’s correlation for Q1, Q2, and Q3 fitting functions (Table 13). Borer and FDF-RR measures obtained somewhat higher Spearman’s and Kendall’s correlations (Table 13).
The main difference between LIVE mobile frame freezing degradations and VQEG-HD5 frame freezing degradations were the type and overall duration of the freezing occurrences. In the LIVE mobile dataset, there were online degradations (due to, for example, packet losses), which resulted in frame drops and offline degradations (due to, for example, packet delay), where no frame was actually lost, only delayed. Also, in the LIVE mobile dataset, overall freezing duration was generally the same—in offline frame freezing 8 × 1 s long, 4 × 2 s long, and 2 × 4 s long. Only online freezing had one 4 s long freeze and one 2 s long freeze. In the VQEG-HD5 dataset, all freezing frames were online degradations, resulting in frame drops. Also, higher packet loss ratio (PLR) resulted in longer frame freezes and, consequently, more lost frames. This meant that overall frame freezing duration was different for different degradation types (in the VQEG-HD5 dataset this was due to the different PLR ratio). Probably because of this, the NR measures FDF and Borer had lower correlation in the LIVE mobile dataset compared to the VQEG-HD5 dataset. Also, motion intensity or temporal information (or any other measure that would take differences between consecutive frames into account) did not have a higher value in offline frame freezing compared to the original video sequence (and that degradation was present only in the LIVE mobile dataset). NR-FFM measure had the best correlation in the LIVE mobile dataset compared to the other tested measures, probably also because it was calculated only on the basis of frame freezing duration and spatial information. NR-FFM measure also had different values for the equal overall frame freezing, but with different numbers of occurrences (see Equation (7)). Nonetheless, in the VQEG-HD5 dataset, NR-FFM had similar correlation to Borer and FDF measures without pretrained values on the VQEG-HD5 dataset.
When comparing online degradation (e.g., stored video sequences) and offline degradation (e.g., live video sequences) types in the LIVE mobile dataset (Table 6), offline degradation had higher correlation. However, in online degradation there were only 10 video sequences with equal degradation type—two equally spaced frame drops that were 4 and 2 s long (3–7 and 13–15 s in each video sequence); thus, these correlation coefficients had lower confidence. Also, compared with Table 13, also with only an online degradation type, NR-FFM had in this case a higher correlation for the tested VQEG-HD5 dataset, probably due to the different degradation levels for each video sequence (4 video sequences), as well as more tested video sequences (28 video sequences overall).
NR-FFM measure was also tested using both the LIVE mobile (40 video sequences) and LIVE tablet study (20 video sequences) and was trained/tested using another dataset, showing similar fitting coefficients and correlations in both cases (trained on the LIVE mobile and tested on the LIVE tablet and vice versa).
Furthermore, we combined the proposed NR-FFM measure with some existing reduced-reference measures (RVQM and STRRED) and tested it using all video sequences from the LIVE mobile/tablet dataset. Results have shown that it is possible to obtain similar correlation for the overall combined objective measure using all five degradation types in the LIVE mobile dataset (Table 10 and Table 11).

7. Conclusions

In this paper we proposed a new NR-FFM measure for frame freezing degradations. It can be used only for this degradation type or in combination with other degradation types, not lowering their overall correlation (which we checked on RVQM and STRRED video quality measures in the LIVE mobile and tablet video databases).
Future research may be based on temporal information (or motion intensity) implementation in the proposed NR-FFM measure to obtain higher correlation with subjective grades for video sequences with both offline and online frame freezing. Alternatively, different formulae maybe defined for spatial and temporal information instead of the proposed formula in Equation (7). A larger dataset should also then be developed to be able to calculate correlations with higher confidence.

Author Contributions

Writing—original draft preparation, E.D.; investigation, E.D.; methodology, E.D. and A.B.; software, E.D.; validation, E.D. and A.B.; writing—review and editing, A.B.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. ITU-R. BT.500-13 Methodology for the Subjective Assessment of the Quality of Television Pictures; International Telecommunication Union/ITU Radiocommunication Sector: Geneva, Switzerland, 2012. [Google Scholar]
  2. Video Quality Experts Group. Final Report from the Video Quality Experts Group on the Validation of Objective Models of Multimedia Quality. 2008. Available online: http://www.vqeg.org/ (accessed on 14 October 2019).
  3. Bjelopera, A.; Dumic, E.; Grgic, S. Evaluation of Blur and Gaussian Noise Degradation in Images Using Statistical Model of Natural Scene and Perceptual Image Quality Measure. Radioengineering 2017, 26, 930–937. [Google Scholar] [CrossRef]
  4. Chikkerur, S.; Sundaram, V.; Reisslein, M.; Karam, L.J. Objective video quality assessment methods: A classification, review, and performance comparison. IEEE Trans. Broadcast. 2011, 57, 165–182. [Google Scholar] [CrossRef]
  5. Loncaric, M.; Tralic, D.; Brzica, M.; Vukovic, J.; Lovrinic, J.; Dumic, E.; Grgic, S. Testing picture quality in HDTV systems. In Proceedings of the 50th International Symposium ELMAR, Zadar, Croatia, 10–12 September 2008; pp. 5–8. [Google Scholar]
  6. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  7. Chandler, D.M.; Hemami, S.S. VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images. IEEE Trans. Image Process. 2007, 16, 2284–2298. [Google Scholar] [CrossRef] [PubMed]
  8. Haque, M.I.; Qadri, M.T.; Siddiqui, N. Reduced Reference Blockiness and Blurriness Meter for Image Quality Assessment. Imaging Sci. J. 2015, 63, 296–302. [Google Scholar] [CrossRef]
  9. Seshadrinathan, K.; Bovik, A.C. Motion Tuned Spatio-temporal Quality Assessment of Natural Videos. IEEE Trans. Image Process. 2010, 19, 335–350. [Google Scholar] [CrossRef] [PubMed]
  10. Pinson, M.H.; Wolf, S. A new standardized method for objectively measuring video quality. IEEE Trans. Broadcast. 2004, 50, 312–322. [Google Scholar] [CrossRef]
  11. Soundararajan, R.; Bovik, A.C. Video Quality Assessment by Reduced Reference Spatio-temporal Entropic Differencing. IEEE Trans. Image Process. 2013, 23, 684–694. [Google Scholar]
  12. Dumic, E.; Grgic, S. Reduced Video Quality Measure Based on 3D Steerable Wavelet Transform and Modified Structural Similarity Index. In Proceedings of the 55th International Symposium ELMAR, Zadar, Croatia, 25–27 September 2013; pp. 65–69. [Google Scholar]
  13. Wolf, S.M.; Pinson, H. Video Quality Model for Variable Frame Delay (VQM_VFD); The National Telecommunications and Information Administration: Boulder, CO, USA, 2011. [Google Scholar]
  14. Qi, Y.; Dai, M. The Effect of Frame Freezing and Frame Skipping on Video Quality. In Proceedings of the International Conference on Intelligent Information Hiding and Multimedia, Pasadena, CA, USA, 18–20 December 2006; pp. 423–426. [Google Scholar]
  15. Huynh-Thu, Q.; Ghanbari, M. No-reference temporal quality metric for video impaired by frame freezing artefacts. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–11 November 2009; pp. 2221–2224. [Google Scholar]
  16. You, J.; Hannuksela, M.M.; Gabbouj, M. An objective video quality metric based on spatiotemporal distortion. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–11 November 2009; pp. 2229–2232. [Google Scholar]
  17. Pinson, M.H.; Choi, L.K.; Bovik, A.C. Temporal Video Quality Model Accounting for Variable Frame Delay Distortions. IEEE Trans. Broadcast. 2014, 60, 637–649. [Google Scholar] [CrossRef]
  18. Moorthy, A.K.; Choi, L.K.; Bovik, A.C.; de Veciana, G. Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies. IEEE J. Sel. Top. Signal Process. 2012, 6, 652–671. [Google Scholar] [CrossRef]
  19. Borer, S. A model of jerkiness for temporal impairments in video transmission. In Proceedings of the 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX), Trondheim, Norway, 21–23 June 2010; pp. 218–223. [Google Scholar]
  20. Wolf, S. A no reference (NR) and reduced reference (RR) metric for detecting dropped video frames. In Proceedings of the Fourth International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), Scottsdale, AZ, USA, 15–16 January 2009; pp. 1–6. [Google Scholar]
  21. Usman, M.A.; Usman, M.R.; Shin, S.Y. A no reference method for detection of dropped video frames in live video streaming. In Proceedings of the Eighth International Conference on Ubiquitous and Future Networks (ICUFN), Vienna, Austria, 5–8 July 2016; pp. 839–844. [Google Scholar]
  22. Usman, M.A.; Shin, S.Y.; Shahid, M.; Lövström, B. A no reference video quality metric based on jerkiness estimation focusing on multiple frame freezing in video streaming. IETE Tech. Rev. 2017, 34, 309–320. [Google Scholar] [CrossRef]
  23. Usman, M.A.; Usman, M.R.; Shin, S.Y. A novel no-reference metric for estimating the impact of frame freezing artifacts on perceptual quality of streamed videos. IEEE Trans. Multimed. 2018, 20, 2344–2359. [Google Scholar] [CrossRef]
  24. Usman, M.A.; Usman, M.R.; Shin, S.Y. The impact of temporal impairment on quality of experience (QOE) in Video streaming: A no reference (NR) subjective and objective study. Int. J. Comput. Electr. Autom. Control Inf. Eng. 2015, 9, 1570–1577. [Google Scholar]
  25. Xue, Y.; Erkin, B.; Wang, Y. A novel no-reference video quality metric for evaluating temporal jerkiness due to frame freezing. IEEE Trans. Multimed. 2014, 17, 134–139. [Google Scholar] [CrossRef]
  26. Babić, D.; Stefanović, D.; Vranješ, M.; Herceg, M. Real-time no-reference histogram-based freezing artifact detection algorithm for UHD videos. Multimed. Tools Appl. 2019, 1–23. [Google Scholar] [CrossRef]
  27. Grbić, R.; Stefanović, D.; Vranješ, M.; Herceg, M. Real-time video freezing detection for 4K UHD videos. J. Real-Time Image Process. 2019, 1–15. [Google Scholar]
  28. VQEG Final Report of HDTV Validation Test, 2010 VQEG. Available online: https://www.its.bldrdoc.gov/vqeg/projects/hdtv/hdtv.aspx (accessed on 14 October 2019).
  29. The Consumer Digital Video Library. Available online: www.cdvl.org (accessed on 14 October 2019).
  30. ITU-T. P.910: Subjective Video Quality Assessment Methods for Multimedia Applications; International Telecommunication Union: Geneva, Switzerland, 1999. [Google Scholar]
  31. Miettinen, K.; Neittaanmäki, P.; Mäkelä, M.M.; Périaux, J. Evolutionary Algorithms in Engineering and Computer Science; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 1999. [Google Scholar]
  32. Hauke, J.; Kossowski, T. Comparison of Values of Pearson’s and Spearman’s Correlation Coefficient on the Same Sets of Data. In Proceedings of the MAT TRIAD 2007 Conference, Birmingham, UK, 13–15 September 2007; pp. 87–93. [Google Scholar]
  33. David Garson, G. Correlation (Statistical Associates “Blue Book” Series Book 3); Statistical Associates Publishing: Asheboro, NC, USA, 2013. [Google Scholar]
  34. Sheikh, H.R. Image Quality Assessment Using Natural Scene Statistics. Ph.D. Thesis, University of Texas at Austin, Austin, TX, USA, 2004. [Google Scholar]
  35. Larson, E.C.; Chandler, D.M. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J. Electron. Imaging 2010, 19. [Google Scholar]
  36. Chenouard, N.; Unser, M. 3D Steerable Wavelets in practice. IEEE Trans. Image Process. 2012, 21, 4522–4533. [Google Scholar] [CrossRef] [PubMed]
  37. Soundararajan, R.; Bovik, A.C. RRED indices: Reduced reference entropic differencing for image quality assessment. IEEE Trans. Image Process. 2012, 21, 517–526. [Google Scholar] [CrossRef] [PubMed]
  38. Wolf, S. A No Reference (NR) and Reduced Reference (RR) Metric for Detecting Dropped Video Frames, NTIA Technical Memorandum TM-09-456, October 2008. Available online: https://www.its.bldrdoc.gov/publications/details.aspx?pub=2493 (accessed on 14 October 2019).
  39. CVQM Source Code for Matlab. Available online: https://www.its.bldrdoc.gov/resources/video-quality-research/guides-and-tutorials/description-of-vqm-tools.aspx (accessed on 14 October 2019).

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.