Hierarchical-P Reference Picture Selection Based Error Resilient Video Coding Framework for High Efficiency Video Coding Transmission Applications

In this paper, a new reference picture selection (RPS) is proposed for a high efficiency video coding (HEVC) framework. In recent studies, HEVC has been shown to be sensitive to packet error which is unavoidable in transmission applications especially for wireless networks. RPS is an effective error resilient technique for video transmission systems where a feedback channel with short round trip delay time is available. However, its procedure cannot directly apply to the HEVC framework and thus this paper expands it. In RPS, error propagation can still happen during round trip delay time. To alleviate the effect of error propagation for better quality, the proposed algorithm considers both the RPS technique and the region-based intra mode selection method by using some novel features of HEVC. Experimental results demonstrate that the proposed method outperforms the hierarchical-P RPS algorithm in terms of PSNR and other metrics. The average PSNR improvement of the proposed algorithm over the reference algorithm under 10% packet error rate is 1.56 dB for 1080p sequences, 2.32 dB for 720p sequences and 1.01 dB for wide video graphics array (WVGA) sequences, respectively. The performance of proposed method is also tested for applications where feedback information is not available. The proposed method shows noticeable improvement for video sequences that contain low or moderate level of motions.


Introduction
Video transmission over mobile devices has considerably increased during the past years.Due to bandwidth limitations and network packet errors, providing acceptable quality of video to mobile users is a challenging task.The high compression ratio of high efficiency video coding (HEVC) [1] helps for reducing the amount of the network traffic load and mitigates bandwidth demand.However, according to recent studies about the error robustness of HEVC in loss prone networks [2][3][4], HEVC encoded bit streams are very sensitive to packet errors and the quality of the decoded video is unacceptable for packet loss rates higher than 1%.Therefore, an effective error resilience framework is essential for HEVC video transmission.
The work in Reference [5] uses discrete wavelet transform based coding approach and achieves significant gain at low bit rates compare to HEVC.This method can reduce the output bit rate lower than that of HEVC at the same visual quality.But for applications under the loss prone environment, this method requires further investigation for error robustness.The major video coding standards like The feedback signal can be either a negative acknowledgement (NACK) signal or an acknowledgement (ACK) signal.For ACK-based RPS, the encoder will not receive an acknowledgement (ACK) signal if a frame is lost or corrupted.Then, the encoder will select the last correctly received frame as the reference frame so that the effect from error propagation can be suppressed.The operation of NACK-based RPS is similar with the operation of ACK-based RPS.The RPS method can effectively stop error propagation and improve quality by stopping temporal dependencies between frames.The proxy-based reference picture selection method for mobile video telephony scenario was proposed in Reference [10].In that method, the adaptive reference selection method using H.264/AVC was proposed for wireless uplink transmission which is an extended version of the NEWPRED method with extensions on slice level reference selection.Liu et al. proposed an RPS method for H.264/AVC by using a long-term reference picture [11].
However, previous studies on RPS mainly focused on the IPPP coding structure because this structure is suitable for constrained bit rate applications that require ultra-low delay and/or low complexity.But HEVC uses a hierarchical coding structure to achieve temporal scalability and high compression efficiency [12].Moreover, HEVC introduces a new reference picture management concept called the reference picture set which is different from the sliding window process and memory management control operation (MMCO) of H.264 [13].Thus, the above-mentioned RPS techniques cannot immediately be adopted for HEVC.We propose a new RPS algorithm for HEVC, called hierarchical-P RPS, by using a hierarchical coding structure and a reference picture set for low delay video conversational applications.Since the target applications have both bit rate constraint and delay constraint, the proposed algorithm is designed for a hierarchical-P coding structure.Although the hierarchical-B coding structure is well-known for random access applications, the hierarchical-P structure is more appropriate for low delay cases.The pros and cons of the hierarchical-P structure are well presented in Reference [14].
The hierarchical-P RPS method generates a control picture upon receiving a feedback signal from the decoder.Under certain packet loss conditions, there is a high potential that some portions of this control picture will be lost.Thus, error propagation will continue until the next control picture is received.If the feedback delay is long or the feedback signal is lost, there is a significant degradation in the quality of the decoded picture.To reduce this effect, the proposed method inserts intra-coded blocks in hierarchical-P RPS.This work has been partially presented in References [15][16][17] by using some initial results.But detail discussion of hierarchical-P RPS method for HEVC and the impact of feedback delay on the performance of the proposed algorithm have not yet been provided.The feedback signal can be either a negative acknowledgement (NACK) signal or an acknowledgement (ACK) signal.For ACK-based RPS, the encoder will not receive an acknowledgement (ACK) signal if a frame is lost or corrupted.Then, the encoder will select the last correctly received frame as the reference frame so that the effect from error propagation can be suppressed.The operation of NACK-based RPS is similar with the operation of ACK-based RPS.The RPS method can effectively stop error propagation and improve quality by stopping temporal dependencies between frames.The proxy-based reference picture selection method for mobile video telephony scenario was proposed in Reference [10].In that method, the adaptive reference selection method using H.264/AVC was proposed for wireless uplink transmission which is an extended version of the NEWPRED method with extensions on slice level reference selection.Liu et al. proposed an RPS method for H.264/AVC by using a long-term reference picture [11].
However, previous studies on RPS mainly focused on the IPPP coding structure because this structure is suitable for constrained bit rate applications that require ultra-low delay and/or low complexity.But HEVC uses a hierarchical coding structure to achieve temporal scalability and high compression efficiency [12].Moreover, HEVC introduces a new reference picture management concept called the reference picture set which is different from the sliding window process and memory management control operation (MMCO) of H.264 [13].Thus, the above-mentioned RPS techniques cannot immediately be adopted for HEVC.We propose a new RPS algorithm for HEVC, called hierarchical-P RPS, by using a hierarchical coding structure and a reference picture set for low delay video conversational applications.Since the target applications have both bit rate constraint and delay constraint, the proposed algorithm is designed for a hierarchical-P coding structure.Although the hierarchical-B coding structure is well-known for random access applications, the hierarchical-P structure is more appropriate for low delay cases.The pros and cons of the hierarchical-P structure are well presented in Reference [14].
The hierarchical-P RPS method generates a control picture upon receiving a feedback signal from the decoder.Under certain packet loss conditions, there is a high potential that some portions of this control picture will be lost.Thus, error propagation will continue until the next control picture is received.If the feedback delay is long or the feedback signal is lost, there is a significant degradation in the quality of the decoded picture.To reduce this effect, the proposed method inserts intra-coded blocks in hierarchical-P RPS.This work has been partially presented in References [15][16][17] by using some initial results.But detail discussion of hierarchical-P RPS method for HEVC and the impact of feedback delay on the performance of the proposed algorithm have not yet been provided.
In addition, there are some RPS methods that do not require feedback information.Checkerboard pattern reference picture selection method is one of these techniques [18].This method stored two reference pictures (default is four reference pictures for HEVC low delay P configuration) in the buffer.
The current frame to be encoded is divided into smaller blocks called largest coding units (LCUs).These LCUs are categorized into two groups based on the checkerboard pattern.The blocks in one group are encoded using only one reference picture while the blocks in another group are encoded using remaining reference picture.Thus, if one reference picture contains error, then it will only propagate to only one group.This method can reduce the prediction mismatch at the decoder due to frame loss.Our proposed method requires the packet loss information in order to select the members of reference picture set and to decide the coding mode of region of interest (ROI).For applications where feedback information is not available, our proposed method can still apply by using error estimation model.Gilbert Elliot Model [19], a simple packet loss estimation method, is used in this work to obtain the packet loss information.According to the estimated packet loss information, the proposed algorithm selects the reference pictures and the coding mode of ROI.The performance of the proposed algorithm is compared with that of checkerboard pattern RPS.
This paper is organized as follows.In Section 2, some important background is presented.The details of the proposed algorithm are discussed in Section 3. Section 4 provides the experimental results and Section 5 concludes this paper.

Background
In this section, some necessary backgrounds for this work such as two novel features of HEVC and the Lambda domain rate control method are briefly reviewed.

Novel Features of HEVC
The improvement in coding efficiency of HEVC is obtained by accumulating several small improvements from almost all parts of the encoder.In this section, two most important features related with this research are briefly reviewed.They are the coding tree structure and the reference picture set.

Coding Tree Structure
In the block-based hybrid video coding, each input video frame is firstly split into several blocks and each block is served as the basic coding unit for the whole encoding process.The size of basic unit is typically chosen as 16 × 16 pixels in almost all previous ITU-T and ISO/IEC video coding standards.However, the fixed block size of basic unit is difficult to adapt the various video contents at different resolutions.In order to get the required flexibility, HEVC introduces the coding tree unit (CTU) [1,13] that can be configure from 8 × 8 to 64 × 64.Depending on the content, the CTU can also be divided into four smaller blocks called coding units (CU).This splitting process repeats recursively till the block size reaches the minimum allowed CU size.After splitting process, each CU serves as the basic unit for the coding process.It should be noted that not every CU becomes the smallest sized CU because the splitting process uses rate-distortion optimization (RDO) to make the decision for partitioning.

Reference Picture Set of HEVC
The only information that required for controlling the reference pictures in H.264/AVC is about the changes in a decoded picture buffer (DPB).The DPB updating is carried out after encoding of a frame is completed.
The new reference picture set concept of HEVC adds the information about the entire reference picture list into each slice header [13].The DPB updating is carried out by extracting the information from slice header of a frame before decoding it.Thus, other information from earlier pictures in decoding order is not necessary.The usage of an incorrect reference picture can be avoided by using this concept and as a result, better error robustness is achieved.
When the hierarchical-P coding structure is used, each picture in a group of pictures (GOP) is encoded using a different value of the quantization parameter (QP) depending on the hierarchical level of the picture.Pictures at a higher hierarchical level use higher QP values.Hence, the pictures in the lowest hierarchical level have the highest quality and they are referred to as core frames in this paper.Other pictures are referred to as common frames.Since core frames have a better quality than common frames, they are usually kept in the DPB longer than common frames for using as reference pictures.

Lambda Domain Rate Control Method
The lambda domain rate control [20,21], which is adopted in the HEVC reference software (HM.15.0) [22], uses the Hyperbolic rate distortion model.This method includes bit allocation, quantization parameter computation and updating of model parameters.Bit allocation at the GOP level is computed based on the target bitrate and the current buffer status.This GOP level bit budget and hierarchical level of picture are used for calculating picture level bit budget.Then bit budget for the largest coding unit (LCU) of a frame is computed.For LCU level bit allocation, generated header bits are also taken into account.
After getting the bit budgets, bit per pixel (bpp) value for a given target bit rate is obtained.This value is then used for computing the slope of the rate-distortion curve (λ) by using (1) where α and β are two model parameters.Once the λ value is obtained, the QP can be computed by using (2).
After encoding a frame or a CTU, the corresponding parameters values are updated.For updating parameters, bpp real is computed from the actual generated bits and the actual λ value, λ real , are used.New parameters can be obtained by using (3) to (5) as follows.
where λ comp is the computed λ, α old is the previous α value and β old is the previous β value.δ α and δ β are two constant values.Detail implementation of the lambda domain rate control can be found in References [20,21].

Proposed Method
The overall block diagram of the proposed feedback-based error resilient framework for the HEVC video transmission system is shown in Figure 2. Before encoding a frame, the moving region (MR) extraction procedure generates MR-map.Then, the hierarchical-P RPS algorithm manages the reference picture set according to the feedback information.The MR-map is used in CTU-level coding mode decision and rate control.The CTU-level coding mode decision can further improve the quality of the video by selecting intra mode for some important blocks.Rate control is modified to exploit the MR-map and maintain the generated bits to meet the target bit rate while keeping a good quality for blocks in the MR.The overall block diagram of the proposed feedback-based error resilient framework for the HEVC video transmission system is shown in Figure 2. Before encoding a frame, the moving region (MR) extraction procedure generates MR-map.Then, the hierarchical-P RPS algorithm manages the reference picture set according to the feedback information.The MR-map is used in CTU-level coding mode decision and rate control.The CTU-level coding mode decision can further improve the quality of the video by selecting intra mode for some important blocks.Rate control is modified to exploit the MR-map and maintain the generated bits to meet the target bit rate while keeping a good quality for blocks in the MR.

Proposed Hierarchical-P RPS Algorithm
The hierarchical-P coding structure can be obtained by using "low-delay-P" configuration.Under common test conditions for HEVC [23], this configuration defines four pictures in a GOP and four active reference pictures (ref_pics_active).When using "low-delay-P" configuration, generally, three core frames and one immediately encoded previous frame serve as reference pictures for the current frame.The immediate previous frame can be either a core or a common frame.For the first few frames of the sequence, more than one common frame is used as a reference since the number of encoded core frames is less than three.The design of the proposed hierarchical-P RPS algorithm is based on this configuration of HEVC.Regarding the feedback channel, the ACK-based system was chosen because it provides better error robustness than the NACK-based system especially when the feedback channel has some errors [3,8].
The proposed hierarchical-P RPS algorithm has two steps: error handling and DPB updating.If an ACK signal is not detected, the index of error frame is firstly determined.Picture order count (POC) is used as the frame index.If the POC of a reference picture in DPB is the same as or higher than that of the error frame, this reference picture is considered as an unreliable picture.If the POC of the reference picture is same as POC of the error frame, this picture contains some errors and any prediction from this picture will cause error propagation.If the POC of the reference picture is higher than that of the error frame, that picture is encoded using the error frame as a reference and error propagation has already happened in this frame.Thus, these kinds of frames are removed from the DPB.This step is to stop temporal dependencies between the error frame and future encoded frames.
The next step includes reference picture selection and DPB updating.Once the error related reference frames are removed from the buffer, the error control process is completed.But the proposed algorithm always tries to keep the number of reference pictures in the reference picture list as close as possible to the ref_pics_active value from the configuration file.Thus, the encoder searches new possible reference pictures for current frame in the DPB.Both core and common

Proposed Hierarchical-P RPS Algorithm
The hierarchical-P coding structure can be obtained by using "low-delay-P" configuration.Under common test conditions for HEVC [23], this configuration defines four pictures in a GOP and four active reference pictures (ref_pics_active).When using "low-delay-P" configuration, generally, three core frames and one immediately encoded previous frame serve as reference pictures for the current frame.The immediate previous frame can be either a core or a common frame.For the first few frames of the sequence, more than one common frame is used as a reference since the number of encoded core frames is less than three.The design of the proposed hierarchical-P RPS algorithm is based on this configuration of HEVC.Regarding the feedback channel, the ACK-based system was chosen because it provides better error robustness than the NACK-based system especially when the feedback channel has some errors [3,8].
The proposed hierarchical-P RPS algorithm has two steps: error handling and DPB updating.If an ACK signal is not detected, the index of error frame is firstly determined.Picture order count (POC) is used as the frame index.If the POC of a reference picture in DPB is the same as or higher than that of the error frame, this reference picture is considered as an unreliable picture.If the POC of the reference picture is same as POC of the error frame, this picture contains some errors and any prediction from this picture will cause error propagation.If the POC of the reference picture is higher than that of the error frame, that picture is encoded using the error frame as a reference and error propagation has already happened in this frame.Thus, these kinds of frames are removed from the DPB.This step is to stop temporal dependencies between the error frame and future encoded frames.
The next step includes reference picture selection and DPB updating.Once the error related reference frames are removed from the buffer, the error control process is completed.But the proposed algorithm always tries to keep the number of reference pictures in the reference picture list as close as possible to the ref_pics_active value from the configuration file.Thus, the encoder searches new possible reference pictures for current frame in the DPB.Both core and common frames can be used as new reference picture but they must be available in the DPBs of both the encoder and the decoder.If no error is detected before encoding the current frame but the number of reference pictures in the DPB is less than the ref_pics_active, all reference pictures in the DPB are used for encoding the current frame.
To better explain the concept, an example from Figure 3 is used.Suppose the detected POC of the error frame is 8.The next frame to be encoded is frame 10 (POC 10).In error free condition, reference pictures kept in DPB for encoding POC 10 are: POC 9, POC 8, POC 4 and POC 0. Since frame 8 has an error, it is unreliable and needs to be removed from the DPB.Frame 9 is also unreliable because frame 8 is one of its reference frames.Therefore, frame 9 also needs to be removed from the DPB.
Hence, the DPB contains frame 4 and frame 0 only (reliable frames).It should be noted that the POCs of reliable frames, that is, POC 4 and POC 0, are less than the POC of the error frame which is POC 8. What the algorithm did here is just remove the reference frame from the DPB where its POC value is the same as or higher than the POC value of the error frame.If there is a reliable previous reference frame which is still available in both encoder and decoder, it will be added to the DPB for encoding frame 10.
reference pictures in the DPB is less than the ref_pics_active, all reference pictures in the DPB are used for encoding the current frame.
To better explain the concept, an example from Figure 3 is used.Suppose the detected POC of the error frame is 8.The next frame to be encoded is frame 10 (POC 10).In error free condition, reference pictures kept in DPB for encoding POC 10 are: POC 9, POC 8, POC 4 and POC 0. Since frame 8 has an error, it is unreliable and needs to be removed from the DPB.Frame 9 is also unreliable because frame 8 is one of its reference frames.Therefore, frame 9 also needs to be removed from the DPB.Hence, the DPB contains frame 4 and frame 0 only (reliable frames).It should be noted that the POCs of reliable frames, that is, POC 4 and POC 0, are less than the POC of the error frame which is POC 8. What the algorithm did here is just remove the reference frame from the DPB where its POC value is the same as or higher than the POC value of the error frame.If there is a reliable previous reference frame which is still available in both encoder and decoder, it will be added to the DPB for encoding frame 10.In this example, POC 7 is possible.Because POC 7 serves as a reference picture for encoding POC 9 and still remains in the buffer until the DPB update and RPS process for POC 10 is completed.If no error occurs, POC 7 will be removed during the DPB update process for POC 10.But in case of error, it is added to the reference picture list for encoding POC 10.So, the reference pictures for encoding POC 10 become POC 7, POC 4 and POC 0. Using POC 7 as a reference picture can be beneficial because the temporal distance between POC 7 and POC 10 is shorter than that for other reference pictures; there is a high probability it will contain more redundancies and can achieve better motion prediction.The detailed procedure of reference selection and DPB updating of the proposed hierarchical-P RPS is shown in Algorithm 1.In this example, POC 7 is possible.Because POC 7 serves as a reference picture for encoding POC 9 and still remains in the buffer until the DPB update and RPS process for POC 10 is completed.If no error occurs, POC 7 will be removed during the DPB update process for POC 10.But in case of error, it is added to the reference picture list for encoding POC 10.So, the reference pictures for encoding POC 10 become POC 7, POC 4 and POC 0. Using POC 7 as a reference picture can be beneficial because the temporal distance between POC 7 and POC 10 is shorter than that for other reference pictures; there is a high probability it will contain more redundancies and can achieve better motion prediction.The detailed procedure of reference selection and DPB updating of the proposed hierarchical-P RPS is shown in Algorithm 1. make this picture as active reference picture 13: process next frame

Proposed CTU-Level Coding Mode Decision
The hierarchical-P RPS algorithm is very responsive to handle error as it modified DPB based on the error status.However, error propagation can still happen in some situations such as long back channel delay or error occurs in the control picture.
To reduce this effect and to improve the quality, the ROI based intra mode selection method is proposed.MR information, which serves as ROI in this research, is extracted by using the frame differencing method as discussed in References [24,25].If an error is detected, the coding mode for CTUs in the MR region is set as intra mode.However, using large amount of intra-coded blocks can cause bit fluctuation and can affect the quality of the reconstructed video.The decision-making parameter for intra mode selection is the temporal distance between the current frame and the last intra refresh frame (lidst).Since the GOP size of hierarchical coding structure with three temporal layers is four, the minimum lidst value for the coding mode (CM) decision is set as 4-frame distance in this work.The idea is that each GOP contains one intra refresh frame under the worst-case condition.Hence, intra mode is selected only when lidst is 4-frame distance or higher and an error message is received.CM of each CTU can be determined by using (6).
where, p is the p th LCU in a frame.

Proposed Rate Control Scheme
Under the packet loss condition, the quality of MR is protected by using intra coding mode because MR region gets more attention than non-MR region.Since intra mode introduces a large number of output bits, rate control algorithm is required to adjust its parameters in order to meet the target bit rate.The lambda domain rate control proposed in References [20,21] considers both frame level and slice level intra mode selection cases.But the blocks in MR normally exit across multiple slices in a frame.Hence, a slice may contain few blocks in MR and these blocks are required to encode with intra mode while the others are still using inter mode.For such condition, block level intra selection is needed and this paper extends the lambda domain rate control to meet the requirement.
The bit budget of the current frame, T CurrPic , can be considered as the combination of the bit budget of MR, T MR and the bit budget of non-MR, T NMR , as shown in (7).
From the frame level bit allocation process, T CurrPic is obtained.Since the total number of pixels in the whole frame, N Pic and the number of pixels in MR, N MR , are known from the MR-map information, T MR can be computed by using (8).
Both T MR and T NMR are then calculated.The T MR is for inter mode and a new T MR is required for intra coding.The bit refinement process is carried out by using computed T MR as an input.Then bpp value of each region is obtained.
To get the QP of MR, α and β values assigned for the intra frame are used.On the other hand, α and β values assigned for the current frame are still used for computing the QP of non-MR.After encoding the entire frame, parameter updating process is carried out for each region separately.Based on the actual generated bits of MR and non-MR, bpp real_MR and bpp real_NMR are computed.λ comp_MR is then computed by using bpp real_MR , α old_MR and β old_MR , as in (9).New parameter values for MR are then computed by (10) and (11).These values are used for updating α and β values of the intra frame.A similar procedure is used for non-MR and new parameter values are used for updating α and β values of the inter frame that has the same hierarchical level as that of the current frame.
If MR is encoded with inter mode, a single bpp is computed by using T CurrPic .QP is calculated by using α and β values assigned for the current frame.After encoding, the normal parameters updating process is applied by using (3) to (5).

Summary of the Proposed Error Resilient Algorithm
Algorithm 2 summarizes the proposed algorithm.Based on feedback information and the MR-map, the proposed algorithm combines the hierarchical-P RPS and CTU-level coding mode decision to mitigate network errors.The region-based rate control also enhances the quality of important region while maintaining the desired target bit rate.

Experimental Results and Discussion
The proposed algorithm is primarily designed for systems in which feedback information is available; however, it is also possible to apply in non-feedback system with the help of a packet loss estimation model or distortion model at the encoder.Thus, in the experiments, the proposed algorithm is tested under both conditions.All the sequences used in the experiments are standard HEVC test sequences.Three different resolutions are included in the experiments.They are WVGA resolution (832 × 480), 1080p resolution (1920 × 1080) and 720p resolution (1280 × 720).
Since the target application is delay sensitive video transmission over wireless networks such as Wi-Fi network or mobile network, all experiments were carried out by using the "low-delay-P" configurations of HEVC.The reason is that the "random-access" and the "low-delay-main" configurations require more computation time than the "low-delay-P" configuration.Each row of LCUs is composed as a slice in the experiments.In order to do so, slice mode 1 that is a slice structure based on the number of blocks per slice is used.For 64 × 64 LCU size, slice argument value is set as 13 for WVGA sequences, 20 for 720p sequences and 30 for 1080p sequences, respectively.If smaller LCU size is chosen, the number of slices per frame is increased.This will increase the number of packets and also the amount of overhead data.But the effect of single packet loss is less severe than 64 × 64 LCU case because each packet contains fewer amounts of data than 64 × 64 LCU case.In the experiments, the GOP size is chosen as 4 frames.Changing the GOP size will have impact on the decoder refresh period which should be integer multiple of GOP size to make sure the decoder refresh frame is at the beginning of a GOP.Five different target bit rates for each video resolution, as recommended in Reference [26], are tested in the experiments.
For packet loss simulation, loss simulation software from [27] is hired.Three different packet loss rates, 3%, 5% and 10% are available with the software.Co-located block copying technique was added to the decoder for error concealment.For the video quality measurement, three different metrics are used namely Peak Signal to Noise Ratio for luminance component (Y-PSNR), Structural Similarity Index (SSIM) [28] and Feature Similarity Index (FSIM) [29].

Experimental Results for Feedback Available Case
In the experiments where feedback is available, the performance of the proposed algorithm is compared with that of the two configurations of the hierarchical-P RPS algorithm, HP_RPS_I and HP_RPS_II.In the HP_RPS_I configuration, only one intra-frame is included at the beginning of the sequence but in the HP_RPS_II configuration, an intra-coded frame was regularly inserted at every second.The strength and weakness of each HP_RPS configuration are also explored.Both hierarchical-P RPS and the proposed algorithm are implemented by using HM15.0.Twelve test sequences are involved in this experiment: three WVGA sequences, six 720p sequences and three 1080p sequences.Firstly, a 3-frame feedback delay under 10% PER case is explored.
In the HP_RPS_I configuration, the temporal distance between the reference frame and the current frame can be very long for some error conditions.This long temporal distance can affect encoder performance.In contrast, the HP_RPS_II configuration can reduce this temporal distance by using the last intra-coded frame as a reference frame.However, adding too many intra frames can also reduce encoder performance due to bit fluctuation.According to the experimental results, the HP_RPS_I configuration performed better than HP_RPS_II for most test cases.This is because HP_RPS_II introduces too many intra-coded blocks that consume a large portion of the available bit budget and as a result, large QP values are used to meet the bitrate constraint.Due to its better performance, HP_RPS_I was selected for comparing to the proposed algorithm.
Generally, for all test sequences, the proposed algorithm achieved better PSNR, SSIM and FSIM values than HP_RPS_I and HP_RPS_II.The detail results of all test sequences for five different target bit rates are shown in Tables 1 and 2. The FSIM metric computes by using luminance values of input videos whereas the FSIMc metric takes into account both luminance and chrominance values.
For 1080p sequences, the average PSNR improvement of the proposed algorithm over the HP_RPS_I configuration of hierarchical-P RPS is 1.56 dB.The minimum improvement is 0.43 dB and the maximum one is 3.24 dB.For 720p sequences, the proposed algorithm achieves an average PSNR improvement about 2.32 dB over HP_RPS_I.For WVGA sequences, the average PSNR improvement of the proposed algorithm is 1.01 dB.Rate-distortion curves of selected sequences are shown in Figure 4.
In terms of average SSIM improvement, the proposed algorithm achieves 5% improvement for 1080p sequences, 2% for 720p sequences and 4% for WVGA sequences, respectively.Rate-SSIM curves of selected sequences are shown in Figure 5.  that of proposed system.For PartyScene and BQMall sequences, above 1 dB improvement in PSNR is achieved for all bit rates except the highest one.Up to 5% SSIM improvement in BQMall sequence and 9% SSIM improvement for PartyScene sequence are obtained.In terms of FSIM, the average improvement of BQMall is about 1.56% whereas that of PartyScene is 0.86% respectively.These sequences include moving persons with moderate speed.It can be seen that the proposed algorithm performs quite well for that sort of sequences.
(a) (   From the view point of the FSIM metric, the average improvement of the proposed algorithm over HP_RPS_I is 1.9% for 1080p sequences.The maximum and minimum values are 2.84% and 0.29%, respectively.For 720p sequences, the maximum improvement is 3.09% whereas the minimum value is 0.09%.The average value for 720p resolution is 0.79% and that for WVGA resolution is 0.8%.The maximum and minimum improvement values for WVGA sequences are 2.66% and −0.02%, respectively.Rate-FSIM curves are described in Figure 6.For the BasketballDrive sequence that contains several running players and camera movement, the proposed algorithm achieves up to 3.24 dB PSNR improvement and 9% SSIM improvement over HP_RPS_I.In terms of FSIM, the improvement of the proposed algorithm is considerably higher than that of HP_RPS_I at low target bit rates.
For the BQTerrace sequence that contains several small moving objects, significant changes in background and camera movement, the PSNR improvement of the proposed algorithm is 2.35 dB but SSIM improvement varies from 1% to 9% based on target bit rate.Up to 2.84% FSIM improvement is achieved for this sequence.It should be noted that co-located block copying for error concealment greatly affects the quality of such kind of sequence.For Cactus sequence, the average PSNR and SSIM improvement are 1.55 dB and 3.5% respectively.This sequence has a still background and moving objects with moderate speed in the foreground.Noticeable improvements of PSNR, SSIM and FSIM can be seen especially at the lower bit rates for all 1080p sequences.For Johnny, Vidyo1, Vidyo3 and Vidyo4 sequences, the significant PSNR improvement is achieved for all target bit rates.The major contents in these videos are located around the middle region of the frame.The MR region extraction method used in the proposed algorithm set more weight values to the middle region.Thus, the major contents of the above sequences are well within the important region and as a result, significant improvements are obtained for these sequences.In terms of SSIM, up to 6% improvement is obtained for these sequences.The maximum FSIM improvement of these sequences is 3.09% and the minimum is 0.21%.For KristenAndSara sequence and FourPeople sequence, PSNR improvement is more noticeable at low bit rates but SSIM improvement is only about 1%.According to SSIM values, the performance of the proposed algorithm and that of HP_RPS_I are about the same for these two sequences since the major contents of these sequences are not only located within the middle region but also located in some areas which are closed to the boundary region and the motions of these contents are very low.Thus, MR region extraction method cannot extract some important regions of that sequences and the encoded quality of that regions when using the proposed algorithm is same as or lower than using HP_RPS_I.Therefore, the average improvement of the proposed algorithm over HP_RPS_I is small.The average FSIM improvement is about 0.37%.If the bit rate is increased, the performance of HP_RPS_I is closed to that of proposed system.For PartyScene and BQMall sequences, above 1 dB improvement in PSNR is achieved for all bit rates except the highest one.Up to 5% SSIM improvement in BQMall sequence and 9% SSIM improvement for PartyScene sequence are obtained.In terms of FSIM, the average improvement of BQMall is about 1.56% whereas that of PartyScene is 0.86% respectively.These sequences include moving persons with moderate speed.It can be seen that the proposed algorithm performs quite well for that sort of sequences.
For the BasketballDrillText sequence, which includes several running players and camera movements, both average PSNR improvement and average SSIM improvement are small.Only less than 1 dB improvement in PSNR and about 1% SSIM improvement are achieved for all bit rates.Average FSIM improvement is only 0.15% for this sequence.The MR region extraction method used the same block size as LCU for its computation.Since the BasketballDrillText sequence contains a lot of motions, the size of the MR region is quite large.The proposed algorithm inserts more intra coded blocks than other sequences.Consequently, inter coded blocks must use the highest QP value to meet the target bit rate, hence the overall performance is small for that sequence.Selected frames of the BasketballDrive sequence and the Vidyo1 sequence are shown in Figures 7 and 8, respectively.

Impact of Feedback Delay
Moreover, the impact of feedback delay is also explored by conducting additional experiments.For this experiment, six test sequences are used.These sequences include two 1080p sequences, two 720p sequences and two WVGA sequences.All sequences are encoded with three target bit rates by using same configuration as in the first experiment.Feedback delays used in this experiment are 3-frame delay, 4-frame delay and 5-frame delay.The detail results are presented in Table 3.According to the results, it can be seen that the longer the delay, the higher the quality degradation.When the delay is long, it is possible that all available reference pictures in the buffer are encoded using the prediction from the error frame.For that case, the coding mode for current frame must select intra mode because there is no reliable reference picture in the buffer.Frequent insertion of I-frame occurs in long feedback delay case.These intra frame cause bit fluctuation and quality drop.In the experiment, 5-frame delay of the proposed algorithm can produce less than 1 dB improvement in average PSNR for BasketballDrive, BQTerrance, BQMall and PartyScene sequences.For Vidyo4 and FourPeople sequences, the performance of the proposed algorithm is lower than that of HP_RPS_I especially at low bit rates.This is the effect of too much intra coded frames as discussed above.

Experimental Results for No Feedback Case
In the experiments where feedback is not available, the checkerboard pattern RPS (chkRPS) method is used as the reference method.Since the proposed method requires packet loss information, a simple packet loss estimation model called Gilbert Elliot model is used in this experiment.Only 10% PER case is considered in this work.The estimated packet error information used in encoding process has same packet error rate with the packet error trace file of network abstraction layer (NAL) unit loss software but the packet error locations of these two files are not identical.Nine test sequences (i.e., three WVGA sequences, three 720p sequences and three 1080p sequences) are encoded by using both proposed method and chkRPS method.Each sequence is encoded for five different target bitrates.Regular intra frame is added every one second for both methods.PSNR and FSIM metrics are used for performance evaluation.
According to the experimental results, proposed method shows higher performance than chkRPS for six test sequences out of nine sequences.For the remaining three sequences, chkRPS is better than the proposed algorithm.These three sequences are BasketballDrillText sequence, BasketballDrive sequence and Cactus sequence.
For BasketballDrillText sequence, chkRPS obtains an average PSNR improvement of 0.45dB over proposed method.In FSIM metric as shown in Figure 9f, the two lowest target bit rates still better than chkRPS about 0.25%.Similarly, for the BasketballDrive sequence, the two lowest target bit rates of the proposed algorithm are higher in FSIM than chkRPS about 0.9% as presented in Figure 9e.It is noted that the proposed algorithm is not suitable for sequences that contain fast moving objects if the estimated error location is not accurate enough.For Cactus sequence, chkRPS achieves about 0.6dB improvement over the proposed algorithm.This sequence has still background but rotating foreground objects.Similar to the above two sequences, if the contents of the video scene are moving with considerably fast pace and the error location estimation is not accurate, the performance of the proposed algorithm is not good.Again, in the Cactus sequence, the FSIM value of the lowest target bit rate is higher than that of chkRPS.Although the results of these three sequences are not good, it is still demonstrated that the proposed algorithm is more suitable for bit rate constraint applications.moving objects if the estimated error location is not accurate enough.For Cactus sequence, chkRPS achieves about 0.6dB improvement over the proposed algorithm.This sequence has still background but rotating foreground objects.Similar to the above two sequences, if the contents of the video scene are moving with considerably fast pace and the error location estimation is not accurate, the performance of the proposed algorithm is not good.Again, in the Cactus sequence, the FSIM value of the lowest target bit rate is higher than that of chkRPS.Although the results of these three sequences are not good, it is still demonstrated that the proposed algorithm is more suitable for bit rate constraint applications.For three 720p sequences, the proposed method outperforms the chkRPS method.The average PSNR improvement is about 0.62dB.With FSIM metric, the average improvement over chkRPS is 0.47%.The maximum value 1.05% and the minimum one is 0.04%.The lower the target bit rate, the For three 720p sequences, the proposed method outperforms the chkRPS method.The average PSNR improvement is about 0.62dB.With FSIM metric, the average improvement over chkRPS is 0.47%.The maximum value 1.05% and the minimum one is 0.04%.The lower the target bit rate, the larger the improvement value.The contents of these three videos are located mainly in the central region and only facial expressions and hand gestures are involved.The same background is appeared throughout the scene.Such kind of sequences is the most suitable one for the proposed algorithm.For BQMall and PartyScene sequences, the performance of proposed system is significantly better than that of chkRPS as shown in Figure 9a,b.For these two sequences, the minimum PSNR improvement is 0.28 dB and the maximum is 1.32 dB.In terms of FSIM, the minimum improvement value is 0.57% and the maximum is 2.3%.The BQMall sequence includes walking people with moderate speed and the camera movement whereas the PartyScene sequence includes still background with moving foreground objects.For BQTerrace sequence that contains slow camera movement and scene changes, proposed method is slightly better than chkRPS especially for low target bit rate.Table 4 summarizes the FSIM results of the proposed method and the chkRPS method for all test sequences in the experiment.According to the results, the proposed method is mainly suitable for sequences that contains moderate or few motions.For sequences that contain fast moving objects, chkRPS is more appropriate than the proposed method.

Conclusions
In this paper, a feedback-based error resilient framework for HEVC video transmission is presented.The noticeable performance of the proposed algorithm is achieved for both high resolution videos that contain significant motion and sequences that contain foreground objects in central region.When feedback is available, the proposed algorithm achieved 1.9% average FSIM improvement for 1080p sequences, 3.09% for 720p sequences and 0.8% for WVGA sequences, respectively.More than 1dB average PSNR improvement is achieved for all resolutions.In terms of SSIM, 2%, 4% and 5% average improvement are obtained for WVGA, 720p and 1080p resolutions, respectively.The accuracy and size of MR is one of the main factors for the performance of the proposed algorithm.All three quality metrics used in this paper demonstrate that the proposed algorithm outperforms the HP_RPS method for all test sequences under 10% PER.Similar results are obtained for lower packet error rates such as 5% PER and 3% PER.Feedback delay is another important factor that can affect the quality of decoded video.The impact of feedback delay is also explored in the experiment.For 3-frame and 4-frame delay cases, more than 1 dB average PSNR improvement is achieved.But for 5-frame delay, about 0.3 dB improvement is obtained.It is noticed that the feedback delay higher than 5-frame delay is not suitable for the proposed algorithm.
The proposed algorithm is also tested for system with no feedback channel.Better performance is obtained for sequences that include moderate level or low level motion objects.For the sequences with fast motion objects, the performance of the proposed algorithm is not as good as reference method.This is due to the accuracy of packet loss estimation model.If the encoder hires more advanced distortion model or packet error estimation model, the performance can be improved.
The experimental results from both feedback available case and feedback not available case showed that the proposed method is mainly suitable for low bit rate applications.Further improvement in perceived quality can be achieved if the decoder uses the advanced error concealment technique than the simple co-located block copying technique that employs in this paper.

Figure 1 .
Figure 1.Acknowledgement (ACK) based NEWPRED for a round trip delay of 2 frame intervals.

Figure 1 .
Figure 1.Acknowledgement (ACK) based NEWPRED for a round trip delay of 2 frame intervals.

Figure 2 .
Figure 2. Block diagram of proposed feedback-based error resilient video codec.

Figure 2 .
Figure 2. Block diagram of proposed feedback-based error resilient video codec.

Table 1 .
PSNR and SSIM Results Comparisons for 10% PER (feedback available case).