Next Article in Journal
Prediction of Air Quality Combining Wavelet Transform, DCCA Correlation Analysis and LSTM Model
Next Article in Special Issue
Lightweight Tennis Ball Detection Algorithm Based on Robomaster EP
Previous Article in Journal
Evaluation of Body Composition Changes by Bioelectrical Impedance Vector Analysis in Volleyball Athletes Following Mediterranean Diet Recommendations during Italian Championship: A Pilot Study
Previous Article in Special Issue
Automatic Ship Object Detection Model Based on YOLOv4 with Transformer Mechanism in Remote Sensing Images
 
 
Article
Peer-Review Record

Neural Network-Based Reference Block Quality Enhancement for Motion Compensation Prediction

Appl. Sci. 2023, 13(5), 2795; https://doi.org/10.3390/app13052795
by Yanhan Chu, Hui Yuan *, Shiqi Jiang and Congrui Fu
Reviewer 1:
Reviewer 2:
Appl. Sci. 2023, 13(5), 2795; https://doi.org/10.3390/app13052795
Submission received: 19 December 2022 / Revised: 19 February 2023 / Accepted: 20 February 2023 / Published: 22 February 2023
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)

Round 1

Reviewer 1 Report

The manuscript documents a lot of work that has been done; however, in the end, it is not evident that the improvement obtained is significant and hence worthy of archival publication.  My specific comments are as follows:

1.  The 1.35% improvement in BD rate on the average does not appear significant in my experience.  The PSNR vs rate plots also seem to show insignificant improvements.  Some comments concerning visual tests would be helpful here and also some SSIM values and showing values of SSIM that are significantly improved.

2.  Is there a theoretical expectation that this approach can yield visually significant improvements?  This approach looks like just throwing deep learning at another isolated problem.

3.  What is the additional workload required of the approach both off line and in implementation?

4.  The application of the approach is stated to be for low resolution images and the main improvement appears to be for antialiasing and noise reduction.  What happens if the proposed module is used for higher resolutions?  Can it degrade the quality?

5.  As I understand it, the PSNR values and BD rate are evaluated only for the first 32 frames of the video sequence?  If this is true, more video frames in each sequence need to be processed since motion compensation can track off without intra correction.

In summary, if the manuscript is to be archival, more visual improvement needs to be demonstrated.

Author Response

Response to Reviewer1’ Comments

Manuscript:

Title: Neural Network-based Reference Block Quality Enhancement for Motion Compensation Prediction

Authors: Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

Dear Editor,

We would like to thank you for the constructive comments and valuable suggestions. We have made all the necessary amendments in the revised manuscript. In the following, we give a point-by-point reply to the comments.

Sincerely

Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

 

Point 1: The 1.35% improvement in BD rate on the average does not appear significant in my experience.  The PSNR vs rate plots also seem to show insignificant improvements.  Some comments concerning visual tests would be helpful here and also some SSIM values and showing values of SSIM that are significantly improved.

Response 1: Thank you for the suggestion.

H.266/VVC is currently the most efficient video coding standard. Although many researchers are committed to improving the performance of H.266/VVC, Nowadays, for a specific coding tool, it is very difficult to achieve a large coding efficiency compared to H.266/VVC common test condition (except for the neural network-based post-processing filter). In the recent MPEG-JVET meetings, a specific adopted coding tool can only achieve about 0.3-2% BD-rate reduction. From the recent published papers, we can see that the method [1] achieves a BD-rate reduction of 1.4 % compared with H.266/VVC, the method [2] achieves 1.77 % BD rate reduction compared with the H.266/VVC. In [3], 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay configurations, respectively. However, this method disables MHIntra, SBT, MMVD, SMVD, IMV, SubPuMvp, TMVPMode, which degrades the performance of the standard method. The method in [4] achieves a BD rate reduction of 0.58% compared to H.266/VVC under LDP configuration. In contrast, the performance of our method is not reasonable.

To verify the performance of the performance in terms of perceptual quality, we used SSIM quality metric and added the comparison of BD-SSIM in Table 1 and Fig. 9.

[1] Galpin, F.; Bordes, P.; Dumas, T.; Nikitin, P.; Le Leannec, F. Neural Network based Inter bi-prediction Blending. In Proceedings of 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 5-8 Dec 2021; pp. 1-5.

[2]Jin, D.; Lei, J.; Peng, B.; Li, W.; Ling, N.; Huang, Q. Deep Affine Motion Compensation Network for Inter Prediction in VVC. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 3923-3933.

[3] Murn, L.; Blasi, S.; Smeaton, A. F.; Mrak, M. Improved CNN-Based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding. IEEE Open Journal of Signal Processing 2021, 2, 453--465.

[4] Katayama, T.; Song, T.; Shimamoto, T.; Jiang, X. Reference Frame Generation Algorithm using Dynamical Learning PredNet for VVC. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 10-12 Jan 2021; pp. 1-5.

Point 2: Is there a theoretical expectation that this approach can yield visually significant improvements?  This approach looks like just throwing deep learning at another isolated problem.

Response 2: Thank you for the comment.

Yes. We analysis the process of video coding, the reconstruction quality of the current frame has a great relationship with the quality of the reference CU. If the reference CU is too different from the current CU, or the quality of the reference CU is too poor, the reconstruction quality of the current CU would be decreased. A large number of experiments have proved that the neural network has a significant effect on improving image quality. Therefore, we use neural network to enhance the reference CU, and want to make the reference CU closer to the lossless current CU, and thus reduce the bitstream and improve the quality of the video.

 

Point 3: What is the additional workload required of the approach both off line and in implementation?

Response 3: Thank you for the suggestion. The network contains 738.43 K parameters. In pratical application, the NVIDIA GeForce GTX 1080 GPU is also needed. As the proposed method need to process all the possible CU candidates, the coding complexity is very high, as shown in Table 2. The compelity is really a problem for all the neural network-based coding methods. In the future, we will investigate lightweight networks to reduce the coding compelxity while preserving the coding efficiency.

 

Point 4: The application of the approach is stated to be for low resolution images and the main improvement appears to be for antialiasing and noise reduction.  What happens if the proposed module is used for higher resolutions?  Can it degrade the quality?

Response 4: Thank you for the comment. It is because the coding complexity is so high that we did not test the proposed on large resolution videos. As the proposed method can achieve better performance for large CUs, we believe the performance for large resolution videos should be better. For example, we tested the BQmall video with a resolution of 832 × 480, which brings a BD-rate reduction of -1.71% for luma component.

 

Point 5: As I understand it, the PSNR values and BD rate are evaluated only for the first 32 frames of the video sequence?  If this is true, more video frames in each sequence need to be processed since motion compensation can track off without intra correction.

Response 5: Thank you for the comment. Yes, due to the enhance the quality of each CU block, the complexity of the method is very high, we only tested the first 32 frames of the video sequence. We should also mention that a long video sequence is first divided into a lot of intra period (usually 24 or 32 frames) which is further seperated into a lot of group of pictures (GOPs). Therefore, it it reasonable to only test the first 32 frames for a video sequence. Similar studies could also be found in [5] (tested the first 5 frames), [3] [6] (tested 32 frames), [7] (tested 33 frames), [8] (tested 64 frames).

 

[5] Pham, C. D.; Zhou, J. Deep Learning-Based Luma and Chroma Fractional Interpolation in Video Coding. IEEE Access 2019, 7, 112535–112543.

[6] Murn, L.; Blasi, S.; Smeaton, A. F.; O’Connor, N. E.; Mrak, M. Interpreting CNN For Low Complexity Learned Sub-Pixel Motion Compensation In Video Coding. In Proceedings of IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25-28 October 2020; pp. 798-802.

[7] Yan, N.; Liu, D.; Li, H.; Li, B.; Li, L.;Wu, F. Invertibility-driven Interpolation Filter for Video Coding. IEEE Trans. Image Process. 2019, 28, 4912—4925.

[8] Huo, S.; Liu, D.;Wu, F.; Li, H. Convolutional Neural Network-Based Motion Compensation Refinement for Video Coding. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27-30 May 2018; pp. 1-4.

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

The authors should clarify some of the background explanations of the problem, possibly by using more illustrations. Some existing Figures need clarification, for example Fig. 5 is a bit confusing (the terms "predicted CU" and "reference CU" don't seem to be adequately explained).

The results should be put into a better context. For example, the BD-reduction seem to be only average (compared to the other existing methods). Therefore, this result can be put into a better context for example by comparing computational speed, codec complexities and compatibilities issues (by introducing a new module), etc.

English needs extensive checking.

Author Response

Response to Reviewer2’ Comments

Manuscript:

Title: Neural Network-based Reference Block Quality Enhancement for Motion Compensation Prediction

Authors: Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

 

Dear Editor,

We would like to thank you for the constructive comments and valuable suggestions. We have made all the necessary amendments in the revised manuscript. In the following, we give a point-by-point reply to the comments.

Sincerely

Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

 

Point 1: The authors should clarify some of the background explanations of the problem, possibly by using more illustrations. Some existing Figures need clarification, for example Fig. 5 is a bit confusing (the terms "predicted CU" and "reference CU" don't seem to be adequately explained).

Response 1: Thank you for the suggestion.    

Yes. To clarify the background problem, we follow your suggestion to add Fig 1 and Fig 2 to explain ME and MC in inter prediction. At the same time, we modified the relevant explanation of Fig 9 (Fig 5) to make it clearer.

 

Point 2: The results should be put into a better context. For example, the BD-reduction seem to be only average (compared to the other existing methods). Therefore, this result can be put into a better context for example by comparing computational speed, codec complexities and compatibilities issues (by introducing a new module), etc.

Response 2: Thank you for the comment. To verify the performance of the performance in terms of perceptual quality, we used SSIM quality metric and added the comparison of BD-SSIM in Table 1 and Fig. 9.

The network contains 738.43 K parameters. In pratical application, the NVIDIA GeForce GTX 1080 GPU is also needed. As the proposed method need to process all the possible CU candidates, the coding complexity is very high, as shown in Table 2. The compelity is really a problem for all the neural network-based coding methods. In the future, we will investigate lightweight networks to reduce the coding compelxity while preserving the coding efficiency.

Point 3: English needs extensive checking.

Response 3: Thank you for the suggestion.

We carefully correct the grammatical errors and improve the English expression in the manuscript, especially in the background introduction and result analysis.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I appreciate the authors' honest and clear responses to my comments.  My question concerning a theoretical reason for expecting improvement was answered with a motivational response.  I don't see a theoretical justification but that is okay in today's "just process more data" world.  I saw a couple of typos that need to be corrected:

In Fig. 3, the block diagram has "motion compendation" that should be compensation.  In the Conclusions, an "e" is left off the end of "demonstrate."  I think a careful proofreading is needed to find and correct any other such errors.

In the end, the complexity is high and the performance gains are less than modest in terms of PSNR, SSIM, and visual viewing.  I am glad the authors did not try to overstate this in the Conclusions.

Author Response

Response to Reviewer1’ Comments

Manuscript: applsci-2139084

Title: Neural Network-based Reference Block Quality Enhancement for Motion Compensation Prediction

Authors: Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

 

Comment

I appreciate the authors' honest and clear responses to my comments.  My question concerning a theoretical reason for expecting improvement was answered with a motivational response.  I don't see a theoretical justification but that is okay in today's "just process more data" world.  I saw a couple of typos that need to be corrected:

In Fig. 3, the block diagram has "motion compendation" that should be compensation.  In the Conclusions, an "e" is left off the end of "demonstrate."  I think a careful proofreading is needed to find and correct any other such errors.

In the end, the complexity is high and the performance gains are less than modest in terms of PSNR, SSIM, and visual viewing.  I am glad the authors did not try to overstate this in the Conclusions.

Response:

We would like to thank you for your comments. In response to your suggestion, we corrected the spelling mistake. Then we carefully checked the manuscript and made corrections for expression and grammatical errors. We hope meet with your approval.

Sincerely

Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

Author Response File: Author Response.docx

Reviewer 2 Report

The authors has incorporated most of this reviewer's suggestion, and the paper is -- in this reviewer's opinion -- improved. The introduction section shows most significant improvement and could better ease the reader into further technical details. The conclusions and future works outlook has been improved, although a stronger, clearer and more concise conclusions to put the proposed method into better context (compared to the contemporaries) would make the paper stronger.

Author Response

Response to Reviewer2’ Comments

Manuscript: applsci-2139084

Title: Neural Network-based Reference Block Quality Enhancement for Motion Compensation Prediction

Authors: Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu

 

Comment:

The authors has incorporated most of this reviewer's suggestion, and the paper is -- in this reviewer's opinion -- improved. The introduction section shows most significant improvement and could better ease the reader into further technical details. The conclusions and future works outlook has been improved, although a stronger, clearer and more concise conclusions to put the proposed method into better context (compared to the contemporaries) would make the paper stronger.

 

Response:

We would like to thank you for your comments. We optimized the conclusions and future work to make the paper stronger. We hope meet with your approval.

Sincerely

Yanhan Chu, Hui Yuan* , Shiqi Jiang, and Congrui Fu.

Author Response File: Author Response.docx

Back to TopTop