Next Article in Journal
Investigation into the Efficient Cooperative Planning Approach for Dual-Arm Picking Sequences of Dwarf, High-Density Safflowers
Previous Article in Journal
Motor Temperature Observer for Four-Mass Thermal Model Based Rolling Mills
Previous Article in Special Issue
Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Masked Feature Residual Coding for Neural Video Compression

1
School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
2
Samsung Seoul R&D Campus, Seoul 06765, Republic of Korea
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(14), 4460; https://doi.org/10.3390/s25144460
Submission received: 29 May 2025 / Revised: 9 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

Abstract

In neural video compression, an approximation of the target frame is predicted, and a mask is subsequently applied to it. Then, the masked predicted frame is subtracted from the target frame and fed into the encoder along with the conditional information. However, this structure has two limitations. First, in the pixel domain, even if the mask is perfectly predicted, the residuals cannot be significantly reduced. Second, reconstructed features with abundant temporal context information cannot be used as references for compressing the next frame. To address these problems, we propose Conditional Masked Feature Residual (CMFR) Coding. We extract features from the target frame and the predicted features using neural networks. Then, we predict the mask and subtract the masked predicted features from the target features. Thereafter, the difference is fed into the encoder with the conditional information. Moreover, to more effectively remove conditional information from the target frame, we introduce a Scaled Feature Fusion (SFF) module. In addition, we introduce a Motion Refiner to enhance the quality of the decoded optical flow. Experimental results show that our model achieves an 11.76% bit saving over the model without the proposed methods, averaged over all HEVC test sequences, demonstrating the effectiveness of the proposed methods.
Keywords: neural video compression; deep learning; Residual; mask; feature; conditional coding neural video compression; deep learning; Residual; mask; feature; conditional coding

Share and Cite

MDPI and ACS Style

Shin, C.; Kim, Y.; Choi, K.; Lee, S. Masked Feature Residual Coding for Neural Video Compression. Sensors 2025, 25, 4460. https://doi.org/10.3390/s25144460

AMA Style

Shin C, Kim Y, Choi K, Lee S. Masked Feature Residual Coding for Neural Video Compression. Sensors. 2025; 25(14):4460. https://doi.org/10.3390/s25144460

Chicago/Turabian Style

Shin, Chajin, Yonghwan Kim, KwangPyo Choi, and Sangyoun Lee. 2025. "Masked Feature Residual Coding for Neural Video Compression" Sensors 25, no. 14: 4460. https://doi.org/10.3390/s25144460

APA Style

Shin, C., Kim, Y., Choi, K., & Lee, S. (2025). Masked Feature Residual Coding for Neural Video Compression. Sensors, 25(14), 4460. https://doi.org/10.3390/s25144460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop