Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance
Abstract
1. Introduction
2. Related Work
3. Proposed Method
3.1. Shallow Feature Extraction
3.2. Deep Feature Extraction
3.2.1. Channel-Wise Self-Attention
3.2.2. Multi-Scale Block-Wise Spatial Self-Attention
3.2.3. Refined Patch-Wise Self-Attention
3.3. Image Reconstruction
3.4. Training and Testing Configuration
3.4.1. Training Dataset
3.4.2. Training Strategy
3.4.3. Experimental Setup
4. Results
4.1. Evaluation Process
4.1.1. Testing Dataset
4.1.2. Testing Strategy
4.1.3. Objective Quality Evaluation: PSNR-Based Analysis
4.1.4. Objective Quality Evaluation: SSIM-Based Perceptual Analysis
4.1.5. Qualitative Visual Quality Evaluation
4.2. Computational Complexity Analysis
5. Discussion
5.1. Ablation Studies
5.1.1. Reconsidering Patch-Wise Self-Attention
5.1.2. Effect of Patch-Wise Self-Attention on Structural Consistency
5.1.3. Attempting to Replace Patch-Wise Attention with Inter-Block Self-Attention
5.1.4. Visual Artifact Removal via Refined PWSA Retraining
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Uhrina, M.; Sevcik, L.; Bienik, J.; Smatanova, L. Performance Comparison of VVC, AV1, HEVC, and AVC for High Resolutions. Electronics 2024, 13, 953. [Google Scholar] [CrossRef]
- Chen, Y.; Murherjee, D.; Han, J.; Grange, A.; Xu, Y.; Liu, Z.; Parker, S.; Chen, C.; Su, H.; Joshi, U.; et al. An overview of core coding tools in the AV1 video code. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 41–45. [Google Scholar]
- Han, J.; Li, B.; Mukherjee, D.; Chiang, C.; Grange, A.; Chen, C.; Su, H.; Parker, S.; Deng, S.; Joshi, U.; et al. A Technical Overview of AV1. arXiv 2020, arXiv:2008.06091. [Google Scholar] [CrossRef]
- Gwun, W.; Choi, K.; Park, G.H. Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec. Mathematics 2024, 12, 2874. [Google Scholar] [CrossRef]
- Zhao, X.; Lei, Z.; Norkin, A.; Daede, T.; Tourapis, A. AOM Common Test Conditions v2.0. Alliance for Open Media, Codec Working Group. 2021. Available online: https://aomedia.org/docs/CWG-B075o_AV2_CTC_v2.pdf (accessed on 29 April 2025).
- Wang, Y.; Zhu, H.; Li, Y.; Chen, Z.; Liu, S. Dense Residual Convolutional Neural Network based In-Loop Filter for HEVC. In Proceedings of the 2018 IEEE Visual Communications and Image Processing (VCIP), Taichung, Taiwan, 9–12 December 2018; pp. 1–4. [Google Scholar]
- Chen, S.; Chen, Z.; Wang, Y.; Liu, S. In-Loop Filter with Dense Residual Convolutional Neural Network for VVC. In Proceedings of the 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020; pp. 149–152. [Google Scholar]
- Zhao, Y.; Lin, K.; Wang, S.; Ma, S. Joint Luma and Chroma Multi-Scale CNN In-loop Filter for Versatile Video Coding. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 27 May–1 June 2022; pp. 3205–3208. [Google Scholar]
- Kathariya, B.; Li, Z.; Van der Auwera, G. Joint Pixel and Frequency Feature Learning and Fusion via Channel-Wise Transformer for High-Efficiency Learned In-Loop Filter in VVC. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 4070–4082. [Google Scholar] [CrossRef]
- Guan, Z.; Xing, Q.; Xu, M.; Yang, R.; Liu, T.; Wang, Z. MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 949–962. [Google Scholar] [CrossRef]
- Lin, W.; He, X.; Han, X.; Liu, D.; See, J.; Zou, J.; Xiong, H.; Wu, F. Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC. IEEE Trans. Multimed. 2020, 22, 2749–2763. [Google Scholar] [CrossRef]
- Zhang, F.; Feng, C.; Bull, D.R. Enhancing VVC through CNN-Based Post-Processing. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Ma, D.; Zhang, F.; Bull, D.R. MFRNet: A New CNN Architecture for Post-Processing and In-Loop Filtering. IEEE J. Sel. Top. Signal Process. 2021, 15, 956–969. [Google Scholar] [CrossRef]
- Liu, T.; Cui, W.; Hui, C.; Jiang, F.; Gao, Y.; Xie, S.; Wu, P. AHG11: Post-Process Filter Based on Fusion of CNN and Transformer. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 26th Meeting, Teleconference, 20–29 April 2022. JVET-Z0101-v2. [Google Scholar]
- Sullivan, G.J.; Ohm, J.-R.; Han, W.-J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Bross, B.; Wang, Y.-K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.-R. Overview of the Versatile Video Coding (VVC) Standard and Its Applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
- Yu, L.; Chang, W.; Wu, S.; Gabbouj, M. End-to-End Transformer for Compressed Video Quality Enhancement. arXiv 2022, arXiv:2210.13827. [Google Scholar] [CrossRef]
- Li, H.; He, X.; Xiong, S.; Zhang, Y.; Chen, B. A Compressed Video Quality Enhancement Algorithm Based on CNN and Transformer Hybrid Network. J. Supercomput. 2025, 81, 144. [Google Scholar] [CrossRef]
- Zheng, M.; Xing, Q.; Qiao, M.; Xu, M.; Jiang, L.; Liu, H.; Chen, Y. Progressive Training of a Two-Stage Framework for Video Restoration. arXiv 2022, arXiv:2204.09924. [Google Scholar] [CrossRef]
- Qiu, Z.; Yang, H.; Fu, J.; Fu, D. Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution. arXiv 2022, arXiv:2208.03012. [Google Scholar] [CrossRef]
- Ma, Z.; Wang, Y.; Tohidypour, H.R.; Nasiopoulos, P.; Leung, V.C.M. A Swin Transformer Based Restoration Scheme for VVC Compressed Images. In Proceedings of the International Conference on Advances in Multimedia (MMEDIA), Nice, France, 26–30 June 2023; IARIA: Nice, France, 2023; pp. 1–5, ISBN 978-1-68558-072-8. [Google Scholar]
- Santamaria, M.; Yang, R.; Cricri, F.; Zhang, H.; Lainema, J.; Youvalari, R.G.; Tavakoli, H.R.; Hannuksela, M.M. Overfitting Multiplier Parameters for Content-Adaptive Post-Filtering in Video Coding. In Proceedings of the 10th European Workshop on Visual Information Processing (EUVIP), Lisbon, Portugal, 11–14 September 2022; pp. 1–6. [Google Scholar]
- Das, T.; Choi, K.; Choi, J. High Quality Video Frames From VVC: A Deep Neural Network Approach. IEEE Access 2023, 11, 54254–54264. [Google Scholar] [CrossRef]
- Zhang, F.; Ma, D.; Feng, C.; Bull, D.R. Video Compression with CNN-Based Postprocessing. IEEE Multimed. 2021, 28, 74–83. [Google Scholar] [CrossRef]
- Ramsook, D.; Kokaram, A. A Neural Enhancement Post-Processor with a Dynamic AV1 Encoder Configuration Strategy for CLIC 2024. arXiv 2024, arXiv:2401.18021. [Google Scholar] [CrossRef]
- Jiang, Y.; Nawała, J.; Feng, C.; Zhang, F.; Zhu, X.; Sole, J.; Bull, D. RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content. arXiv 2024, arXiv:2411.13362. [Google Scholar] [CrossRef]
- Dehaghi, A.M.; Razavi, R.; Moshirpour, M. Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression. arXiv 2024, arXiv:2412.08912. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, R.; Zhang, Y.; Dong, C.; Loy, C.C. NTIRE 2022 Challenge on Quality Enhancement of Compressed Video: Methods and Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, New Orleans, LA, USA, 19–20 June 2022; pp. 2531–2543. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef]
- Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollár, P.; Girshick, R. Early Convolutions Help Transformers See Better. arXiv 2021, arXiv:2106.14881. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, Y.; Rusanovskyy, D.; Karczewicz, M. EE1-1.5: Report on implementation of HOP In-loop filter with Transformer blocks. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 33rd Meeting, Online, 17–26 January 2024. Document JVET-AG0162_v1. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Ma, D.; Zhang, F.; Bull, D.R. BVI-DVC: A Training Database for Deep Video Compression. arXiv 2020, arXiv:2003.13552. [Google Scholar] [CrossRef]
- Nawała, J.; Jiang, Y.; Zhang, F.; Zhu, X.; Sole, J.; Bull, D. BVI-AOM: A New Training Dataset for Deep Video Compression Optimization. arXiv 2024, arXiv:2408.03265. [Google Scholar] [CrossRef]
- AOMediaCodec. SVT-AV1: Scalable Video Technology for AV1 Encoder. Available online: https://gitlab.com/AOMediaCodec/SVT-AV1 (accessed on 29 April 2025).
- Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2366–2369. [Google Scholar]
- Prangnell, L. Visible Light-Based Human Visual System Conceptual Model. Computer Vision and Pattern Recognition. arXiv 2016, arXiv:1609.04830. [Google Scholar]
- Alshina, E.; Galpin, F.; Rusanovskyy, D. AhG11/AhG14 teleconference. In Proceedings of the Joint Video Experts Team (JVET) Meeting, Online, 17–26 January 2024. Document JVET-AG0041-v1. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. Article 721. pp. 8026–8037. [Google Scholar]
- Bjontegaard, G. Response to Call for Proposals for H.26L. ITU-T SG16 Doc. Q15-F-11. In Proceedings of the International Telecommunication Union, Sixth Meeting, Seoul, Republic of Korea, 3–6 November 1998. [Google Scholar]
- Barman, N.; Martini, M.G.; Reznik, Y. Revisiting Bjontegaard Delta Bitrate (BD-BR) Computation for Codec Compression Efficiency Comparison. In Proceedings of the 1st Mile-High Video Conference, MHV ’22, New York, NY, USA, 1–3 March 2022; pp. 113–114. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Reference | Year | Method Name /Acronym | Target Codec(s) | Core Technique | Key Features | Reported Gains (Contextualized) |
---|---|---|---|---|---|---|
Guan et al. [10] | 2021 | MFQE 2.0 | HEVC, H.264 | Multi-frame CNN; BiLSTM PQF detector | Uses PQFs; MC and QE subnets; efficient QE-subnet | HM16.5: −14.06% BD-BR (PSNR) |
Lin et al. [11] | 2020 | Partition-Aware CNN + ASN | HEVC | Partition-aware CNN; adaptive switching NN (ASN) | Uses HEVC CU info; ASN selects CNN per patch | HM16.0: −10.97% BD-BR (Y-PSNR) |
Zhang et al. [12] | 2020 | CNN Post-Proc. | VVC | SRResNet-like CNN | QP-specific models | VTM4.01: −3.90% BD-BR (PSNR) |
Ma et al. [13] | 2021 | MFRNet | HEVC, VVC | Advanced CNN (MFRBs) for PP and ILF | MFRBs; cascading structure; PP and ILF modes | HM16.20: −21.0% BD-BR (VMAF) VTM7.0: −6.7% BD-BR (PSNR) |
Liu et al. [14] | 2022 | DFNN | VVC | Hybrid CNN-transformer (Swin) | Deep fusion blocks (CNN+transformer); QP map input | VTM11.0 NNVC1.0: −0.97% BD-BR(Y-PSNR) |
Ma et al. [21] | 2023 | STRS-VVC | VVC (Intra frames/images) | Swin transformer (SwinIR-like) | Optimized 16 × 16 window for VVC; QP-specific models | VTM 18.2 (Intra): −27.44% BD-BR (PSNR) |
Yu et al. [17] | 2024 | TVQE | HEVC | Full transformer (Swin-AE + restormer) | Multi-frame input; Channel and spatiotemporal attention | HM 16.51 (LDP): −23.04% BD-BR (PSNR) |
Li et al. [18] | 2025 | CTVE | HEVC | Hybrid DCN-CNN + Swinv2-transformer | Multi-frame (5 frames); DCN for alignment, Swinv2 for LR | HEVC HM16.20 (LDP, @QP37): +5.5% (PSNR) |
Santamaria et al. [22] | 2022 | CA-NNPF | VVC | Content-adaptive CNN (overfitting “multipliers”) | Encoder-side overfitting; weight update via SEI | VTM11.0 NNVC1.0: −5.01% BD-BR(Y-PSNR) |
Das et al. [23] | 2023 | QP-Adaptive VVC CNN | VVC | CNN with QP map input | Uses QP map as input; QP-specific models | VVenC: −4.54% BD-BR(Y-PSNR) |
Zheng et al. [19] | 2022 | PTTSR | Compressed video (NTIRE challenge—HEVC/VVC likely) | Two-stage: recurrent (BasicVSR++) + transformer (SwinIR) | Progressive training (stage I); transfer learning (stage II) | NTIRE 2022 [28] Winner (Track 1 QE: 32.07 PSNR abs. on test) |
Zhang et al. [24] | 2021 | CNN Post-Proc. + GAN | VVC, AV1 | SRResNet-like CNN; optional GAN | L1 and perceptual loss versions; QP-specific models | VVC VTM 4.0.1: −3.9% BD-BR (PSNR) AV1 libaom 1.0.0: −5.8% BD-BR (PSNR) |
Qiu et al. [20] | 2022 | FTVSR | H.264 (for SR task) | Frequency-transformer (DCT tokens) | Spatiotemporal-frequency attention; recurrent structure | H.264 (REDS4, CRF25 for SR): +1.6dB PSNR (vs. COMISR) |
Ramsook & Kokaram [25] | 2024 | E2E AV1 Enhancer | AV1 | E2E System: encoder strategy + adversarial post-proc. | Pre-downsampling; temporal NN input; perceptual focus | AV1 (libaom-av1 3.6.1 in E2E pipeline):VMAF: +6.72 pts @50 kbps, +1.81 pts @500 kbps;PSNR Y: −0.09 dB @50 kbps, −0.18 dB @500 kbps |
Jiang et al. [26] | 2024 | RTSR | AV1 | Low-complexity CNN | Knowledge distillation; PixelUnshuffle | AV1 SR (SVT-AV1 1.8.0): ×3 (360p → 1080p): +4.20 VMAF pts (vs. Lanczos5) ×4 (540p → 4K): +3.78 VMAF pts (vs. Lanczos5) |
Dehaghi et al. [27] | 2025 | DiQP | AV1, HEVC (8K/4K Video) | Transformer-diffusion Model (LeWin transformer) | QP as diffusion step; look around/ahead modules; LOST embedding | SEPE8K (AV1/HEVC, max QP): +1.77 to +1.99 dB PSNR vs. baselines |
Gwun et al. [4] | 2024 | MTSA | AV1 | Multi-type self-attention CNN | CWSA, BWSSA, PWSA; QP-specific models | SVT-AV1 1.8.0 (AVM): −10.17% BD-BR (Y-PSNR) |
This Paper | 2025 | MS-MTSA | AV1 | Multi-scale self-attention CNN | MS-BWSSA (16 × 16 + 12 × 12); refined PWSA; patch overlap | SVT-AV1 1.8.0 (AVM): −12.44% BD-BR (Y-PSNR) |
Class | Video Resolution | Number of Videos | Frames | Bit Depth | Chroma Sampling |
---|---|---|---|---|---|
A | 3840 × 2176 | 200 | 64 | 10 | 4:2:0 |
B | 1920 × 1088 | 200 | 64 | 10 | 4:2:0 |
C | 960 × 544 | 200 | 64 | 10 | 4:2:0 |
D | 480 × 272 | 200 | 64 | 10 | 4:2:0 |
Configuration Parameter | Command Line Option | Range | Value Used | Description |
---|---|---|---|---|
RateControlMode | --rc | (0–20) | 0 | Rate control mode (0: CRF or CQP if --aq-mode is 0; default, 1: VBR, 2: CBR) |
AdaptiveQuantization | --aq-mode | (0–2) | 0 | Set adaptive QP level (0: off, 1: variance base using AV1 segments, 2: deltaq pred efficiency) |
QuantizationParameter | --qp | (1–63) | 20, 32, 43, 55, 63 | Initial QP level value |
FrameRate | --fps | (1–240) | Sequence dependent | Input video frame rate, integer values only, inferred if Y4M |
EncoderColorFormat | --color-format | (0–3) | 1 | Color format, only yuv420 is supported at this time (0: yuv400, 1: yuv420, 2: yuv422, 3: yuv444) |
EncoderBitDepth | --input-depth | (8, 10) | 10 | Input video file and output bitstream bit-depth |
PredStructure | --pred-struct | (1, 2) | 2 | Set prediction structure (1: low delay, 2: random access) |
Model | QP Base Range |
---|---|
Model QP20 | QPbase < 26 |
Model QP32 | 26 ≤ QPbase < 37.5 |
Model QP43 | 37.5 ≤ QPbase < 49 |
Model QP55 | 49 ≤ QPbase < 59 |
Model QP63 | 59 ≤ QPbase ≤ 63 |
Class | Sequence | Resolution | Frame Rate | Bit-Depth |
---|---|---|---|---|
A1 | BoxingPractice | 3840 × 2160 | 59.94 | 10 |
Crosswalk | 3840 × 2160 | 59.94 | 10 | |
FoodMarket2 | 3840 × 2160 | 59.94 | 10 | |
Neon1224 | 3840 × 2160 | 29.97 | 10 | |
NocturneDance | 3840 × 2160 | 60 | 10 | |
PierSeaSide | 3840 × 2160 | 29.97 | 10 | |
Tango | 3840 × 2160 | 59.94 | 10 | |
TimeLapse | 3840 × 2160 | 59.94 | 10 | |
A2 | Aerial3200 | 1920 × 1080 | 59.94 | 10 |
Boat | 1920 × 1080 | 59.94 | 10 | |
CrowdRun | 1920 × 1080 | 50 | ||
DinnerSceneCropped | 1920 × 1080 | 29.97 | 10 | |
FoodMarket | 1920 × 1080 | 59.94 | 10 | |
GregoryScarf | 1080 × 1920 | 30 | 10 | |
MeridianTalksdr | 1920 × 1080 | 59.94 | 10 | |
Motorcycle | 1920 × 1080 | 30 | ||
OldTownCross | 1920 × 1080 | 30 | ||
PedestrianArea | 1920 × 1080 | 25 | ||
RitualDance | 1920 × 1080 | 59.94 | 10 | |
Riverbed | 1920 × 1080 | 25 | ||
RushFieldCuts | 1920 × 1080 | 29.97 | ||
Skater227 | 1920 × 1080 | 30 | 10 | |
ToddlerFountainCropped | 1080 × 1920 | 29.97 | 10 | |
TreesAndGrass | 1920 × 1080 | 30 | ||
TunnelFlag | 1920 × 1080 | 59.94 | 10 | |
Verticalbees | 1080 × 1920 | 29.97 | ||
WorldCup | 1920 × 1080 | 30 | ||
A3 | ControlledBurn | 1280 × 720 | 30 | |
DrivingPOV | 1280 × 720 | 59.94 | 10 | |
Johnny | 1280 × 720 | 60 | ||
KristenAndSara | 1280 × 720 | 60 | ||
RollerCoaster | 1280 × 720 | 59.94 | 10 | |
Vidyo3 | 1280 × 720 | 60 | ||
Vidyo4 | 1280 × 720 | 60 | ||
WestWindEasy | 1280 × 720 | 30 | ||
A4 | BlueSky | 640 × 360 | 25 | |
RedKayak | 640 × 360 | 29.97 | ||
SnowMountain | 640 × 360 | 29.97 | ||
SpeedBag | 640 × 360 | 29.97 | ||
Stockholm | 640 × 360 | 59.94 | ||
TouchdownPass | 640 × 360 | 29.97 | ||
A5 | FourPeople | 480 × 270 | 60 | |
ParkJoy | 480 × 270 | 50 | ||
SparksElevator | 480 × 270 | 59.94 | 10 | |
VerticalBayshore | 270 × 480 | 29.97 |
BVI-DVC | AVM-CTC |
---|---|
BoxingPracticeHarmonics | BoxingPractice |
DCrosswalkHarmonics | Crosswalk |
CrowdRunMCLV | CrowdRun |
TunnelFlagS1Harmonics | TunnelFlag |
DrivingPOVHarmonics | DrivingPOV |
Class | Sequence | BD-BR (%) | ||
---|---|---|---|---|
Y | Cb | Cr | ||
A1 | FoodMarket2 | −14.48% | −22.27% | −24.98% |
Neon1224 | −19.93% | −27.08% | −29.74% | |
NocturneDance | −23.15% | −30.33% | −31.86% | |
PierSeaSide | −17.25% | −28.34% | −35.58% | |
Tango | −19.56% | −40.76% | −34.91% | |
TimeLapse | −10.12% | −22.71% | −17.74% | |
Average | −17.41% | −28.58% | −29.14% | |
A2 | Aerial3200 | −5.22% | −17.18% | −31.39% |
Boat | −11.10% | −40.12% | −32.03% | |
DinnerSceneCropped | −15.39% | −37.35% | −23.18% | |
FoodMarket | −13.48% | −33.68% | −27.23% | |
GregoryScarf | −13.65% | −42.11% | −26.94% | |
MeridianTalksdr | −14.64% | −34.99% | −29.50% | |
Motorcycle | −13.45% | −27.86% | −27.39% | |
OldTownCross | −15.06% | −48.00% | −27.13% | |
PedestrianArea | −18.79% | −20.76% | −24.87% | |
RitualDance | −19.01% | −29.04% | −37.87% | |
Riverbed | −14.14% | −15.48% | −18.56% | |
RushFieldCuts | −13.06% | −21.63% | −17.82% | |
Skater227 | −20.46% | −22.46% | −21.47% | |
ToddlerFountainCropped | −15.13% | −25.99% | −31.11% | |
TreesAndGrass | −5.30% | −19.45% | −12.95% | |
Verticalbees | −13.05% | −14.13% | −13.90% | |
WorldCup | −21.07% | −26.87% | −27.84% | |
Average | −14.23% | −28.06% | −25.36% | |
A3 | ControlledBurn | −10.80% | −36.08% | −27.69% |
Johnny | −15.18% | −17.94% | −18.18% | |
KristenAndSara | −13.42% | −18.52% | −18.33% | |
RollerCoaster | −19.48% | −20.43% | −29.65% | |
Vidyo3 | −12.24% | −10.65% | −9.62% | |
Vidyo4 | −12.75% | −14.23% | −17.50% | |
WestWindEasy | −15.64% | −48.39% | −30.38% | |
Average | −14.21% | −23.75% | −21.62% | |
A4 | BlueSky | −12.01% | −20.23% | −39.46% |
RedKayak | −9.06% | −15.94% | 16.02% | |
SnowMountain | 0.96% | 6.24% | 5.35% | |
SpeedBag | −11.96% | −9.42% | −14.76% | |
Stockholm | −12.78% | −13.99% | −20.33% | |
TouchdownPass | −9.05% | −21.43% | −17.27% | |
Average | −8.98% | −13.88% | −14.58% | |
A5 | FourPeople | −8.67% | −22.64% | −10.30% |
ParkJoy | −4.84% | −11.21% | −9.62% | |
SparksElevator | −6.06% | −14.92% | −12.06% | |
VerticalBayshore | 9.91% | −15.66% | −11.64% | |
Average | −7.37% | −14.25% | −11.57% | |
Average | −12.44% | −21.70% | −19.90% |
Class | BD-BR (%) | |||||
---|---|---|---|---|---|---|
MTSA | MS-MTSA | |||||
Y | Cb | Cr | Y | Cb | Cr | |
A1 | −14.56% | −24.53% | −22.63% | −17.41% | −28.58% | −29.14% |
A2 | −12.00% | −25.07% | −20.01% | −14.23% | −28.06% | −25.36% |
A3 | −11.66% | −18.51% | −17.74% | −14.21% | −23.75% | −21.62% |
A4 | −6.55% | −11.70% | −9.99% | −8.98% | −13.88% | −14.58% |
A5 | −6.07% | −14.59% | −11.32% | −7.37% | −14.25% | −11.57% |
Average | −10.17% | −18.88% | −16.34% | −12.44% | −21.70% | −19.90% |
Class | Sequence | BD-BR (%) | ||
---|---|---|---|---|
Y | Cb | Cr | ||
A1 | FoodMarket2 | −15.8% | −30.6% | −36.7% |
Neon1224 | −35.7% | −49.7% | −53.3% | |
NocturneDance | −35.7% | −37.6% | −48.7% | |
PierSeaSide | −18.2% | −38.6% | −44.0% | |
Tango | −28.8% | −65.5% | −54.0% | |
TimeLapse | −11.4% | −31.6% | −32.9% | |
Average | −24.3% | −42.2% | −44.9% | |
A2 | Aerial3200 | −5.4% | −24.3% | −43.1% |
Boat | −8.2% | −51.4% | −40.5% | |
DinnerSceneCropped | −18.3% | −61.0% | −54.2% | |
FoodMarket | −14.0% | −42.1% | −41.2% | |
GregoryScarf | −16.0% | −61.3% | −42.9% | |
MeridianTalksdr | −20.3% | −57.4% | −50.7% | |
Motorcycle | −23.0% | −43.5% | −42.7% | |
OldTownCross | −13.7% | −55.8% | −32.1% | |
PedestrianArea | −26.3% | −31.9% | −40.5% | |
RitualDance | −27.2% | −40.9% | −53.0% | |
Riverbed | −18.0% | −25.7% | −26.5% | |
RushFieldCuts | −18.1% | −29.6% | −21.8% | |
Skater227 | −39.5% | −58.0% | −59.2% | |
ToddlerFountainCropped | −14.8% | −36.4% | −39.1% | |
TreesAndGrass | −4.5% | −30.8% | −18.3% | |
Verticalbees | −18.9% | −20.6% | −22.0% | |
WorldCup | −31.4% | −43.8% | −46.0% | |
Average | −18.7% | −42.0% | −39.6% | |
A3 | ControlledBurn | −18.7% | −54.5% | −46.3% |
Johnny | −19.2% | −28.0% | −23.8% | |
KristenAndSara | −17.3% | −24.8% | −24.2% | |
RollerCoaster | −20.5% | −23.8% | −36.4% | |
Vidyo3 | −19.4% | −19.3% | −13.9% | |
Vidyo4 | −14.4% | −22.4% | −24.8% | |
WestWindEasy | −21.4% | −64.9% | −17.8% | |
Average | −18.7% | −34.0% | −26.7% | |
A4 | BlueSky | −14.1% | −30.0% | −50.7% |
RedKayak | −12.0% | −27.7% | 109.7% | |
SnowMountain | −2.6% | 14.7% | 8.2% | |
SpeedBag | −18.3% | −15.5% | −19.7% | |
Stockholm | −10.3% | −20.2% | −22.9% | |
TouchdownPass | −3.5% | −30.8% | −20.1% | |
Average | −10.2% | −18.2% | 0.8% | |
A5 | FourPeople | −12.4% | −18.1% | −16.3% |
ParkJoy | −7.0% | −31.9% | −16.9% | |
SparksElevator | −8.9% | −14.7% | −7.9% | |
VerticalBayshore | −10.2% | −16.1% | −20.0% | |
Average | −9.6% | −20.2% | −15.3% | |
Average | -16.3% | −31.3% | −25.2% |
Class | BD-BR (%) | |||||
---|---|---|---|---|---|---|
MTSA | MS-MTSA | |||||
Y | Cb | Cr | Y | Cb | Cr | |
A1 | −20.4% | −35.5% | −36.6% | −24.3% | −42.2% | −44.9% |
A2 | −15.9% | −37.5% | −32.5% | −18.7% | −42.0% | −39.6% |
A3 | −15.7% | −24.8% | −23.4% | −18.7% | −34.0% | −26.7% |
A4 | −7.7% | −18.5% | −4.5% | −10.2% | −18.2% | 0.8% |
A5 | −7.3% | −20.1% | −16.0% | −9.6% | −20.2% | −15.3% |
Average | −13.4% | −27.3% | −22.6% | −16.3% | −31.3% | −25.2% |
Params | MACs | |||||
---|---|---|---|---|---|---|
Module | MTSA | MS-MTSA | % Change | MTSA | MS-MTSA | % Change |
CWSA | 49.54 K | 49.54 k | − | 3.25 GMac | 3.45 GMac | +6.2% |
BWSSA | 49.54 K | 99.08 k | +100.0% | 3.25 GMac | 7.11 GMac | +118.8% |
PWSA | 16.78 M | 16.93 M | +0.9% | 278.11 GMac | 324.89 GMac | +16.9% |
Refine | − | 147.6 k | − | − | 10.93 GMac | − |
Total | 19.24 M | 19.74 M | +2.6% | 446.44 GMac | 529.54 GMac | +18.6% |
Class | Sequence | PSNR(QP20) | |||||
---|---|---|---|---|---|---|---|
MTSA | IBSA | ||||||
Y | Cb | Cr | Y | Cb | Cr | ||
A1 | FoodMarket2 | 42.97 | 48.84 | 48.97 | 42.91 | 48.78 | 48.90 |
Neon1224 | 45.59 | 49.16 | 48.67 | 45.41 | 49.10 | 48.54 | |
NocturneDance | 46.96 | 43.33 | 47.93 | 46.75 | 43.31 | 47.83 | |
PierSeaSide | 44.16 | 47.12 | 44.52 | 44.06 | 46.96 | 44.44 | |
Tango | 40.60 | 49.61 | 47.83 | 40.55 | 49.52 | 47.76 | |
TimeLapse | 43.68 | 49.53 | 50.72 | 43.60 | 49.44 | 50.77 | |
Average | 43.99 | 47.93 | 48.11 | 43.88 | 47.85 | 48.04 | |
A2 | Aerial3200 | 41.87 | 45.32 | 45.00 | 41.80 | 45.31 | 44.87 |
Boat | 41.92 | 45.62 | 47.39 | 41.77 | 45.48 | 47.31 | |
DinnerSceneCropped | 38.45 | 41.35 | 46.84 | 38.39 | 41.34 | 46.80 | |
FoodMarket | 39.48 | 44.57 | 45.49 | 39.46 | 44.56 | 45.43 | |
GregoryScarf | 40.86 | 47.27 | 47.03 | 40.75 | 47.29 | 46.97 | |
MeridianTalksdr | 42.48 | 46.61 | 52.75 | 42.43 | 46.34 | 52.46 | |
Motorcycle | 41.87 | 44.04 | 44.84 | 41.74 | 44.03 | 44.79 | |
OldTownCross | 37.69 | 39.82 | 42.45 | 37.65 | 39.86 | 42.50 | |
PedestrianArea | 42.48 | 46.65 | 48.52 | 42.42 | 46.51 | 48.47 | |
RitualDance | 46.20 | 49.22 | 50.19 | 46.00 | 48.93 | 49.99 | |
Riverbed | 41.00 | 43.83 | 45.49 | 40.91 | 43.81 | 45.46 | |
RushFieldCuts | 40.41 | 44.10 | 45.32 | 40.26 | 43.89 | 45.23 | |
Skater227 | 48.27 | 54.88 | 55.33 | 48.02 | 53.58 | 55.15 | |
ToddlerFountainCropped | 40.55 | 45.59 | 42.99 | 40.35 | 45.22 | 42.91 | |
TreesAndGrass | 39.74 | 43.61 | 45.81 | 39.73 | 43.68 | 45.78 | |
Verticalbees | 45.16 | 49.23 | 49.97 | 45.10 | 48.62 | 49.88 | |
WorldCup | 45.74 | 51.16 | 50.94 | 45.53 | 50.93 | 50.83 | |
Average | 42.01 | 46.05 | 47.43 | 41.90 | 45.85 | 47.34 | |
A3 | ControlledBurn | 43.01 | 49.53 | 48.31 | 42.86 | 49.33 | 48.21 |
Johnny | 43.10 | 49.53 | 50.23 | 43.01 | 49.36 | 50.30 | |
KristenAndSara | 43.80 | 49.07 | 50.03 | 43.72 | 48.89 | 50.03 | |
RollerCoaster | 42.82 | 45.86 | 46.52 | 42.59 | 45.82 | 46.43 | |
Vidyo3 | 44.16 | 50.20 | 50.22 | 44.01 | 49.38 | 50.09 | |
Vidyo4 | 43.35 | 50.13 | 50.31 | 43.27 | 49.92 | 50.28 | |
WestWindEasy | 42.92 | 49.16 | 51.18 | 42.68 | 49.33 | 51.29 | |
Average | 43.31 | 49.07 | 49.54 | 43.16 | 48.86 | 49.52 | |
A4 | BlueSky | 42.69 | 44.06 | 46.39 | 42.60 | 44.06 | 46.22 |
RedKayak | 40.89 | 43.86 | 47.94 | 40.77 | 43.78 | 48.04 | |
SnowMountain | 42.01 | 48.81 | 49.31 | 41.95 | 48.79 | 49.31 | |
SpeedBag | 46.76 | 50.18 | 51.75 | 46.61 | 50.06 | 51.72 | |
Stockholm | 40.60 | 46.75 | 46.36 | 40.60 | 46.54 | 46.37 | |
TouchdownPass | 41.88 | 48.21 | 48.07 | 41.82 | 48.16 | 48.04 | |
Average | 42.47 | 46.98 | 48.30 | 42.39 | 46.90 | 48.28 | |
A5 | FourPeople | 45.13 | 48.20 | 49.28 | 45.02 | 48.09 | 49.24 |
ParkJoy | 39.74 | 41.91 | 43.91 | 39.76 | 41.88 | 43.96 | |
SparksElevator | 41.15 | 49.14 | 47.34 | 40.82 | 48.51 | 47.35 | |
VerticalBayshore | 42.71 | 52.07 | 51.52 | 42.55 | 51.25 | 51.49 | |
Average | 42.18 | 47.83 | 48.01 | 42.04 | 47.43 | 48.01 | |
Average | 42.79 | 47.57 | 48.28 | 42.67 | 47.38 | 48.24 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gwun, W.; Choi, K.; Park, G.H. Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance. Mathematics 2025, 13, 1782. https://doi.org/10.3390/math13111782
Gwun W, Choi K, Park GH. Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance. Mathematics. 2025; 13(11):1782. https://doi.org/10.3390/math13111782
Chicago/Turabian StyleGwun, Woowoen, Kiho Choi, and Gwang Hoon Park. 2025. "Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance" Mathematics 13, no. 11: 1782. https://doi.org/10.3390/math13111782
APA StyleGwun, W., Choi, K., & Park, G. H. (2025). Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance. Mathematics, 13(11), 1782. https://doi.org/10.3390/math13111782