MDPI - Publisher of Open Access Journals

32 pages, 4311 KiB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Viewed by 463

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

18 pages, 1845 KiB

Open AccessArticle

Fast Intra-Prediction Mode Decision Algorithm for Versatile Video Coding Based on Gradient and Convolutional Neural Network

by Nana Li, Zhenyi Wang, Qiuwen Zhang, Lei He and Weizheng Zhang

Electronics 2025, 14(10), 2031; https://doi.org/10.3390/electronics14102031 - 16 May 2025

Viewed by 543

Abstract

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To [...] Read more.

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To accommodate this, VVC increases the number of intra-prediction modes from 35 to 67, leading to a substantial rise in computational demands. This study presents a fast intra-prediction mode selection algorithm that combines gradient analysis and CNN. First, the Laplace operator is employed to estimate the texture direction of the current CU block, identifying the most probable prediction direction and skipping over half of the redundant candidate modes, thereby significantly reducing the number of mode searches. Second, to further minimize computational complexity, two efficient neural network models, MIP-NET and ISP-NET, are developed to determine whether to terminate the prediction process for Matrix Intra Prediction(MIP) and Intra Sub-Partitioning(ISP) modes early, avoiding unnecessary calculations. This approach maintains coding performance while significantly lowering the time complexity of intra-prediction mode selection. Experimental results demonstrate that the algorithm achieves a 35.04% reduction in encoding time with only a 0.69% increase in BD-BR, striking a balance between video quality and coding efficiency. Full article

► Show Figures

Figure 1

16 pages, 433 KiB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 876

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

24 pages, 6380 KiB

Open AccessArticle

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024

Cited by 1 | Viewed by 1648

Abstract

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7340 KiB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Cited by 2 | Viewed by 2182

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

22 pages, 2143 KiB

Open AccessArticle

Optimization of the Generative Multi-Symbol Architecture of the Binary Arithmetic Coder for UHDTV Video Encoders

by Grzegorz Pastuszak

Electronics 2023, 12(22), 4643; https://doi.org/10.3390/electronics12224643 - 14 Nov 2023

Cited by 2 | Viewed by 1248

Abstract

Previous studies have shown that the application of the M-coder in the H.264/AVC and H.265/HEVC video coding standards allows for highly parallel implementations without decreasing maximal frequencies. Although the primary limitation on throughput, originating from the range register update, can be eliminated, other [...] Read more.

Previous studies have shown that the application of the M-coder in the H.264/AVC and H.265/HEVC video coding standards allows for highly parallel implementations without decreasing maximal frequencies. Although the primary limitation on throughput, originating from the range register update, can be eliminated, other limitations are associated with low register processing. Their negative impact is revealed at higher degrees of parallelism, leading to a gradual throughput saturation. This paper presents optimizations introduced to the generative hardware architecture to increase throughputs and hardware efficiencies. Firstly, it can process more than one bypass-mode subseries in one clock cycle. Secondly, aggregated contributions to the codestream are buffered before the low register update. Thirdly, the number of contributions used to update the low register in one clock cycle is decreased to save resources. Fourthly, the maximal one-clock-cycle renormalization shift of the low register is increased from 32 to 64 bit positions. As a result of these optimizations, the binary arithmetic coder, configured for series lengths of 27 and 2 symbols, increases the throughput from 18.37 to 37.42 symbols per clock cycle for high-quality H.265/HEVC compression. The logic consumption increases from 205.6k to 246.1k gates when synthesized on 90 nm TSMC technology. The design can operate at 570 MHz. Full article

(This article belongs to the Special Issue New Technology of Image & Video Processing)

► Show Figures

Figure 1

20 pages, 6779 KiB

Open AccessArticle

Fast CU Partition Algorithm for Intra Frame Coding Based on Joint Texture Classification and CNN

by Ting Wang, Geng Wei, Huayu Li, ThiOanh Bui, Qian Zeng and Ruliang Wang

Sensors 2023, 23(18), 7923; https://doi.org/10.3390/s23187923 - 15 Sep 2023

Cited by 2 | Viewed by 1618

Abstract

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization [...] Read more.

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization method, which may result in high encoding complexity and hardware implementation challenges. To address this problem, this paper proposes a method that combines convolutional neural networks (CNN) with joint texture recognition to reduce encoding complexity. First, a classification decision method based on the global and local texture features of the CU is proposed, efficiently dividing the CU into smooth and complex texture regions. Second, for the CUs in smooth texture regions, the partition is determined by terminating early. For the CUs in complex texture regions, a proposed CNN is used for predictive partitioning, thus avoiding the traditional recursive approach. Finally, combined with texture classification, the proposed CNN achieves a good balance between the coding complexity and the coding performance. The experimental results demonstrate that the proposed algorithm reduces computational complexity by 61.23%, while only increasing BD-BR by 1.86% and decreasing BD-PSNR by just 0.09 dB. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

22 pages, 4694 KiB

Open AccessArticle

Reducing Video Coding Complexity Based on CNN-CBAM in HEVC

by Huayu Li, Geng Wei, Ting Wang, ThiOanh Bui, Qian Zeng and Ruliang Wang

Appl. Sci. 2023, 13(18), 10135; https://doi.org/10.3390/app131810135 - 8 Sep 2023

Cited by 4 | Viewed by 1738

Abstract

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a [...] Read more.

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a convolution neural network (CNN) based optimization algorithm combined with a hybrid attention mechanism module. Firstly, we designed a CNN compatible with the current coding unit (CU) size to accurately predict the CU partitions. In addition, we also designed a convolution block to enhance the information interaction between CU blocks. Then, we introduced the convolution block attention module (CBAM) into CNN, called CNN-CBAM. This module concentrates on important regions in the image and attends to the target object correctly. Finally, we integrated the CNN-CBAM into the HEVC coding framework for CU partition prediction in advance. The proposed network was trained, validated, and tested using a large scale dataset covering various scenes and objects, which provides extensive samples for intra-frame CU partition prediction in HEVC. The experimental findings demonstrate that our scheme can reduce the coding time by up to 64.05% on average compared to a traditional HM16.5 encoder, with only 0.09 dB degradation in BD-PSNR and a 1.94% increase in BD-BR. Full article

► Show Figures

Figure 1

13 pages, 3064 KiB

Open AccessCommunication

Visual Perception Based Intra Coding Algorithm for H.266/VVC

by Yu-Hsiang Tsai, Chen-Rung Lu, Mei-Juan Chen, Meng-Chun Hsieh, Chieh-Ming Yang and Chia-Hung Yeh

Electronics 2023, 12(9), 2079; https://doi.org/10.3390/electronics12092079 - 1 May 2023

Cited by 6 | Viewed by 3402

Abstract

The latest international video coding standard, H.266/Versatile Video Coding (VVC), supports high-definition videos, with resolutions from 4 K to 8 K or even larger. It offers a higher compression ratio than its predecessor, H.265/High Efficiency Video Coding (HEVC). In addition to the quadtree [...] Read more.

The latest international video coding standard, H.266/Versatile Video Coding (VVC), supports high-definition videos, with resolutions from 4 K to 8 K or even larger. It offers a higher compression ratio than its predecessor, H.265/High Efficiency Video Coding (HEVC). In addition to the quadtree partition structure of H.265/HEVC, the nested multi-type tree (MTT) structure of H.266/VVC provides more diverse splits through binary and ternary trees. It also includes many new coding tools, which tremendously increases the encoding complexity. This paper proposes a fast intra coding algorithm for H.266/VVC based on visual perception analysis. The algorithm applies the factor of average background luminance for just-noticeable-distortion to identify the visually distinguishable (VD) pixels within a coding unit (CU). We propose calculating the variances of the numbers of VD pixels in various MTT splits of a CU. Intra sub-partitions and matrix weighted intra prediction are turned off conditionally based on the variance of the four variances for MTT splits and a thresholding criterion. The fast horizontal/vertical splitting decisions for binary and ternary trees are proposed by utilizing random forest classifiers of machine learning techniques, which use the information of VD pixels and the quantization parameter. Experimental results show that the proposed algorithm achieves around 47.26% encoding time reduction with a Bjøntegaard Delta Bitrate (BDBR) of 1.535% on average under the All Intra configuration. Overall, this algorithm can significantly speed up H.266/VVC intra coding and outperform previous studies. Full article

(This article belongs to the Special Issue Selected Papers from 2022 IET International Conference on Engineering Technologies and Applications)

► Show Figures

Figure 1

19 pages, 1269 KiB

Open AccessArticle

A Highly Pipelined and Highly Parallel VLSI Architecture of CABAC Encoder for UHDTV Applications

by Chen Fu, Heming Sun, Zhiqiang Zhang and Jinjia Zhou

Sensors 2023, 23(9), 4293; https://doi.org/10.3390/s23094293 - 26 Apr 2023

Cited by 2 | Viewed by 2541

Abstract

Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and [...] Read more.

Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and VVC/H.266. CABAC is a well known throughput bottleneck due to its strong data dependencies. Because the required context model of the current bin often depends on the results of the previous bin, the context model cannot be prefetched early enough and then results in pipeline stalls. To solve this problem, we propose a prediction-based context model prefetching strategy, effectively eliminating the clock consumption of the contextual model for accessing data in memory. Moreover, we offer multi-result context model update (MCMU) to reduce the critical path delay of context model updates in multi-bin/clock architecture. Furthermore, we apply pre-range update and pre-renormalize techniques to reduce the multiplex BAE’s route delay due to the incomplete reliance on the encoding process. Moreover, to further speed up the processing, we propose to process four regular and several bypass bins in parallel with a variable bypass bin incorporation (VBBI) technique. Finally, a quad-loop cache is developed to improve the compatibility of data interactions between the entropy encoder and other video encoder modules. As a result, the pipeline architecture based on the context model prefetching strategy can remove up to 45.66% of the coding time due to stalls of the regular bin, and the parallel architecture can also save 29.25% of the coding time due to model update on average under the condition that the Quantization Parameter (QP) is equal to 22. At the same time, the throughput of our proposed parallel architecture can reach 2191 Mbin/s, which is sufficient to meet the requirements of 8 K Ultra High Definition Television (UHDTV). Additionally, the hardware efficiency (Mbins/s per k gates) of the proposed architecture is higher than that of existing advanced pipeline and parallel architectures. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes)

► Show Figures

Figure 1

20 pages, 8135 KiB

Open AccessArticle

Learning-Based Rate Control for High Efficiency Video Coding

by Sovann Chen, Supavadee Aramvith and Yoshikazu Miyanaga

Sensors 2023, 23(7), 3607; https://doi.org/10.3390/s23073607 - 30 Mar 2023

Cited by 4 | Viewed by 2389

Abstract

High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each [...] Read more.

High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each coding tree unit (CTU) in frames. This paper proposes a learning-based mapping method between rate control parameters and video contents to achieve an accurate target bit rate and good video quality. The proposed framework contains two main structural codings, including spatial and temporal coding. We initiate an effective learning-based particle swarm optimization for spatial and temporal coding to determine the optimal parameters at the CTU level. For temporal coding at the picture level, we introduce semantic residual information into the parameter updating process to regulate the bit correctly on the actual picture. Experimental results indicate that the proposed algorithm is effective for HEVC and outperforms the state-of-the-art rate control in the HEVC reference software (HM-16.10) by 0.19 dB on average and up to 0.41 dB for low-delay P coding structure. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes)

► Show Figures

Figure 1

17 pages, 12907 KiB

Open AccessArticle

A Hardware-Friendlyand High-Efficiency H.265/HEVC Encoder for Visual Sensor Networks

by Chi-Ting Ni, Ying-Chia Huang and Pei-Yin Chen

Sensors 2023, 23(5), 2625; https://doi.org/10.3390/s23052625 - 27 Feb 2023

Cited by 7 | Viewed by 3240

Abstract

Visual sensor networks (VSNs) have numerous applications in fields such as wildlife observation, object recognition, and smart homes. However, visual sensors generate vastly more data than scalar sensors. Storing and transmitting these data is challenging. High-efficiency video coding (HEVC/H.265) is a widely used [...] Read more.

Visual sensor networks (VSNs) have numerous applications in fields such as wildlife observation, object recognition, and smart homes. However, visual sensors generate vastly more data than scalar sensors. Storing and transmitting these data is challenging. High-efficiency video coding (HEVC/H.265) is a widely used video compression standard. Compare to H.264/AVC, HEVC reduces approximately 50% of the bit rate at the same video quality, which can compress the visual data with a high compression ratio but results in high computational complexity. In this study, we propose a hardware-friendly and high-efficiency H.265/HEVC accelerating algorithm to overcome this complexity for visual sensor networks. The proposed method leverages texture direction and complexity to skip redundant processing in CU partition and accelerate intra prediction for intra-frame encoding. Experimental results revealed that the proposed method could reduce encoding time by 45.33% and increase the Bjontegaard delta bit rate (BDBR) by only 1.07% as compared to HM16.22 under all-intra configuration. Moreover, the proposed method reduced the encoding time for six visual sensor video sequences by 53.72%. These results confirm that the proposed method achieves high efficiency and a favorable balance between the BDBR and encoding time reduction. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

23 pages, 3802 KiB

Open AccessArticle

Time Delay Optimization of Compressing Shipborne Vision Sensor Video Based on Deep Learning

by Hongrui Lu, Yingjun Zhang and Zhuolin Wang

J. Mar. Sci. Eng. 2023, 11(1), 122; https://doi.org/10.3390/jmse11010122 - 6 Jan 2023

Cited by 4 | Viewed by 2908

Abstract

As the technology for offshore wireless transmission and collaborative innovation in unmanned ships continues to mature, research has been gradually carried out in various countries on methods of compressing and transmitting perceptual video while driving ships remotely. High Efficiency Video Coding (H.265/HEVC) has [...] Read more.

As the technology for offshore wireless transmission and collaborative innovation in unmanned ships continues to mature, research has been gradually carried out in various countries on methods of compressing and transmitting perceptual video while driving ships remotely. High Efficiency Video Coding (H.265/HEVC) has played an extremely important role in the field of Unmanned Aerial Vehicle (UAV) and autopilot, and as one of the most advanced coding schemes, its performance in compressing visual sensor video is excellent. According to the characteristics of shipborne vision sensor video (SVSV), optimizing the coding aspects with high computational complexity is one of the important methods to improve the video compression performance. Therefore, an efficient video coding technique is proposed to improve the efficiency of SVSV compression. In order to optimize the compression performance of SVSV, an intra-frame coding delay optimization algorithm that works in the intra-frame predictive coding (PC) session by predicting the Coding Unit (CU) division structure in advance is proposed in combination with deep learning methods. The experimental results show that the total compression time of the algorithm is reduced by about 45.49% on average compared with the official testbed HM16.17 for efficient video coding, while the Bjøntegaard Delta Bit Rate (BD-BR) increased by an average of 1.92%, and the Peak Signal-to-Noise Ratio (BD-PSNR) decreased by an average of 0.14 dB. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

20 pages, 520 KiB

Open AccessArticle

Detection of Double-Compressed Videos Using Descriptors of Video Encoders

by Yun Gu Lee, Gihyun Na and Junseok Byun

Sensors 2022, 22(23), 9291; https://doi.org/10.3390/s22239291 - 29 Nov 2022

Cited by 2 | Viewed by 2074

Abstract

In digital forensics, video becomes important evidence in an accident or a crime. However, video editing programs are easily available in the market, and even non-experts can delete or modify a section of an evidence video that contains adverse evidence. The tampered video [...] Read more.

In digital forensics, video becomes important evidence in an accident or a crime. However, video editing programs are easily available in the market, and even non-experts can delete or modify a section of an evidence video that contains adverse evidence. The tampered video is compressed again and stored. Therefore, detecting a double-compressed video is one of the important methods in the field of digital video tampering detection. In this paper, we present a new approach to detecting a double-compressed video using the proposed descriptors of video encoders. The implementation of real-time video encoders is so complex that manufacturers should develop hardware video encoders considering a trade-off between complexity and performance. According to our observation, hardware video encoders practically do not use all possible encoding modes defined in the video coding standard but only a subset of the encoding modes. The proposed method defines this subset of encoding modes as the descriptor of the video encoder. If a video is double-compressed, the descriptor of the double-compressed video is changed to the descriptor of the video encoder used for double-compression. Therefore, the proposed method detects the double-compressed video by checking whether the descriptor of the test video is changed or not. In our experiments, we show descriptors of various H.264 and High-Efficiency Video Coding (HEVC) video encoders and demonstrate that our proposed method successfully detects double-compressed videos in most cases. Full article

(This article belongs to the Special Issue From AI to Image Processing, Forensics, Anonymization, and Adversarial Techniques)

► Show Figures

Figure 1

17 pages, 2073 KiB

Open AccessArticle

A Study on Fast and Low-Complexity Algorithms for Versatile Video Coding

by Kiho Choi

Sensors 2022, 22(22), 8990; https://doi.org/10.3390/s22228990 - 20 Nov 2022

Cited by 9 | Viewed by 3540

Abstract

Versatile Video Coding (VVC)/H.266, completed in 2020, provides half the bitrate of the previous video coding standard (i.e., High-Efficiency Video Coding (HEVC)/H.265) while maintaining the same visual quality. The primary goal of VVC/H.266 is to achieve a compression capability that is noticeably better [...] Read more.

Versatile Video Coding (VVC)/H.266, completed in 2020, provides half the bitrate of the previous video coding standard (i.e., High-Efficiency Video Coding (HEVC)/H.265) while maintaining the same visual quality. The primary goal of VVC/H.266 is to achieve a compression capability that is noticeably better than that of HEVC/H.265, as well as the functionality to support a variety of applications with a single profile. Although VVC/H.266 has improved its coding performance by incorporating new advanced technologies with flexible partitioning, the increased encoding complexity has become a challenging issue in practical market usage. To address the complexity issue of VVC/H.266, significant efforts have been expended to develop practical methods for reducing the encoding and decoding processes of VVC/H.266. In this study, we provide an overview of the VVC/H.266 standard, and compared with previous video coding standards, examine a key challenge to VVC/H.266 coding. Furthermore, we survey and present recent technical advances in fast and low-complexity VVC/H.266, focusing on key technical areas. Full article

(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor II)

► Show Figures

Figure 1

Search Results (35)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (35)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI