MDPI - Publisher of Open Access Journals

17 pages, 1788 KiB

Open AccessArticle

Detection of Double Compression in HEVC Videos Containing B-Frames

by Yoshihisa Furushita, Daniele Baracchi, Marco Fontani, Dasara Shullani and Alessandro Piva

J. Imaging 2025, 11(7), 211; https://doi.org/10.3390/jimaging11070211 - 27 Jun 2025

Viewed by 304

This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a [...] Read more.

This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a 28-dimensional feature vector. A bidirectional Long Short-Term Memory (Bi-LSTM) classifier is then trained to model temporal inconsistencies introduced during recompression. To evaluate the method, we created a dataset of 129 HEVC-encoded YUV videos derived from 43 original sequences, covering various bitrate combinations and GOP structures. The proposed method achieved a detection accuracy of 80.06%, outperforming two existing baselines. These results demonstrate the practical applicability of the proposed approach in realistic double compression scenarios. Full article

(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

► Show Figures

Figure 1

24 pages, 19576 KiB

Open AccessArticle

Evaluating HAS and Low-Latency Streaming Algorithms for Enhanced QoE

by Syed Uddin, Michał Grega, Mikołaj Leszczuk and Waqas ur Rahman

Electronics 2025, 14(13), 2587; https://doi.org/10.3390/electronics14132587 - 26 Jun 2025

Viewed by 983

Abstract

The demand for multimedia traffic over the Internet is exponentially growing. HTTP adaptive streaming (HAS) is the leading video delivery system that delivers high-quality video to the end user. The adaptive bitrate (ABR) algorithms running on the HTTP client select the highest feasible [...] Read more.

The demand for multimedia traffic over the Internet is exponentially growing. HTTP adaptive streaming (HAS) is the leading video delivery system that delivers high-quality video to the end user. The adaptive bitrate (ABR) algorithms running on the HTTP client select the highest feasible video quality by adjusting the quality according to the fluctuating network conditions. Recently, low-latency ABR algorithms have been introduced to reduce the end-to-end latency commonly experienced in HAS. However, a comprehensive study of the low-latency algorithms remains limited. This paper investigates the effectiveness of low-latency streaming algorithms in maintaining a high quality of experience (QoE) while minimizing playback delay. We evaluate these algorithms in the context of both Dynamic Adaptive Streaming over HTTP (DASH) and the Common Media Application Format (CMAF), with a particular focus on the impact of chunked encoding and transfer mechanisms on the QoE. We perform both objective as well as subjective evaluations of low-latency algorithms and compare their performance with traditional DASH-based ABR algorithms across multiple QoE metrics, various network conditions, and diverse content types. The results demonstrate that low-latency algorithms consistently deliver high video quality across various content types and network conditions, whereas the performance of the traditional adaptive bitrate (ABR) algorithms exhibit performance variability under fluctuating network conditions and diverse content characteristics. Although traditional ABR algorithms download higher-quality segments in stable network environments, their effectiveness significantly declines under unstable conditions. Furthermore, the low-latency algorithms maintained high user experience regardless of segment duration. In contrast, the performance of traditional algorithms varied significantly with changes in segment duration. In summary, the results underscore that no single algorithm consistently achieves optimal performance across all experimental conditions. Performance varies depending on network stability, content characteristics, and segment duration, highlighting the need for adaptive strategies that can dynamically respond to varying streaming environments. Full article

(This article belongs to the Special Issue Video Streaming Service Solutions)

► Show Figures

Figure 1

32 pages, 4311 KiB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Viewed by 463

Abstract

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

18 pages, 15092 KiB

Open AccessArticle

Ultra-Low Bitrate Predictive Portrait Video Compression with Diffusion Models

by Xinyi Chen, Weimin Lei, Wei Zhang, Yanwen Wang and Mingxin Liu

Symmetry 2025, 17(6), 913; https://doi.org/10.3390/sym17060913 - 10 Jun 2025

Viewed by 694

Abstract

Deep neural video compression codecs have shown great promise in recent years. However, there are still considerable challenges for ultra-low bitrate video coding. Inspired by recent diffusion models for image and video compression attempts, we attempt to leverage diffusion models for ultra-low bitrate [...] Read more.

Deep neural video compression codecs have shown great promise in recent years. However, there are still considerable challenges for ultra-low bitrate video coding. Inspired by recent diffusion models for image and video compression attempts, we attempt to leverage diffusion models for ultra-low bitrate portrait video compression. In this paper, we propose a predictive portrait video compression method that leverages the temporal prediction capabilities of diffusion models. Specifically, we develop a temporal diffusion predictor based on a conditional latent diffusion model, with the predicted results serving as decoded frames. We symmetrically integrate a temporal diffusion predictor at the encoding and decoding side, respectively. When the perceptual quality of the predicted results in encoding end falls below a predefined threshold, a new frame sequence is employed for prediction. While the predictor at the decoding side directly generates predicted frames as reconstruction based on the evaluation results. This symmetry ensures that the prediction frames generated at the decoding end are consistent with those at the encoding end. We also design an adaptive coding strategy that incorporates frame quality assessment and adaptive keyframe control. To ensure consistent quality of subsequent predicted frames and achieve high perceptual reconstruction, this strategy dynamically evaluates the visual quality of the predicted results during encoding, retains the predicted frames that meet the quality threshold, and adaptively adjusts the length of the keyframe sequence based on motion complexity. The experimental results demonstrate that, compared with the traditional video codecs and other popular methods, the proposed scheme provides superior compression performance at ultra-low bitrates while maintaining competitiveness in visual effects, achieving more than 24% bitrate savings compared with VVC in terms of perceptual distortion. Full article

(This article belongs to the Special Issue 2025 9th International Conference on Electronic Information Technology and Computer Engineering)

► Show Figures

Figure 1

18 pages, 780 KiB

Open AccessArticle

Reinforcement Learning-Based Layered Lossy Image Semantic Coding

by Jun Yan, Youfang Wang, Yongyi Miao and Die Hu

Electronics 2025, 14(10), 1986; https://doi.org/10.3390/electronics14101986 - 13 May 2025

Viewed by 371

Abstract

Semantic communication has garnered increasing attention due to its ability to shift the focus from pixel-level transmission to the transfer of fundamental semantic information representing the core content of images. This paper proposes a novel reinforcement learning-based layered semantic coding (RL-SC) method aimed [...] Read more.

Semantic communication has garnered increasing attention due to its ability to shift the focus from pixel-level transmission to the transfer of fundamental semantic information representing the core content of images. This paper proposes a novel reinforcement learning-based layered semantic coding (RL-SC) method aimed at optimizing image communication systems under bandwidth constraints. By leveraging deep reinforcement learning (DRL), the proposed method efficiently allocates semantic bits to maximize semantic fidelity while minimizing bitrates. By incorporating semantic segmentation and image reconstruction networks, the framework utilizes both semantic maps and residual information to enhance the image coding process. Experiments demonstrate that the proposed method outperforms traditional image compression techniques and layered image encoding methods without reinforcement learning in preserving semantic content and achieving high-quality image reconstruction. In particular, the proposed method excels in low-bitrate scenarios, effectively maintaining both semantic accuracy and perceptual quality. Full article

(This article belongs to the Section Microwave and Wireless Communications)

► Show Figures

Figure 1

31 pages, 1200 KiB

Open AccessArticle

Power-Efficient UAV Positioning and Resource Allocation in UAV-Assisted Wireless Networks for Video Streaming with Fairness Consideration

by Zaheer Ahmed, Ayaz Ahmad, Muhammad Altaf and Mohammed Ahmed Hassan

Drones 2025, 9(5), 356; https://doi.org/10.3390/drones9050356 - 7 May 2025

Viewed by 845

Abstract

This work proposes a power-efficient framework for adaptive video streaming in UAV-assisted wireless networks specially designed for disaster-hit areas where existing base stations are nonfunctional. Delivering high-quality videos requires higher video rates and more resources, which leads to increased power consumption. With the [...] Read more.

This work proposes a power-efficient framework for adaptive video streaming in UAV-assisted wireless networks specially designed for disaster-hit areas where existing base stations are nonfunctional. Delivering high-quality videos requires higher video rates and more resources, which leads to increased power consumption. With the increasing demand of mobile video, efficient bandwidth allocation becomes essential. In shared networks, users with lower bitrates experience poor video quality when high-bitrate users occupy most of the bandwidth, leading to a degraded and unfair user experience. Additionally, frequent video rate switching can significantly impact user experience, making the video rates’ smooth transition essential. The aim of this research is to maximize the overall users’ quality of experience in terms of power-efficient adaptive video streaming by fair distribution and smooth transition of video rates. The joint optimization includes power minimization, efficient resource allocation, i.e., transmit power and bandwidth, and efficient two-dimensional positioning of the UAV while meeting system constraints. The formulated problem is non-convex and difficult to solve with conventional methods. Therefore, to avoid the curse of complexity, the block coordinate descent method, successive convex approximation technique, and efficient iterative algorithm are applied. Extensive simulations are performed to verify the effectiveness of the proposed solution method. The simulation results reveal that the proposed method outperforms 95–97% over equal allocation, 77–89% over random allocation, and 17–40% over joint allocation schemes. Full article

(This article belongs to the Special Issue Drone Communication, Networking, and Trajectory Control in Urban Environments)

► Show Figures

Figure 1

12 pages, 1671 KiB

Open AccessArticle

Crosstalk Suppression in Multi-Core Fiber Through Modulation of the Refractive Index

by Er’el Granot

Fibers 2025, 13(5), 56; https://doi.org/10.3390/fib13050056 - 3 May 2025

Viewed by 645

Abstract

One promising method to increase the bit-rate capacity of optical fibers is the use of Multi-Core Fibers (MCFs). However, the close proximity of the cores can lead to data interference due to crosstalk between them. A novel approach is proposed to suppress crosstalk [...] Read more.

One promising method to increase the bit-rate capacity of optical fibers is the use of Multi-Core Fibers (MCFs). However, the close proximity of the cores can lead to data interference due to crosstalk between them. A novel approach is proposed to suppress crosstalk in MCFs. It is demonstrated that if the refractive index of the cores is weakly modulated harmonically, with each core having a different phase, crosstalk in two-core and three-core fibers can be entirely eliminated. Furthermore, by using specific configurations—either by selecting the fiber length or by arranging the cores’ spatial layout—crosstalk can be suppressed even in fibers with more than three cores. Full article

► Show Figures

Figure 1

17 pages, 5607 KiB

Open AccessArticle

Tampering Detection in Absolute Moment Block Truncation Coding (AMBTC) Compressed Code Using Matrix Coding

by Yijie Lin, Ching-Chun Chang and Chin-Chen Chang

Electronics 2025, 14(9), 1831; https://doi.org/10.3390/electronics14091831 - 29 Apr 2025

Viewed by 332

Abstract

With the increasing use of digital image compression technology, ensuring data integrity and security within the compression domain has become a crucial area of research. Absolute moment block truncation coding (AMBTC), an efficient lossy compression algorithm, is widely used for low-bitrate image storage [...] Read more.

With the increasing use of digital image compression technology, ensuring data integrity and security within the compression domain has become a crucial area of research. Absolute moment block truncation coding (AMBTC), an efficient lossy compression algorithm, is widely used for low-bitrate image storage and transmission. However, existing studies have primarily focused on tamper detection for AMBTC compressed images, often overlooking the integrity of the AMBTC compressed code itself. To address this gap, this paper introduces a novel anti-tampering scheme specifically designed for AMBTC compressed code. The proposed scheme utilizes shuffle pairing to establish a one-to-one relationship between image blocks. The hash value, calculated as verification data from the original data of each block, is then embedded into the bitmap of its corresponding block using the matrix coding algorithm. Additionally, a tampering localization mechanism is incorporated to enhance the security of the compressed code without introducing additional redundancy. The experimental results demonstrate that the proposed scheme effectively detects tampering with high accuracy, providing protection for AMBTC compressed code. Full article

(This article belongs to the Special Issue Advancements in Distributed Intelligent Security Through AI-Driven Solutions)

► Show Figures

Figure 1

35 pages, 44299 KiB

Open AccessArticle

Lossy Infrared Image Compression Based on Wavelet Coefficient Probability Modeling and Run-Length-Enhanced Huffman Coding

by Yaohua Zhu, Ya Liu, Yanghang Zhu, Mingsheng Huang, Jingyu Jiang and Yong Zhang

Sensors 2025, 25(8), 2491; https://doi.org/10.3390/s25082491 - 15 Apr 2025

Viewed by 397

Abstract

Infrared line-scanning images have high redundancy and large file sizes. In JPEG2000 compression, the MQ arithmetic encoder’s complexity slows down processing. Huffman coding can achieve O(1) complexity based on a code table, but its integer-bit encoding mechanism and ignorance of the continuity of [...] Read more.

Infrared line-scanning images have high redundancy and large file sizes. In JPEG2000 compression, the MQ arithmetic encoder’s complexity slows down processing. Huffman coding can achieve O(1) complexity based on a code table, but its integer-bit encoding mechanism and ignorance of the continuity of symbol distribution result in suboptimal compression performance. In particular, when encoding sparse quantized wavelet coefficients that contain a large number of consecutive zeros, the inaccuracy of the one-bit shortest code accumulates, reducing compression efficiency. To address this, this paper proposes Huf-RLC, a Huffman-based method enhanced with Run-Length Coding. By leveraging zero-run continuity, Huf-RLC optimizes the shortest code encoding, reducing the average code length to below one bit in sparse distributions. Additionally, this paper proposes a wavelet coefficient probability model to avoid the complexity of calculating statistics for constructing Huffman code tables for different wavelet subbands. Furthermore, Differential Pulse Code Modulation (DPCM) is introduced to address the remaining spatial redundancy in the low-frequency wavelet subband. The experimental results indicate that the proposed method outperforms JPEG in terms of PSNR and SSIM, while maintaining minimal performance loss compared to JPEG2000. Particularly at low bitrates, the proposed method shows only a small gap with JPEG2000, while JPEG suffers from significant blocking artifacts. Additionally, the proposed method achieves compression speeds 3.155 times faster than JPEG2000 and 2.049 times faster than JPEG. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 5667 KiB

Open AccessArticle

Content-Symmetrical Multidimensional Transpose of Image Sequences for the High Efficiency Video Coding (HEVC) All-Intra Configuration

by Tamer Shanableh

Symmetry 2025, 17(4), 598; https://doi.org/10.3390/sym17040598 - 15 Apr 2025

Viewed by 406

Abstract

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the [...] Read more.

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the motion-compensation loop. This work proposes a pre- and post-coding solution using the content-symmetrical multidimensional transpose of raw video sequences. The content-symmetrical multidimensional transpose results in images composed of slices of the temporal domain whilst preserving the video content. Such slices have higher spatial homogeneity at the expense of reducing the temporal resemblance. As such, an all-intra configuration is an excellent choice for compressing such images. Prior to displaying the decoded images, a content-symmetrical multidimensional transpose is applied again to restore the original form of the input images. Moreover, we propose a lightweight two-pass encoding solution in which we apply systematic temporal subsampling on the multidimensional transposed image sequences prior to the first-pass encoding. This noticeably reduces the complexity of the encoding process of the first pass and gives an indication as to whether or not the proposed solution is suitable for the video sequence at hand. Using the HEVC video codec, the experimental results revealed that the proposed solution results in a lower percentage of coding unit splits in comparison to regular HEVC coding without the multidimensional transpose of image sequences. This finding supports the claim of there being increasing spatial coherence as a result of the proposed solution. Additionally, using four quantization parameters, and in comparison to regular HEVC encoding, the resulting BD rate is −15.12%, which indicates a noticeable bitrate reduction. The BD-PSNR, on the other hand, was 1.62 dB, indicating an enhancement in the quality of the decoded images. Despite all of these benefits, the proposed solution has limitations, which are also discussed in the paper. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

17 pages, 2527 KiB

Open AccessArticle

Three-Stage Multi-Frame Multi-Channel In-Loop Filter of VVC

by Si Li, Honggang Qi, Yundong Zhang and Guoqin Cui

Electronics 2025, 14(5), 1033; https://doi.org/10.3390/electronics14051033 - 5 Mar 2025

Viewed by 800

Abstract

For the Versatile Video Coding (VVC) standard, extensive research has been conducted on in-loop filtering to improve encoding efficiency. However, most methods use only spatial characteristics without exploiting the content correlation across multiple frames or fully utilizing the inter-channel relational information. In this [...] Read more.

For the Versatile Video Coding (VVC) standard, extensive research has been conducted on in-loop filtering to improve encoding efficiency. However, most methods use only spatial characteristics without exploiting the content correlation across multiple frames or fully utilizing the inter-channel relational information. In this paper, we introduce a novel three-stage Multi-frame Multi-channel In-loop Filtering (3-MMIF) method for VVC that improves the quality of each encoded frame by harnessing the correlations between adjacent frames and channels. Firstly, we establish a comprehensive database containing pairs of encoded and original frames across various scenes. Then, we select the nearest frames in the decode buffer as the reference frames for enhancing the quality of the current frame. Subsequently, we propose a three-stage in-loop filtering method that leverages spatio-temporal and inter-channel correlations. The three-stage method is grounded in the recently developed Residual Dense Network, benefiting from its enhanced generalization ability and feature reuse mechanism. Experimental results demonstrate that our 3-MMIF method, with the encoder’s standard filter tools activated, achieves 2.78%/4.87%/5.13% Bjøntegaard delta bit-rate (BD-Rate) reductions for the Y, U, and V channels over the VVC 17.0 codec for random access configuration on the standard test set, outperforming other VVC in-loop filter methods. Full article

► Show Figures

Figure 1

18 pages, 13263 KiB

Open AccessArticle

Efficient Large-Scale Point Cloud Geometry Compression

by Shiyu Lu, Cheng Han and Huamin Yang

Sensors 2025, 25(5), 1325; https://doi.org/10.3390/s25051325 - 21 Feb 2025

Viewed by 1150

Abstract

Due to the significant bandwidth and memory requirements for transmitting and storing large-scale point clouds, considerable progress has been made in recent years in the field of large-scale point cloud geometry compression. However, challenges remain, including suboptimal compression performance and complex encoding–decoding processes. [...] Read more.

Due to the significant bandwidth and memory requirements for transmitting and storing large-scale point clouds, considerable progress has been made in recent years in the field of large-scale point cloud geometry compression. However, challenges remain, including suboptimal compression performance and complex encoding–decoding processes. To address these issues, we propose an efficient large-scale scene point cloud geometry compression algorithm. By analyzing the sparsity of large-scale point clouds and the impact of scale on feature extraction, we design a cross-attention module in the encoder to enhance the extracted features by incorporating positional information. During decoding, we introduce an efficient generation module that improves decoding quality without increasing decoding time. Experiments on three public datasets demonstrate that, compared to the state-of-the-art G-PCC v23, our method achieves an average bitrate reduction of −46.64%, the fastest decoding time, and a minimal network model size of 2.8 M. Full article

(This article belongs to the Special Issue Intelligent Point Cloud Processing, Sensing and Understanding (Volume II))

► Show Figures

Figure 1

19 pages, 2254 KiB

Open AccessArticle

Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC

by Shuqian He, Biao Jin, Shangneng Tian, Jiayu Liu, Zhengjie Deng and Chun Shi

Electronics 2024, 13(24), 5028; https://doi.org/10.3390/electronics13245028 - 20 Dec 2024

Viewed by 1175

Abstract

In video encoding rate control, adaptive selection of the initial quantization parameter (QP) is a critical factor affecting both encoding quality and rate control precision. Due to the diversity of video content and the dynamic nature of network conditions, accurately and efficiently determining [...] Read more.

In video encoding rate control, adaptive selection of the initial quantization parameter (QP) is a critical factor affecting both encoding quality and rate control precision. Due to the diversity of video content and the dynamic nature of network conditions, accurately and efficiently determining the initial QP remains a significant challenge. The optimal setting of the initial QP not only influences bitrate allocation strategies but also impacts the encoding efficiency and output quality of the encoder. To address this issue in the H.266/VVC standard, this paper proposes a novel hierarchical reinforcement learning-based method for adaptive initial QP selection. The proposed method introduces a hierarchical reinforcement learning framework that decomposes the initial QP selection task into high-level and low-level strategies, handling coarse-grained and fine-grained QP decisions, respectively. The high-level strategy quickly determines a rough QP range based on global video features and network conditions, while the low-level strategy refines the specific QP value within this range to enhance decision accuracy. This framework integrates spatiotemporal video complexity, network conditions, and rate control objectives to form an optimized model for adaptive initial QP selection. Experimental results demonstrate that the proposed method significantly improves encoding quality and rate control accuracy compared to traditional methods, confirming its effectiveness in handling complex video content and dynamic network environments. Full article

► Show Figures

Figure 1

16 pages, 433 KiB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 876

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

17 pages, 771 KiB

Open AccessArticle

PaCs: Playing Time-Aware Chunk Selection in Short Video Preloading

by Sen Fu, Guanyu Yang, Zhengjun Yao, Shuxin Tan, Yongxin Shan and Wanchun Jiang

Electronics 2024, 13(24), 4864; https://doi.org/10.3390/electronics13244864 - 10 Dec 2024

Viewed by 855

Abstract

Short video applications are popular nowadays. Typically, a short video may be played for a few seconds or swiped away by users at any time. Due to this uncertain user behavior, video chunks should be preloaded to ensure a smooth viewing process for [...] Read more.

Short video applications are popular nowadays. Typically, a short video may be played for a few seconds or swiped away by users at any time. Due to this uncertain user behavior, video chunks should be preloaded to ensure a smooth viewing process for users. In other words, the short video preloading scheme is crucial to the quality of experience (QoE) of users and revenue. Specifically, the short video preloading scheme should determine which video chunk to download, when to download it, and at what bitrate. Existing schemes either fail to consider all these factors together or find it hard to make the best decision. In this work, we argue that the selection of downloaded video chunks is the foremost task due to the uncertain user behaviors and the corresponding huge number of possible playing sequences of video chunks. As a step further, we propose the playing time-aware chunk selection (PaCs) scheme, which downloads the video chunk with the smallest expected playing time first. After the selection of the video chunk, the bitrate is selected according to the classic MPC algorithm and then whether the downloading process is paused or executed is discussed. In total, PaCs can improve the score consisting of the QoE of downloaded video chunks and the utilized network bandwidth under different conditions. Simulations confirm that PaCs achieves a higher score than the existing scheme Dashlet proposed in NSDI 2024. Full article

► Show Figures

Figure 1

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI