## 1. Introduction

Vehicular ad-hoc networks (VANETs) can provide multimedia communication between vehicles with the aim of providing efficient and safe transportation [

1]. Vehicles with different sensors can exchange and share information for safely breaking, localization and obstacle avoiding. Moreover, the sharing of traffic accident’s live video can improve the rescue efficiency and alleviate traffic jams. However, video transmission has been considered as a challenging task for VANETs, because video transmission over VANETs can significantly increase bandwidth [

2]. This work focuses on the development of video codec that supports real-time video transmission over VANETs for road safety applications.

The demanding challenges of VANETs are bandwidth limitations and opportunities, connectivity, mobility, and high loss rates [

3]. Because of the resource-demanding nature of video data in road safety applications, bandwidth limitations is the bottleneck for real-time video transmission over VANETs [

4,

5]. Moreover, due to the limited vehicle node’s battery lifetime, video delivery over VANETs remains extremely challenging. Essentially, the low-complexity video encoder can accelerate the video transmission in real-time, and achieve low delay for video streaming. Hence, in order to transmit real-time video with bandwidth constraint, it is vital to develop an efficient video encoder. A video encoder with high encoding efficiency and low encoding complexity is the core requirement of VANETs [

6].

Concurrently, the main video codecs are High Efficiency Video Coding (HEVC, or H.265) that are developed by the Joint Collaborative Team on Video Coding (JCT-VC) Group, and AV1 that is developed by Alliance for Open Media (AOMedia) [

7]. Nevertheless, video codec is faced with several challenges. AV1 is a newer codec, royalty free and open sourced. However, the hardware implementation of AV1 encoder will take a long time. H.265/HEVC is the state-of-the-art standardized video codecs [

8]. Compared with H.265/HEVC, AV1 increases the complexity significantly without achieving an increase in coding efficiency. When supporting the most available (resource-limited/mobile) devices or having a need for real-time, low latency encoding, it would be better to stick to H.265/HEVC. However, the encoding complexity of H.265/HEVC encoder increases dramatically due to its recursive quadtree representation [

9]. Although previous excellent works have been proposed for reducing H.265/HEVC encoder complexity [

10,

11,

12,

13,

14], most of them balance the encoding complexity and encoding efficiency unsuccessfully.

To address this issue, spatial and temporal information is widely used to reduce the computation redundancy of the H.265/HEVC encoder. However, the spatiotemporal correlation between the coding unit and the neighboring coding unit is not better used. To the author’s best knowledge, complexity reduction in real-time coding with the available computational power at the VANET nodes has not been well studied, especially from the viewpoint of hardware implementation. The key contributions of this work are summarized as follows:

We propose a low-complexity and hardware-friendly H.265/HEVC encoder. The proposed encoder allows the encoding complexity to be reduced significantly so that low delay requirements for video transmission in power-limited VANETs nodes are satisfied.

A novel spatiotemporal neighboring set is used to predict the depth range of the current coding tree unit. The prior probability of coding unit splitting or non-splitting is calculated with the spatiotemporal neighboring set. Moreover, the Bayesian rule and Gibbs Random Field (GRF) are used to reduce the encoding complexity for H.265/HEVC encoder with the combination of the coding tree unit depth decision and prediction unit modes decision.

The proposed algorithm can balance the encoding complexity and encoding efficiency successfully. The encoding time can be reduced by 50% with negligible encoding efficiency loss, and the proposed encoder is suitable for real-time video applications.

The rest of paper is organized as follows. Related works are reviewed in

Section 2.

Section 3 discusses background details. In

Section 4, the fast CU decision algorithm is presented. Simulation results are discussed in

Section 5.

Section 6 concludes this work.

## 4. The Proposed Low-Complexity and Hardware-Friendly H.265/HEVC Encoder for VANETs

#### 4.1. The Novel Spatiotemporal Neighborhood Set

The object motion is regular in video sequences and there is some continuity in the depth between adjacent CUs. If the depth range of the current CU can be inferred from the encoded neighboring CU, then some hierarchical partitioning is directly skipped or terminated. Therefore, the computational complexity has been reduced, significantly.

In order to utilize the spatiotemporal correlation, the four neighborhood set G is defined as

Set

G is shown in

Figure 4, where

$C{U}_{L}$,

$C{U}_{TL}$,

$C{U}_{TR}$, and

$C{U}_{CO}$ denote the left, top-left CU, top-right, and collocated of the current CU, respectively.

#### 4.2. CTU Depth Decision

For video compression techniques, a smooth coding block popularly has the smaller CU depth. By contrast, the larger depth value is suitable for a complex area. Previous works show that the object motion in the same frame remains directional, and the motion and texture of the neighboring CUs are similar. In this work, the depths of neighboring CTU in the set

G are used to predict the depth range of current CTU, and the predicted depth of current CTU is calculated as

where

k is the index of neighboring CTU in set

G,

$De{p}_{k}$ is the depth of neighboring CTU in the set

G, and

${\theta}_{k}$ is a weight factor of neighboring CTU’s depth, respectively, in the set

G. In the H.265/HEVC standard, the range of CTU is depth 0, 1, 2, and 3. Hence, the calculated depth of the current CTU (

${\widehat{Dep}}_{CTU}$) satisfies

In Equation (

3),

$De{p}_{k}\le 3$. Therefore, weight factor

${\theta}_{k}$ satisfies

If the range of current CTU is depth 0, 1, 2, and 3, then the sum of weight factor

${\theta}_{k}$ is 1 in this work. Moreover, Zhang’s work confirms that, when the weight factor of the spatial neighboring CTU’s depth is more than the weight factor of the temporal neighboring CTU’s depth, the calculated CTU depth is closer to the actual depth of the current CTU [

29]. In this work, each weight factor of the spatial neighboring CTU’s depth is equal, and the weight factor of spatial neighboring CTU’s depth is more than the weight factor of temporal CTU’s depth. Then

${\theta}_{k}$ satisfies

However, the calculated value of

${\widehat{Dep}}_{CTU}$ is a non-integer most of the time. It is not suitable to directly predict the depth of current CTU by the value of

${\widehat{Dep}}_{CTU}$. Therefore, the rule of CTU depth range has been formulated as

Table 1, and the depth range of current CTU can be generated with the value of

${\widehat{Dep}}_{CTU}$.

Due to the predicted depth of the current CTU, each CTU can be divided into three types:

$T1,T2,T3$. The CTU depth range can be decided from

Table 1. The expressions of the relation between CTU type,

${\widehat{Dep}}_{CTU}$, and CTU depth are as follows.

- (1)
when the predicted depth of current CTU ${\widehat{Dep}}_{CTU}$ satisfies ${\widehat{Dep}}_{CTU}$ ≤ 1.5, it means that the motions of neighboring CTUs are smooth and the depths of neighboring CTUs are small. The current CTU belongs to the still or homogeneous motion region and is classified as type $T1$. In this case, the minimum depth of current CTU $De{p}_{min}$ is equal to “0”, and the maximum depth of current CTU $De{p}_{max}$ is equal to “2”.

- (2)
when the predicted depth of current CTU ${\widehat{Dep}}_{CTU}$ satisfies 1.5 < ${\widehat{Dep}}_{CTU}$ ≤ 2.5, it means that the depths of neighboring CTUs are middle. The current CTU belongs to the moderate motion region and is classified as type $T2$. In this case, the minimum depth of current CTU $De{p}_{min}$ is equal to “1”, and the maximum depth of current CTU $De{p}_{max}$ is equal to “3”.

- (3)
when the predicted depth of current CTU ${\widehat{Dep}}_{CTU}$ satisfies 2.5 < ${\widehat{Dep}}_{CTU}$ ≤ 3, it means that the motions of neighboring CTUs are intense and the depths of neighboring CTUs are high. The current CTU belongs to the fast motion region and is classified as type $T3$.

In this case, the minimum depth of current CTU $De{p}_{min}$ is equal to “2”, and the maximum depth of current CTU $De{p}_{max}$ is equal to “3”.

#### 4.3. PU Mode Decision

The CU splitting or non-splitting is formulated as a binary classification problem

${\omega}_{i}$, where

i = 0, 1. In this work,

${\omega}_{0}$ and

${\omega}_{1}$ respectively represent CU non-splitting and CU splitting, and the variable

x represents the RD-cost of the PU. According to the Bayes’ rule, the posterior probability

$p({\omega}_{i}|x)$ can be calculated as follows:

According to Bayesian decision theory, the prior probability

$p({\omega}_{i})$ and the conditional probability

$p(x|{\omega}_{i})$ values must be known. Therefore, CU non-splitting (

${\omega}_{0}$) will be chosen if the following condition holds true:

Otherwise, CU splitting (${\omega}_{1}$) will be chosen.

The conditional probability

$p(x|{\omega}_{0})$ and

$p(x|{\omega}_{1})$ are the probability density function of the RD cost, and they are approximated by normal distributions. Defining the mean values and covariance of RD cost of CU non-splitting and splitting as

$N({\mu}_{0},{\sigma}_{0})$ and

$N({\mu}_{1},{\sigma}_{1})$, the normal function can be given by

The prior probability

$p({\omega}_{i})$ is modeled with Gibbs Random Fields (GRF) model in set

G [

30], and

$p({\omega}_{i})$ will always have the Gibbsian form

where

Z is a normalization constant, and

$E({\omega}_{i})$ is cost function.

k is the index of set

G, and

${\overline{\omega}}_{k}$ denotes the non-splitting or splitting value of the neighborhood

k-CU (

${\overline{\omega}}_{k}=-1,1$). The CU size decision deals with the binary classification problem (

${\omega}_{i}=-1,1$), and the clique potential

$\phi ({\omega}_{i},{\overline{\omega}}_{k})$ obeys the Ising model [

31]:

where the parameter

$\gamma $ is the coupling factor, which denotes the strength of current CU correlation with neighborhood

k-CU in set

G. In this work,

$\gamma $ is set to “0.75”. Then, the prior

$p({\omega}_{i})$ can be written in the factorized form:

At last, the Equation (

6) can be written as

Finally we can define the final CU decision function as

$S({\omega}_{i})$, which can be written in the exponential form

It should be noted that the statistical parameters

$p(x|{\omega}_{i})$ are estimated by using a non-parametric estimation with online learning, and are stored in a lookup table (LUT). The frames used for online updating of the values of

$({\mu}_{0},{\sigma}_{0})$ and

$({\mu}_{1},{\sigma}_{1})$ are shown as in

Figure 5. In each group of pictures(GOP), the 1st frame that can be encoded by using the original H.265/HEVC coding will be used for the online update, while the successive frames are coded by using the proposed algorithm.

Through the above analysis, the proposed PU decision based on Bayes’ rule includes the CU termination decision (inter $2N*2N$) and CU skip decision (inter $2N*2N$, $N*2N$, $2N*N$). In the case of the CU termination decision, the current CU is not divided into sub-CUs in the sub-depth. In the case of the CU skip decision, the current PU mode in current CU depth is determined at the earliest possible stage. Therefore, the flowchart of the proposed PU mode decision is described as follows.

- (1)
At the encoding time for inter prediction, first of all, look up the statistical parameters in LUT. Then, the RD cost of the inter $2N*2N$ PU mode is checked. If the condition satisfies $S({\omega}_{0})$ > $S({\omega}_{1})$ and cbf = 0, the CU termination decision is processed. Otherwise, if the condition is satisfying $S({\omega}_{1})$ > $S({\omega}_{0})$ and cbf = 1, a CU skip decision is made.

- (2)
RD cost of the inter $2N*N$ PU mode is checked. If the condition is satisfying $S({\omega}_{1})$ > $S({\omega}_{0})$ and cbf = 1, a CU skip decision is made.

- (3)
RD cost of the inter $N*2N$ PU mode is checked. If the condition is satisfying $S({\omega}_{1})$ > $S({\omega}_{0})$ and cbf = 1, a CU skip decision is made.

- (4)
Other PU modes are checked according to the H.265/HEVC reference model.

#### 4.4. The Overall Framework

Based on the above analysis, the proposed overall algorithm incorporates the CTU depth decision and the PU mode decision algorithms to reduce the computation complexity of the H.265/HEVC encoder. The flowcharts are shown in

Figure 6 and

Figure 7, respectively. The proposed CTU depth decision and PU mode decision algorithms have been discussed in

Section 4.2 and

Section 4.3.

It is noted that the maximum GOP size is equal to “8” in this work, and the value of $({\mu}_{0},{\sigma}_{0})$ and $({\mu}_{1},{\sigma}_{1})$ are updated every GOP for PU mode decision.

#### 4.5. Encoder Hardware Architecture

Figure 8 shows the core architecture of the H.265/HEVC with mode decision. By using the architecture, inter-frame prediction is used to eliminate the spatiotemporal redundancy. The proposed CU decision method can accelerate the inter-prediction module before fast rate-distortion optimization (RDO). The novel spatiotemporal neighboring set is used to reduce the complexity of inter encoder which leads to a very low-power cost. Moreover, video codec on mobile vehicles for VANETs need to be more energy efficient and more reliable, so reducing the complexity of the video encoder is important. Then, the proposed low-complexity and hardware-friendly H.265/HEVC encoder can ensure the reliability of the video codec for VANETs significantly. Moreover, as a benefit of the high complexity reduction rate, the energy consumption can be reduced for hardware design, significantly.

## 5. Experimental Results

To evaluate the performance of the proposed low-complexity and hardware-friendly H.265/HEVC encoder for VANETs, this section shows the experimental results by implementing the proposed algorithms with the H.265/HEVC reference software [

32]. The simulation environments are shown in

Table 2.

The Bjontegaard delta bit-rate (BDBR) is used to represent the average bit-rate [

33], and the average time saving (TS) is calculated as

where

$Tim{e}_{HM16.0}(Q{P}_{i})$ and

$Tim{e}_{proposed}(Q{P}_{i})$ denote the encoding time of using HM16.0 and the proposed algorithm with different QP.

In this work, the scenarios have been chosen carefully. This work focuses on the development of a video codec that supports real-time video transmission over VANETs for road safety applications. The common test conditions (CTC) are provided to conduct experiments [

34]. The test sequences in CTC have different spatial and temporal characteristics and frame rates. Furthermore, the video sequences of traffic scenarios including ‘Traffic’ and ‘BQTerrace’ (as in

Figure 9) are tested in this work. Moreover, we selected low delay (LD) configuration to reflect the real-time application scenario for all encoders.

Table 3 and

Table 4 show the performance results of the CTU depth decision, PU mode decision and the overall (proposed) methods, compared to H.265/HEVC reference software in random access (RA) and low delay (LD) configurations. From the experimental results on

Table 3, It can be seen that the encoding time can be reduced by 15.59%, 55.79%, and 50.96% on average for CTU depth decision, PU mode decision, and overall methods, while the BDBR can be incremented by only 0.11%, 0.96%, and 0.80%, respectively. From the experimental results in

Table 4, the encoding time can be reduced by 14.05%, 50.28%, and 50.23% on average for CTU depth decision, PU mode decision, and overall methods, while the BDBR can be incremented by only 0.15%, 0.79%, and 0.76%, respectively. For high-resolution of sequences such as “BQTerrace”, and “Vidyo4”, the time saving is particularly high. Therefore, the overall (proposed) algorithm can significantly reduce the encoding complexity and rarely affects encoding efficiency. Moreover, the proposed method can achieve the trade-off between the encoding complexity and the encoding efficiency. In addition, the optimal tradeoff of encoding performance can be adjusted by the coupling factor

$\gamma $. Therefore, the optimal tradeoff of encoding performance is that the encoding complexity can be reduced significantly with less than or equal to 0.8% encoding efficiency, and less low delay (LD) and random access (RA) configuration. In order to find the optimal tradeoff with coupling factor

$\gamma $, the

$\gamma $ is set to “0.5”, “0.75” and “0.85” under the same simulation environments. The compared results of the average efficiency and time saving are shown as in

Table 5. From this table we can see that, in this case of

$\gamma $ = 0.75, the encoding performance is optimal in this work.

Video objective quality evaluation can be expressed by rate–distortion (R–D) curve. The R–D curve is fitted through four data points, and PSNR/bit-rate are assumed to be obtained for QP = 22, 27, 32, 37. In addition, when an error on the predicted depth of the current CTU occurs, the bit-rate will increase. In this paper, the video objective quality is evaluated by using bit-rate and PSNR. Then the lower the accuracies of the predicted depth of the current CTU algorithm, the more the bit-rate increases.

Figure 10 shows the R–D curve of the proposed method, compared with the H.265/HEVC reference software. It can be noticed that the enlarged part of the figure shows the proposed algorithm is close to HM16.0 under the LD and RA configurations. In addition,

Figure 11 shows the time saving of the sequences “Cactus” and “BlowingBubbles”. It is noted that the encoding time can be reduced under different configurations.

The performance comparison of the proposed method is shown in

Table 6, compared to previous works [

12,

13,

14,

24,

25,

26]. Goswami’s work is based on Bayesian decision theory and Markov Chain Monte Carlo model (MCMC). Zhang’s work is based on the Bayesian method and Conditional Random Fields (CRF). Tai’s algorithm is based on depth information and RD cost. Zhu’s algorithm is based on the machine learning method. Ahn’s work is based on spatiotemporal encoding parameters. Xiong’s work is based on the latent sum of absolute differences (SAD) estimation. However, the proposed approach is based on Bayesian rule and Gibbs Random Field. Although Zhu’s method can achieve a 65.60% encoding time reduction, the BDBR is higher than the proposed method. Moreover, the increasing of the BDBR is smaller than state-of-the-art works, while the time saving is more than 50% on average. Compared with previous works [

19,

20], the proposed work can trade-off the encoding complexity and encoding efficiency successfully.