Fusion-Based Versatile Video Coding Intra Prediction Algorithm with Template Matching and Linear Prediction

The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings.


Introduction
In recent years, Internet traffic comprising multimedia data has increased rapidly, with 80% expected to be of video content in the coming years. Therefore, efficient video compression is very important. The new generation video coding standard Versatile Video Coding (VVC) [1] was officially released in July 2020. On the basis of the High Efficient Video Coding (HEVC) [2] standard, VVC has introduced many advanced technologies to further improve compression performance. Simultaneously, VVC supports a wide variety of video types such as High Definition (HD) and Ultra HD (UHD) resolution, Wide Color Gamut (WCG) video sequences, Virtual Reality (VR) videos, ultra-low delay applications, and so on.
Intra prediction predicts the current pixels by using the reconstructed pixels of the current frame to remove spatial redundancy. Recently, many works on intra prediction have been performed. Schneider et al. [26] proposed an algorithm named Sparse Coding-Based Intra Prediction (SCIP) to further improve the performance of VVC. To better predict the areas with complex textures, Li et al. [27] presented an improved intra prediction mode combination method and introduced an efficient mode coding method of syntax elements to enhance the coding performance. Yoon et al. [28] designed a method to obtain the parameters of CCLM more precisely, which compensated for the coding loss of the simplified CCLM mode. To fully utilize the advantages of Intra Block Copy (IBC) and palette coding, Zhu et al. [29] designed a compound palette mode to improve the performance of VVC on screen content coding. Since IBC cannot deal well with geometric transformations, Jayasingam et al. [30] extended IBC to adapt contents with zooms, rotations, and stretches. In [31], Weng et al. presented an L-shape-based iterative algorithm to improve intra prediction accuracy, and a residual median edge detection method was also proposed to address edge information. In [32], Yoon et al. exploited the number of occurrences of the modes in the neighboring blocks to extend the Most Probable Mode (MPM). Wang et al. [33] designed a Sample Adaptive Offset (SAO) acceleration method to reduce the complexity of VVC. Saha et al. [34] analyzed the decoder complexity of VVC on two different platforms.
Intra prediction plays an important role in video coding, and effectively compressing data can reduce the workload of the sensors. Although there are already some works that have achieved good results, there have been few that have taken the different features of different blocks and different modes into account. Hence, there is still room for further improvement. In this paper, we propose a fusion-based algorithm. The main contributions of our work are as follows: (1) We proposed a fusion-based adaptive template matching method. Its core idea is adaptive template matching. This method sufficiently considers the influence of matching errors on prediction results. The mode using this method is named mode 67; (2) We designed a fusion-based linear prediction method. Its core idea is linear prediction.
This method fully considers the linear relationship between reference pixels and the pixels to be predicted, and the correlation between different models. The mode utilizing this method is named mode 68.
The remainder of the paper is organized as follows. Section 2 introduces the related works. In Section 3, our proposed fusion-based algorithm is presented in detail. In Section 4, we conduct some experiments to verify the effectiveness of our proposed algorithm, and conclusions are drawn in Section 5.

Related Works
The Template Matching Prediction (TMP) algorithm is a widely used method in video coding. The TMP algorithm takes the reference pixels of the current Prediction Unit (PU) block as a reference template. In the reconstructed area, a certain criterion is employed to select some candidate templates with the least error, from which the final prediction value is obtained. To reduce the effect of compression noise, Tan et al. [35] proposed using the average value of several candidate templates. Gayathri et al. [36] presented a region-based TMP algorithm that could reduce the complexity and obtain good performance. In [37], Gayathri et al. further decreased the memory requirements and number of computations at the decoder side. Considering that only the local reference samples could not deal well with complex areas, Lei et al. [38] designed a two-step progressive method to use both local (derived by the high frequency coefficients) and non-local information (obtained through TMP). These methods achieved good performance, however, simply averaging does not take the different importance of different blocks into account, and a linear approach is not adaptive and does not directly consider the error. To better predict PUs with rich texture information, we proposes a fusion-based adaptive template matching method in this paper.
Linear prediction assumes a linear relationship between reference samples and the samples to be predicted. The final prediction value is obtained by constructing a linear function. Typically, CCLM linearly obtains the chroma prediction value through the corresponding reconstructed luminance samples. The process of CCLM is shown in Figure 1. First, the co-located luminance block is down-sampled. Then, the reference pixels of the current chroma block and the down-sampled luminance block are used to calculate the parameters by constructing a linear function. Finally, the chroma prediction value is obtained by a linear function. samples could not deal well with complex areas, Lei et al. [38] designed a two-step progressive method to use both local (derived by the high frequency coefficients) and nonlocal information (obtained through TMP). These methods achieved good performance, however, simply averaging does not take the different importance of different blocks into account, and a linear approach is not adaptive and does not directly consider the error. To better predict PUs with rich texture information, we proposes a fusion-based adaptive template matching method in this paper. Linear prediction assumes a linear relationship between reference samples and the samples to be predicted. The final prediction value is obtained by constructing a linear function. Typically, CCLM linearly obtains the chroma prediction value through the corresponding reconstructed luminance samples. The process of CCLM is shown in Figure  1. First, the co-located luminance block is down-sampled. Then, the reference pixels of the current chroma block and the down-sampled luminance block are used to calculate the parameters by constructing a linear function. Finally, the chroma prediction value is obtained by a linear function.
Many related works have been undertaken based on CCLM. Ghaznavi-Youvalari et al. [39] merged CCLM with an angle mode derived from the corresponding luma block to improve the chroma prediction accuracy. Zhang et al. [40] introduced three methods including Multi-Model CCLM (MMCCLM), Multi-Filter CCLM (MFCCLM), and linear mode angle prediction to further enhance the coding efficiency of CCLM. However, there have been few linear prediction optimizations for the luminance component. In [41], Ghaznavi-Youvalari et al. presented a three-parameter linear function to improve the intra prediction performance. The parameters of this method were obtained based on a mean square Error (MSE) minimization approach from the reference pixels and their locations. This method achieves good efficiency, however, when the pixel to be predicted is far from the reference samples, the prediction performance decreases with the decline in the correlation. To address this problem, we proposed a fusion-based linear prediction method. As shown in Equation (1), pred c (i, j) is the chroma pixel to be predicted, and rec L (i, j) denotes the down-sampled reconstructed luma pixel. (i, j) is the position and α, β are the parameters.
Many related works have been undertaken based on CCLM. Ghaznavi-Youvalari et al. [39] merged CCLM with an angle mode derived from the corresponding luma block to improve the chroma prediction accuracy. Zhang et al. [40] introduced three methods including Multi-Model CCLM (MMCCLM), Multi-Filter CCLM (MFCCLM), and linear mode angle prediction to further enhance the coding efficiency of CCLM. However, there have been few linear prediction optimizations for the luminance component. In [41], Ghaznavi-Youvalari et al. presented a three-parameter linear function to improve the intra prediction performance. The parameters of this method were obtained based on a mean square Error (MSE) minimization approach from the reference pixels and their locations. This method achieves good efficiency, however, when the pixel to be predicted is far from the reference samples, the prediction performance decreases with the decline in the correlation. To address this problem, we proposed a fusion-based linear prediction method.

Proposed Method
As mentioned previously, our proposed fusion-based algorithm includes two parts: fusion-based adaptive template matching (mode 67) and fusion-based linear prediction (mode 68). The overall flowchart is depicted in Figure 2. We termed the VVC's original intra prediction modes (mode 0~mode 66) as the traditional/original modes. When conducting intra prediction, we have to first decide on whether the prediction mode is traditional. If the mode is traditional, the original mode of the VVC is utilized to obtain the prediction pixels. If the mode is not traditional, we then decide on whether the mode is 67 or 68. If it is mode 67, the fusion-based adaptive template matching is employed. If it is mode 68, fusion-based linear intra prediction is used. Some modes can be selected with less distortion, called candidate modes, by rough calculation. Finally, the Rate-Distortion Cost (RDC) is utilized to determine the best mode from these candidate modes. Since the decoder can perform the same operations as the encoder to obtain the prediction and reconstructed values based on the mode number, we do not need to send extra flag bits.
fusion-based adaptive template matching (mode 67) and fusion-based linear prediction (mode 68). The overall flowchart is depicted in Figure 2. We termed the VVC's original intra prediction modes (mode 0~mode 66) as the traditional/original modes. When conducting intra prediction, we have to first decide on whether the prediction mode is traditional. If the mode is traditional, the original mode of the VVC is utilized to obtain the prediction pixels. If the mode is not traditional, we then decide on whether the mode is 67 or 68. If it is mode 67, the fusion-based adaptive template matching is employed. If it is mode 68, fusion-based linear intra prediction is used. Some modes can be selected with less distortion, called candidate modes, by rough calculation. Finally, the Rate-Distortion Cost (RDC) is utilized to determine the best mode from these candidate modes. Since the decoder can perform the same operations as the encoder to obtain the prediction and reconstructed values based on the mode number, we do not need to send extra flag bits.

Fusion-Based Adaptive Template Matching
Usually, high correlation and similar textures are prevalent with blocks, and the TMP algorithm can find candidate blocks with the least errors in the reconstructed area to better predict the region with similar textures. Since it searches and compares pixel by pixel, it usually performs well for blocks with similar textures. The process of template matching

Fusion-Based Adaptive Template Matching
Usually, high correlation and similar textures are prevalent with blocks, and the TMP algorithm can find candidate blocks with the least errors in the reconstructed area to better predict the region with similar textures. Since it searches and compares pixel by pixel, it usually performs well for blocks with similar textures. The process of template matching is shown in Figure 3, in which T r is the reference template and T i is the candidate template. B p is the block to be predicted and B i is the corresponding prediction block of T i . w i is the corresponding weighting factor of the candidate block. First, we find the candidate blocks T i through T r in the reconstructed area, and then B i is utilized to obtain B p . is shown in Figure 3, in which r T is the reference template and i T is the candidate template. p B is the block to be predicted and i B is the corresponding prediction block of i T . i w is the corresponding weighting factor of the candidate block. First, we find the candidate blocks i T through r T in the reconstructed area, and then i B is utilized to ob- Considering that the error between r T and i T has a large influence on the prediction result, we used it as a key basis for weight selection. To balance the compression performance and time complexity, we limited the searching area to 64 × 64 and chose the best four candidate templates through the MSE minimization criteria. The MSE is obtained by: σ is the MSE between the reference template r T and i-th candidate template i T . N is the total number of pixels in r T . If 2 i σ is large, it implies that the difference between r T and i T is large, and the corresponding weight is set to be small. If 2 i σ is small, it implies that the candidate template is close to the reference template, and the weight is set to be large. Consequently, the temporary weight can be obtained by introducing a logarithm function as follows: where i λ is the temporary weight of the i-th candidate template. By normalization, the final weighting factor i w of the i-th candidate template is obtained by: Considering that the error between T r and T i has a large influence on the prediction result, we used it as a key basis for weight selection. To balance the compression performance and time complexity, we limited the searching area to 64 × 64 and chose the best four candidate templates through the MSE minimization criteria. The MSE is obtained by: where σ 2 i is the MSE between the reference template T r and i-th candidate template T i . N is the total number of pixels in T r . If σ 2 i is large, it implies that the difference between T r and T i is large, and the corresponding weight is set to be small. If σ 2 i is small, it implies that the candidate template is close to the reference template, and the weight is set to be large. Consequently, the temporary weight can be obtained by introducing a logarithm function as follows: where λ i is the temporary weight of the i-th candidate template. By normalization, the final weighting factor w i of the i-th candidate template is obtained by: Then, the final prediction value p(x, y) can be obtained by: Generally, the TMP has a high time cost due to the pixel-by-pixel comparison and error calculation. It sacrifices time in exchange for performance gains. Hence, fusionbased adaptive template matching is only employed when the PU size is smaller or equal to 32 × 32. Adding this limitation is important because when the PU size is large, the TMP algorithm will trade high time complexity for coding gain, which is not worthwhile.
Simultaneously, the texture information of large PUs is simple, so other modes can achieve good results.

Fusion-Based Linear Prediction
Linear prediction is a simple but efficient method because there is usually a high linear correlation between the pixels to be predicted and the reference samples. However, as the pixels to be predicted move away from the reference pixels, the correlation between them will weaken, and the performance of linear prediction will decline. To address this issue, we present a fusion-based linear prediction method based on Ghaznavi-Youvalari's work [41].
When the PU size is small, single three-parameter linear prediction can achieve good results. However, when the PU size is large, prediction accuracy declines with the weakening in correlation. Hence, we combined the three-parameter mode with planar mode to obtain the final prediction value. Planar mode obtains the prediction value by weighting the pixels in the horizontal and vertical directions, where the weights are related to the distance. The prediction process of the planar mode is shown in Figure 4.
Generally, the TMP has a high time cost due to the pixel-by-pixel comparison and error calculation. It sacrifices time in exchange for performance gains. Hence, fusion-based adaptive template matching is only employed when the PU size is smaller or equal to 32 × 32. Adding this limitation is important because when the PU size is large, the TMP algorithm will trade high time complexity for coding gain, which is not worthwhile. Simultaneously, the texture information of large PUs is simple, so other modes can achieve good results.

Fusion-Based Linear Prediction
Linear prediction is a simple but efficient method because there is usually a high linear correlation between the pixels to be predicted and the reference samples. However, as the pixels to be predicted move away from the reference pixels, the correlation between them will weaken, and the performance of linear prediction will decline. To address this issue, we present a fusion-based linear prediction method based on Ghaznavi-Youvalari's work [41].
When the PU size is small, single three-parameter linear prediction can achieve good results. However, when the PU size is large, prediction accuracy declines with the weakening in correlation. Hence, we combined the three-parameter mode with planar mode to obtain the final prediction value. Planar mode obtains the prediction value by weighting the pixels in the horizontal and vertical directions, where the weights are related to the distance. The prediction process of the planar mode is shown in Figure 4.  The prediction value of horizontal p h (x, y) is obtained by: where (x, y) is the position of the current pixel. a, b are the reference samples and lw is the width of PU. Similarly, the prediction value of vertical p v (x, y) can be obtained by: where c, d are the reference samples and lh is the height of PU. Then, the final prediction value is the average of p h (x, y) and p v (x, y).
We can see that the prediction process of the planar mode is very close to linear prediction. Hence, the combination of planar mode and three-parameter linear mode can partially compensate for the shortcomings of a single linear prediction and improve the prediction precision.
When the PU size is smaller or equal to 32 × 32, three-parameter linear prediction is used to obtain the final pixels. The linear function is constructed by: where p(x, y) is the prediction value. a 0 , a 1 , and a 2 are the parameters to be calculated. The specific solution is shown in [41]. When the PU size is larger than 32 × 32, both three-parameter linear prediction and planar mode are utilized to obtain the final prediction value: where p linear (x, y) is the prediction value of three-parameter linear prediction and p planar (x, y) is that of the planar mode. w 1 and w 2 are the corresponding weighting factors. Since threeparameter linear prediction and planar mode adopt different prediction methods and they can provide different prediction information, they are equally important. Therefore, the two weighting factors were both set as 0.5.

Experimental Environment
To verify the effectiveness of our proposed fusion-based algorithm, we implemented it on top of VVC Test Model 9.1 (VTM9.1). Coding conditions were followed by the Joint Video Exploration Team (JVET) Common Test Condition (CTC) [42]. Sixteen test sequences with four kinds of resolution were utilized. The resolutions included 416 × 240, 832 × 480, 1280 × 720, and 1920 × 1080. We selected four common test QP values ∈ {22, 27, 32, 37} to code video sequences in this work. Only the first 30 frames of each sequence were coded with All Intra (AI) configurations due to the experimental conditions.

Compression Performance
First, we tested the compression performance of our proposed fusion-based algorithm by comparing it with the VVC anchor. The Bjøntegaard Delta Rate (BD-Rate) method [43] was utilized to assess the compression performance. In addition to the respective BD-Rate of the three components, we also calculated the weighted BD-Rate of the three components by: where BDRate YUV is the weighted YUV BD-Rate. BDRate Y , BDRate U , and BDRate V represent the bitrate of Y, Cb, and Cr, respectively. Considering that class F is the screen content sequence and the others are the natural content sequence, we made a distinction between them when calculating the average BD-Rate. If the BD-Rate is negative, it indicates that the performance of the VVC has improved. Otherwise, the coding performance of the VVC has deteriorated. As shown in Table 1, compared with the VVC anchor, our proposed fusion-based algorithm saved a bitrate of 0.89%, 0.84%, and 0.90% on average (up to 2.69%, 2.81%, and 2.81%) for components Y, Cb, and Cr, respectively. Simultaneously, our algorithm was particularly efficient for sequences such as "BasketballDrive" (1920 × 1080), "BQTerrace" (1920 × 1080), "Johnny" (1280 × 720), "BasketballPass" (416 × 240), and so on. This is mainly because there are many similar areas in these sequences. The correlation between blocks, and the correlation between reference pixels and the pixels to be predicted were relatively high. For most sequences in Class C and Class D, their texture content was very rich and the texture information varied greatly. For these two classes, the performance of our algorithm was not particularly outstanding since our proposed algorithm is suitable for videos with more similar texture areas. For almost all sequences, our proposed fusion-based algorithm achieved good results, which verified its effectiveness. The horizontal axis is the bitrate and the vertical axis is the YUV-PSNR. If the curve of our proposed algorithm is above the VVC, this shows that the peak signal-tonoise ratio (PSNR) of our proposed algorithm is higher than that of VVC for the same bitrate, indicating that our proposed algorithm enhances the compression performance. Otherwise, our proposed algorithm deteriorated the performance. In Figure 6, the blue curve is VTM9.1, and the red one is proposed. We enlarged a small part (the position of the green rectangle) of each curve to show the comparison of results more intuitively. Clearly, our RD curves were higher than that of VVC. For areas with a similar texture, our proposed fusion-based algorithm could better model the relationship between the reference pixels and pixels to be predicted. Hence, for the same PSNR, our algorithm needs less bitrate, which means that our proposed algorithm improves the compression performance of VVC. curve is VTM9.1, and the red one is proposed. We enlarged a small part (the position of the green rectangle) of each curve to show the comparison of results more intuitively. Clearly, our RD curves were higher than that of VVC. For areas with a similar texture, our proposed fusion-based algorithm could better model the relationship between the reference pixels and pixels to be predicted. Hence, for the same PSNR, our algorithm needs less bitrate, which means that our proposed algorithm improves the compression performance of VVC.   curve is VTM9.1, and the red one is proposed. We enlarged a small part (the position of the green rectangle) of each curve to show the comparison of results more intuitively. Clearly, our RD curves were higher than that of VVC. For areas with a similar texture, our proposed fusion-based algorithm could better model the relationship between the reference pixels and pixels to be predicted. Hence, for the same PSNR, our algorithm needs less bitrate, which means that our proposed algorithm improves the compression performance of VVC.   Moreover, we compared our proposed fusion-based algorithm with some existing works to further verify the effectiveness of our work. This is detailed in Table 2. In most cases, the performance of our proposed algorithm was superior to the others. In [28], Yoon et al. optimized the CCLM algorithm and achieved good results, although their work only enhanced the channel correlation between the luminance and chroma components and did not consider the correlation of the stronger chroma components themselves. Therefore, overall, our algorithm performed better. In [32], Yoon et al. obtained the mode of the current PU block through the neighboring blocks and achieved good results. However, they only considered the correlation between the current block and its neighbors, not the correlation between the more distant blocks and the correlation between the reference pixels and the pixels to be predicted. Therefore, in general, our proposed algorithm has superior performance. In [26], Schneider et al. obtained the final prediction value through sparse coding and achieved good results, although the dictionary is finite, which limits the performance of their algorithm. Our proposed fusion-based algorithm calculates the corresponding prediction value according to different PUs, which results in improved performance. Moreover, we compared our proposed fusion-based algorithm with some existing works to further verify the effectiveness of our work. This is detailed in Table 2. In most cases, the performance of our proposed algorithm was superior to the others. In [28], Yoon et al. optimized the CCLM algorithm and achieved good results, although their work only enhanced the channel correlation between the luminance and chroma components and did not consider the correlation of the stronger chroma components themselves. Therefore, overall, our algorithm performed better. In [32], Yoon et al. obtained the mode of the current PU block through the neighboring blocks and achieved good results. However, they only considered the correlation between the current block and its neighbors, not the correlation between the more distant blocks and the correlation between the reference pixels and the pixels to be predicted. Therefore, in general, our proposed algorithm has superior performance. In [26], Schneider et al. obtained the final prediction value through sparse coding and achieved good results, although the dictionary is finite, which limits the performance of their algorithm. Our proposed fusion-based algorithm calculates the corresponding prediction value according to different PUs, which results in improved performance.

Mode Using Probability
Since our new added modes were in competition with the original VVC intra prediction modes, we calculated the using probability of our new modes, as shown in Equation (11). Five sequences were used to conduct this experiment.
where η is the using probability. N mode and N total represent the using numbers of a single mode and all modes, respectively. If η is large, it shows that compared with other modes, our modes are chosen more often, indicating that our algorithm shows superior RD performance. The using probability of mode 67 (the mode using fusion-based adaptive template matching) and mode 68 (the mode utilizing fusion-based linear prediction) in the luminance component are illustrated in Tables 3 and 4, respectively. Both mode 67 and mode 68 were selected to some extent. Specifically, mode 67 was often chosen because of its good performance. Its maximum using probability reached 10.50% for "Johnny" (1280 × 720). The performance of mode 68 was not as appealing as mode 67. Since the TMP algorithm exploits the correlation between blocks and mode 68 exploits the linear relationship between the reference pixels and the pixels to be predicted, TMP generally works better. However, for some blocks, mode 67 could not strike a good balance between complexity and performance, while mode 68 could achieve better results. Hence, mode 68 also made some contributions to video compression. For "BQMall" (832 × 480), the using probability of mode 68 reached 0.28%. These results illustrate that our proposed algorithm showed good compression efficiency.
To show the using probability more intuitively, we plotted the numbers being used in all luma modes of the sequence "BasketballPass" (416 × 240) in QP 32 and "BQTerrace" (1920 × 1080) in QP 27, as shown in Figure 8. The X-axis is the mode number, and the Y-axis is the number being used for each mode. We drew two dotted lines with the numbers being used for mode 67 and mode 68, respectively, to show the comparison of the results more intuitively. Here, the red dotted line represents the numbers of mode 67 being used, and the green one represents the numbers of mode 68 being used. If the blue "+" is below the red dotted line, it means that the using numbers of this mode are less than that of mode 67. Correspondingly, if the blue "+" is below the green dotted line, it means that the numbers being used of this mode are less than that of mode 68. In Figure 8b, there is no red dotted line. This is because mode 67 had been used more than 1200 times and could not be explicitly shown in the figure. Clearly, the numbers being used for mode 67 were much higher than that of most VVC traditional modes. Although the using probability of mode 68 was only 0.29% ("BasketballPass" in QP 32) and 0.22% ("BQTerrace" in QP 22), the numbers being used for mode 68 were still higher than some of the original VVC intra prediction modes. red dotted line. This is because mode 67 had been used more than 1200 times and could not be explicitly shown in the figure. Clearly, the numbers being used for mode 67 were much higher than that of most VVC traditional modes. Although the using probability of mode 68 was only 0.29% ("BasketballPass" in QP 32) and 0.22% ("BQTerrace" in QP 22), the numbers being used for mode 68 were still higher than some of the original VVC intra prediction modes.  Table 5 shows the total using probability of mode 67 and mode 68 in the chroma component. For the chroma blocks, the optimal mode can be mode 67 or mode 68, only if that of the corresponding luminance block is mode 67 or mode 68 according to the VVC prediction process. Therefore, the total using probability of our proposed modes in the chroma component was lower than that of the luminance component. In the chroma component, there were also correlations between the different blocks and different modes, hence, our algorithm was still effective for the chroma components. In addition, we present the modes using subjective graphs of some sequences in QP 27. Figures 9-11 show the respective results. The red rectangular boxes are the blocks that utilized our proposed algorithm to conduct intra prediction. As can be seen, in some areas with similar textures, our modes were more competitive. This is because the traditional VVC intra prediction modes are more suitable for blocks with a single texture direction or simple areas, and our proposed algorithm can partly compensate for this shortcoming. These results further verify the good performance of our proposed fusion-based algorithm.  Table 5 shows the total using probability of mode 67 and mode 68 in the chroma component. For the chroma blocks, the optimal mode can be mode 67 or mode 68, only if that of the corresponding luminance block is mode 67 or mode 68 according to the VVC prediction process. Therefore, the total using probability of our proposed modes in the chroma component was lower than that of the luminance component. In the chroma component, there were also correlations between the different blocks and different modes, hence, our algorithm was still effective for the chroma components. In addition, we present the modes using subjective graphs of some sequences in QP 27. Figures 9-11 show the respective results. The red rectangular boxes are the blocks that utilized our proposed algorithm to conduct intra prediction. As can be seen, in some areas with similar textures, our modes were more competitive. This is because the traditional VVC intra prediction modes are more suitable for blocks with a single texture direction or simple areas, and our proposed algorithm can partly compensate for this shortcoming. These results further verify the good performance of our proposed fusion-based algorithm.

Conclusions
In this paper, we designed an efficient fusion-based algorithm to enhance the compression performance of VVC intra prediction. First, we presented a fusion-based adaptive template matching method to better predict areas with similar texture. The error between the reference template and candidate template was used to make the prediction results more precise. Second, we presented a fusion-based linear prediction method to fully utilize the correlation between the reference samples and the pixels to be predicted. This method also compensated for the shortcomings of the single linear prediction. Experiments verified the effectiveness of our proposed algorithm. Compared with VTM 9.1, the bitrate savings for components Y, Cb, and Cr achieved 0.89, 0.84, and 0.90% on average, respectively. The maximum bitrate savings were up to 2.69%, 2.81%, and 2.81% for Y, Cb, and Cr, respectively. In addition, the comparison with other works and using probability further verified that our proposed algorithm improved the intra prediction performance. For some classes, the bitrate saving was not that notable. This is mainly because the VVC itself has adopted many new techniques to optimize the performance of intra

Conclusions
In this paper, we designed an efficient fusion-based algorithm to enhance the compression performance of VVC intra prediction. First, we presented a fusion-based adaptive template matching method to better predict areas with similar texture. The error between the reference template and candidate template was used to make the prediction results more precise. Second, we presented a fusion-based linear prediction method to fully utilize the correlation between the reference samples and the pixels to be predicted. This method also compensated for the shortcomings of the single linear prediction. Experiments verified the effectiveness of our proposed algorithm. Compared with VTM 9.1, the bitrate savings for components Y, Cb, and Cr achieved 0.89, 0.84, and 0.90% on average, respectively. The maximum bitrate savings were up to 2.69%, 2.81%, and 2.81% for Y, Cb, and Cr, respectively. In addition, the comparison with other works and using probability further verified that our proposed algorithm improved the intra prediction performance. For some classes, the bitrate saving was not that notable. This is mainly because the VVC itself has adopted many new techniques to optimize the performance of intra Figure 11. The mode subjective graphs. Images from the JVET CTC [42]: (a) "BasketballDrive" (1920 × 1080); (b) "BQTerrace" (1920 × 1080).

Conclusions
In this paper, we designed an efficient fusion-based algorithm to enhance the compression performance of VVC intra prediction. First, we presented a fusion-based adaptive template matching method to better predict areas with similar texture. The error between the reference template and candidate template was used to make the prediction results more precise. Second, we presented a fusion-based linear prediction method to fully utilize the correlation between the reference samples and the pixels to be predicted. This method also compensated for the shortcomings of the single linear prediction. Experiments verified the effectiveness of our proposed algorithm. Compared with VTM 9.1, the bitrate savings for components Y, Cb, and Cr achieved 0.89, 0.84, and 0.90% on average, respectively. The maximum bitrate savings were up to 2.69%, 2.81%, and 2.81% for Y, Cb, and Cr, respectively. In addition, the comparison with other works and using probability further verified that our proposed algorithm improved the intra prediction performance. For some classes, the bitrate saving was not that notable. This is mainly because the VVC itself has adopted many new techniques to optimize the performance of intra prediction, which makes it difficult to improve the performance. In the future, we will try to further improve the performance of intra prediction.