Analog Video Encoding and Quality Evaluation

: The most widespread analog video encoding systems in the literature are based on the use of the 2D and 3D DCT. These systems use both transformations indistinctly without assessing their suitability. In this paper, we present procedures to compress video using 2D and 3D-DCT and we evaluate the video quality for different compression levels.


Introduction
Nowadays, the multimedia use of computers and mobile devices is increasing because of its applications. Within this, the transmission of information quickly and without delay is essential for applications and users. In this environment, video transmission plays a predominant role with a great challenge due to the relationship between transmitting a large amount of data and compressing it by introducing a high computational load.
In recent years, if we talk about video transmission and compression, we are talking about digital processes. These systems are capable of delivering high performance in the vast majority of possible scenarios. Although it is true that these digital systems present some well-known problems. On one hand, video compression techniques look for spatial and temporal correlations and require a high computational load. On the other hand, if data cannot be recovered without errors, retransmissions are needed, which degrades the delay of the communication link.
An alternative approach to digital systems is to use analog transmissions, which provide low delay and low complexity. Most of the existing works regarding analog video encoding and transmission propose hybrid analog-digital schemes, where the analog part consists on the use of the discrete cosine transform (DCT). The idea of using the DCT is that the components at higher frequencies correspond to the most important visual information; hence, some of the coefficients can be discarded without affecting the image quality. Some of the proposed systems use the 2D-DCT [1], whereas others consider the 3D-DCT [2,3]. However, in most of them, the digital part is a key component. In order to compare the image quality after the transmission there exist different metrics: peak signal-to-noise ratio (PSNR) is considered in [1][2][3], whereas structural similarity (SSIM) and signal-to-distortion ratio (SDR) is employed in [3].
In this context, we compare both transformations, 2D and 3D DCT, in videos with movement and static scenarios. We have to stablish some metrics to evaluate and compare the system evaluation parameters such as the compression ratio, related to the transmitted frequencies, and image quality, measured in terms of PSNR and SSIM.

System Description
We propose two analog schemes for video encoding: one using the 2D-DCT and another one employing the 3D-DCT. The 2D-DCT system encodes each individual video frame using the analog scheme proposed in [4] for still images, although in this case the correlation between frames is not considered. In this system, each frame is divided into 8 × 8 blocks and the DCT transformation is applied to each block. The resulting DCT coefficients are stacked onto a vector following a zigzag pattern. Thus, the resulting vector will be sorted from low to high frequencies. The symbols corresponding to the higher frequencies are discarded, thus compressing the image and reducing its visual quality.
Regarding the 3D-DCT system, the entire video is firstly divided into sequences of 8 frames each. Next, each sequence is divided into blocks of 8 × 8 pixels, thus the whole video is split in cubes with dimension 8 × 8 × 8. We now define de concept of symbol as the pixel intensity (luminance in our case) expressed as an integer number ranging between 0 and 255. A weight is then assigned to each symbol according to its low or high frequency using the 3D pattern defined in [5] to rearrange the symbols into a sequence from the lower to the higher frequencies. As in the 2D-DCT case, the symbols corresponding to the higher frequencies are discarded, hence compressing the image and reducing its visual quality.
Next, a comparison with the original video sequence is carried out to determinate the video quality related to the compression factor. The metric consists in comparing each original frame with the compressed one and averaging out the result. More specifically, in this paper both the SSIM and PSNR as considered since they are the most used metrics to perform this type of comparisons [1][2][3].

Results
The results presented here are the product of averaging out three different video sequences. We have also taken into account different scenarios to get a fair comparison. We tested resolutions ranging from SD to 1080p, as well as with different static and motion sequences. Figure 1 shows the results obtained from the system simulations. The results show that the quality of the video is higher for the 3D-DCT than for the 2D-DCT and for both metrics: PSNR and SSIM. Note that in Figure 1a, when the compression factor becomes 1, the PSNR approaches infinite.

Conclusions
The results show that with the same video quality it is possible to achieve a higher compression level when the 3D-DCT is employed instead of the 2D-DCT. This is because the 3D-DCT considers both spatial and temporal correlation at the same time, hence reducing the amount of redundant information in the three dimensions.
In view of the results, it can be safely concluded that the 3D-DCT can be considered as an improvement with respect to the 2D-DCT for video applications.