Block Compressive Sensing Single-View Video Reconstruction Using Joint Decoding Framework for Low Power Real Time Applications

: Several real-time visual monitoring applications such as surveillance, mental state monitoring, driver drowsiness and patient care, require equipping high-quality cameras with wireless sensors to form visual sensors and this creates an enormous amount of data that has to be managed and transmitted at the sensor node. Moreover, as the sensor nodes are battery-operated, power utilization is one of the key concerns that must be considered. One solution to this issue is to reduce the amount of data that has to be transmitted using speciﬁc compression techniques. The conventional compression standards are based on complex encoders (which require high processing power) and simple decoders and thus are not pertinent for battery-operated applications, i.e., VSN (primitive hardware). In contrast, compressive sensing (CS) a distributive source coding mechanism, has ransformed the standard coding mechanism and is based on the idea of a simple encoder (i.e., transmitting fewer data-low processing requirements) and a complex decoder and is considered a better option for VSN applications. In this paper, a CS-based joint decoding (JD) framework using frame prediction (using keyframes) and residual reconstruction for single-view video is proposed. The idea is to exploit the redundancies present in the key and non-key frames to produce side information to reﬁne the non-key frames’ quality. The proposed method consists of two main steps: frame prediction and residual reconstruction. The ﬁnal reconstruction is performed by adding a residual frame with the predicted frame. The proposed scheme was validated on various arrangements. The association among correlated frames and compression performance is also analyzed. Various arrangements of the frames have been studied to select the one that produces better results. The comprehensive experimental analysis proves that the proposed JD method performs notably better than the independent block compressive sensing scheme at di ﬀ erent subrates for various video sequences with low, moderate and high motion contents. Also, the proposed scheme outperforms the conventional CS video reconstruction schemes at lower subrates. Further, the proposed scheme was quantized and compared with conventional video codecs (DISCOVER, H-263, H264) at various bitrates to evaluate its e ﬃ ciency (rate-distortion, encoding, decoding).


Introduction
The transformation of wireless sensor networks (WSNs) into visual sensor networks (VSNs) has set new horizons in the IT ecological space, making it broadly adapted in various applications. Equally, the coupling of high-quality cameras with sensors has greatly increased the size of the data that has to be managed and transmitted (increasing the computational burden). Furthermore, as the sensor nodes are battery-operated, power utilization is a key concern that must be considered. One potential solution to the aforementioned issues is to reduce the amount of data transmitted using specific image/video compression techniques. In a conventional video capturing arrangement, a compression algorithm is usually applied to the video data once the complete frame set is obtained to exploit the inter-view and intra-view correlations between the frames. Various approaches have been proposed to exploit these correlations for standard video compression [1,2]. The standard complex-encoder simple-decoder method is mostly used by traditional video compression schemes i.e., at encoder heavy processing compared to the decoder. The encoder usually uses the motion estimation and compensation (ME/MC) method to utilize the correlations between the frames that require high computation. Thus, the conventional schemes are not apt for low power real-time applications, i.e., VSN (battery-powered nodes with primitive hardware). Most of the conventional schemes entail the use of extra computational resources for data processing leading to additional power requirements, with few trade-offs between the reconstruction quality and computational constraints [1].
The emergence of compressive sensing (CS) [3] has widened the approach for 3D reconstruction, multi-camera imaging, armed personnel tracing and surveillance, magnetic resonance imaging (MRI) and seismic identification for applications such as surveillance, mental state monitoring, driver drowsiness and patient care, etc. CS is specifically useful for the transmission of correlated data for low power real-time applications, i.e., visual sensors networks and wireless sensors networks. CS has transformed the idea of conventional data compression schemes as it is based on the principle of exemplifying a signal using a much smaller sampling rate than the standard Nyquist rate. Hence, using CS is one of the best solutions. Unlike the conventional schemes it is based on a simple encoder (i.e., transmitting fewer data) and a complex decoder. It is also known that the primary motivation for considering CS is that it offers the potential to reduce sensor costs and computational complexity significantly. For example, it is proposed that the famous single-pixel camera, with only one photo-sensor, could provide an imaging sensor that is much cheaper to build than one with several million photosensors. Moreover, CS is a form of dimensionality reduction only, and it is not specifically a type of compression [4].
CS makes sense only when the dimensionality reduction that it provides takes place directly within the sensing device's hardware. In other words, the image never exists anywhere in the sensor in its full dimensionality, unlike most of the conventional video coding schemes in which the acquired image is used in its full dimensionality, increasing the load on the sensor node. However, CS can be used for compression, provided that the dimensionality reduction it offers is coupled with quantization and entropy coding to create a bit of data from the CS measurements.
In this paper, a joint decoding (JD) framework is proposed. The proposed scheme aims to decrease the redundancies among video frames at the decoder using side information. The proposed scheme first decodes the encoded keyframes and non-key frames of a video sequence received at the host workstation. After independently decoding the frames, a side information process is initiated. The side information produced is utilized to enhance the reconstruction quality of the non-key frames. The side information process consists of registration and fusion steps used to exploit the inter/intra view correlation between the frames to generate a frame prediction. Next, the difference among the predicted frame measurements and the acquired measurements is calculated to minimize the prediction errors. The difference at the level of the measurement, known as the residual measurements, is recovered to produce the residual. The final frame reconstruction is performed by adding the residual and the predicted frame to compensate for the difference.
The rest of the paper is organized as follows: Section 2 presents an overview of the various existing CS-based video reconstruction coding schemes. A detailed explanation of the proposed JD scheme for single view video reconstruction using block CS is provided in Section 3. In Section 4 the experimental results are presented and the conclusions are drawn in Section 5.

Literature Review
Compressive sensing (CS) has emerged as one of the most substantial mechanisms for compressing visual data (video) in recent times. The principle is to sample each video frame independently using CS (encoder), and then at a decoder the correlations within the video frames are exploited using joint reconstruction schemes for image reconstruction. In literature various video reconstruction coding schemes have been proposed (dictionary-based coding, 3D-transformation coding, residual-based coding, etc), each having their benefits and challenges. In the following, only the state-of-the-art residual coding schemes are discussed as the proposed model relies on residual-based coding and will be compared with these schemes. However, complete explanations of the other coding schemes can be found in [5][6][7][8][9][10][11].
In [12], the modified-CS-residual scheme is presented. The scheme is based on a residual reconstruction approach in which side information helps manage the reconstruction issue related to sparse signals (minutest quantity of linear projections). The least mean squares or Kalman filter estimation methods are used to produce side information. However, this side information can be exposed to errors. The proposed scheme aims to counter the convex relaxation problem correlated to data restraints and sparsity outside the side information.
A reconstruction scheme known as k-t FOCUSS is presented in [13,14]. It is also built on a similar residual reconstruction idea. The scheme first generates side information assuming n keyframes using disparity (compensation/estimation) predictions and a residual encoding approach. This leads to an ideal sample allocation among the estimated and residual steps. The residuals among the bidirectional (DC/DE) estimation of each keyframes and non-key frame is used to attain the reconstruction.
A DC/DE-based reconstruction scheme with residuals is introduced in [15]. The scheme generates the side information by integrating DC/DE-based prediction and is referred to as DC-BCS-SPL. The final reconstruction is aided by calculating the residuals among the side information and the original view.
Also, in [16], a DC/DE-based joint reconstruction scheme is proposed. The proximal gradient method is adopted by the proposed scheme to resolve the optimization problem.
The motion (compensation/estimation)-based scheme is introduced in [17]. The proposed MC/ME scheme is incorporated into the block compressive sensing (BCS-SPL) video restoration method and called MC-BCS-SPL. In this approach, each frame of the video sequence is initially sampled using random block-based CS measurements (at the encoder) and transmitted to the decoder. The received encoded measurements are then decoded and the proposed residual dependent ME/MC approach is applied to generate the final view. The proposed scheme generates video frames alternately along with the motion fields associated with the frames i.e., using one to improve the other iteratively.
In [18], a joint reconstruction algorithm using MC/ME and fusion is presented. The proposed scheme uses the down-sample approach (lowering sampling rate of the views) and then MC/ME and a fusion approach are applied on the down-sampled views to produce a view prediction, which helps generate the final view.
A view estimation and residual reconstruction-based joint reconstruction method is introduced in [19]. The method integrates a MC/ME approach into the reconstruction to generate the side information. The side information then helps in the reconstruction of the final output.
The residual coding schemes discussed above help improve the reconstruction quality of the video frames, however, they are exposed to a few issues such as imprecise estimation or computational intricacy. For example, Ref. [12] claims that the proposed scheme is more suitable for video applications and uses least mean squares (LMS) or Kalman filter prediction methods. However, the scheme is exposed to prediction errors. The schemes proposed in [13][14][15][16][17][18][19] are mostly based on DC/MC and DE/ME prediction methods. In such a scheme, precise predictions are hard to achieve if basic transformation (translation/affine) models are used. This is because video captured from different view angles might display some distortions that are hard to handle by using fundamental transformations. Thus, the proposed scheme usually uses a more complex transformation model to resolved prediction issues.
However, such a solution might lead to an additional computational burden. A summary of CS-based residual coding schemes for single view video reconstruction is presented in Table 1. Table 1. Summary of various CS-based residual coding schemes for visual reconstruction.

Work
Scheme Description Issues [12] Kalman-Filter (KF) based Prediction Make use of side information to handle the sparse signals reconstruction issue (minimum number of linear projections). It also resolves the convex relaxation issue linked with data restraints and sparsity, outer the side information.
It is usually viable for video applications, as it assumes that the sparsity pattern develops progressively from frame to frame. Besides, Prediction make use of state dynamics and measurement models. The prediction might fail due to incorrect initialization of the model and filter. A tree-structured KF algorithm can be used to cater the issue at the cost of higher computational resources. [13]

Disparity Estimation & Compensation (DE/DC) based Prediction
Incorporates disparity estimation and compensation prediction methods into the reconstruction process of BCS-SPL that generate side information aiding final reconstruction.
Introduces discontinuities at the block borders (blocking artefacts). Accurate predictions are hard to achieve with fundamental estimation, and compensation algorithms as images/frames captured from different view angles may exhibit some deformations. May result in producing false edges and ringing effects. Requires more complex estimation and compensation algorithms that result in additional computational burden.
Significant improvements at higher subrate as compared to lower subrate i.e., the scheme does not accurately predict the motions due to the smaller number of measurements at lower subrates that leads to low-quality initial reconstructions [14] Solve the optimization problem by implementing the proximal-gradient method, and side information is generated by DE/DC prediction approach. [15,16]

Motion Estimation & Compensation (ME/MC) based Predication
Uses motion-based prediction and residual encoding to optimize the sample allocation among the estimated and residual encoding steps. [17] The MC/ME approach is incorporated with the BCS-SPL reconstruction process for the video. The video sequence frames are generated alternatively i.e., one helps to improve the quality of the other iteratively. [18] The scheme uses low sample rate, ME/MC and fusion method to produce a view prediction, which helps in the generation of an ultimate view for the multi-view video. [19] The MC/ME approach is incorporated with the BCS-SPL reconstruction process for multi-view video.

Proposed Joint Decoding (JD) Framework for Video
In this section, the proposed joint decoding (JD) framework for single view video is discussed. The archetype of the complete system is shown in Figure 1. At the encoder side, we consider a visual node S, capturing a scene (video), and block-based compressive sensing (BCS) is applied on each frame in the video sequence for encoding. The encoded frames are then independently transmitted to the host workstation. At the host workstation (decoder), a sequence of J recurrent frames is received from the visual node S, referred to as a group of pictures (GoP). As the video is uninterrupted, it is established that the current GoP is tailed by another GoP. The GoP comprises a keyframe and J-1 non-key frames, represented as FK (the primary) and FNK. The encoding of FK and FNK is performed at the subrate of MK and MNK, respectively, with MK > MNK.
At the visual node (encoder), each frame Fx (where x is the frame number) in the video sequence is initially portioned into a small block of 16×16. Then, the sensing matrix Φx as in Equation (1) is used to sample each block within a frame producing a set of measurements (Yx) as in Equation (2)   At the host workstation (decoder), a sequence of J recurrent frames is received from the visual node S, referred to as a group of pictures (GoP). As the video is uninterrupted, it is established that the current GoP is tailed by another GoP. The GoP comprises a keyframe and J-1 non-key frames, represented as F K (the primary) and F NK . The encoding of F K and F NK is performed at the subrate of M K and M NK , respectively, with M K > M NK .
At the visual node (encoder), each frame F x (where x is the frame number) in the video sequence is initially portioned into a small block of 16 × 16. Then, the sensing matrix Φ x as in Equation (1) is used to sample each block within a frame producing a set of measurements (Y x ) as in Equation (2) [20][21][22]: The set of measurements (Y x ) produced at the encoder is sent to the host workstation (server) independently. At the server side (decoder), firstly, the received encoded frames (Y x ) are decoded independently (frame by frame) using the total variation (TV) minimization [23,24] problem till a complete GoP is obtained. The TV minimization makes use of piece-wise smooth features of the signals to provide a better solution within the possible space rather than finding the sparse solution within the transformation domain Ψ. The basic TV minimization function is given in Equation (4): However, the CS reconstruction using TV minimization as in Equation (4) is exposed to extra computational problems that limit its use for CS, i.e., it is harder to computationally access and explain certain properties of TV minimization (non-differentiable, non-linear) than 1 minimization. To counter such issues, a scheme [24] referred as TV-AL3 is presented that uses the conventional augmented Lagrangian (AL) with variable splitting and alternating direction method. The approach decreases the computational burden and provides the same output as standard TV.
After obtaining the complete GoP, the proposed joint decoding (JD) method is implemented to utilize the inter and intra-view correlations among the frames. As shown in Figure 1, the first frames of the current and next GoP's aid as the keyframes F K (reference frames) for the JD to produce side information of frames (prediction, residual) that will help in enhancing the quality of (J-1) non-key frames F NK of the current GoP.

Correlation Estimation of the CS Measurements among Adjacent Frames
A correlation estimation of the CS measurements among the adjacent frames is presented in this section. As discussed earlier, adjacent frames with a video sequence have a high inter-frame level correlation with each other. Consequently, it can also be assumed that the CS measurements of such frames are also highly correlated. Although, acquiring CS measurements is entirely different from linear transformations, as CS measurements form a random Gaussian distribution. The correlation among the two random entities can be estimated by using Pearson's correlation coefficient. It is defined as "the correlation coefficient of two random variables is a measure of their linear dependence". Consider the CS measurements of two adjacent frames Y n and Y n+1 of a video sequence, then their correlation coefficient is estimated as: where, ϕ is the number of measurements, µ Y n and µ Y n+1 are the mean of Y n and Y n+1 , respectively, while σ Y n and σ Y n+1 are the standard deviation of Y n , and Y n+1 , respectively. The evaluation of CS measurement correlation among various frames is performed on various standard grayscale CIF video sequences (Foreman, Coastguard, Container, Hall Monitor, Mother-Daughter). Each video sequence contains 300 frames and is divided into GoPs of size 8. The CS measurements for each frame within a video sequence are obtained at a measurement rate of 0.5. The CS measurement correlation coefficient for each non-keyframe for various video sequences is estimated and shown in Figure 2. The results presented are an average of all the correlations for non-key frame w.r.t keyframes. It can be noticed that all the frames within the video sequence show a high correlation, i.e., above 0.92. Moreover, for moderate motion content video sequences, the correlation is also higher.

Joint Decoding Framework
This subsection discusses in detail the decoding process of the proposed JD scheme. The proposed JD consists of three significant steps, as presented in Figure 3. The descriptions of each step are provided in the following sections.

Frame Estimation
This step proposes a frame estimation technique using registration and fusion methods. The idea is to utilize the frames' correlations to estimate the J-1 non-key frame (F'NK) in the GoP from the keyframes (F'K). The proposed method assists in utilizing the inter and intra-view correlations between the frames and produce a set of predicted non-key frames.

Joint Decoding Framework
This subsection discusses in detail the decoding process of the proposed JD scheme. The proposed JD consists of three significant steps, as presented in Figure 3.

Joint Decoding Framework
This subsection discusses in detail the decoding process of the proposed JD scheme. The proposed JD consists of three significant steps, as presented in Figure 3. The descriptions of each step are provided in the following sections.

Frame Estimation
This step proposes a frame estimation technique using registration and fusion methods. The idea is to utilize the frames' correlations to estimate the J-1 non-key frame (F'NK) in the GoP from the keyframes (F'K). The proposed method assists in utilizing the inter and intra-view correlations between the frames and produce a set of predicted non-key frames. The descriptions of each step are provided in the following sections.

Frame Estimation
This step proposes a frame estimation technique using registration and fusion methods. The idea is to utilize the frames' correlations to estimate the J-1 non-key frame (F NK ) in the GoP from the keyframes (F K ). The proposed method assists in utilizing the inter and intra-view correlations between the frames and produce a set of predicted non-key frames.

Registration Approach
The frame estimation step is initialized by performing registration (intensity) on the two independently generated keyframes F K in the GoP. The aim is to align the F K on the same plane as of F NK (F K is aligned to F NK, and the redundancies between them are exploited). Firstly, a phase correlation method (finding the gross alignment) is used on the F K and F NK frames to calculate the initial transformation matrix. Next, the translation transformation over affine is used for aligning F K w.r.t. F NK to produce transformed F K called F" KT . The reason for using a translation transform is that affine transform makes sense when multiple images are not on the same plane and are to be rectified, whereas, in the case of a video sequence, each frame is usually on the same plane, so it would be sufficient to consider translation transform.
The similarity metric and optimization function is then applied on the transformed frame F" KT to estimate the registration precision and generate the final registered image F" K, as shown in Equation (6): The similarity metric makes use of the mutual information (MI) method where the optimizer uses one evolutionary (OE). The evolutionary optimizer is used to exploits the alike alignment that usually occurs within the frames of a video sequence that better helps the OE optimizer than the other optimizers such as gradient descent (GD). The optimizer is one of the vital parameters of the registration process that defines the method to exploit the attained similarity metric M and produce a final result. Once both the keyframes F" K is registered, a wavelet-based fusion process is applied on them to produce the predicted frame F P .

Fusion Approach
In the fusion process, the registered keyframes F" K are down-sampled up to three levels using symlet 4-tap filter into their respective approximation (A) and detail (D) coefficient maps. Then, the point-to-point operations are performed to fuse the A and D coefficients in the two decomposition maps. The coefficients A and D are set as shown in Equations (7) and (8), respectively: The average magnitudes for each A and D coefficient having a similar coordinate of the two decomposition maps is calculated. The average value obtained then serves as the output for the fused map (FM). Once the fused map is generated, an inverse transformation is applied to generate the F P 's predicted frames. The proposed frame estimation method predicts the object motions and produces predicted frames F P : The defined set of parameters for registration and fusion processes remained best when related to other sets of parameters for various video sequences that have been used.

Residual Reconstruction
In this step, the predicted frames F P is projected onto the measurement basis Y P = Φ x I P . Once, the predicted measurements Y P is generated, they are then differentiated from the given frame measurements Y x to produce the residual measurement Y r as in Equation (10): The obtained residual measurements Y r are then decoded by using BCS-TV-AL3 method to generate the residual frames F r .

Final Frame Reconstruction
The final reconstructed frames F" NK within the GoP are generated by adding the F r and F P frames. It is a standard point-to-point addition that is expressed in Equation (11). By doing so, uniformity in terms of frame measurements (Y) is achieved, i.e., the measurements computed for F" NK is to some extent equal to the measurements Y NK : After the keyframes, F K (F 0 and F J from Y 0 and Y J ) are reconstructed using BCS-TV-AL3, they are used as the reference frames for reconstructing the non-key frames F NK between them.
The proposed scheme produces the non-key frame F" 1 from Y 1 , F 0, and F J ; similarly F" 2 is generated from Y 2 , F 0 and F J . The method endures for all the remaining non-key frames. The reconstructed frames' quality is expected to drop as the distance of the key frames from the non-key frames increases. Thus, the quality of the reconstruction may decrease with the increase in the GoP size (J). A complete GoP (J = 8) reconstruction for News video sequence is shown in Figure 4. The highlighted red dotted circles clearly show that as the difference among the key frames and non-key frame increases the reconstruction quality decreases.

Final Frame Reconstruction
The final reconstructed frames F"NK within the GoP are generated by adding the Fr and FP frames. It is a standard point-to-point addition that is expressed in Equation (11). By doing so, uniformity in terms of frame measurements (Y) is achieved, i.e., the measurements computed for F"NK is to some extent equal to the measurements YNK: After the keyframes, F'K (F0 and FJ from Y0 and YJ) are reconstructed using BCS-TV-AL3, they are used as the reference frames for reconstructing the non-key frames F'NK between them.
The proposed scheme produces the non-key frame F"1 from Y1, F0, and FJ; similarly F"2 is generated from Y2, F0 and FJ. The method endures for all the remaining non-key frames. The reconstructed frames' quality is expected to drop as the distance of the key frames from the non-key frames increases. Thus, the quality of the reconstruction may decrease with the increase in the GoP size (J). A complete GoP (J = 8) reconstruction for News video sequence is shown in Figure 4. The highlighted red dotted circles clearly show that as the difference among the key frames and non-key frame increases the reconstruction quality decreases.

Setup
In this subsection, the validation of the proposed JD scheme incorporated with TV-AL3 (i.e., JD-TV) in terms of performance and efficiency is presented. A standard set of grayscale CIF [25] and HD resolution [26] video sequences with sizes of 352 × 288 and 1024 × 678 are selected on which the proposed scheme is applied for evaluation. The selected video sequences are based on different parameters such as # of frames, motion content type (slow, moderate, fast). The details of the video sequences (# frames, motion type) are presented in Table 2. The video sequence indicated as low, moderate, and high contents are based on spatial details plus camera and object movement.
The experimental setup comprises of the application of proposed JD-TV having various GoP (J) sizes (3, 5, and 8) to validate its efficiency at various levels. The assessment is performed by observing the peak signal to noise ratio (PSNR) at various sampling rates (subrate). Subrate or sampling rate is the number of recorded samples (pixel) per unit distance when converting from an analog signal to digital and is give as S = M B /N. Where, M is the number of CS measurements, N = B × B (B = block size). Also, the structural similarity index (SSIM) is also recorded as it is anticipated as more precise and consistent (human visual perception) than PSNR. All PSNR and SSIM values presented are the mean of five independent experiments. The values are averaged because the Φ is of random nature, and thus, the image quality may vary. A smaller block size of 16 × 16 is implemented compared to larger ones (32 × 32, 64 × 64), as it is observed that larger block size will produce more samples and will require additional time and energy to transmit. Although, the larger block size provides better reconstruction quality but at the expense of complexity. The selection of block size is a tradeoff between reconstruction quality and computational complexity. In this regard, it is not feasible for a battery-powered device to always encode and transmit the captured images at larger block size. In addition, most of the research works focused on low powered application have adopted block size of 16 × 16 or 32 × 32 as it provides better image quality with less computational complexity. The encoding of all keyframes are performed at a fixed subrate of 0.5 while the non-key frames within a GoP are encoded at various lower subrates (0.05, 0.1, 0.15, 0.2, 0.25, 0.3).

Effect of GoP Size on the Performance of Proposed JD-TV
In this subsection, the proposed JD-TV is assessed and equated with independent BCS-TV-AL3 using various GoP sizes. Table 3 shows the impact of three different GoP sizes (3,5,8) on the performance of the proposed JD at various subrates for several video sequences.
The results shown are the average values of the non-key frames obtained for each complete video sequence. The result shows that the proposed scheme performs well when compared with independent BCS-TV-AL3. For low, moderate, and high content types, the improvement over independent BCS-TV-AL3 is 3 dB-7 dB, 2.5 dB-5 dB and 2 dB-4 dB, respectively, on average for all GoP sizes. Also, it is also noticed that the gain for reconstructed low content video sequence is better than moderate and high content as the association between the frames is higher than in moderate and high content videos. The better correlation results in more precise frame estimation and residual reconstruction. It is also observed that the larger the GoP size, the lower the PSNR gain will be. This is because the principle of the proposed scheme is to reconstruct the non-key frames by utilizing the correlation of the keyframes. Thus, the non-key frames closer to the keyframe are more correlated than those further away. For GoP = 3 the average gain is 3dB to~6dB, while for GoP = 8~2dB to 4dB is achieved.
Additionally, as the subrate increases, the gain decreases. In other words, as the keyframes, F K is encoded at a subrate higher than that of non-key frames F NK , it produces a larger measurement, covering the related smaller set of F NK measurements. This, in result decreases the estimation errors of F NK that arises due to smaller set measurements and generates a better form of F NK.
The reconstructed frames' visual quality is also tested by using SSIM metric on the same video sequences used for PSNR. The SSIM graphs of various video sequences with GoP sizes of 3, 5, and 8 at different subrates are presented in Figure 5.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 23 The reconstructed frames' visual quality is also tested by using SSIM metric on the same video sequences used for PSNR. The SSIM graphs of various video sequences with GoP sizes of 3, 5, and 8 at different subrates are presented in Figure 5. The SSIM shows a similar trend in terms of gain as of PSNR. Moreover, the result also indicates an enhancement in visual quality and a notable increase over the independent scheme. Also, the proposed scheme is tried at higher GoP sizes (16,32). However, the frame reconstruction was not substantial enough to be imitated. In the case of VSN, the frequent transmission of the keyframes is not encouraged due to limited battery life as in the instance with GoP size 3 and 5; it will upsurge the The SSIM shows a similar trend in terms of gain as of PSNR. Moreover, the result also indicates an enhancement in visual quality and a notable increase over the independent scheme. Also, the proposed scheme is tried at higher GoP sizes (16,32). However, the frame reconstruction was not substantial enough to be imitated. In the case of VSN, the frequent transmission of the keyframes is not encouraged due to limited battery life as in the instance with GoP size 3 and 5; it will upsurge the computational load at the encoder (Low power node). Thus, GoP size 8 is measured as a more stable point between all the GoP sizes and will later opt-in trials.

Subrate
This subsection presents the impact of different F K subrate used in the proposed JD-TV on the F NK reconstruction. A GoP size of 8, as indicated from the former experimentation (balance point) instead of 16/32 is opted. The evaluation is carried on two arrangements. The evaluation result shown in Table 4 indicates the importance of subrate on the reconstruction quality of the frames. In other words, F K subrate has a substantial effect on the F NK reconstruction quality. In the first setup, where subrate M K = M NK the reconstruction quality, when compared with traditional scheme, improves on average from 2 dB-3.5 dB. However, when the subrate is M K > M NK, it is noted that the gain increases significantly (3.5 dB-5 dB) when compared with JD-TV (M K = M NK ) and BCS-TV. The reason that the reconstruction quality at subrate M K > M NK is better than at subrate M K = M NK is because, when encoding of both F K and F NK is performed at the same subrate, the generated F K does not comprehend enough material that could considerably assist the reconstruction of F NK .

Proposed JD Framework and HD Video Sequences
In this section, the proposed JD framework is tested with HD video sequences in order to evaluate its effectiveness at various video resolutions. The HD video sequence selected contains both low content and moderate content types. A block size of 16 × 16 and GoP size of 8 is adopted.
The PSNR and SSIM results presented in Tables 5 and 6 respectively, clearly show that the proposed JD framework also performs well for HD video sequences compared with conventional BCS framework. A similar trend as that of CIF video sequence can be noticeable using JD reconstructions. For low motion video (Lovebird) the gain in term of PSNR and SSIM is more significant than moderate motion videos (Newspaper, Book arrival).

Visual Result Comparison
Visual quality analysis is one of the vital evaluation parameters for the reconstruction of compressed images. In this subsection, the visual quality of the reconstructed frames using proposed JD is analyzed. The proposed framework performs better in terms of PSNR and SSIM at lower sub-rate. However, it is also important to verify that the frame produced at lower subrate are visually perceptible. Thus, we perform the analysis on three different video sequences having low, moderate and high motion contents. Figure 6 presents the visual results of the random center frame of each GoP selected from the reconstructed video sequence (the similar visual quality was observed for all center frames of each GoP) by using JD-TV and BCS-TV-AL3. The result shows that the visual quality of reconstructed frames by using proposed method at lower subrates is better visually perceptible than the conventional method. In addition, for low motion content video sequence (Mother-Daughter) the proposed JD-TV shows higher visual quality (due to precise frame prediction and residual reconstruction) as compared to moderate (News) and higher Motion content video sequences.

Visual Result Comparison
Visual quality analysis is one of the vital evaluation parameters for the reconstruction of compressed images. In this subsection, the visual quality of the reconstructed frames using proposed JD is analyzed. The proposed framework performs better in terms of PSNR and SSIM at lower subrate. However, it is also important to verify that the frame produced at lower subrate are visually perceptible. Thus, we perform the analysis on three different video sequences having low, moderate and high motion contents. Figure 6 presents the visual results of the random center frame of each GoP selected from the reconstructed video sequence (the similar visual quality was observed for all center frames of each GoP) by using JD-TV and BCS-TV-AL3. The result shows that the visual quality of reconstructed frames by using proposed method at lower subrates is better visually perceptible than the conventional method. In addition, for low motion content video sequence (Mother-Daughter) the proposed JD-TV shows higher visual quality (due to precise frame prediction and residual reconstruction) as compared to moderate (News) and higher Motion content video sequences.

BCS-TV-AL3
Mother It should also be noted that at a lower subrate (0.05) due to the insufficient estimation information of motion few distortions are detected (red dotted circle). Further, for video sequences with fast-moving objects (Mobile Calendar), the JD-TV is vulnerable to a few distortions as emphasized by the red spotted circle.

Proposed JD-TV v/s Standard CS Video Compression Schemes
In order to analyze the performance of the proposed JD-TV methodically it is compared with conventional CS-based schemes i.e., MS-residual [12], k-t FOCUSS [13], MC-BCS-SPL [17] as referred in Section 2. The results presented in Table 7 (gain of proposed and conventional schemes w.r.t independent BCS) are the average of the total frames obtained at a block size of 16 × 16 and GoP of 8. It should also be noted that at a lower subrate (0.05) due to the insufficient estimation information of motion few distortions are detected (red dotted circle). Further, for video sequences with fast-moving objects (Mobile Calendar), the JD-TV is vulnerable to a few distortions as emphasized by the red spotted circle.

Proposed JD-TV v/s Standard CS Video Compression Schemes
In order to analyze the performance of the proposed JD-TV methodically it is compared with conventional CS-based schemes i.e., MS-residual [12], k-t FOCUSS [13], MC-BCS-SPL [17] as referred in Section 2. The results presented in Table 7 (gain of proposed and conventional schemes w.r.t independent BCS) are the average of the total frames obtained at a block size of 16 × 16 and GoP of 8. The results in Table 7 clearly show that the proposed JD provides considerable gain for various motion content video sequences at lower subrates than conventional CS-based schemes. Further, is can also be observed that the gain decreases as the subrate increases, which is due to impact of F K subrate as discussed earlier. In addition, the focus of the proposed scheme is to provide better reconstruction at lower subrates.

Comparison of Proposed JD-TV with Conventional Video Compression Schemes
As discussed earlier, CS is a type of dimensionality reduction such that the signal at no time exists in the sensor in its full dimensionality. CS is based on a simple-encoder complex-decoder archetype, which is in contrast to the conventional video codecs. In order to use CS as a compression mechanism, quantization and entropy coding must be coupled with CS to generate bit from the CS measurements.
In this section, the proposed JD-TV is coupled scalar quantization-adaptive differential pulse code modulation (SQ-ADPCM) referred as SQ-ADPCM-JD to compare it with state-of-the-art video codecs i.e., DISCOVER [27], H.264 (intra, (I-P-P)) [28] and H.263 (intra, (I-P-P)) [29]. The results shown in Figure    It can be noted from the results that the CS codec, coupled with SQ-ADPCM, performs better than the standard video codec at various bitrates. For example, when compared with H.263 (Intra) and H.264 (Intra), the CS codec outperforms them for all video sequences at various bitrates. For the case of moderate content type video (Foreman, Coastguard), the CS codec performs noteworthily better than H.263 (I-P-P) at various bitrates, while it should also be noted that the performance of the proposed SQ-ADPCM incorporated with the JD scheme performs better than DISCOVER and H.264 (I-P-P) at lower bitrates. As mentioned earlier, the proposed JD-TV make use of keyframes that contains larger set of measurements (higher subrate) to improve the non-key frames with lower measurement set (low subrates). Thus, the key frames having larger set of measurements superimposes the correlated smaller measurements set encompass by non-key frames. This, helps to decrease the non-key frame prediction errors, that arises due to smaller measurements set and predict a better version of non-keyframes. Further, DISCOVER and H.264 (I-P-P) schemes use the feedback channel to improve the reconstruction of the keyframes.

Number of Bits
The proposed JD scheme is developed with the idea of benefiting low power real-time applications (VSNs) in terms of efficiency, data transmission and power utilization. This subsection observed the bit rate savings among the independent BCS-TV-AL3 and the proposed JD-TV for different video sequences at the various reconstruction qualities (PSNR). It can be noted from the results that the CS codec, coupled with SQ-ADPCM, performs better than the standard video codec at various bitrates. For example, when compared with H.263 (Intra) and H.264 (Intra), the CS codec outperforms them for all video sequences at various bitrates. For the case of moderate content type video (Foreman, Coastguard), the CS codec performs noteworthily better than H.263 (I-P-P) at various bitrates, while it should also be noted that the performance of the proposed SQ-ADPCM incorporated with the JD scheme performs better than DISCOVER and H.264 (I-P-P) at lower bitrates. As mentioned earlier, the proposed JD-TV make use of keyframes that contains larger set of measurements (higher subrate) to improve the non-key frames with lower measurement set (low subrates). Thus, the key frames having larger set of measurements superimposes the correlated smaller measurements set encompass by non-key frames. This, helps to decrease the non-key frame prediction errors, that arises due to smaller measurements set and predict a better version of non-keyframes. Further, DISCOVER and H.264 (I-P-P) schemes use the feedback channel to improve the reconstruction of the keyframes.

Number of Bits
The proposed JD scheme is developed with the idea of benefiting low power real-time applications (VSNs) in terms of efficiency, data transmission and power utilization. This subsection observed the bit rate savings among the independent BCS-TV-AL3 and the proposed JD-TV for different video sequences at the various reconstruction qualities (PSNR).
The results presented in Table 8 shows the reconstruction quality at different numbers of bits for different video sequences. The results obtained can also be calculated as given in [30]. It can be noted that the number of bits essential for conventional BCS and the proposed JD schemes to achieve the same PSNR varies. The proposed scheme utilizes minimum number of bits as compared to the independent BCS scheme to achieve the same PSNR i.e., for higher motion content video sequence the average saving rate is~42%, while for lower content video sequence the average saving rate is 65% for all videos. This is due to the fact that the proposed JD offers improved reconstruction quality at lower bit rates. As the proposed scheme provides better reconstruction at lower bits this helps to reduce the computational burden from the encoder as the encoder will be transmitting small amounts of data. On average, the number of bits saved by the proposed scheme contrary to the independent scheme is~50% at different reconstruction qualities. In this subsection, the encoding and decoding complexity (average compression and reconstruction time) of the proposed JD-TV and other conventional video schemes i.e., DISCOVER [27], and H.264 (intra) [28] with GoP size 3 and block size 16 for various video sequences is presented. Also, the reconstruction complexity of the proposed schemes is compared with other CS-based reconstruction schemes. All the schemes are implemented using MATLAB (R2019a) running on a computer equipped with an Intel ® Core TM i7-6700 CPU @ 3.4 GHz and 16 GB RAM.
The results presented in Table 9 shows that the average encoding time of the proposed JD-TV, DISCOVER [27], and H.264 (intra) [28] ranges from 6.31 s-7.38 s, 31.05 s-60.08 s, and 62.40 s-125.45 s, respectively. It is also observed that at higher subrates, the proposed JD-TV requires more encoding time than lower subrates. In contrast to conventional video codecs, the proposed JD-TV codec takes lesser time to encode the video frames i.e., for all types of video sequence (low, medium, high) the average encoding time of the proposed JD-TV method is significantly lower than the DISCOVER and H.264 (intra) video codecs. For example the encoding time required by JD-TV to encode the Foreman video sequence is 6.94 s which, is 4-5 times less than DISCOVER (36.40 s) and 9-10 times less than the H.264 (Intra) (78.50 s). A similar trend can be observed for other video sequences. Thus, this validates the fact that the conventional video schemes' encoders are exposed to heavy computation burden and are not suitable for low power real-time applications. We also observed the decoding complexity (average reconstruction time) of the proposed JD-TV, DISCOVER, and H.264 (intra). The results presented in Table 10 shows that the proposed JD-TV requires more reconstruction time as compared to the conventional codecs, i.e., its decoder has a heavy computational complexity when compared with DISCOVER and H.264 (intra). This validates the fact the CS-based codecs are based on a simple encoder and complex decoder, unlike conventional codecs (DISCOVER and H.264 (intra)). Also, it should also be noted that the results shown are a trade-off between encoding/decoding complexity and rate-distortion performance and has a weak comparability aspect. Additionally, the average reconstruction time of the proposed JD-TV and other CS reconstruction schemes i.e., MS-residual [12], k-t FOCUSS [13], and MC-BCS-SPL [17], is also shown in Figure 8. Overall, the proposed JD-TV executes about 2-3 times faster than MC-BCS-SPL and kt-Focuss whereas, the proposed JD-TV outperforms MS-residual. This reason is that the proposed scheme incorporates BCS-TV-AL3 (less complex) for initial reconstruction and the simplified process of predicting the non-key frames.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 20 of 23 We also observed the decoding complexity (average reconstruction time) of the proposed JD-TV, DISCOVER, and H.264 (intra). The results presented in Table 10 shows that the proposed JD-TV requires more reconstruction time as compared to the conventional codecs, i.e., its decoder has a heavy computational complexity when compared with DISCOVER and H.264 (intra). This validates the fact the CS-based codecs are based on a simple encoder and complex decoder, unlike conventional codecs (DISCOVER and H.264 (intra)). Also, it should also be noted that the results shown are a tradeoff between encoding/decoding complexity and rate-distortion performance and has a weak comparability aspect. Additionally, the average reconstruction time of the proposed JD-TV and other CS reconstruction schemes i.e., MS-residual [12], k-t FOCUSS [13], and MC-BCS-SPL [17], is also shown in Figure 8. Overall, the proposed JD-TV executes about 2-3 times faster than MC-BCS-SPL and kt-Focuss whereas, the proposed JD-TV outperforms MS-residual. This reason is that the proposed scheme incorporates BCS-TV-AL3 (less complex) for initial reconstruction and the simplified process of predicting the non-key frames.

Conclusions
In this paper, a joint decoding (JD) framework is proposed that reduces the redundancies present between video frames at the decoder and reduces the computational burden on the encoder. The proposed scheme makes use of an efficient registration and fusion method to generate side information that helps in the reconstruction of final frame sequences. The simulation results are based on different arrangements with different subrates and GoP sizes (3,5,8). The results show the effect of different GoP sizes on the reconstruction quality i.e., smaller GoP = 3 provides improved reconstruction quality compared to larger ones. The comprehensive experimental analysis proves that the proposed JD-TV performs notably better than the independent BCS-TV-AL3 scheme at different subrates for various video sequences having low, moderate and high motion contents. In addition, the proposed scheme outperforms the conventional CS video reconstruction schemes at lower subrates. Further, when compared with conventional video codecs (DISCOVER, H-263, H264) the proposed framework shows notable performance at various bitrates. Further, the coding efficiency in terms of encoding and decoding time of the proposed and conventional video codes are also compared.
The developed scheme can be used in real-time transport safety applications such as driver warning systems for motorbikes or cars, or driver drowsiness detection. Similarly, it can be applied for real-time visual patient monitoring, MRI, patient care and remote sensing.