Resource Optimization for 3D Video SoftCast with Joint Texture/Depth Power Allocation

: During wireless video transmission, channel conditions can vary drastically. When the channel fails to support the transmission bit rate, the video quality degrades sharply. A pseudo-analog transmission system such as SoftCast relies on linear operations to achieve a linear quality transition over a wide range of channel conditions. When transmitting 3D videos over SoftCast, the following issues arise: (1) assigning the transmission power to texture and depth maps to obtain the optimal overall quality and (2) handling 3D video data trafﬁc by dropping and re-allocating resources. This paper solves the pseudo-analog transmission resource allocation problem and improves the results by applying the optimal joint power allocation. First, the minimum and the target distortion optimization problems are formulated in terms of a power–bandwidth pair versus distortion. Then, a minimum distortion optimization algorithm iteratively computes all the possible resource allocations to ﬁnd the optimal allocation based on the minimum distortion. Next, the three-dimensional target distortion problem is divided into two subproblems. In the power-distortion problem, to obtain a target distortion, the algorithm exhaustively solves the closed form of the power resource under a predeﬁned upper-bound bandwidth. For the bandwidth-distortion problem, reaching a target distortion requires solving iteratively for the bandwidth resource closed form, given a predeﬁned power. The proposed resource control scheme shows an improvement in transmission efﬁciency and resource utilization. At low power usage, the proposed method could achieve a PSNR gain of up to 1.5 dB over SoftCast and even a 1.789 dB gain over a distortion-resource algorithm, using less than 1.4% of the bandwidth.


Introduction
Owing to the immense growth of industrial and entertainment applications and the rising need for an immersive visual experience, three-dimensional video (3DV) has become a favored choice for many consumers.The 3D scene provides an arbitrary view of the actual scene.Adopting multi-view video plus depth (MVD) [1] to represent 3D video content efficiently has become more common.At the same time, 3D-video-based applications on mobile terminals have gained consumer attention due to the improvement in wireless communication (e.g., 5G and LTE [2]) and the existence of 3D-enabled laptops and nakedeye 3D mobile devices.However, some technical issues remain.Firstly, large amounts of computing resources are required to process a 3D video signal, as the amount of video data grows proportionally to the number of cameras and the frame depth, which imposes grave challenges for resource-constrained mobile terminals.Secondly, the unpredicted channel condition of the wireless network poses a higher requirement for 3D video transmission than for traditional 2D video.Hence, any proposed efficient wireless transmission method must address these specific issues.
In a 3D video transmission system, the transmitter implements several cameras to record texture videos and depth map frames of the 3D scene.After the receiver selects the preferred virtual viewpoint, the transmitter captures the frames of the adjacent cameras near the selected virtual points.These frames are considered reference viewpoint texture videos and depth maps for that virtual viewpoint.The captured data are source and channel encoded before transmission to the receiver.The demanded virtual viewpoint at the receiver is synthesized from the decoded texture video and depth map frames [3][4][5] via a technique named depth-image-based rendering (DIBR).Therefore, the virtual view quality is determined by the received texture video quality and depth map quality.
Conventional video transmission over wireless networks relies on Shannon's theorem [6,7], in which source coding is separated from channel coding.Based on this strategy, optimal performance can be obtained provided that the channel condition and capacity are known before source and channel coding, in order to select the proper transmission rate adaptively [8].However, this strategy does not suit practical wireless communication where the channel condition could vary drastically, and even the receiver is ignorant of the exact channel-state information.The video quality degrades abruptly due to channel capacity mismatch when the channel SNR falls below the acceptable threshold.Even when the channel SNR is high, the reconstruction quality will not improve, as the quantization factor in the digital video compression already determines the upper bound.This phenomenon is known as the cliff effect [9].
SoftCast, an uncoded video transmission technique, has been proposed to overcome conventional digital video transmission problems [10][11][12].The transmitted signal is transformed into a series of real-value coefficients linearly with the pixel value in the video signal.By skipping quantization and entropy coding, such a transmission becomes lossy in nature.More importantly, the channel noise is directly mapped to the video signal reconstruction error, which makes the reconstruction quality commensurate with the channel SNR of the receiver.Thus, SoftCast tackles the cliff effect.However, when considering SoftCast for transmitting 3DV, several challenges appear.
Firstly, transmitting a large-sized video (i.e., MVD) over a wireless medium consumes substantial amounts of limited resources (power and bandwidth), mostly wasted in carrying reference viewpoints with quite similar content.Secondly, unlike 2D videos, every scene in 3DV consists of texture and depth map frames.Both must be transmitted to the receiver to synthesize the virtual view, which is affected nonlinearly by the received texture videos and the depth maps.Intuitively, assigning more power to texture videos than depth maps will cause an inevitable geometric distortion in virtual views, whereas less power allocation to texture videos would incur additional distortions in reference views.Hence, under limited resources and joint texture/depth power allocation, the problem becomes one of achieving the optimal reconstruction quality.In another case, given a predefined quality at the receiver end, the problem is one of achieving the minimal resource (bandwidth and power) consumption.
This paper proposes incorporating a joint texture-depth power allocation and resource optimization for 3D video transmission, as shown in Figure 1.The optimal power allocation ratio (PAR) between texture and depth is estimated, assuming equal power among all reference viewpoints.Then, the intricate resource control problem of 3D video transmission is considered from the point of view of two optimization problems: minimum distortion and target distortion.Resource allocation algorithms are designed to find the optimal solutions to the formulated problems.The main contributions of this work are as follows:

•
The optimal solutions for minimum and target distortion problems are obtained in terms of texture and depth using proposed resource allocation algorithms, and the trade-off relationship between power and bandwidth usage is analyzed.

•
The proposed resource control scheme for 3D transmission integrates the optimal PAR with resource allocation algorithms, ensuring balanced power allocation between texture and depth to improve the transmission results.The simulation results demonstrate a significant reduction in resource usage in terms of texture and depth, while achieving a satisfactory or desired video quality.
The remainder of the paper is organized as follows.SoftCast is reviewed in Section 2. The minimum distortion optimization is presented in Section 3. Section 4 discusses the target distortion optimization.Sections 5 and 6 show the simulation results and conclude the paper.

SoftCast Video Transmission
The energy efficiency of any network shows the extent to which the network can reduce distortion cost efficiently.In [13,14], two examples of improving the network energy efficiency in two different networks are given.Video transmission over wireless network channels is challenging due to the time-varying channel characteristics and power constraints.In such systems, the objective of bit allocation between source and channel is to minimize the power consumption while maintaining satisfactory video quality, and this is also defined as the energy efficiency of the network.The SoftCast transmission scheme is an end-to-end architecture for wireless video transmission that replaces the complicated bit allocation in digital video transmission with a power allocation method.It differs from digital video transmission in the video encoding mechanism, assuring error resilience and transmission.Unlike conventional digital video transmission, the SoftCast approach depends on analog code to achieve linearity and a compression-protection trade-off with an optimal power allocation.In addition to the excellent video quality reception, which is linearly dependent on the channel condition, SoftCast offers scalability that is realized by broadcasting only a single signal.To improve the SoftCast efficiency, Fan et al. [15] first replaced 3D-DCT with motion-compensated temporal filtering (MCTF) and then proposed DCast [16], a distributed source coding that exploits the inter-frame redundancy.In [17,18], adaptive chunk division was incorporated to achieve optimal transmission power usage, and in [19], the bandwidth and computation requirement were reduced.Some other studies suggested a hybrid digital-analog framework, incorporating analog transmission alongside digital transmission to take advantage of both systems, achieving a balance between robust adaptation capability and high coding efficiency [15,[20][21][22][23][24].These systems can be categorized based on the type of information to be digitized and how the analog and digital parts share the channel.With the increase in the wireless traffic burden on wireless networks, efficient network resource exploitation becomes necessary.Based on the Shannon capacity [6], the network capacity is enhanced via bandwidth increase and frequency reuse.In HDA-SIM [25], the bandwidth of digital traffic is treated as a hidden resource, where the analog modulated data are superimposed over existing digital traffic.He et al. [26] proposed MCast, a linear video transmission system that retransmits data across multiple time slots and multiple channels to exploit the time and frequency resources.These authors also proposed MUcast [27] to solve the problem of resource allocation to realize efficient multi-user video transmission.
To improve the network's energy efficiency in pseudo-analog 3D video transmission.Yang et al. [28] suggested an uncoded wireless depth map transmission scheme using mean-removed block-based DCT to improve decorrelation and view-synthesis distortion, to enhance energy allocation efficiency.Power allocation scaling is performed on rearranged chunks that include inter-block DCT coefficients.Luo et al. [29] considered the viewsynthesis distortion in the power-distortion optimization problem to achieve optimal reconstruction quality in the reference and virtual views.The optimization problem was then solved in a closed form of texture/depth power allocation.
In general, all the studies mentioned followed the minimum distortion optimization approach for a pseudo-analog system.Only a few studies addressed the target distortion.In [30], Liu et al. proposed prediction models to solve the target distortion optimization problem for single-view video.The curve-fitting-based resource control algorithm allocates the constrained bandwidth and power resources to obtain the predefined quality.Zhang et al. [31] developed resource allocation algorithms for 3D video.In distortionresource optimization, the DR algorithm exhaustively searches all possible discrete power and channel resources.For resource-distortion optimization, the optimization problem is divided into power and channel optimization subproblems.Then, considering fixing one resource each time given a desired distortion, the RD algorithm looks for the optimal solution after searching in an exhaustive manner.Nonetheless, it combines texture and depth using 5D-DCT without considering the characteristic distinction.Unlike 2D video pseudo-analog transmission systems, which resort to discarding chunks with the smallest variation to meet the constrained resource requirement without significantly degrading quality, 3D video pseudo-analog transmission systems can face some complications.Due to the involvement of synthesis distortion, the source data that can be discarded from both texture and depth frames must significantly degrade neither the virtual view quality nor the reference view quality, rather than only the latter.Hence, this work aims to perform resource allocation while jointly balancing the power assigned to texture and depth maps to avoid causing geometric distortion in virtual views or additional distortions in reference views, to improve resource allocation in 3D video transmission.

SoftCast Power Allocation
In a conventional wireless video transmission scheme, the real-value coefficients are transformed into a bitstream for transmission, and channel coding is adopted to add parity bits to cope with drastic channel noise.This strategy demolishes the numeric properties within the video data and leads to the cliff effect.SoftCast avoids this by transmitting the real-value coefficients directly without bitstream transformation.In SoftCast, a novel error protection scheme was developed by scaling the magnitude of the transmitted coefficient.For a specific conventional 2D video sequence, consider applying 3D-DCT to every GOP.The resulting stream of DCT coefficients is divided into N chunks x 1 , x 2 , . . ., x N with size hxw, hence x i [j], j = 1, . . ., (hxw).The amount of information in each chunk is captured by its entropy (i.e., average energy) λ i .Each chunk x i is scaled up to y i [j] = g i * x i [j], g i > 1 at the sender.The values y are then transmitted over the wireless channel.The channel noise is taken to be AWGN with zero mean and variance σ Consider how SNR changes the LLSE.At high SNR [11], the reconstructed signal xi [j] is estimated by simply scaling down the received signal, xi [j] = ŷi [j]/g i .For the scale up/down scheme, the reconstruction error is only σ 2 n /g 2 i compared to σ 2 n for the direct transmission error.Therefore, this scheme remarkably enhances the transmission error protection when g i is relatively large.Since the power budget for any transmission system is a constrained resource, scaling up some signal samples with more power means leaving other signal samples with less power.Therefore, power allocation is conducted to determine the optimal scaling factors for minimizing the total reconstruction distortion for the entire transmitted sequence.In [11,18,32], the optimal factor for scaling chunk i is: where P is the total power budget.The total reconstruction distortion under optimal power allocation is From ( 2), it becomes clear that for traditional 2D video, the reconstruction distortion D is inversely related to the total power P. The value of D can be computed once P is known.However, for 3D video, the power budget P is shared by texture videos and depth maps, which together determine the overall quality of the received 3D video.Hence, to optimally allocate the total power between texture videos and depth maps, the texture/depth power allocation ratio is estimated, considering the texture/depth distortion trade-off [29].

Resource Allocation Optimization Problem
Optimization problems for the power-bandwidth pair versus distortion address resource constraint challenges in 3D video pseudo-analog transmission.For example, the objective of minimum distortion optimization is to minimize the overall distortion for a given resource usage: min D where D is the distortion and R available represents the available power and bandwidth resources.The target distortion optimization minimizes the resource usage for a given distortion: min R where D exp is the expected distortion.It must be noted that as pseudo-analog systems transform video data into independent chunks, which means each chunk contributes independent distortion to the entire video quality, the distortion of a chunk caused by a transmission error does not affect any other chunk.Hence, discarding chunks to fit the bandwidth does not prevent each receiver from obtaining a video quality proportional to its channel conditions, and due to the heavy data traffic involved in 3D video transmission, it is crucial to optimally select chunks for transmission and reuse the saved power efficiently.In summary, the problem becomes: (a) the optimal allocation of the total power between texture videos and depth maps and (b) the joint optimal resource (power and bandwidth) allocation for the entire signal sequence to obtain satisfactory video quality.

Investigation and Motivation
In pseudo-analog transmission, every scaled DCT chunk is given its own slot in the transmission channel.An advantage of the compacting nature of DCT is that low variance coefficients dominate the high spatial frequencies.Chunks with the lowest variance are considered the least important chunks, denoted LP (low-priority) because they contribute least to the video quality.Therefore, discarding these chunks to meet the constrained resource requirement will not lead to a significant quality degradation.The power saved from chunk dropping is re-allocated to other more important HP (high-priority) chunks for efficient resource utilization, particularly in stringent power-constraint conditions.Therefore, we have a joint problem of power allocation and chunk selection for 3D video transmission.

Problem Formulation
Given a 3D video sequence, a 3D-DCT is used to decorrelate the texture and depth frames of the reference viewpoints in one GOP, which results in N chunks of DCT coefficients with variance indicated by λ 1 , λ 2 , . . ., λ N .Without loss of generality, assume λ i ≥ λ j for all i < j.Let K be the available bandwidth resources (e.g., frequency or time slots).Let k = [k 1 , k 2 , . . ., k N ] be a binary allocation vector for each GOP.Therefore, the chunk selection is indicated as follows: chunk i is discarded when k i = 0 and transmitted over a bandwidth resource slot when k i = 1.The expected MSE of the ith chunk is denoted as the Euclidean norm The optimal values of power allocation and bandwidth allocation are found by minimizing the transmission distortion under constrained power and bandwidth resources.The problem is formulated as follows [33]: where is the signal noise-to-power ratio for the ith chunk for a given power allocation, K should be in the range [0, N], and SNR chk is the SNR budget for each chunk.
As systems run short on bandwidth, with K< N, to minimize distortion the system will resort to transmitting a number K of HP chunks and discarding the remaining chunks.
By re-expressing MSE as the problem becomes one of finding the optimal power allocation ρ * i and bandwidth K * .Since the second-order derivative of the objective function in ( 12) is derived as the Lagrange multiplier method is used to solve (12), after defining µ > 0 and the Lagrange function J as min By setting The optimal power allocation value ρ * i is derived as This confirms the chunk scaling in (2).
From ( 12), the total power budget P is inversely proportional to the reconstruction distortion.Increasing the power budget increases ρ * i and therefore decreases the distortion.Nevertheless, the bandwidth resource K in (17) is the summation term upper bound.Its discrete nature complicates the effort to attain the optimal bandwidth K * in closed form.

Problem Solution
We propose using a greedy search algorithm to find the optimal bandwidth K * , given that the transmitter has complete knowledge of the chunk variance λ i and the available transmission power P. The transmitter exhaustively searches all the possible discrete bandwidth resources to reach the number of chunks that leads to distortion minimization.With the optimal chunk selection, the amount of traffic is reduced without critical performance degradation, and users' experience improves as the latency is greatly shortened.Meanwhile, other users in the network may utilize the saved bandwidth resources.
In Algorithm 1, after computing the average power of the chunks in each GOP, we calculate all the possible power and bandwidth allocations throughout the GOP in an exhaustive manner.Then, at line 11, the maximum PSNR is chosen to determine the optimal bandwidth usage K * , the corresponding optimal chunk selection k * i , and the optimal power allocation ρ * i .In some cases, the performance curve tends to flatten after a certain range, as shown in Figure 2, indicating that an excessive number of channels are used to accomplish a marginal improvement in PSNR, which is apparently ineffective.Therefore, a control parameter τ in the range [0, 1] is introduced in line 14 to find a suboptimal solution.The bandwidth resource usage can be significantly decreased by slightly sacrificing the PSNR performance.Note that, as 3D-DCT decorrelation transformation is implemented in this scheme, the greedy search in each GOP must be conducted on each reference viewpoint's texture and depth map separately.When conducting minimum distortion optimization under the optimal PAR, the power allocation SNR chk in (17) changes, depending on whether it is assigned to a texture or a depth map.Hence, a different optimal bandwidth usage K * is chosen, leading to selection of another set of optimal chunks k * i and optimal power allocations ρ * i .

Investigation and Motivation
Each GOP may have a different compressibility level based on the video sequence content (e.g., rate of HP chunks and LP chunks).Given a limited power and bandwidth usage for a video sequence, some GOPs will be more distorted than others under the minimum distortion optimization.Thus, the distortion fluctuations over GOPs reflect a poor viewing experience (even though the overall PSNR might be maximized).As the channel SNR increases, the quality distortion decreases, given the same limiting constraints for that constrained optimization.Nevertheless, this change in quality is visually indistinguishable by the human eye if the PSNR value is high.These observations show that maintaining a relatively more stable distortion over GOPs is desirable.Thus, for high SNR channels, instead of allowing the PSNR to increase constantly, we keep it at a certain high value.At the same time, other users can use the saved constrained resources.Hence, the problem becomes one of reaching a balanced combination of power allocation and chunk selection for a target distortion.

Problem Formulation
Since the target distortion optimization comprises bandwidth, power, and distortion, it is considered a three-dimensional problem.As will be demonstrated later, bandwidth and power constraints are exchangeable.Consequently, the optimization problem ( 5) is decomposed into two two-dimensional subproblems, where one constraint is fixed in each subproblem.For instance, we formulate the minimization of power resource usage given a distortion constraint MSE and a bandwidth resource usage K as a power distortion optimization problem: On the other hand, we also formulate the minimization of bandwidth resource usage given a target distortion MSE and a power budget SNR chk as a bandwidth distortion optimization problem:

Problem Solution
Since the power scaling factor g i is continuous, while the bandwidth usage variable K is discrete, these two subproblems are considered mixed-integer nonlinear programming (MINLP) problems that are NP-hard to solve optimally.Therefore, greedy search algorithms are proposed to approximate the resource allocation for power and bandwidth distortion problems.

The Power Distortion Optimization
The minimal power use SNR chk in (19) is found by searching exhaustively all the feasible r ∈ [1, K].For each fixed bandwidth resource usage r, the subproblem is solved as To use the Lagrange multiplier method to solve this problem, we define a Lagrange multiplier µ > 0, then After setting ∂J ∂ρ i = 0, i = 0, 1, 2, . . ., r and ∂J ∂µ = 0 , the optimal solution is calculated by solving The optimal power allocation value ρ * i is derived as The algorithm procedures for power distortion optimization demonstrated in Algorithm 2 are performed on a per GOP basis.Throughout the exhaustive search, the optimal power values lie in three regions.At first, ρ i ≤ 0, as the number of channels is still insufficient.Subsequently, the distortion caused by discarded chunks gradually lessens, and when MSE − 1 N ∑ N i=r+1 λ i > 0, the optimal power will start to comply with the constraint ( 23).This feasible solution continues for a certain range of r.Finally, beyond this range, up to r = K, we have ρ i ≤ 0. Therefore, it is better to stop the exhaustive search iteration, as in lines 9 to 11.In line 7, the objective function of problem (19) is calculated in a closed form.Finally, among all the SNR chk s, the one with the minimum value is chosen, and the associated power allocation ρ * i is adopted.Calculate optimal power allocation ρ c i,r via (26) 8: Calculate SNR c chk,r via (21) .9: We exhaustively search the values of K in ascending order for the minimal bandwidth use K.The subproblem is solved for each K as follows: Using the Lagrange multiplier technique, the optimal power ρ * i is derived as In Algorithm 3, under a given distortion MSE and power budget SNR chk , the corresponding K is infeasible if the objective value (27) is greater than the expected distortion.The search continues by increasing r by 1 and solving the problem (27) until the search reaches a feasible K.It is also possible that an exhaustive search for a particular GOP does not reach a feasible K, which means the expected distortion requirement is too high for a specific SNR chk or the power budget SNR chk is insufficient to improve transmission up to a particular quality.In both cases, all N channels are used.Hence, more resources must be allocated.Here, it is suggested that the power budget is increased with a 0.01 step size until a value is reached that will lead to a feasible K.

Algorithm 3: Bandwidth Distortion Optimization
Compute chunk variance λ t i and Calculate optimal power allocation ρ c i,r via (29) 8: Calculate MSE c r via (27) 9: end 10: , for all i 11: end

Power and Bandwidth Trade-Off
To achieve a desirable video quality, either the power consumption is optimized under a constrained bandwidth usage or the bandwidth usage is optimized under a limited power budget.Each point of the resultant trade-off curve between power and bandwidth represents a possible power and bandwidth usage combination that attains the same video quality.In multi-user systems, each viewer has a different power and bandwidth resource budget, where a joint optimization that relies on choosing a proper power and bandwidth usage combination can be implemented to preserve resource consumption.

Simulation Results
Several simulation experiments were conducted to assess the performance of the proposed uncoded 3D video wireless transmission.
Test Sequence: For the multi-view plus depth video datasets, different standard reference 3D video sequences were considered: Kendo, Balloons [34], Newspaper provided by GIST, South Korea, and PoznanStreet, PoznanHall2 [35], Dancer, gtFly provided by Nokia, Finland, and Shark by NICT, Japan.The standard video sequences Dancer, gtFly, and Shark are computer-generated 3D scenes with ground-truth depth maps, whereas the content of the other 3D videos consist of captured real 3D scenes.The tested sequence configurations are listed in Table 1.
Parameter settings: The GOP size was set to eight frames.Depending on the sequence resolution in Table 1, each frame was divided into 16 × 16 = 256 chunks.As mentioned previously, virtual viewpoint synthesis requires transmitting the texture and depth frames of the two adjacent cameras near the selected virtual point.Hence, one GOP consists of 2 × 2 × 8 × 256 = 8192 chunks.The 3D-HEVC Test Model (HTM) v16.3 software [36] was used to synthesize the virtual viewpoints.MATLAB was used to conduct the simulation experiments, where the wireless transmission parameters were based on the 802.11a/g standard.In addition, the wireless transmission experiments were investigated over AWGN-based channels.
Metric: For video quality assessment, although the proposed algorithms were designed based on the objective performance metric PSNR, the perceptual metric SSIM still revealed important performance evaluation aspects.

Performance Evaluation for PAR
The optimal PAR was estimated based on the joint texture/depth power allocation method [29]: where M is the number of reference viewpoints transmitted in a 3D video SoftCast transmission, L is the number of virtual viewpoints to be synthesized at the receiver, t and d represent the original texture video and depth map, respectively, and α i and β i represent the parameters of the distortion model of both the transmitted reference views and the synthesized virtual views.To reach the global optimal PAR, a full search was conducted iteratively, with the search step set to 0.05 [29].Although PAR is not theoretically a discrete variable, simulations proved that the 0.05 step was small enough to indicate any changes.The estimated PAR and the global optimal PAR results for the simulation video sequences are listed in Table 2.

Minimum Distortion Optimization Performance
We investigated the highest PSNR attained given the available resource (i.e., power and bandwidth).As discussed earlier, we let the maximum number of available channel slots N for one GOP be 8192, where each channel slot transmits only one chunk.As assumed, each GOP transmits a separate texture and depth map for each reference view.Thus the available channel slots N for each GOP are split equally N sub = N 2×No.referenceviews , and N sub becomes the number of available channel slots for each texture and depth map.Furthermore, the noise variance was fixed at 1, and the transmission power budget P was varied for each GOP (Since the noise level is assumed to be fixed, then in the following we no longer differentiate between P and SNR chk ).
Figure 2 shows the minimum distortion optimization results for the Kendo and Poz-nanStreet sequences.Note that the average results for the two transmitted reference views were used.As shown, for a given bandwidth resource usage N sub , the quality commonly increased with the power budget SNR chk , as was also proved in (12).Nevertheless, for a given power budget SNR chk , the quality did not necessarily increase with the bandwidth usage, mainly under low power budgets.For instance, in the texture frames, when the transmission power P was 5 dB, the PSNR max points for Kendo and PoznanStreet were achieved when N sub = 233 and N sub = 303, respectively.Achieving the highest PSNR did not always correspond to the maximum bandwidth usage because, in a pseudo-analog system, different chunks are not equally important, even though each chunk consumes one channel.Hence, more power allocation to HP chunks and less to LP chunks can improve PSNR performance under a limited power budget.The minimum distortion optimization curves generally have a flat tail, which implies that above a certain PSNR level, increasing the power budget is more efficient than improving bandwidth use.For instance, for the texture frames in the Kendo sequence, under power budget SNR chk = 10 dB and bandwidth use N sub = 559, the PSNR was 39 dB.The PSNR value did not improve if more bandwidth was assigned.On the other hand, a power increment from 10 dB to 15 dB resulted in a PSNR improvement of 4.3 dB, and a further increment from 10 dB to 20 dB resulted in a PSNR improvement of 7.8 dB.The previous calculations were all carried out under the consideration of an equal texture/depth power allocation ratio or PAR 1:1.The power allocation between the texture and depth map will be rearranged when performing minimum distortion optimization under the optimal PAR mentioned in Table 2. Figure 2 demonstrates the change in behavior after applying the optimal PAR.As more power is assigned to the reference viewpoints' texture, the received quality of the texture frames will be enhanced, and the bandwidth allocation will increase at the expense of the depth map.In many cases, the bandwidth decrease in the depth map is greater than the bandwidth increase in the texture.The overall 3D video quality between two points (1848, 2048) along the bandwidth is drawn to show the improvement in the proposed resource control scheme.
The Kendo sequence corresponding to the average performance is plotted in Figure 3 to illustrate the bandwidth usage saving.Both the reference viewpoints' quality, and the five viewpoints' (two reference plus three virtual) overall quality are considered.As a scheme similar to the proposed scheme, the pioneering uncoded video transmission system SoftCast was applied to transmit each reference viewpoint texture and depth map separately, but with equal power allocation between texture and depth map or PAR 1:1.As shown, the conventional SoftCast employs the entire bandwidth and power resources to obtain a good PSNR.However, with less bandwidth usage, the minimum distortion optimization algorithm can obtain a somewhat comparable PSNR before applying the PAR.When the estimated PAR is applied, it alters the total transmission budget SNR chk in (17) according to the optimal ratio, affecting ρ * i and K * for that reference viewpoint texture and depth map and eventually leading to a PSNR enhancement.For instance, in Figure 3d when SNR chk = 0 dB , for the overall quality of the 3D video in SoftCast, the bandwidth usage for each reference viewpoint texture and depth map was 2048, and the achieved PSNR was 32.379 dB.However, the minimum distortion algorithm achieved a PSNR of 33.428 dB with only 88/2048 = 4.3% and 56/2048 = 2.7% of the texture and depth bandwidth usage.After applying the optimal PAR, the performance improved to 34.358 dB with a bandwidth usage of 6% and 1.4%.Furthermore, due to the flat tail of the minimum distortion curve, sacrificing a slight margin of the PSNR (e.g., τ = 98% of the highest PSNR), reduced the bandwidth usage further from 124 to 39 chunks for texture and from 30 to 8 chunks for the depth map.These figures show that the proposed resource control scheme saves more bandwidth usage at low SNR chk .Hence, it is more suitable when the transmission power budget is limited or under inferior channel conditions.
Table 3 lists the bandwidth usage for different 3D video standards.The distortionresource and resource-distortion algorithms [31] were modified to ensure a fair comparison.Therefore, instead of transmitting three reference viewpoints, only two were transmitted, and the number of frames in each GOP was increased from four to eight.Since our proposed resource control scheme and the algorithms in [31] exploit different correlation methods, i.e., 3D-DCT and 5D-DCT, the adopted chunk size differs for each scheme, leading to a different number of chunks.Hence, the percentage of the total bandwidth used in each case is calculated, to compare the bandwidth resource usage.The proposed method remarkably decreases the bandwidth usage while achieving satisfactory results.Assume the sender delivers about 30 GOPs to users whose demanded quality is 35 dB, given the bandwidth constraint n = 8192 chunks per GOP. Figure 4 depicts the power allocation for the consecutive GOPs and the total bandwidth usage for Kendo and PoznanStreet sequence reference viewpoints.It is apparent that the power allocation for depth map video is lower than for texture due to depth map'sbecause of depth map distinct characteristics (i.e., the 3D scene's geometric information which comes in the form of homogeneous regions separated by sharp edges).The same is true for bandwidth usage.In some GOPs, depth maps may occupy less than 0.005% of the available bandwidth, as in the PoznanStreet sequence.It is evident that the power allocation curves for both texture and depth map frames are relatively smooth, and the total bandwidth usage is kept at a comparatively low level.This complex resource allocation assists in maintaining the 3D video quality at the predefined PSNR value.Thus, the content is delivered with a favorable viewing experience at the user end, and other users can also benefit from the saved resources.

Bandwidth Distortion Optimization Performance
Similarly, assume the sender delivers about 30 GOPs to users whose demanded quality is 35 dB, under a transmission power constraint SNR chk = 10 dB. Figure 5 shows the bandwidth allocation curve for reference viewpoints for different video sequences.Compared with the large chunk number n = 8192 in one GOP, the total bandwidth usage fluctuates at a low and steady level.The bandwidth usage for the depth map can be less than three chunks in some GOPs.Therefore, it will be practical to allocate slightly more bandwidth resources (i.e., 4%), to avoid running Algorithm 3 several times, which will reduce the computational cost.Figure 6 presents the bandwidth usage saving obtained while achieving the target PSNRs.The conventional SoftCast exhausts all the available resources to achieve a good PSNR, unaware of the user's predefined distortion.However, the target distortion algorithm approaches the user's predefined PSNR with a lower bandwidth usage.The performance is slightly enhanced when the optimal PAR is applied.Based on Algorithms 2 and 3, the optimal PAR does not influence the power and bandwidth allocation.Nonetheless, provided the optimal PAR adjusts the power assigned jointly to texture and depth map frames in the pseudo-analog transmission, as shown in Figure 1, it still can influence the overall quality of the 3D video at the receiving end.
For example, in Figure 6c, when the power budget SNR chk = 10 dB and the target PSNR = 40 dB, in SoftCast, the bandwidth usage for texture and depth map videos with a single reference viewpoint is N sub and N sub , respectively, and the achieved overall quality of 3D video is 42.26 dB.However, we achieved a PSNR of 39.30 dB with the proposed target distortion algorithm using only 167/N sub = 8.15% and 42/N sub = 2% of the original bandwidth use for the texture and depth map.When the optimal PAR was applied, the proposed resource control scheme achieved a PSNR of 39.66 dB, close to the desired quality, while maintaining the same bandwidth usage.The bandwidth usage saving while achieving a target PSNR under a power budget SNR chk = 20 dB is shown in Figure 7.As mentioned earlier, the bandwidth resource usage percentage will be used for the comparison between our proposed scheme and the resourcedistortion algorithm [31].In this figure, the total bandwidth usage percentage is split equally between the texture and depth map.The proposed resource control scheme achieves PSNR results closer to the target quality than the resource-distortion algorithm, with less bandwidth usage.For example, in Figure 7b, the target PSNR = 45 dB.For the resource-distortion algorithm, the bandwidth usage to achieve 43.379 dB was 149/512 = 7.25% for texture and the same for the depth map.In contrast, the proposed resource control scheme achieved a PSNR of 43.556 dB with only 558/2048 = 6.8% and 350/2048 = 4.25% bandwidth usage for the texture and depth map, respectively.

Power and Bandwidth Trade-Off
A trade-off exists between power and bandwidth usage for each expected video quality, as the minimal resource solution is generally not unique.However, the valid ranges for power and bandwidth for a given distortion are constrained.The trade-off curves shown in Figure 8 correspond to the average texture and depth maps.The curve boundaries vary with different predefined distortions.In general, for lower target PSNR values, the required power and bandwidth usages are comparatively minimal.The power resource is high near the N sub lower boundary.At this point, a slight increase in bandwidth dramatically reduces the power.Nonetheless, allocating more bandwidth after a certain threshold does not reduce the power further.Hence, it is suggested that a minimum power and bandwidth usage pair is adopted close to the curve's turning point, where a slight change in one factor noticeably affects the other.

The Resource Control Scheme Performance Comparison
The performance of the proposed scheme was tested regarding the synthesized virtual views quality and compared with two benchmark schemes: SoftCast and the distortionresource algorithm [31].
Figure 9 illustrates the synthesis quality at virtual viewpoint 2 of the 3D video for the three schemes, along with the synthesized frames.Under a channel bandwidth constraint of 8192, the SoftCast scheme exploits all the available bandwidth resources and allocates power to each chunk.In contrast, the other schemes tend to retain HP chunks and discard LP chunks, which significantly saves bandwidth usage.For instance, when the power usage was 10 dB, the PSNR and SSIM of the proposed resource control scheme were close to the values obtained by SoftCast with only 7.28% bandwidth usage, which is nearly half of the bandwidth saved by the distortion-resource algorithm.As the power usage decreased to SNR = 0 dB, the proposed scheme achieved even better performance.With only 1.1535% bandwidth usage, the proposed scheme PSNR saved 1.21 dB over SoftCast, and in terms of SSIM, the proposed scheme improved the SSIM from 0.8244 to 0.9317.Figure 10 shows the constrained resource influence on pseudo-analog transmission.Assuming different bandwidth resources along with different power budgets reveals the pseudo-analog transmission behavior.The impact of discarding LP chunks and retaining HP chunks at lower SNR is different from that at higher SNR.For each given power budget, several bandwidth constraints are suggested.Under each bandwidth resource, SoftCast consumes the available bandwidth resource for the transmission process.Although the available bandwidth resource also bounds the proposed resource control scheme, it achieves PSNR max .To save more bandwidth, the proposed scheme was set to achieve τ•PSNR max .The dashed line represents the resulting video quality for SoftCast transmission utilizing the whole bandwidth.In Figure 10a the same result is shown as in Figure 9 but for only one virtual viewpoint.

Complexity Analysis
The proposed resource control scheme complexity is divided into two parts.
In the joint power allocation method, the complexity is almost determined by the preprocessing step.The full search method complexity is directly proportional to the PAR search range.If a search range of R is assumed with a search step of 0.05, the search will preprocess 20 × R times, including 20 × R × M times distortion computing of reference viewpoints and 20 × R × L times virtual view synthesis.However, the estimation method will preprocess only 6 times, including 6 × 2 times distortion computing of the leftmost and rightmost reference views and 6 × L times virtual view synthesis.Only the first frame of each sequence is preprocessed in the proposed estimation method, whereas the whole sequence is preprocessed in the full search method.The significance becomes clear as the sequence frames number grows.Thus, the estimation method complexity is 96.25% lower than that of the full search method, which is trivial.
In the greedy search algorithms (Algorithms 1-3), all possible chunk (n) or channel (r) values are traversed iteratively, and for each n or r, all the operations involved are linear.Thus, Algorithms 1-3 all have an O(N) complexity, which is negligible.The N chunks energy sorting process in each GOP is the major source of complexity, and is O(NlogN).However, instead of strictly sorting all the chunks according to their energy distribution, the sorting process can be avoided, as chunks can be sorted in a zigzag scanning manner (i.e., JPEG image compression).Furthermore, the complexity can be further reduced by considering the calculation of all the optimal values for the first GOP only and then applying the result to the remaining GOPs.This is possible if the video content of consecutive frames does not change greatly.

Conclusions
This paper presented a resource control scheme for pseudo-analog transmission based on the integration of joint power allocation with resource optimization for multi-view plus depth video transmission.While the joint texture/depth power allocation maximized the transmission efficiency, the resource allocation algorithms handled the data traffic burden caused by multi-view video content.The estimated optimal PAR between texture and depth performed similarly to the full search method but with negligible complexity.Following the SoftCast perspective, a minimum distortion algorithm was used to achieve the best viewing quality under a resource constraint.In contrast, the target distortion algorithm minimizes the resource usage for a predefined distortion requirement, which results in receiving videos of consistent viewing quality.As the minimal resource solution is generally not unique, we analyzed the power and bandwidth usage trade-off.The reported results verified the efficiency of the proposed scheme for reference and virtual viewpoints.During 3D video streaming, if the 3D scene changes, the video content characteristic in the subsequent frames may differ.Hence, the model parameters of the joint power allocation estimation using the first frames become inaccurate for the other frames, which have video content with considerably different characteristics.Adding scene change detection allows the system to update the model parameters automatically whenever the video content changes dramatically.In future work, the inter-view correlation could be exploited to optimize the transmission performance using 4D-DCT.

Figure 10 .
Figure 10.Performance comparison of different schemes under multiple SNR chk and bandwidth values: (a) Kendo; (b) PoznanStreet.

Table 3 .
Bandwidth usage comparison for different 3D video sequences.