Cross-Layer Optimization-Based Asymmetric Medical Video Transmission in IoT Systems

: At present, Internet of Things (IoT) networks are attracting much attention since they provide emerging opportunities and applications. In IoT networks, the asymmetric and symmetric studies on medical and biomedical video transmissions have become an interesting topic in both academic and industrial communities. Especially, the transmission process shows the characteristics of asymmetry: the symmetric video-encoding and -decoding processes become asymmetric (affected by modulation and demodulation) once a transmission error occurs. In such an asymmetric condition, the quality of service (QoS) of such video transmissions is impacted by many different factors across the physical (PHY-), medium access control (MAC-), and application (APP-) layers. To address this, we propose a cross-layer optimization-based strategy for asymmetric medical video transmission in IoT systems. The proposed strategy jointly utilizes the video-coding structure in the APP-layer, the power control and channel allocation in the MAC-layer, and the modulation and coding schemes in the PHY-layer. To obtain the optimum conﬁguration efﬁciently, the proposed strategy is formulated and proofed by a quasi-convex problem. Consequently, the proposed strategy could not only outperform the classical algorithms in terms of resource utilization but also improve the video quality under the resource-limited network efﬁciently.


Introduction
The emergence of the Internet of Things (IoT) system, connecting millions of people and billions of machines, is yielding a radical paradigm shift from the tradition toward ultra-reliable, low-latency communications (URLLC) [1]. As wireless communication technologies evolve, medical video transmission, as a very important application in URLLC, plays a vital role in medical image analysis and doctors' diagnosis [2]. However, the radio resource in IoT networks is very limited, and it varies dynamically due to many factors, e.g., channel gains, access users, etc. As a result, it is required to efficiently utilize the radio resource for medical video transmissions.
Without transmission errors, the video transmission is symmetrical, e.g., the video encoding-decoding process, the modulation-demodulation process, and the multiplexingdemultiplexing process are all symmetrical. However, once an error occurs, the process of the video recovery shows asymmetric feature [3].
The tradition video transmissions show the characteristics of symmetry; the encodingdecoding process of the video itself and the modulation-demodulation process of the communication system are symmetric when the transmission has no error [4]. In the scope of medical video transmission, asymmetric quality adaptation has proved to be an effective method in terms of maintaining the perceived quality while reducing the required transmission bandwidth. As a result, in the IoT networks, the asymmetric studies on the medical and biomedical video transmissions have become an interesting topic in both academic and industrial communities.
To circumvent the aforementioned issues, the joint video team (JVT) proposes an H.264 video-coding standard to provide certain frame rates, resolutions, and image fidelity scalabilities [5]. Since such a standard only works in the application layer (APP-), it could not adapt to the network resource and conditions dynamically [6], especially the timevarying wireless channels. To improve this, the authors propose to jointly optimize the rate and power for maximizing the video quality in [7][8][9]. However, such works only focus on the optimization of transmission in the physical (PHY-) layer, regardless of the video encoding structure. In [10,11], the authors optimize the video quality according to video frame-error-rate (FER), which is caused by either the transmission errors in PHY-layer or the video-coding structures in the APP-layer. To further improve the inefficiency of optimization, the joint consideration of medium access control (MAC-) and APP-layers is highly demanded as an interesting and important topic.
By adopting the cross-layer optimization method, refs. [12][13][14] propose a cross-layer method for scalable video streaming in wireless communication systems. The radio resources are allocated based on the output of the rate distortion (RD) function in the APPlayer, and the video data are scheduled when considering the video layer as the controllable unit. As is well known, the RD function cannot reflect the structure of the scalable video coding (SVC). Additionally, a video layer is dropped when a bit error occurs , resulting in resource waste. Ref. [15] investigates energy-optimized wireless video transmission by employing the APP-and PHY-layers jointly. The optimum configuration of the proposed strategy is obtained by the exhaustive method, resulting in high computational complexity. In [16], the authors configure the video transmission in the PHY-, MAC-, and APP-layers, individually. The authors propose to optimize the video quality by the utilizing information of the APP-and MAC-layers jointly, regardless of the PHY-layer in [17,18].
To solve the aforementioned issues, considering the characteristics of asymmetric medical video transmission, this paper proposes an efficient strategy that controls the video transmission across the PHY-, MAC-, and APP-layers. The major contributions of this paper are presented as follows.
(i) In the PHY-layer, the proposed strategy prefers the modulation and coding schemes (MCS) optimally, regarding the dynamical channel state information (CSI). Hence, our strategy could efficiently utilize the spectrum resources to transmit the video streams. (ii) In the MAC-layer, the video frame information in the APP-layer is employed as the control unit to replace the conventional video layer. Additionally, we propose to allocate the resource of MAC-layer with consideration of the video data priority. Such improvements ensure that the proposed strategy has higher adaptability than the related works [7][8][9][10][11][12][13][14][15][16] . (iii) By characterizing the video transmission scheme in IoT networks, the proposed strategy is formulated by a quasi-convex problem. Thus, the strategy could achieve its optimum to ensure the video transmission efficiently.
The remainder of this paper is organized as follows. Section 2 introduces the system model. In Section 3, we propose a novel video transmission strategy and optimize it mathematically. In Section 4, the simulation results are presented to validate our studies. Finally, Section 5 concludes this paper.

System Model
The system structure is illustrated in Figure 1. As shown, the video data are encoded/decoded by the H.264 encoder/decoder in the APP-layer. Then, such data are multiplexed and transmitted in the MAC-and PHY-layers. If the cross-layer information is not considered as feedback, the whole model is symmetrical. However, once there is an error in the link (which is very common in the practical transmission process), the problem of asymmetry will occur. Therefore, our strategy is to use three types of feedback to solve the asymmetry problem in the process of medical video transmission. In the APP-layer, the H.264/SVC encoder allows for a video sequence to be encoded into L scalable video layers, including one base layer and L − 1 enhancement layers. All of the enhancement layers rely on the base layer, and the higher enhancement layer relies on the lower enhancement layers. The data of each video layer consist of a sequence of frames. The video frame is partitioned into multiple groups of picture (GoPs), each one of which consists of an I-frame and M − 1 P-frames. Each frame is further encoded to multiple network abstract layer units (NALUs) to support different video layers in [6,19]. Consider l m n is the length of m-th NALU in the video layer-n. Additionally, H.264/SVC coarse grain scalability (CGS) is employed in [16,20].

Wireless
The data from the same video layer form a video stream that requires a service rate. Since the service rates are different across the video layers, the video transmissions need to handle such asymmetric rates to guarantee various user's quality of service (QoS). Let the set of all service rates in different video layers be expressed by (1) In the MAC-layer, orthogonal frequency division multiple access (OFDMA) is employed as the transmission mechanism. Consider K independent and identically distributed (i.i.d.) subchannels. Let the related channel gain, bandwidth, and transmit power be h k , B k , and E k with respect to the channel-k, respectively. On the one hand, in a typical LTE (long-term evolution) system, a resource block (RB) consists of 12 sub-carriers. On the other hand, the video data are commonly encoded into several video layers. Thus, consider K ≥ L holds.
In the PHY-layer, quadrature amplitude modulation (QAM) is preferred and the related modulation order of channel-k is M k . The wireless channels are i.i.d. block Rayleigh fading, which means the fading channel gain remains constant during a frame. This paper considers a perfect channel estimation, which means that the channel state information (CSI) is known at the transmitter. Additionally, the additive white Gaussian noise (AWGN) is considered at the receiver.

Video Transmission Strategy
In this section, we first formulate the studied asymmetric problem and then propose our video transmission strategy mathematically.

Problem Formulation
In the conventional strategies, since the pratical video transmissions show the characteristics of asymmetry, the objective is to transmit as many of the video layers as as possible under the constraints of the transmit power and service rate [4][5][6][7][8][9][10][11] . In other words, the more streams that can be correctly received in order, the better the quality of the video that will be achieved at the receiver. Accordingly, the traditional problem is given by [6,8] max where N represents the maximum number of video layers that can be received, and E max is the maximal transmit power. Evidently, there are four major issues in the conventional strategies: 1.
Since the video layer is employed as the controllable data unit in the MAC-layer, the entire corresponding video layer is dropped once an error occurs in the decoder; those asymmetric feature results are a huge waste of resources [9][10][11].

2.
Shannon's capacity is commonly used to calculate the transmission rate of k-th stream, where σ 2 is the noise power. Obviously, such an asymmetric rate estimation neglects the modulation impact and hence it could not reflect the practical situation [5,21].

3.
Additionally, although the video data from the base layer and the enhancement layers have different priorities during decoding, they are allocated to the channels without considering the unequal error protection (UEP) and encoding structure jointly [12,15].
To the best of our knowledge, there does not exist such a strategy that could simultaneously address such issues with low computational complexity.

Strategy Design and Optimization for Asymmetric Video Transmissions
To address the aforementioned asymmetric issues, we propose to utilize the video structure in the APP-layer, the channel allocation in the MAC-layer, and the power control and modulation schemes in the PHY-layer jointly. Accordingly, the structure of the proposed strategy owns two feedback flows, illustrated in Figure 1 with the broken-arrow lines: (i) by utilizing the video encoding information, the optimization controller determines the channel selection and NALU allocation. (ii) Additionally, according to the CSI, channel selection, and NALU allocation jointly, the optimization controller determines the modulation order and the related transmit power of each subchannel. The detailed improvements are explained as follows.
For issue-1, this paper proposes to adopt NALU as the controllable data unit in the MAC-layer, instead of the video layer in the conventional strategies [9][10][11]. Our objective is to maximize the average number of the decodable NALUs in a GoP, which is denoted by S, compared with the number of decodable video layers N in the conventional ones.
To solve issue-2, the bit rate c k , related to the employed modulation schemes in the PHY-layer, is employed to replace the Shannon rate r k . The bit rate has the following relationship with the modulation order: where . means to return the smallest integer greater than or equal to the expression. Additionally, the bit rate c k is given by As for issue-3, our strategy utilizes the asymmetric encoding structure in the APP-layer to schedule the NALUs to the matched frequency channels in the MAC-layer. Particularly, the NALUs with identical priority are allocated into the channels with similar CSI. Additionally, the NALUs with higher priority are scheduled into the channel with a better CSI [5]. Additionally, the number of channels allocated for the NALUs in the video layer-n is given by For issue-4, the encoding structure in the APP-layer and the BER in the PHY-layer is used to obtain the average number of the decodable NALUs in a GoP S for the optimal power allocation. Hence, S could reflect both the encoding structure and BER. The following Lemma 1 is derived to show the expression of S. Lemma 1. With the CGS encoder, the average number of the decodable NALUs in a GoP S is Proof of Lemma 1. Given a QAM with order M k , the related BER could be expressed by [7] λ k ≈ With regard to BER, let P m n be the probability that the m-th NALU in the video layer-n is decoded correctly. Evidently, P m n is a function of λ k . According to the encoding structure, when n = 1 holds, we could arrive at It is applicable to any combination of m and an arbitrary modulation scheme; these combinations are completely listed in [7].
Similarly, in conditions of n = 1, we could arrive at where n ∈ [2, N]. Hence, P m n (λ k ) is given by When each GoP contains M frames and each frame contains L video layers, S is given by Based on such improvements, when compared with (2), the optimization problem of our strategy is formulated by where K 0 = 1. Consequently, associating with (4)-(13) is transformed into an optimization problem only related to transmit power, given by To achieve the optimal transmit power of the multiple sub-channels in (14), the following corollary is derived. Corollary 1. S is a quasi-convex function of E 1 ,E 2 , · · · ,E K .

Proof. See Appendix A.
By Corollary 1, we could obtain the optimum of the proposed strategy with low computational complexity. Consequently, the proposed strategy could not only jointly utilize the asymmetric information from the APP-, MAC-, and PHY-layers but also achieve its optimum efficiently.
The overall proposed cross-layer optimization strategy, which combines all of the mentioned algorithms, is described in Table 1. As shown, our strategy consists of two major procedures: at first, according to the required transmission rate c k , the proposed strategy determines the modulation order M k based on the information such as the coding rate R k of the APP-layer and the channel bandwidth B k of the PHY-layer by (4) and (5). Secondly, the number of blocks successfully transmitted S is determined by the relationship between M k , c k , and E k by (14). Through the above two steps, we could determine the relationship between E k and S, where S includes the scalable video structure of the APP-layer and the content of the BER of the transmission system. While trying to reduce the asymmetric BER λ k of the transmission system, we make S as large as possible; in other words, we make the received video quality as high as possible. In addition, since Problem (14) is quasiconvex, we could obtain the optimum of the proposed strategy with low computational complexity [22]. Optimal solution set: {c * n , E * k } 1.
For each modulation scheme 3.
Do power allocation and obtain E * k (temp); 5.
Obtain the minimum −S by (13) and (14); Next, we employ the simulation experiment to verify the scientificity and rationality of our proposal.

Simulation Results
In this section, we evaluate the proposed video transmission strategy through Matlab simulation tools. Regarding the asymmetric system model, we have the following simulation configurations: the radius of cell is D = 250 m; the white noise density is −174 dBm/Hz; the path loss is evaluated by 128.1 + 37.6 log 10 (d) dB, where d denotes the distance between the transmitter and receiver in meters; and there are 12 subcarriers with 15 KHz bandwidth per subcarrier.
Two widely employed video sequences are preferred: Flowerand Foreman [23]. They are encoded with H.264/SVC coarse grain scalability (CGS) by the Joint Scalable Video Model version 9.19 at 30 fps. Additionally, the GoP size is 16, the frame pattern is IPPPP, and the duration time is 1 s. The quantization parameter (QP) of the Flower is 40-38-36, and the one of the Foreman is 41-38-36. In addition, the average Y-PSNR is adopted to quantify the received video quality. To meet the service rate requirement of each SVC video layer, the modulation schemes of the SVC video layers -1 (base layer), -2 (Enhancement Layer 1), and -3 (Enhancement Layer 2) are set as QPSK, 8-PSK, and 8-PSK, respectively. Such frame rates of the three SVC video layers are listed in Table 2. Firstly, to validate the correctness of Corollary 1, we take one frame and two video layers as an example. Figure 2 shows the results obtained by traversing all cases, where −S * is the minimum value obtained by the proposed algorithm. It is obvious that −S is a quasi-convex function of E 1 and E 2 , which is consistent with the Corollary 1. The results we achieved are in accordance with Corollary 1, to ensure that our problem is quasi-convex. It should be noted that other algorithms could also find the optimal solution, e.g., the genetic algorithm, but we use its quasi-convexity to obtain our optimal solution with low computational complexity. For a horizontal comparison, the strategy proposed in [16] is preferred as the baseline method, which schedules the video data without consideration of the data priorities. As can be seen from Figures 3 and 4, in the different asymmetric SNRs (in decibels), our proposed strategy always outperforms the baseline scheme across the Flower and Foreman video sequences with different numbers of video layers. Especially, when SNR is 10 dB in the case of three video layers, our experimental results achieve a great gain in performance, up to 15 dB. This is because the proposed strategy could efficiently utilize the information from APP-, MAC-, and PHY-layers jointly. As shown, both our strategy and the baseline one have similar performance when the SNR is high. This is because when the received SNR is high, the BER of modulation approximates zero, resulting in no video data loss. It is emphasized that such a situation is ideal for practical wireless communication systems.   Consequently, from Figures 3 and 4, it is evident that the proposed strategy could efficiently achieve the better video transmission performance rather than the related works.

Conclusions
In this paper, considering the asymmetric video streams and the transmission process with the characteristics of asymmetry, we propose a cross-layer-optimization-based strategy for asymmetric medical video transmission in IoT systems. The proposed strategy uses three types of feedback and jointly utilizes the video-coding structure in the APP layer, the power control and channel allocation in the MAC layer, and the modulation and coding schemes in the PHY layer. While trying to reduce the BER of the transmission system, we make the received video quality as high as possible. To obtain the optimum configuration efficiently, the proposed strategy is formulated and proofed by a quasi-convex problem. Consequently, the proposed strategy could not only outperform the classical algorithms in terms of resource utilization but also greatly improve the video quality under the resource-limited network efficiently.