Energy Efficiency Maximization of Two-Time-Slot and Three-Time-Slot Two-Way Relay-Assisted Device-to-Device Underlaying Cellular Networks

The continuous development of fifth generation (5G) communication and Internet of Thing (IoT) inevitably necessitates more advanced systems that can satisfy the growing wireless data rate demand of future equipment. Device-to-Device (D2D) communication, whose performance is evaluated in terms of the overall throughput, energy efficiency (EE) and spectral efficiency (SE), is considered a promising solution for the aforementioned problem. Thereby, this paper aims at improving the performance of the D2D communication underlaying cellular networks operating on multiple bands by maximizing the EE in its uplink. Thanks to the stochastic geometry theory, it is possible to derive the closed-form expressions for the successful transmission probability (STP), the total average transmission rate (TATR), and the total average energy efficiency (TAEE) of cellular and D2D users in different time slot setting. Particularly investigated and compared in this study, there are one-hop, direct, D2D communication in two time slots (2TS), and multi-hop, indirect, D2D communication in three time slots (3TS) with an additional D2D user acting as a two-way relay to assist the communication. Moreover, an optimization problem is formulated to calculate the maximum TAEE of D2D users and the optimum transmission power of both the cellular and D2D users. Herein this optimization study, which is proven to be non-convex, the Quality of Service (QoS) is ensured as the STP on every link is considered. The herein approach is referred to as relay-assisted D2D communication which is capable of delivering a notably better QoS and lower transmission power for communication among distant D2D users.


Related Work
There are several pieces of research conducted on the EE of the D2D communications underlaying cellular networks. Before discussing these studies further, it should be noted that the EE metric is important for such wireless networks because it indicates how efficiently the energy is utilized. Thereby, for instance, to improve the system performance, [26] studied the EE of D2D communications whereas the channel reuse was removed. The D2D communications deployed in heterogeneous networks results in better EE in comparison with the full small-cell deployment, hence, a greener solution for cellular network deployment. Likewise, the EE of D2D systems with cellular network deployment was analyzed and presented in [17,27]. Specifically in [27], the authors studied the compromise between the EE and the delay in D2D systems under stochastic traffic arrivals and time-varying channel conditions. Moreover, in [28], a D2D system whereas every user is self-seeking and individually attempts to maximize its EE was investigated. Therein, the EE and SE were studied provided that the system was constrained by the SE and the maximum transmission power limit. In [29], the extended radio resource management algorithms were integrated into one-way relay-assisted D2D communications aiming to balance off the SE and EE under constraints of mode selection and resource allocation. In the context of one-way D2D communications, the authors in [16,30] studied the system's maximum achievable transmission capacity. The OP for both cellular and D2D link was guaranteed during this study. In [31], network users were modelled based on the stochastic geometry on multiple bands and the D2D users' EE was maximized. Nevertheless, therein the study, the D2D communication was investigated in one way only, given that the influence of the transmission power of the cellular users was neglected. Differing itself from the above papers, this study focuses on the maximization and comparison study of the EE, which is henceforward addressed interchangeably as the TAEE, of the two-way D2D communication underlaying cellular networks in 2TS and 3TS scenario.

Main Contributions
As aforementioned, based on the stochastic geometry theory, an effective solution to maximize the EE of the cellular network in two-way D2D multi-hop communication is investigated. Accordingly, the optimum power of multiple bands is derived. In particular, the users are spatially and randomly distributed within the network following the homogeneous Poisson point process (PPP). It should be noted that the D2D and the cellular users in each band are distributed with different density levels. Additionally, the number of users participating in the transmission process is different. Because of the resource fluctuation and the developing network conditions of 5G, studies conducted in multi-hop scenario are considered more practical than one-hop ones. Besides, the dynamic behaviour of the system in those studies can as well be observed. Moreover, to manage the variation of the wireless channel parameters so as to enhance the network EE, multi-hop D2D communication as 2TS and 3TS mode with each possessing its adjustable power transmission is studied. Last but not least, as there are multiple bands for multiple hops, the D2D users, cellular users and other devices do not interfere with each other. This reduces considerably the complication of the interference management and improves the D2D communication performance.
Listed below are the main contributions of this paper: • Firstly, the closed-form expressions for the STP, the TATR, and the TAEE for the cellular users and D2D users in 2TS and 3TS mode of the two-way D2D communication network are derived. • Secondly, there is a proposed optimization problem aiming at maximizing the EE of D2D users while ensuring the QoS of the cellular and D2D users. This problem is solved thanks to a DB algorithm formulated from the fact that the multi-hop D2D users are subject to transmission power and OP constraints. This problem is proven to be non-convex and separated into two sub-problems. Solving the first sub-problem provides the maximized EE of the D2D users along with the optimum transmission power of the direct (one-hop) D2D user in 2TS mode. In the second sub-problem, an objective function is formulated to calculate the maximized EE and the optimum transmission power of the D2D users in 3TS mode with a two-way relay assisting the communication (multi-hop). It is noteworthy that the objective function is a sum of a number of sub-functions. When all of the sub-functions are maximized, by summing them up, it is possible to obtain the optimal result for the second sub-problem.

•
Finally, a comparison between this study and the study in [31,32] given similar system model and assumptions is done. It can be observed from the simulation results that the DB algorithm provides a near-optimal solution in 2TS and an outstanding performance in 3TS comparing to the conventional Branch and Bound (BB) algorithm [33].
Ones can find in Section 2 the scenario and system model descriptions. Section 3 describes the analytical studies and the problem formulation on multiple bands, especially the EE optimization problem. Section 4 describes in details the proposed DB algorithm utilized to solve the aforementioned optimization problem. Accordingly, Section 5 presents the simulation results. Section 6 concludes this paper.

System Model
A. Scenario Description: In this study, a D2D communication underlaying a general cellular network is considered. It should be noted that the cellular network's uplink frequency resources are shared with the D2D communication. The resource allocation of the whole network is handled by a base station (BS). Moreover, there are network and channel model depicted. Dissimilar to [34] where the authors conducted a power allocation problem study on a single band, the power allocation on multiple bands is considered in this paper. As its name suggests, the cellular network operates in a spectrum that is split into K number of bands, whereas each K has a subscript i indicating the i th band, given that i = 1, 2, .., K. Figure 1 depicts the two transmission modes focused in this study. The two are based on a cellular network with cellular links between cellular users and the BS receiver. As D2D users exchange information with one another in multiple bands scenario, they operate in two modes: • Two-time-slot (2TS) mode: direct, one-hop communication.

•
Three-time-slot (3TS) mode: indirect, via an in-between D2D user working as a two-way relay to assist the signal transmission, multi-hop communication.

B. Network Models:
There are four assumptions that are made based on the stochastic geometry theory in this study: Assumption 1. There is a two-dimensional plane symbolized by Φ c,i . Herein, by utilizing the homogeneous PPP, the random spatial distribution of the cellular users in the i th band can be carried out given that the density is δ c,i . To denote the cellular users' transmission power in the i th band, P c,i is used. The total transmission power of all the cellular users, according to [35], is calculated as

Assumption 2.
On , in the same manner with modelling cellular users, D2D users are distributed with homogeneous PPP Φ d,i , and density δ d,i . Every transmitter D2D user is coupled with a receiver D2D user which is a distance R d,i away from each other, and assigned with a Rayleigh fading coefficient f d,i . Similarly, there is a denotation for transmission power of D2D users in the i th band, P d,i , and P d = Assumption 3. The statistics of the PPP are not influenced by the presence of any typical receiver at the origin, as stated in Palm theory, [36]. Thus, at the 's origin, a typical receiver is placed. It plays the role of a BS typically utilized for cellular uplink transmitting task and a typical D2D receiver assisting the D2D communication. This BS is studied in combination with the D2D users. Moreover, there are presumed reciprocal, invariant uplink and downlink channels founded on consecutive equal time slots model.

Assumption 4.
If a pair of D2D users, D2D a and D2D b , are considerably distanced from each other, there is a need for another D2D user placed in the middle of the two to assist their signal transmission. This device works as a two-way relay which is denoted as D2D r , given that the transmission power of the relay nodes (RNs) equals to the D2D power in the i th band, P r,i = P d,i . Communication between D2D a and D2D b is realized utilizing decode-and-forward (DF) protocol with the assistance of D2D r . The three D2D users are distributed on following the stationary PPP Φ r,i , with density δ r,i . Furthermore, the distances from D2D r to D2D a and D2D b , denoted respectively as R ar d,i and R rb d,i , are constrained so that they can not be equal to or greater than R d,i , being the distance between D2D a and D2D b . To formulate the relation between the distances, α fraction is deployed so that R ar d,i = R d,i communication of multiple bands are calculated. Eventually, the EE maximization, treated as an optimization problem, is formulated.

The Signal to Noise Plus Interference Ratio
The performance analysis of the cellular transmission and D2D communication in two-way relay-assisted setup is conducted and presented in this subsection. In particular, as the D2D pair and two-way relay user share common cellular spectrum resources, the D2D communication in two modes, 2TS and 3TS, are investigated.

A.1. D2D Pair between the Direct Links and Cellular Links
In this mode, the D2D pair formed between the direct and cellular links, UED a and UED b , is investigated. They are in charge of exchanging information via direct one-hop D2D link in maximum two time slots without the two-way relay assisting the D2D transmission in the overlapping region.
According to [37], the received signal at the D2D receiver can be calculated in the i th band utilizing the following formula whereas the signal from D2D user is I d,00,i = P d,i R −m d,00,i f d,00 . Herein, there are f d,00 being the Rayleigh fading channel coefficient, and R d,00,i being the distance from the D2D transmitter to its corresponding D2D receiver in the i th band. Similarly, the interference caused by the cellular users is formulated as , in which f c,j0 is the Rayleigh fading channel coefficient, and R c,j0,i is the distance from the j th cellular user to the typical D2D in the i th band. Moreover, there is the interference caused by the D2D users I id,d0,i = ∑ It is constructed of f d,l0 being the Rayleigh fading channel coefficient and R d,l0,i being the distance from the l th D2D user as the transmitter to its corresponding receiver in the i th band. Last but not least, there is the thermal noise term N 0 . Due to the nature of the wireless broadcasting process, a typical receiver has to suffer the interference caused by the cellular and D2D communication in the same network. As aforementioned, the D2D communication operates utilizing the same uplink frequency resources of the cellular system. This, indeed, makes the interference an important parameter in this study. The whole network now is interference limited with neglect-able thermal noise. It should be noted that, in this scenario, the D2D pair communicates in 2TS mode. Accordingly, SINR is rewritten as the signal-to-interference ratio (SIR) of the typical D2D receiver in the i th band and expressed as

A.2. Cellular Transmissions of the Typical BS
As aforementioned, the cellular and D2D transmission process cause interference to the typical receiver. Thus, to calculate the signal receiver that a typical BS, and, accordingly, its cellular user in the i th band, ones can utilize the following formula whereas I c,00,i = P c,i R −m c,00,i f c,00 is the cellular signal of the typical BS. As in (2) are the interference from the cellular users to the typical BS, and the interference from D2D users to the typical BS, respectively, in the i th band. The SIR of the typical cellular receiver in the i th band can be obtained in the same manner as in (3). Thus, it is written as

B. Three Time Slot Communications Mode
This session presents the study conducted on 3TS communication whereas a D2D communication in stochastic geometry, specifically in this case, a relay-assisted D2D link, aids a pair of other D2D users in information exchange with physical-layer network coding scheme [38]. The main focus is paid on the EE of the 3TS transmission given that the UER is embedded with DF protocol.
In two-way 3TS communication mode, a two-way relay D2D user is placed in between the two D2D users to assist the data transmission. In the first and second time slots, information is respectively transmitted from the two D2D users, D2D a and D2D b , to the two-way relay, D2D r , with power P d,i . Then, within the third time slot, D2D r decodes the information sent from the two aforementioned sources. These two information flows are mixed utilizing some network coding technique, for example, the XOR coding operation. As a result, the information is network-coded and broadcasted back to the D2D a and D2D b . The assisting task of the D2D r is accomplished so far as the information transmission is finished. The two D2D users then can extract the information they need from the returned mixed flow of information.
The signals received by the D2D r user in the first and second time slot are expressed as follows y ar,i = I ar,00,i + I ir,c0,i + I ir,d0,i + I ir,r0,i + N 0 , and y br,i = I br,00,i + I ir,c0,i + I ir,d0,i + I ir,r0,i + N 0 , whereas there are formulas for calculating the signal from the two typical D2D users I ar,00,i = ,i , and from the two-way relay assisting the D2D users to the typical receiver I ir,r0,i = ∑ Within the third time slot, a portion β, (0 < β < 1), of the transmission power at the two-way relay D2D user, P d,i , is utilized by the D2D r to broadcast the mixed flow of the signal to the two D2D users. Particularly, βP d,i is allocated for transmission from D2D r to D2D a , and (1 − β) P d,i for D2D r to D2D b . Hence, at D2D a and D2D b , the resulting received signals are calculated respectively as follows y ra,i = I ra,00,i + I ir,c0,i + I ir,d0,i + I ir,r0,i + N 0 , and y rb,i = I rb,00,i + I ir,c0,i + I ir,d0,i + I ir,r0,i + N 0 , whereas I ra,00,i = βP d,i R d,00,i 1+α −m f ra,00 , and I rb, Similar to what is previously done, the SIR in the i th band of the typical D2D receivers, D2D a and D2D b user, via D2D r user transmission with DF protocol is obtained from and whereas with regard to the Assumption 3, the instantaneous SIR in the first, second and third time slot can be calculated as Ones can find the derivation of the STP of the typical D2D receiver in the next subsection.

The Successful Transmission Probability of Typical Receivers
Herein this subsection, the STP of typical receivers is derived. Signal transmission is considered successful as long as the transmitter can send the signal packet to its receivers within the third time slot given that the SIR recorded at the receivers is not lower than the whole D2D communication's threshold value. On the other hand, negative feedback is sent to the D2D communication and the failed-to-be-sent packet is put back on top of the queue waiting for another transmission round.

A.1. D2D Pair Communication
The Proposition 1 below defines the STP of a typical D2D user receiver when it accomplishes one round of communication in the duration of two time slots. Proposition 1. The SIR threshold of the D2D communication in the i th band is denoted as ζ d,i . From (2), the STP of the typical D2D pair receivers in the i th band has to satisfy Proof. As aforementioned in the network model section, f d,00 is independently and exponentially distributed with unit mean. In combination with the (2), the STP of the typical D2D receiver in the i th band is written as follows As the study is conducted on 2D planes and γ 2TS d,i follows the independent exponential distribution of D2D users and cellular channel gains, with regard to the (19), the STP can be rewritten as From Laplace transform definition and stochastic geometry theory in [35], below equations are obtained whereas Γ (x) = By substituting (21) and (22) into (20), the result in (18) can be obtained. The proof for Proposition 1 ends here. Remark 2. Proposition 1 discloses the relation between a certain number of the network's key parameters and the typical D2D receiver's STP. In particular, as the threshold value in the i th band, ζ d,i , increases, the γ d,i ≥ ζ d,i condition becomes more difficult to be satisfied leading to the decrease of STP. Moreover, as R d,00,i increases, the STP decreases. This is because the increase in the distance makes the channel fading more serious. On the other hand, the lower the density of the D2D users δ d,i or cellular users δ c,i , the higher the STP. This can be utilized for alleviating the network interference resulted from different users. Moreover, P d,i decrease is associated with the STP decrease. Last but not least, ζ d,i , R d,00,i , δ c,i , δ d,i , and P c,i increase will lead to the STP decrease. The parameters mentioned in the two previous sentences characterize the cellular transmission power, all of which, after being modified as stated, will introduce interference to the D2D communication, thus, causing the decline of the STP.

A.2. Typical Base Station
The STP of the typical D2D pair transmission and the typical BS in the i th band are derived similarly. This formula for the typical BS is presented in the Proposition 2 below. Proposition 2. The STP of the typical BS in the i th band has to satisfy whereas ζ c,i denotes the SIR threshold of cellular transmission, and Θ c, Proof. This proof is similar to the proof of Proposition 1.

Remark 3.
From Proposition 2, it should be noticed that the typical BS varies dependently on the network key parameters. As stated previously, there is interference caused by the transmission power of the D2D communication to the network. Thus, the increase in P d,i , and decrease in ζ c,i , R d,00,i , δ c,i , δ d,i , and P c,i will result in the STP decrease. Furthermore, the more the power usage for the cellular transmission, the higher the γ c,i , followed by the higher STP.

B. Three Time Slot Communication Mode
Proposition 3. The STP of the typical receiver D2D user through D2D r with DF protocol at D2D a , D2D b , is given respectively by and Proof. Utilizing Proposition 1, with regard to (12), (16), the STP in the first time slot is formulated as whereas Θ ar,i = πζ (13) and (17), the STP in the second time slot is formulated as whereas Θ br,i = πζ In the third time slot, based on (14)- (17), the STP at D2D a and D2D b are respectively formulated as and whereas As aforementioned, in 3TS mode, one cycle of signal transmission consists of three links indicating the flows of information: D2D a → D2D r , D2D b → D2D r , then D2D r → D2D a and D2D b . At the start of every cycle, the D2D user attempts to send some packets from its queue. To evaluate the success of the packet sending to the D2D r , the SIR value at the D2D r is compared with the SIR threshold value of the overall D2D transmission. If the D2D r 's SIR is greater than the network's SIR, an acknowledgement (ACK) message is forwarded to the D2D user indicating that the transmission is successful. Those packets are then removed from the queue. On the other hand, the packets remain on top of the queue waiting for another transmission round.
Hence, the STP end-to-end SIR with DF protocol at UED a in the 3TS mode can be given by Similarly, the STP end-to-end SIR with DF protocol at UED b in the 3TS mode can be given by . This is the end of the proof for Proposition 3.

Performance Energy Efficiency
Herein this section, the D2D communication performance in the K bands is studied in 2TS and 3TS mode. One of the factors to consider is the influence of the cellular system given that the D2D users and two-way relay assisting the communication to share the same cellular spectrum resources. So far, the STP of typical D2D receivers is derived. This is an important indicator for assessing the D2D network performance. In the following subsection, there are two others investigated indicators so-called total average transmissions rate (TATR), and total average energy efficiency (TAEE) of D2D communication.

Total Average Transmissions Rate and Total Average Energy Efficiency
In this subsection, the exact expression of the TATR is derived. The TAEE of D2D communication on multiple bands is then calculated. Subsequently, the TAEE optimizing formulation is done.

A. Two Time Slot Communication
According to Proposition 1, let TATR 2TS d,i denoted the TATR of a D2D pair operating in 2TS mode on the i th band, in one round of communication, is given by [39] TATR 2TS whereas the W i is the bandwidth of the i th band. The TAEE is defined as the TATR divided by the total power consumption as a product of δ d,i P d,i in [40]. Accordingly, the TAEE of D2D communication in K bands can be formulated as

B. Three Time Slot Communication
Given the STP of typical receiver D2D user through two-way relay-assisted D2D user in (24), the average (data) transmission rate (ATR) at D2D a in the i th band is formulated as Similarly, from (25), the ATR at D2D b in the i th band is formulated as Utilizing (34) and (35), the TATR of typical receiver D2D user through the assisting relay with DF protocol in 3TS can be obtained as follows Correspondingly, the TAEE of the typical receiver D2D user in 3TS mode in the K bands is formulated as follows m . Specifically, the TAEE of D2D communication in 2TS and 3TS mode are the objective functions of the optimization problems in this paper. Their formulations are presented in the following section.

Optimization Problem
The main focus of this section is on the optimization problem of the TAEE formulation given the network constraints. It should be noted that individual users' EE is affected by the interference introduced by the cellular and D2D transmission.

Optimization Total Average Energy Efficiency in 2TS
An optimization problem is formulated aiming at maximizing the TAEE of the D2D users in this subsection. This formulation, according to (33), applied in the K bands, is rewritten as whereas given that (i) The optimal resource allocation scheme aims at maximizing the TAEE 2TS d in (38) considering the P d,i under certain constraints being (i) the total transmission power of the D2D users over all the bands must not exceed the threshold value of P d ; (ii) the power of D2D users in each band is greater than or equal to zero and does not exceed the allowable threshold value of P d,i,max ; respectively in (iii) and (iv), the threshold values ε c,i and ε d,i are established as the upper limits for the OP of the cellular and D2D users in order to maintain the QoS for the network users.
With regard to the Proposition 1 and Proposition 2, in combination with (ii), (iii), and (iv) in (38), the optimization problem is simplified utilizing the feasible regions of P d,i as follows Hence, (38) can be transformed to whereas (a) following P 2TS d,i,down ≤ P d,i ≤ P 2TS d,i,up , and

Optimization Total Average Energy Efficiency in 3TS
Similar to the previous subsection, the optimal transmission power of the two-way relay-assisted D2D communication with DF protocol in 3TS mode in K bands is calculated as follows in the i th band. Using Proposition 3, and from (39), ones can obtain P 3TS However, having been proven to be non-convex, the objective function TAEE in (40) and (41) can not be solved by convex optimization theory. Besides, being different from the previous optimization problem where only a single band is considered [28,32,34,41,42], a Derivative-Based (DB) algorithm is proposed to solve the problem in case of multiple bands and presented in the next subsection.

Derivative-Based Algorithm
This subsection presents the DB algorithm proposed to solve the aforementioned non-convex TAEE optimization problem. In the existing literature, there is the BB algorithm [33] adopted to solve a number of challenging optimization problems in wireless communication networks [42,43]. However, having considered the limit of the BB algorithm, the authors instead establish a DB algorithm to solve the (40) and (41). This method exploits the objective function property of being the sum of several sub-functions and attempts to maximize those sub-functions. Firstly, the study is conducted on a D2D pair in 2TS mode.

A. Two Time Slot Communication
As in (33), it can be observed that TAEE 2TS d is a sum of K sub-functions. Eliminating the constraint (40) makes every P d,i become independent on each other. Consequently, the maximum TAEE 2TS d can be acquired as EE d,i is maximum on the feasible region. It is obvious that maximizing one at a time every EE in i th band is more practical than having the (40) solved.
Thereby, the global maximum P d,i,max of the EE d,i of D2D communication in the i th band can be calculated given that it is in the range of P 2TS Accordingly, re-defining P d,i = P d,i,max for i = 1, 2, ..., K band is done. The P d,i is then adjusted so that it satisfies K ∑ i=1 P d,i = P d and produces the lowest EE reduction. Based on this principal, the DB algorithm is proposed, and is implemented in details henceforward. Firstly, the P d,i,max is calculated in the following Theorem 1.
Proof. From (40), It appears that in (43), the first and second term are greater than zero. Thus, focus is paid only 3. Provided that P d,i,down ≥ 2B 2TS i m m 2 , then there is a monotonic decrease of f i (P d,i ) on the feasible region. Hence, f i (P d,i ) reaches its maximum at P d,i,max = P d,i,down . Theorem 1 holds true having considered the aforementioned cases in (42).
As stated in Theorem 1, P d,i,max can be defined utilizing the relationship between 2B 2TS i m m 2 and the feasible region P 2TS d,i,low , P 2TS d,i,up of f i (P d,i ). Hence forward, the current P d,i value will be denoted as P d,i,index . The method adopted to adjust the value of P d,i after assigning P d,i,max to P d,i is presented as follows whereas the adjustment step of P d,i is denoted as f and is controlled by parameter n. Because the global maximum point of f i (P d,i ) on the feasible region in (40) is adjusted, the P d,i is caused to deviate from P d,i,max following the below condition Referring to Taylor's theorem in [43], the f i (P d,i ) at P d,i = P d,i,index by the first order Taylor polynomial is approximated as Provided that P d,i is adjusted from P d,i,index to P d,i,index − f , then, EE d is reduced by an amount of The reduced value is determined predominantly by f i (P d,i,index ) as stated in (45). Thus, the f i (P d,i,index ) is calculated for i = 1, 2, ..., K and P d,j is adjusted from P d,j,index to P d,j,index − f , whereas j satisfies the condition of f j P d,j,index ≤ f i (P d,i,index ) , ∀i = 1, 2, ..., K.
This process is replicated for a minimum of n times until which the condition then, the near-optimal solution to (40) is obtained. Based on the aforementioned analysis, the DB algorithm is formulated in the following Algorithm 1. Key steps are presented in details as follows  (40) Initialize the tolerance ε ← 10 −3 that controls the loop; f is the adjustment step of P d,i , and n is the parameter that controls f . Calculate P d,i,max based on Theorem 1; Set P d,i = P d,i,max ; Calculate Update der j = f j P d,j ; end end

B. Three Time Slot Communication
Being approached as the same manner as in Algorithm 1, the DB algorithm for solving the non-convex TAEE optimization problem of the two-way relay-assisted D2D communication with DF protocol in 3TS mode in K bands, as mentioned in (41), is shown in the following Algorithm 2.

Algorithm 2: The DB algorithm for 3TS
Initialize the tolerance ε ← 10 −3 that controls the loop; Calculate P d,i,max based on Theorem 1 for f a,i (P d,i ) and f b,i (P d,i ); Set P d,i = P d,i,max ; Calculate der a,i = f a,i (P d,i ) , and der b, Set j = arg (min {der a,i }), and l = arg min der b,i ; if der a,j ≤ der b,l then if P d,j − f > P 3TS d,j,up or P d,j − f < P 3TS d,j,down then der a,i = +∞; else Update der a,j = f a,j P d,j ;

Numerical Results
Herein this section, the TAEE performance of the D2D communication in the 2TS and 3TS case is studied with the help of the DB algorithm in the K bands. The Table 1 below lists out the primary simulation parameters [32]. Figures 2 and 3 plot the TAEE in all bands versus the density of D2D users, along with the transmission power of the cellular users calculated from the DB algorithm, in respectively 2TS and 3TS mode. There is the density setting of [δ d,1 , δ d,2 , · · · , δ d,5 ] = δ d, ref × [10,1,10,10,10]. It can be observed that the TAEE obtained from the DB algorithm is nearly coincident with the optimal solution. On the other hand, there exists a considerable gap between the performance plot of the DB algorithm and the conventional BB algorithm [33]. It can be concluded that the DB algorithm yields near-optimal results and significantly outperforms the ones from conventional BB algorithm.
The two figures under consideration share the same common pattern being that after rising to a certain peak, the TAEE gradually declines as the δ d,re f continues to increase further. This is because as the density of the users is small, the interference they cause through spectrum sharing is insignificant. This remains true up to a certain checkpoint. Hence, from the beginning, as δ d,re f rises together with the ATR, the rise of the interference stays neglect-able resulting in higher and higher TAEE. Nevertheless, as δ d,re f reaches its peak and continues to grow, the interference turns to be significant as there is a higher demand for energy consumption to coordinate the interference which causes the decline of the TAEE. To the direction of the black arrow indicating the P c,i , the lowest curve corresponds to the highest P c,i and the highest curve corresponds to the lowest P c,i . Accordingly, it can be observed that the TAEE curve tends to shift downward as the transmission power of the cellular users P c,i increases. In fact, the rise in P c,i results in more severe interference caused by the cellular transmission to the D2D communication. This rises as well the power demand for coordinating the interference thus causing the TAEE in overall to decrease.    It can be observed that TAEE performance in 3TS mode is remarkably better than in 2TS mode. This is because, in 2TS mode, the D2D direct link is utilized for signal transmission without two-way relay D2D users assisting the transmission with distance extension, Algorithms 1 and 2, resulting in lower performance. On the other hand, in two-way relay-assisted case, with small δ d,re f , the TAEE performance of the D2D network is highly boosted with the two-way relay's assistance. In particular, the TAEE rises sharply as δ d,re f increases from 1 to 3. The TAEE continues to increase with a lower rate in association with the increase of the D2D user density. The TAEE in 2TS and 3TS reaches its maximum value at δ d,re f equals respectively to 5 or 6. Then, as the D2D user density increases further causing excessive interference to the D2D network, the TAEE performance gradually shrinks.
Figures 4 and 5 plot the TAEE versus the cellular user density δ c,re f in 2TS and 3TS mode. The higher the cellular user density, the lower the TAEE. This is because more cellular users introduce higher interference on D2D users. The D2D users, thus, have to consume more power to maintain the QoS leading to the exponential decline of the TAEE. It can be observed that the DB algorithm performs better than the conventional BB one. Nevertheless, as a very high-density level, there is not much difference between them since the TAEE at this level can not be enhanced further. Moreover, the TAEE is plotted for different R d,re f being the distance between two D2D users. As the R d,re f becomes higher, the TAEE tends to shift further downward since maintaining the QoS over longer distance requires higher power usage. Moving to the next figure, it can be seen that the TAEE performance in 3TS mode is similar to the 2TS mode but with an overall higher level. This further emphasizes the role of the communication mode in optimizing the transmission power of the cellular users. Besides, the two-way TAEE in 2TS and 3TS mode is compared with the one-way direct link TAEE in [31]. The TAEE is calculated by first solving the transmission power of the D2D users as in (40) and (41). Accordingly, after putting the transmission power of the cellular users, which is assumed to be constant, P c,i = 325mW, and the power of devices into the DB algorithm, the TAEE is obtained. Similar to the TAEE in 2TS, the δ c,re f increase is associated with the increase of the cellular users' interference causing the TAEE to decline exponentially. The fact that higher overall TAEE is achieved in 3TS mode helps as well to enhance the D2D network in all bands. The TAEE, in the beginning, is considerably higher calculated from Algorithm 2 because they operate at the optimum transmission power.

Conclusions
This paper presents the maximization study of the TAEE of the D2D communication underlaying cellular networks in multi-hop 2TS and 3TS communication mode. First of all, by utilizing the stochastic geometry theory, the closed-form expressions for the STP and the TATR of both cellular and D2D users on multiple bands are derived. Accordingly, the optimization problem of maximizing the TAEE is formulated having considered the STP as the QoS of the cellular and D2D users in multi-hop two-way relay-assisted D2D network. Having proven that the TAEE optimization problem is non-convex, by taking advantage of the property of the objective function to be a sum of several sub-functions, a DB algorithm is proposed to collectively optimize the sub-functions to achieve the near-optimal solution for the objective one. The simulation results reveal that the performance of the authors' proposed scheme is better than the currently known schemes in the literature. Besides, there is a remarkable EE enhancement accomplished by joint optimization of the transmission power of the cellular and D2D users. Developers can consider applying this method to the D2D communication of the 5G wireless networks in the future. Furthermore, it can be observed from the simulation results that the D2D direct link distance in 2TS mode; the one-way communication of cellular and D2D pair; and the D2D two-way relay user density in 3TS mode are the three factors that cause interference to the overall D2D network with different intensity. For the future study, band selection and power allocation can be considered for cross-tier interference minimization between the cellular and D2D users.