Energy-Efficient Optimization for Energy-Harvesting-Enabled mmWave-UAV Heterogeneous Networks

Energy Harvesting (EH) is a promising paradigm for 5G heterogeneous communication. EH-enabled Device-to-Device (D2D) communication can assist devices in overcoming the disadvantage of limited battery capacity and improving the Energy Efficiency (EE) by performing EH from ambient wireless signals. Although numerous research works have been conducted on EH-based D2D communication scenarios, the feature of EH-based D2D communication underlying Air-to-Ground (A2G) millimeter-Wave (mmWave) networks has not been fully studied. In this paper, we considered a scenario where multiple Unmanned Aerial Vehicles (UAVs) are deployed to provide energy for D2D Users (DUs) and data transmission for Cellular Users (CUs). We aimed to improve the network EE of EH-enabled D2D communications while reducing the time complexity of beam alignment for mmWave-enabled D2D Users (DUs). We considered a scenario where multiple EH-enabled DUs and CUs coexist, sharing the full mmWave frequency band and adopting high-directive beams for transmitting. To improve the network EE, we propose a joint beamwidth selection, power control, and EH time ratio optimization algorithm for DUs based on alternating optimization. We iteratively optimized one of the three variables, fixing the other two. During each iteration, we first used a game-theoretic approach to adjust the beamwidths of DUs to achieve the sub-optimal EE. Then, the problem with regard to power optimization was solved by the Dinkelbach method and Successive Convex Approximation (SCA). Finally, we performed the optimization of the EH time ratio using linear fractional programming to further increase the EE. By performing extensive simulation experiments, we validated the convergence and effectiveness of our algorithm. The results showed that our proposed algorithm outperformed the fixed beamwidth and fixed power strategy and could closely approach the performance of exhaustive search, particle swarm optimization, and the genetic algorithm, but with a much reduced time complexity.


Introduction
Recently, UAV-assisted communication-network-supporting energy transfer has gained significant attention. With the advantages of flexible deployment and low-cost deployment, UAVs can quickly establish A2G links and transmit information and energy to ground users [1,2]. UAVs have been widely used in emergency communications to quickly restore ground equipment communications in disaster-stricken areas or improve the quality of service for degraded users in overloaded ground areas. UAV Base Stations (UAV-BSs) equipped with large-scale antenna arrays can provide directional transmission of information and energy, thereby avoiding interference between UAVs caused by strong Line-of-Sight (LoS) paths.
As predicted by Cisco, there will be 14.7 billion Machine-to-Machine (M2M) connections and on average 1.8 connections for each member of the global population by 2023 [3]. M2M communication or D2D communication is playing a more important role in the mmWave D2D link, it is of great significance to obtain the optimal/sub-optimal beamwidth in a short time to improve the network performance.
Besides, EH emerges as a promising technology to increase the EE of users relying on wireless charging. In traditional low-frequency scenarios, wireless charging has not shown great performance due to the low received signal strength. In 5G mmWave links, multi-antenna system can be employed at the transmitter to achieve effective RF energy harvesting through the enhanced transmission gain brought by beamforming. The authors in [12][13][14] verified the feasibility of the combination of EH and mmWave technology. Although the optimization of the network EE in EH-enabled low-frequency scenarios has been widely studied, there still is a lack of an effective method to improve the network EE in mmWave scenarios. Hence, in this paper, we applied EH to mmWave D2D links and aimed to enhance the battery life of the links by increasing the network EE.
When a D2D link selects a large transmitting and receiving beamwidth, the beamsweeping process is carried out in a large-angle manner, which will reduce the beam alignment time and increase the time for EH and DT. However, a large transmitting beamwidth will possibly degrade the performance of other links, and a large receiving beam will suffer more serious interference. On the contrary, reducing the transmitting and receiving beamwidth is beneficial to mitigate potential interference, but the time of the BA phase will be greatly prolonged because a huge number of pilot signals are needed to align the beams, which will shorten the time for energy harvesting and data transmission and degrade the system performance.
Moreover, for D2D links, the power and EH time ratio have a significant impact on the network EE. The transmit power not only affects the user's achievable rate, but also the rate of other D2D links and the overall energy consumption. Therefore, the transmit power for DUs will have a complex impact on the network EE and needs to be optimized. Similarly, the network EE is also closely related to the EH time ratio. Fixing the BA time ratio and increasing the EH time ratio can increase the energy harvested, but this will also reduce the effective transmission time, and vice versa. Moreover, the performance of CUs should be protected by controlling the interference from DUs to CUs.
In this case, to improve the network EE, the three coupling variables, beamwidth, transmit power, and the EH time ratio, need to be considered comprehensively. In this paper, we propose a joint optimization of beamwidth selection, power control, and the EH time ratio to iteratively optimize the variables while fixing the other two. First, a low-complexity beamwidth selection scheme was designed to allow users to find the optimal/sub-optimal beamwidth in a short time. Specifically, DUs first form coalitions to choose different beamwidths. Then, the coalitions are continuously updated along the direction of improving the network utility (i.e., EE) until the final Nash equilibrium is reached. Next, a power control method was designed to suppress the interference and improve the EE. The power levels of D2D transmitters were optimized using the Dinkelbach method and SCA to obtain the sub-optimal power for DUs. Finally, the optimization of the EH time ratio was executed to further improve the EE. The EH time ratio was optimized by solving the convex linear fractional programming. The network EE was iteratively updated until reaching the convergence point.
In conclusion, the main contributions of this paper can be summarized as follows: • Different from existing research, we considered a downlink scenario of EH-enabled D2D underlying a UAV-assisted mmWave network and took into consideration the complex interference issue. We built an efficient framework for improving the EE of D2D links by jointly optimizing the beamwidth, transmit power, and EH time ratio of the D2D links using alternating optimization, while guaranteeing the rate requirements of CUs and DUs. The proposed algorithm has low computation complexity and is applicable to large-scale mmWave networks; • We constructed a coalition game model to solve the beamwidth selection problem for D2D users, which can explore the potential beamwidth combinations of DUs and converge to the beamwidth structure with sub-optimal utility. A low-complexity beamwidth selection algorithm was proposed to adjust the beamwidths of DUs for increasing the network EE; • The transmit power of D2D links was optimized to further improve the system EE.
The original non-convex problem with regard to power optimization was solved by non-fractional programming and successive convex approximation. Specifically, we first transformed the non-convex fractional programming into a non-fractional problem using the Dinkelbach method. Next, we eliminated the non-convexity in the formula by approximating the non-convex part with its first-order Taylor expansion. Along with the power, we also optimized the EH time ratio. The problem with regard to the EH time ratio was proven to be a convex linear fractional programming, and hence, the optimal solution is readily obtained; • We validated the convergence and effectiveness of the proposed algorithm by performing numerical experiments with different settings for the network parameters. The proposed algorithm can converge to the optimal EE after finite iterations and effectively improve the network EE compared with the Fixed Beamwidth and Fixed Power (FBFP) scheme. Our proposed algorithm can achieve performance close to ES and PSO, but with much reduced complexity. The factors that potentially affect the network EE, such as the number of DUs, the number of CUs, the maximal power for DUs, the minimum harvested energy, and the minimum rate for CUs, are also thoroughly discussed and analyzed.
The rest of this paper is organized as follows: The related works are given in Section 2. In Section 3, the EH-enabled D2D underlying the UAV network model and problem formulation are presented in detail. In Section 4, the alternating optimization method is employed to jointly optimize the beamwidth, transmit power, and EH time ratio to obtain the sub-optimal EE with a low time complexity. The simulation results are shown and discussed thoroughly in Section 5. Finally, the conclusions are drawn in Section 6.

Related Works
Under the mmWave network architecture, energy harvesting is capable of providing reliable and sustainable energy coverage and shows better performance than conventional low-frequency scenarios [12][13][14]. The authors of [15,16] investigated the performance of energy harvesting in UAV-assisted mmWave technology. The authors of [15] derived the energy and SINR coverage probability under the UAV mmWave network enabled by energy harvesting. In [16], the authors analyzed the energy and SINR coverage probability in a hybrid network where sub-6G transmission and mmWave transmission coexist. In [17], the trajectories of two UAVs for data gathering and energy transferring were optimized using the deep reinforcement learning approach to reduce the system energy consumption and improve the timeliness of service for users. The trajectory optimization was performed in [18] to minimize the energy consumption of the UAV and guarantee the user rate. In [19], the authors considered the effect of beam alignment error in an EH-enabled mmWave network and derived the energy coverage probability by assuming a non-linear EH process.
Moreover, energy harvesting can provide new possibilities for improving the performance of D2D communication [20], which has been the focus of academia [21][22][23][24][25][26]. In [21], the authors considered a non-linear energy harvesting model and classified the users into the EH group and the non-EH group based on the minimum harvested power threshold. Then, the resource allocation and power control were iteratively optimized to maximize the sum-EE using non-fractional programming and the Lagrange method. The authors of [22] proposed a power control and time scheduling algorithm to improve the capacity of the system under a time splitting architecture, where each BS was equipped with a single antenna, and they further reformed the algorithm and applied it to the scenario where the BSs were equipped with multiple antennas in [23]. The authors in [24] used a stochastic model to derive the ergodic capacity of EH-enabled D2D communication and proposed an effective mode selection method to improve the system EE. The outage issue of EH-based D2D communication has also been widely studied [25,26]. In [25], the outage probability of D2D links was analyzed considering the spectrum sharing between DUs and CUs. A power control scheme for DUs while ensuring the outage probability for CUs below the threshold was proposed in [26].
In mmWave networks, another issue that attracts the attention of academia is the optimization of the beamwidth. To achieve the tradeoff between alleviating interference and reducing beam alignment complexity, there have been many research works on beamwidth selection in mmWave networks [9][10][11][27][28][29][30][31][32][33]. In [11], we proposed an coalition-game-based beamwidth selection algorithm for mmWave-enabled D2D links. In [27], the authors proposed two joint beamwidth selection and scheduling schemes. The authors exploited the method of interference estimation rather than precise calculation to measure the beam-level interference among the users, which achieved much improved throughput compared with existing standards [34]. In [28], the authors revealed the relationship between latency and overhead in the beam alignment process, which indicated that multi-beam simultaneous scan can provide the best tradeoff between latency and overhead. A recent research work [29] used the geometry model to characterize the beamwidth selection and obtained the near-optimal solution for the beamwidths of users in mmWave networks. In [30], the authors considered a simple full-duplex mmWave wireless network architecture and exploited the numerical solution to obtain the optimal beamwidth for the users to maximize the network energy efficiency. In D2D communication, the beamwidth design for devices is also a research hotspot. In [9,10], the authors considered a single D2D pair in a mmWave network without considering the inter-user interference. The throughput and EE were maximized in [9,10], respectively, by optimizing the beamwidth and EH ratio of the D2D pair. To overcome the complexity of ES, PSO has also been applied in beamwidth optimization of V2V communication [31] and D2D communication [32,33] to obtain the sub-optimal solution.

Network Topology
As shown in Figure 1, we considered an EH-enabled D2D underlying the mmWave network consisting of multiple mmWave UAV-BSs (UBSs). M cellular users and N D2D users were randomly distributed in the considered area. Each D2D link was composed of a D2D Transmitter (DT) and D2D Receiver (DR). There exist two kinds of communication links in the network: A2G links and D2D links. A2G links denote the transmission from the UAV to ground users, including cellular users and the D2D transmitter. D2D links denote the direct transmission from the DT to the DR. It was assumed that both the A2G links and D2D links adopt narrow beams for transmission. We denote the set of CUs and DUs as C = {C 1 , C 2 , . . . , C M } and D = {D 1 , D 2 , . . . , D N }, respectively. For the nth D2D link, we denote its DT and DR as D T n and D R n . In our scenario, both A2G links and D2D links reuse the full mmWave frequency band and adopt narrow beams for transmission. The UAVs are connected to the ground gateway to transmit the backhauling data. In addition, we assumed that data collection and algorithm execution were performed at the Operation And Management (OAM) in the gateway. We assumed that the coverage of the UAV is determined based on the path loss threshold [35], and the association between UAVs and users was established based on minimum path loss criterion. In addition, we ignored the movement of UAVs and users and assumed all the UAVs hovered at a fixed horizontal position and fixed altitude. However, our algorithm can also be utilized in a dynamic and fast-changing environment since the time frame can be approximately divided into multiple snapshots, in which our system model and algorithm can be applied. In our paper, we did not focus on the energy consumption of the UAVs, since they have a larger battery capacity and can be charged by a charging station. In our scenario, we incorporated EH technology into the system and assumed that each DT can harvest energy from the ambient RF signals to prolong the transmission time. We assumed that the A2G links, including signal and interfering signals, are utilized by the DTs to harvest energy and restore the energy of the battery. As shown in Figure 2, we adopted the EH-BA-DT protocol for D2D links, where the total transmission time for D n with duration T is divided into three phases: (1) energy harvest phase with a duration of τ e T: DTs harvest the energy from the ambient environment and store it in the battery; (2) beam alignment phase with a duration of τ ba n T, during which the DUs perform beamwidth alignment for the transmitter and receiver; (3) data transmission phase with a duration of (1 − τ e − τ ba n )T: after EH and BA, the DT starts transmitting data to its associated DR, where τ e is the unified time ratio of EH for all the DUs, and τ ba n is the time ratio of BA for D n , respectively.

Alignment Delay
In our model, we only considered the optimization of beamwidths for D2D links, assuming that the beamwidths of A2G links have been determined and vary with time. According to [7], to avoid the high time complexity of narrow beam search in the whole angle range, a sector-level alignment with a large angle sweeping was first performed. Then, the beam-level alignment with a much more refined beam search was conducted in the aligned sector, which obtained the best matched beam pair at the transmitting and receiving side. Hence, the beam alignment time ratio for D2D link D T n → D R n is: where θ t n,n and θ r n,n denote the sector-level beamwidth at D T n and D R n and T p denotes the pilot transmission time. For analytical tractability, we assumed the beamwidths for D T n and D R n were equal: φ n = φ t n,n = φ r n,n , which is beneficial to speed up the beam alignment process. In addition, we assumed that θ = θ t n,n = θ r n,n , ∀n ∈ N , which means the sectorlevel beamwidths for all the DUs are also equal. Let BW denote the feasible beamwidth set, which contains all the feasible beamwidths that each D2D pair can select, then the range of BW is denoted as follows: where ceil(x) denotes the ceiling function, BW i is an element in the feasible beamwidth set BW, and φ min is the minimum beamwidth for D2D pairs.

Channel Modeling
We assumed that both D2D links and A2G links had full access to the whole mmWave bandwidth and the interference can be avoided to a great extent thanks to high directional beams. However, once the beams of interferers and receivers are accidentally aligned, severe interference is introduced. Moreover, there also exists sidelobe interference in the network, which is trivial, but not negligible.
We adopted the sectored antenna model presented in [36] to calculate the antenna gain of mmWave links, where the antenna gain of the beam-steered transmitter and receivers is constant for all the angles in the mainlobe, and the antenna gain in the sidelobe was also regarded as a small constant 0 < z 1. Let ϕ t a,b and ϕ r a,b denote the alignment error angle at the transmitter and receiver, and the transmitting beam gain at transmitter a is calculated as: where φ t a,b is the mainlobe beamwidth selected by transmitter a. Similarly, the receiving beam gain at receiver b is calculated as (4), where φ r a,b is the receiving beamwidth at receiver b: For A2G links, the channel gain between the UAV m and receiver k (CU or DT) is calculated as g c U m ,k = 10 −PL Um,k /10 , where PL U m ,k is the path loss between UAV U m and user k, which follow the free-space path loss model due to the high probability of the LoS path: where d 0 is the channel gain of reference distance 1m and d U m ,m is the distance between user m and UBS U m . Hence, the total channel gain between U m and k is: For ground links, the channel gain of link a (DT) → b (DR or CU) is calculated as g c a,b = 10 −PL a,b /10 , where PL a,b denotes the path loss of link a → b, which is modeled as [36]: where d a,b is the horizontal distance from the transmitter a to the receiver b of each link. λ and A denote the wavelength and the path loss exponent. χ SF is the shadow fading factor, which follows a Gaussian distribution χ SF ∼ N (0, σ 2 SF ). Hence, the total link gain from a → b is calculated as the product of the transmitting gain, channel gain, and receiving gain: For cellular user C m , the interference comes from the other UBSs and D2D transmitters in the network, and its Signal-to-Interference-plus-Noise Ratio (SINR) is calculated as: where P UBS m is the transmit power of UBS m , which is the serving UBS of C m . I m C = ∑ i∈M\U m ∑ j∈C i P UBS i h UBS i,j ,m denotes the interference from other UBSs in the network, where C i is the set of serving CUs of UAV i. h UBS i,j ,m is the channel gain between the transmit beam of UAV i for the j-th user and the receive beam of D2D receiver m.
For D2D pair D n , its SINR can be calculated as: where I n C = ∑ i∈M ∑ j∈C i P UBS i h UBS i,j ,D r n and I n D = ∑ i∈N \n p i h i,n denote the interference from cellular links and other D2D transmitters, respectively.
For EH-enabled DTs, assuming the energy harvesting efficiency is γ, the harvested energy at D T n is calculated as: Then, the total consumed power during the transmission time T for D n is: where P cir is the circuit power consumption at D t n and D r n . For simplicity, we omitted the power consumption for DUs in the BA phase as it is trivial when compared to the amount of harvested energy and consumed energy for transmitting data.
According to the Shannon formula, the achievable rate of CUs and DUs is expressed as (13) and (14), respectively: Under these assumptions, we can formulate our target problem. Without loss of generality, we express the network EE as the ratio of the sum of the user rate and the sum of the energy consumed by DUs. Let φ = {φ n , ∀n ∈ N } and p = {p n , ∀n ∈ N } denote the beamwidth vector and power vector of D2D links, then the problem of maximizing the EE of D2D links by optimizing φ, p, and τ e is formulated as: where C1 and C2 guarantee the minimum rate requirements for DUs and CUs, respectively. C3 indicates the feasible range for the transmitting and receiving beamwidths of the D2D links. C4 is the power constraint for the DUs. C5 and C6 ensure that the time ratios for beam alignment, energy harvesting, and data transmission are all positive. C7 indicates that the harvested energy for each DU should exceed the minimum amount to activate the EH process.

Proposed Algorithm
The problem (15) is difficult to solve and computationally hard, especially when a huge number of D2D links and CUs exist in the network. In this section, we resorted to the alternating optimization, also known as the block coordinate descent method, to alternately optimize one of the variables given the other two. To be specific, we first optimized the beamwidths φ, fixing p and τ e . Next, we applied the Dinkelbach method and successive convex approximation to obtain the sub-optimal solution of the transmit power p, fixing the beamwidth φ and time ratio τ e . Finally, we optimized τ e by solving the standard linear fractional programming, given the beamwidth φ and transmit power p.

Coalition-Game-Based Beamwidth Selection Algorithm
Given transmit power p and EH time ratio τ e , the problem (15) can be reduced to the following problem: s.t. C1, C2, C3, C5 and C7 However, due to the huge number of feasible beamwidths for DUs, Problem (16) is still hard to solve. Although Exhaustive Search (ES) can solve the problem optimally, its time complexity is unbearable; thus, its practicability is limited. When the number of DU grows, the time complexity of ES grows exponentially, which incurs unacceptable overhead. Hence, we resorted to a coalition game to obtain the sub-optimal solution of Problem (16), which is time efficient and can achieve performance close to ES.
To solve the original problem, we formulated a coalition game G = {P, X , U}, where P = N denotes the player set formed by DUs, X is the strategy space, which contains all the strategies that players can adopt, and U is the transferable utility. In the proposed game, multiple players (i.e., D2D pairs) forming a coalition can be regarded as choosing the corresponding beamwidth; thus, a coalition structure is established, which also corresponds to a beamwidth strategy. The number of coalitions is the same as the number of feasible beamwidths, i.e., |BW |, and each coalition consists of a group of players that select the same beamwidth. The ultimate goal of the coalition game is to find the Nash-stable coalition structure with the optimal/sub-optimal system utility.
Let F = {F 1 , F 2 , . . . , F |BW | } denote the coalition structure that all the players form and F c ∈ F be a coalition that a group of players forms, which indicates these DUs select the same beamwidth. The coalitions are non-overlapping, and the entire coalition structure should contain all the DUs, which means that F i ∩ F j = ∅, for any i = j, and ∪ i∈{1,2,...,|BW |} Aiming at improving the EE while guaranteeing the performance of the CU, the utility under structure F is calculated as the overall EE of the D2D links: It can be seen from (17) that our ultimate goal was to maximize the overall EE of the D2D links rather than focusing on the individual utility. In addition, any beamwidth structure that fails to satisfy the constraints in (16) will be given a utility of negative infinity as a penalty.
Definition 1 (preference order ). For player D n ∈ N , the preference order n is defined as a complete, reflexive, and transitive binary relation over all the coalitions that D n can possibly join. For player D n ∈ N , given two coalition structures F and F and two coalitions F c ∈ F and F c ∈ F , F c n F c indicates that D n prefers being a member of F c to form structure F than being a member of another coalition F c to form structure F , i.e., D n prefers selecting beamwidth F c rather than F c to improve the network EE. The switch rule that determines the preference order for the players is defined as: This switch rule (18) demonstrates that: to improve the network EE, player D n prefers being a member of coalition F c than being a member of F c .
Definition 2 (switch operation ). Given a coalition structure F = {F 1 , F 2 , . . . , F c . . . , F c . . . , F |BW | }, if player D n chooses to leave its current coalition F c and switch to another coalition F c , the coalition structure will be updated: At the initialization stage of the alternating optimization, the coalition structure is initialized satisfying the constraints in Problem (16). In the following iterations, the coalition structure is first set according to the optimized beamwidth solution of the last iteration. Then, the players are randomly chosen to perform the switch operation. If the switch rule (18) is strictly satisfied, the selected player will leave F c and join F c to form a new structure. After continuous switch operations performed by players, the coalition structure will be updated and finally converge to the Nash-stable structure, which implies that there is no player who has the incentive to change its beamwidth (i.e., coalition) and form a new coalition structure; thereby, the system utility can no longer be improved. The detailed illustration of Coalition-Game (CG)-based beamwidth selection algorithm is shown in Algorithm 1.
The proposed coalition game can converge after a finite number of iterations. The reason lies in that the number of feasible beamwidths that players can select and the number of players (D2D pairs) are both finite. Therefore, the cardinality of the beamwidth strategy space is also finite. In other words, the number of structures that players can form is limited. Since each switch operation performed by players will possibly visit a new coalition structure, we can reach the conclusion that the switch process will terminate and the final coalition structure will be ultimately reached. Next, we prove the stability of our proposed CG-based beamwidth selection algorithm. Algorithm 1 Coalition-game-based beamwidth selection algorithm for D2D pairs during each iteration 1: Initialize (φ, p, τ e ) with the output of the last iteration, and initialize F ini = {F 1 , F 2 , . . .} as indicated by φ. 2: Set current coalition structure F cur = F ini . Require: 3: Randomly select a player D n ∈ N , and denote its current coalition as F c . 4: D n randomly chooses another coalition F c ∈ F cur , and denote the temporary structure after D n leaves F c and switches to F c as: Update the coalition structure as: F cur = F temp . 10: else 11: D n remains in its current coalition F c . Ensure: The final Nash-stable coalition structure is reached. 12: Output The optimal beamwidths φ and (p, τ e ) to the next block. Theorem 1. The final coalition structure F f in in the proposed algorithm is a Nash-stable coalition structure.
Proof. The final coalition structure is a Nash-stable structure if the system utility can no longer be improved by any player changing its beamwidth, i.e., F To prove the Nash-stability of F f in , we resorted to a contradiction: if the final formed structure F f in is not stable, which is equivalent to that there is at least a player D n ∈ D, who is in coalition F f in c , who will switch to another coalition F f in c , due to (F f in c ∪ D n ) n F f in c . In this case, a new coalition structure is formed, which is contrary to our assumption that F f in is the final coalition structure. So far, the proof that F f in is Nash-stable is complete.

Power Optimization for D2D Links
Given beamwidth φ and EH time ratio τ e , (15) can be written as the problem with regard to power optimization:

s.t. C1, C2, C4 and C7
It can be seen that Problem (19) is a fractional programming, which is non-convex and hard to solve. Thus, we applied the Dinkelbach method [37] to transform Problem (19) into a non-fractional problem. Given η, let R D = ∑ n∈N R n and P con,D = ∑ n∈N P con n , then the original problem (19) can be equivalently transformed into the following problem: s.t. C1, C2, C4 and C7 Theorem 2 ([37]). The optimal η * can be obtained if and only if R D * − η * P con,D * = 0 where: R D * and P con,D * are the optimal value of R D and P con,D , respectively, when η is maximized.
The proof of Theorem 2 can be referred to [37], and we omit it here. It can be seen from Theorem 2 that we can obtain the optimal solution of Problem (19) by solving the equivalent Problem (20).
However, Problem (20) is still not convex due to the existence of R D = ∑ n∈N R D n . Next, to eliminate the non-convexity of Problem (20), we first decomposed R D n in the objective function into the subtractive form: R D n = c n log 2 (p n g n,n + ∑ j∈N ,j =n where c n = 1 − τ ba n − τ e n denotes the time ratio for data transmission. It can be seen that R D n and R D n are both concave over p; however, the subtraction of two concave functions is not convexity preserving [38]. Hence, given a local point p l = (p l 1 , p l 2 , · · · , p l N ), we can exploit the property that a concave function is upper-bounded by its first-order Taylor Expansion. For each n ∈ N : On this basis, we can approximate R D n as follows: The non-convexity in R D n was eliminated by approximating it as R D n,lb , and R D in the objective function of (20) can also be approximated by R D lb = ∑ n∈N R D n,lb , which provides a lower bound of R D .
After approximating each R D n in R D , C1 can be transformed into the following concave form: Further, the constraint C2 in (15) can be rewritten as the following form: Evidently, C2 is concave over p, and the non-convexity of (20) has been eliminated so far. To obtain the sub-optimal solution, we can iteratively optimize R D lb by solving the following approximated problem: s.t. C1, C2, C4 and C7 Problem (27) is a convex optimization of p and can be readily solved by the interior point method [39]. The detailed procedure for optimizing p is summarized in Algorithm 2.
Algorithm 2 SCA-based power optimization for D2D links during each iteration 1: Initialize (φ, p, τ e ) with the output of Algorithm 1. Set the current iteration l = 0 and the initial local point p l = p. 2: while The improvement of the objective function in (27) is higher than a predefined threshold 1 do.

3:
Solve Problem (27) for a given η using the interior point method and obtain the optimal solution p l * .
6: Output the optimal power strategy p l * and (φ, τ e ) to the next block.

Time Scheduling Optimization
Given φ and p, the optimization of τ e is formulated as: Next, we can equivalently solve the following linear programming using the method in [40]: (EH min /γT(p n h n,n + I n C ))t C8 : Cή + Dt = 1 C9 :ή ≥ 0, t ≥ 0 whereή = τ e Cτ e +D and t = 1 Cτ e +D . Problem (30) is obviously convex over (ή, t) and can be solved by the interior point method [39], and finally, the optimal value for τ e is calculated as: τ e =ή/t.

Alternating Optimization and Convergence Analysis
After the solution to each subproblem has been obtained, we give an overall algorithm for solving Problem (15). The original variables φ, p, τ e can be divided into three blocks and alternately optimized. During each iteration, we iteratively optimized one of the variables by keeping the other two fixed, and the optimized variable would be delivered as the input of the next block. The detailed process for the alternating optimization is illustrated in Algorithm 3. (30) 1: Initialize iteration index l = 0 and the (φ l , p l , τ l e ) satisfying the constraints in (15). 2: while The improvement of η is higher than a predefined threshold 2 do.
Next, we give the convergence analysis of the proposed algorithm. Let η l = η(φ l , p l , τ l e ) denote the objective value after the lth iteration. First, in the (l + 1)th iteration, there exist η(φ l , p l , τ l e ) ≤ η(φ l+1 , p l , τ l e ) after performing beamwidth optimization. The reason lies in that φ l is the input of Algorithm 1 fixing (p l , τ l e ), and the switch rule of Algorithm 1 guarantees the non-decreasing property of the objective value. Second, we have η(φ l+1 , p l , τ l e ) ≤ η(φ l+1 , p l+1 , τ l e ) due to the p l being the input local point, and the optimization of Problem (27) ensures that p l+1 can achieve the non-decreasing objective value. Third, η(φ l+1 , p l+1 , τ l e ) ≤ η(φ l+1 , p l+1 , τ l+1 e ) holds since Problem (30) is solved optimally. Finally, we can conclude that η(φ l , p l , τ l e ) ≤ η(φ l+1 , p l+1 , τ l+1 e ), which means the objective value of problem (15) is non-decreasing after each iteration. Since the three variables of (15) are bounded by the constraints and the upper bound of the Problem (15) exists, we can conclude that the convergence of the proposed algorithm is guaranteed.

Complexity Analysis
First, we assumed the proposed coalition game in the beam alignment phase needs N 1 iterations to converge. During each iteration, 4N times of calculation are needed for DUs to calculate the throughput, harvested energy, power consumption, and beam alignment ratio, respectively, and M times of the calculation are needed for CUs to verify the minimum rate requirement. Hence, the complexity of beam alignment phase is O (N 1 (4N + M)). In the power optimization phase, the complexity to solve Problem (27) using the interior point method [38] is O(N 3 ). Hence, the complexity for power optimization is O (N 2 N 3 ), where N 2 is the number of iterations for Algorithm 2. Similarly, in the EH time ratio optimization phase, the complexity is O(1) using the interior point method, due to (ή, t) being the only two variable to be optimized. Hence, the total complexity of our algorithm is mainly related to the beam alignment phase and the power optimization phase, which can be calculated as: O(N OA (N 1 (4N + M) + N 2 N 3 )), where N OA is the number of iteration for the overall algorithm. It should be pointed out that the practical running time of the algorithm depends on the settings of specific network parameters and the convergence threshold; hence, we further evaluate the time complexity in the next section.

Numerical Results
In this section, we present our numerical results to verify the convergence and effectiveness of the proposed algorithm. We considered a heterogeneous mmWave air-to-ground network scenario, where 10 UBSs were deployed above a circular area with a radius of 200 m [21]. The height of the UAVs was fixed at 100 m [41]. Multiple CUs and EH-enabled D2D pairs were randomly distributed within the target area. It is worthwhile to note that we assumed the power for each BS was fixed to 46 dBm, and we also set the maximum distance for each D2D link D T n − D R n to be 50 m. The minimum beamwidth for DU-φ min was set to 10 • . The convergence criteria 1 and 2 were set to 10 −3 . The other simulation parameters are shown in Table 1.
To verify the effectiveness of our algorithm, we compared the performance of the proposed algorithm with the following algorithms: • ES: exhaustive search, which traverses each possible beamwidth combination of DUs in the beamwidth selection phase. ES can obtain the optimal solution to beamwidth selection, but incurs an unbearable time complexity; • PSO [31]: the particle swarm optimization algorithm, which forms multiple feasible beamwidth solutions to continuously evolve along the direction of increasing the utility until reaching the local optimal/global optimal solution. The parameter in the execution of PSO was the same as in [31]; • GA [42]: the genetic algorithm, which is also a population-based method. The GA evolves to the optimal/sub-optimal solution using the operations of mutation and crossover. In the simulation, the parameters including the population size and the probability for mutation and crossover were set the same as in [42]; • FBFP: the fixed beamwidth and fixed power strategy, in which the beamwidths and transmit power of all the D2D pairs are both fixed to a constant value and not varied with time; • Reference [10]: Joint Optimization of the EH Time Ratio and Beamwidth (JOETRB). However, the interference between users was ignored, and the power optimization was not considered.
It is worth noting that ES, PSO, and the GA were only used in the beamwidth selection phase in our benchmark schemes. In the phase for optimizing p and τ e of the two algorithms, our proposed power optimization and time scheduling algorithm were adopted. The convergence analysis of ES, PSO, and the GA combined with the proposed power optimization and time scheduling was similar to our proposed algorithm; hence, we omit it here. It should also be noted that all the following tables and curves are the average result of 100 independent experiments. Circuitry power [21], P cir 20 dBm Maximum transmit power for DU [21], P max D

dBm
Energy harvesting efficiency [23], γ, 0.5 Minimum rate for CU [21], R C min , 2 bit/s/Hz Minimum rate for DU [21], R D min , 1 bit/s/Hz In Figure 3, fixing M = 10 and the threshold of convergence 2 to 10 −3 , we show the convergence behavior of the proposed algorithm. Under N = 10, the proposed algorithm can converged within seven iterations. When the number of DU was increased to two, the number of iteration to converge was slightly increased to nine. This indicates that our proposed algorithm can converge within a small number of iterations and the fast convergence speed of the proposed algorithm is verified. In Table 2, we list the time complexity of the posed CG, PSO, GA and ES, where S PSO and S GA denote the population size for PSO and Ga, N PSO and N GA denote the maximum number of iterations to converge for PSO and GA. It can be seen that the proposed CG greatly reduced the complexity compared with ES, which had exponential complexity. Generally speaking, the execution of PSO and the GA required a large population and number iterations to converge. Therefore, the complexity of the proposed CG also outperformed PSO and the GA.  In Figure 4, we study the MATLAB software running time of the beam alignment phase of the algorithms under different θ and N, to compare the time complexity of the proposed algorithm with ES and PSO. Obviously, the total time for the convergence of CG under different settings was much shorter than PSO and ES. For all the algorithms, when θ increased from 45 • to 60 • , the required time to converge increased due to the expansion of the search range for the beamwidth. The figure also shows that when the number of DUs increased from 10 to 20, the time to converge for PSO under θ = 60 • increased from 26.44 s to 82.54 s, the time to converge for the GA increased from 29.36 s to 87.94 s, while for CG, under θ = 60 • , the convergence time only increased from 8.29 s to 20.63 s. Among the three algorithms, ES showed the worst-case scenario. As every candidate combination of beamwidths for the DUs needed to be evaluated, the total running time for ES was unacceptably long. Moreover, our proposed algorithm could obtain better performance than PSO. As a great number of candidate solutions needed to be evaluated and updated, the total time for PSO and the GA to converge was far over CG. Hence, the superiority of the CG-based beamwidth selection algorithm with regard to time complexity was verified.
The proposed algorithm can obtain the beamwidths for the DUs in a short time, thus effectively increasing the transmission time. Next, we evaluated the EE performance comparing the proposed algorithm with the benchmark schemes. To find the parameters that may affect the network EE, we compared the network EE versus the number of DUs N, the number of CUs M, the maximum power for DU P max D , the minimum harvested power EH min , and the minimum rate for CU R C min , respectively.
In Figure 5, we fix M = 10 and plot the network EE versus the number of D2D links, comparing the proposed algorithm with the benchmark schemes under different N. It can be seen that the proposed algorithm can achieve better performance than FBFP and a performance near PSO and ES. The beamwidth selection and power control in our algorithm were designed to improve the network EE, and the EH time ratio optimization could further increase the EE. Another observation is that the network EE decreased with N. The reason lied in that although the narrow beams were adopted for the DUs, more D2D transmitters would potentially increase the interference between the DUs, thus degrading the transmission rate of the DUs, which degrades the network EE. A similar observation can be found by comparing the EE of the JOETRB [10] and FBFP algorithms under different N. However, it can be seen that the EE for the JOETRB [10] and FBFP sharply decreased when N exceeded 15, while our proposed algorithm still maintained a good level thanks to the dynamic adjustment of the beamwidths, power levels, and EH time ratio for the DUs. In Figure 6, we plot the network EE versus different numbers of CUs to compare the performance of the proposed algorithm with ES and FBFP under different M. It can be seen from the figure that the energy efficiency of the proposed algorithm and the benchmark schemes decreased with the increase of M. The reason was that the growing of M led to the increased interference from the CU, i.e., I n C . At the same time, in order to meet the minimum rate constraint for the CUs, the DUs had to strictly control the transmit power, which degraded the transmission rate for the DUs. Although the growth in the number of CUs provided more energy for the DUs to harvest, it also led to the decrease of the user rate, which significantly impacted the EE. Therefore, in a heterogeneous network where CUs and DUs coexist and share the spectrum resources, the EE performance of the DUs will be degraded when the number of CUs increases. It can be seen from the figure that the performance of the proposed algorithm still approached PSO, the GA, and ES and outperformed the JOETRB [10] and FBFP strategies, which further verified the effectiveness of the proposed algorithm. In Figure 7, we change the maximum transmission power of the DU and plot the curve of EE. With the increasing of P max D , some users in the network would increase their power to improve the user rate. Therefore, the EE first increased with P max D . However, due to the existence of a minimum rate for the CU R C min , the EE did not always increase, but was upper bounded by a certain value. When P max D increased from 19 dBm to 23 dBm, the network EE increased. However, the EE began to fall when P max D was further increased to 25 dBm. In addition, we also set the fixed power under the JOETRB [10] and FBFP to P max D to find the relationship between the EE and P max D under FBFP, and a similar finding could be obtained. Hence, we can reach the conclusion that the network EE was closely related to the transmit power of the DUs and first increased, then decreased with P max D . In Figure 8, we evaluate the effect of EH min on the network EE. As EH min increased, the network EE gradually increased, which can be seen from the curves of the proposed algorithm, PSO, and ES. However, with the further growing of EH min , the network EE no longer had unlimited growth. The reason was that the increased EH min required the DUs to harvest more energy, so the DUs were more inclined to increase the power and increase the EH time ratio τ e . However, the EE will not monotonically increase with the EH time ratio, because with more energy harvested, less energy will be consumed and the user rate will decrease at the same time due to the reduction of the effective transmission time. Moreover, due to the existence of the R C min , the power of the DUs was also limited. It can be seen that the proposed algorithm could achieve a performance close to PSO, the GA, and ES.
In Figure 9, we plot the network EE under different R C min . As R C min increased, the EE of all three algorithms decreased. The reason was that when R C min increased, the DUs had to decrease their transmission power to reduce the interference to the CUs, so as to meet the rate requirement of the CUs. By comparing the performance of the three algorithms, a similar conclusion can be drawn that the proposed algorithm can achieve a close performance to PSO, the GA, and ES.

Conclusions
EH-enabled D2D communication has shown great potential to be applied in the 5G network and beyond. In this paper, we investigated the energy efficiency of D2D users in a mmWave A2G network consisting of CUs and DUs served by UAV-BSs. We aimed to maximize the network energy efficiency, while guaranteeing the rate requirements of the CUs and DUs. The problem was formulated as the joint optimization of the beamwidth, transmit power, and EH time ratio of the D2D users. Alternating optimization was adopted to iteratively optimize one of the variables, fixing the other two. Firstly, a non-cooperative coalition game model was established to adjust the beamwidths of the DUs. Next, to tackle the non-convexity in the subproblem of power control, we exploited the methods of Dinkelbach and successive convex approximation. Finally, the EH time ratio optimization was performed by using linear fractional programming. The simulation results showed that our proposed algorithm could achieve better performance compared to the scheme with a fixed beamwidth and power and achieve a performance close to PSO, the GA and ES, but greatly reduce the time complexity. Moreover, the convergence of our proposed algorithm was also validated.
Author Contributions: Conceptualization, J.Z. and G.C.; data curation, investigation, methodology, resources, and software, J.Z.; formal analysis and supervision, G.C. and W.G.; visualization and writing, original draft, J.Z.; validation and writing, review and editing, G.C. and W.G. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data, models, or code that support the findings of this study are available from the authors upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: