Joint Relay Selection and Resource Allocation for Energy-Efﬁcient D2D Cooperative Communications Using Matching Theory

: Device-to-device (D2D) cooperative relay can improve network coverage and throughput by assisting users with inferior channel conditions to implement multi-hop transmissions. Due to the limited battery capacity of handheld equipment, energy efﬁciency is an important issue to be optimized. Considering the two-hop D2D relay communication scenario, this paper focuses on how to maximize the energy efﬁciency while guaranteeing the quality of service (QoS) requirements of both cellular and D2D links by jointly optimizing relay selection, spectrum allocation and power control. Since the four-dimensional matching involved in the joint optimization problem is NP-hard, a pricing-based two-stage matching algorithm is proposed to reduce dimensionality and provide a tractable solution. In the ﬁrst stage, the spectrum resources reused by relay-to-receiver links are determined by a two-dimensional matching. Then, a three-dimensional matching is conducted to match users, relays and the spectrum resources reused by transmitter-to-relay links. In the process of preference establishment of the second stage, the optimal transmit power is solved to guarantee that the D2D link has the maximized energy efﬁciency. Simulation results show that the proposed algorithm not only has a good performance on energy efﬁciency, but also enhances the average number of served users compared to the case without any relay.


Background and Motivation
As the demand for wireless data services grows dramatically, revolutionary technologies that focus on improving network coverage and throughput within the limited system resources have been constantly emerging.Device-to-device (D2D) communication that enables user equipment (UEs) to communicate with each other through direct links without the base station (BS) relaying [1] is considered to be one of the key technologies in the future 5G wireless system [2].In the underlay mode, D2D links share the same spectrum resources with traditional cellular UEs (CUEs), providing significant increases in spectrum efficiency and system throughput [1,3].There are two basic transmission types in D2D communication: single-hop transmission, in which data are transmitted from the transmitter to the receiver directly as the channel condition between them meets the quality of service (QoS) requirement; and multi-hop transmission, where some devices act as relay stations, in which data are transmitted from the transmitter to the receiver hop by hop, and the route must be opportunistically optimized to guarantee the performance [4].In general, because of the high channel quality of short-range D2D links, the D2D UEs can achieve a high single-hop transmission rate [5,6].However, there exists a possibility that the channel condition between two potential D2D UEs cannot support a direct single-hop link.For this reason, D2D cooperative relay has naturally become an important application that can assist two UEs with an inferior direct link to communicate [7][8][9].
Though applying underlying mode D2D communication in cellular networks can bring some advancements such as spectrum efficiency and the coverage of communication, the UEs can no longer neglect the co-channel interference caused by spectrum resource reusing [10,11].Since the underlay mode D2D communication brings new problems and challenges to the cellular networks, one of the crucial issues is the intelligent resource allocation strategy for both D2D UEs and CUEs to guarantee their QoS [12][13][14].Due to the low cost and the few modifications to current infrastructures, the D2D relay may work more efficiently than fixed relays [15].Nonetheless, the co-channel interference caused by spectrum resource sharing exists in D2D cooperative relay transmissions, which is identical to the single-hop transmission.As the relay communication requires two-hop spectrum resources, the resource management has become more complicated than in the single-hop case.Moreover, relay selection needs to be optimized jointly with the spectrum allocation and power control for both the first hop and the second hop links, which can realize the system performance maximization.There have already been some works that focus on relay selection [16,17] and resource allocation [18][19][20][21][22] to optimize the performance of D2D relay communications.In [20], the joint resource allocation and relay selection problem was formulated as a binary integer nonlinear programming problem, and the authors in [21] also considered the joint optimization and proposed a two-stage method to maximize system throughput.However, few works above have covered the power control problem, which is closely related to energy efficiency (EE).As the booming wireless services consume a large amount of energy from UE batteries and UEs are handheld equipment with limited battery capacity, one key issue is to reduce the energy consumption for both D2D transmitters (TXs) and D2D relay stations (RSs).An iterative Hungarian method was proposed to solve the relay and spectrum allocation problem with optimal transmit power at each node in [23].Nevertheless, EE has not been considered in this work, and the cellular spectrum resources are allowed to be reused by only one of the two hops in the D2D relay link.
Therefore, we put emphasis on energy-efficient resource management in D2D cooperative relay communications.The problem is formulated as the joint relay selection, spectrum allocation and power control optimization for two-hop D2D relay transmission underlay cellular networks.Both the first hop from D2D TX to D2D RS and the second hop from D2D RS to the D2D receiver (RX) are allowed to reuse the spectrum of CUEs to improve spectrum efficiency.Our objective is to find an approach to determine the optimal relay, spectrum and transmit power of TXs and RSs, which maximizes the EE of D2D links, while guaranteeing the QoS of all of the links.
The problem is difficult to solve, as it not only includes continuous and discrete variables, but also involves four-dimensional resource allocation indicators [24].It is a challenge to solve the problem with four-dimensional variables directly, since it is infeasible to achieve a solution in polynomial time for this NP-hard problem [25].Besides, the transmit power of D2D TXs and RSs must be controlled to guarantee the QoS of UEs and achieve the maximum EE simultaneously, which also increases the complexity of the problem.Compared with our previously proposed works [11,12,26,27], this work focuses on the multi-hop D2D links, which was not investigated before, and the optimization problem in this work is a four-dimensional matching among D2D pairs, relay selections and resource blocks (RBs) reused in the first and second hop of two-hop D2D links, which is much more complicated than the previous works.Since the problem involved in this work is a joint optimization problem with multidimensional variables, it cannot be solved by using the previously proposed algorithm directly.To give a tractable solution of this four-dimensional matching problem, based on the previously proposed three-dimensional matching algorithm, we proposed an iterative four-dimensional matching algorithm.

Contributions
Noting that D2D TX-RX pairs (TRs), RSs, the first hop spectrum and the second hop spectrum should be matched effectively, thus, in our paper, we solve the mixed integer nonlinear programming problem using matching theory [28], which considers two side preferences in a traditional form and provides a low-complexity solution.However, the proposed joint optimization involves four dimension variables that cannot be solved by a traditional matching algorithm.Therefore, we first decompose the problem into a two-dimensional matching and a three-dimensional matching with a low dimension reduction error and then solve the two matching problems based on a pricing mechanism.The power control process is coupled with the three-dimensional matching, and the optimal transmit power maximizes the total EE of the two-hop link.The auction algorithm has been used in some previous works to solve the resource allocation problems under the D2D scenario [29,30].In [29], the authors proposed a two-phase auction-based resource allocation algorithm for underlying D2D communication.Additionally, in [30], the authors proposed a two-step distributed algorithm of channel allocation and mode selection for mobile stations with the objective of capacity maximization.Compared to these works, we consider a much more complicated problem with multidimensional variables that cannot be solved by the above-mentioned auction algorithm, which includes only two dimension variables, i.e., bidders and bidding items.The similarity is just that we use the pricing strategy when the conflicts appear in the matching algorithm, and we employ the idea of price rising in an auction.
Remark 1.The Bertsekas auction algorithm and Munkres' Hungarian algorithms are not considered in this work.However, the joint optimization involved in this work is a high complexity problem with multidimensional variables, which cannot be solved directly by a traditional matching algorithm or auction algorithm.Therefore, we decompose the original problem into a two-dimensional matching and a three-dimensional matching with a low dimension reduction error and then solve the two matching problems based on a pricing mechanism.Although the employed pricing-based matching algorithm is similar to the auction algorithm, to solve the joint optimization problem, both of them should be modified in the same way, which is the focus of this work.The innovation of this work is not only the proposed pricing-based matching algorithm, but also the idea of problem simplification.
The main contributions of this paper are summarized as follows:

•
We design a two-hop D2D relay strategy for the transmitter-receiver pair with an inferior direct link to improve system performance.Specifically, we propose a joint relay selection, power optimization and spectrum allocation problem, which involves the matching between D2D TRs and relay stations and the matching between two-hop D2D links and the respective spectrum resources and can be formulated as four-dimensional matching to maximize the EE of D2D UEs.

•
The joint relay selection and resource allocation problem is intractable and belongs to the class of NP-hard problems; thus, we simplify the problem into one two-dimensional matching between D2D TRs and the second hop spectrum resources and one three-dimensional matching among D2D TRs, relays and the first hop spectrum resources.By solving these two matching problems, we can achieve a sub-optimal solution, which can approach the performance of the exhaustive optimal algorithm with a much lower complexity.The two-dimensional matching problem is solved by using a pricing strategy directly, which decides the winner when more than one D2D TRs propose the same spectrum resource for the second hop.For the three-dimensional matching, we transform it into a two-sided matching, in which the preference lists of D2D TRs from one side over the combinations of RSs and spectrums for the first hop from the other side are established based on the achievable maximum EE.Therefore, the three-dimensional matching can use the pricing strategy to achieve a sub-optimal solution.In the three-dimensional matching process, we also consider the power optimization for both D2D TXs and RSs to maximize the EE of D2D links, as well as avoid excessive interference to CUEs.

•
The properties of the proposed three-dimensional matching and two-dimensional matching algorithm, such as optimality, stability, convergence and complexity, are analyzed theoretically.
In the simulation, we compare the proposed matching algorithm with the exhaustive optimal and random matching algorithm in terms of the achieved EE for D2D UEs under different system configurations.Numerical results show that our proposed scheme can achieve a considerable performance gain, and the average percentage of served users is substantially improved compared with that without any relay.
The rest of this paper is organized as follows.In Section 2, we provide a brief review of the related works.In Section 3, we describe the system model.The formulation of the four-dimensional matching problem is introduced in Section 4. In Section 5, we present the proposed pricing-based matching algorithm, and the simulation results are shown in Section 7 with related discussions.Finally, we conclude the paper in Section 8.

Related Works
The purpose of this paper is to solve the joint relay selection, channel assignment and power optimization problem by using the pricing-based matching algorithm.In [31][32][33][34], the proper relay stations were selected to improve the performance of the multi-hop D2D network while simultaneously considering the co-channel interference.In [35], the authors presented a theoretical framework for multi-hop D2D communications, which enables modeling of interference and performance analysis.In [36], the authors proposed a novel routing protocol for underwater sensor networks (UWSNs), which was named the location-aware routing protocol (LARP) and could improve packet delivery ratio and reduce normalized routing overhead.In [37], the authors modeled the problems of relay selection and channel allocation as a classical weighted bipartite graph matching problem and solved it by using the Hungarian algorithm.In [38], a novel quantum particle swarm optimization-based relay selection scheme was proposed to maximize the system throughput of the cooperative relay networks and reduce the computational complexity compared with the exhaustive search.The above works mainly focus on how to maximize the throughput of the system through relay selection with the presence of interference.In particularly, the channel allocation problem was also considered in [37].
In [39][40][41][42][43], the resource allocation problem was solved simultaneously with the relay selection problem.In [20], the joint resource allocation and relay selection problem was formulated as a binary integer non-linear programming problem, and the optimal solution of the random linear network coding aided D2D communications was obtained by introducing the concept of D2D cluster.In [44], an algorithm was proposed to maximize the throughput of a D2D pair by finding the optimum multi-hop routes and the corresponding transmit powers.To achieve simultaneous downlink transmission for multiple users, in [45], the authors proposed a novel MAC scheme based on an opportunistic channel-aware scheduling policy.In [46], the authors employed the Lyapunov optimization theory to decompose the dynamic communication mode selection problem and provide a lower bound of the system throughput and proposed an algorithm to achieve approximate maximization.In [47], the joint relay selection and resource allocation problem were solved in a multi-user, multi-carrier and multi-cellular network, thus increasing the throughput of users in the cell-edge.As the co-channel interference exists in the underlying mode of the D2D communication network, the resource allocation problem must be solved under the interference constrains to guarantee the quality of communications [48,49].In [12], a three-dimensional matching algorithm was applied to maximize the weighted channel rate of the system, which explored the information from both the social layer and the physical layer.However, the EE performance was neglected.
To extend the lifetime of the rapidly-drained battery of user equipment, when matching D2D users with relays, it is necessary to consider the EE [26,27,50].In [51], the selection of the energy harvesting relay has been studied for D2D communications in a public safety environment.In [23], the iterative Hungarian method was devised to solve the associated relay selection and resource assignment problems, and it showed that the optimal power allocation problem can be solved in a closed-form.In [52], the authors have proven that threshold-based power control policies function efficiently, and this would benefit the limited battery of users' handheld equipment.
However, most of the above works have not considered EE of D2D links during the optimization process.Although in [26,51], the EE issue has been considered, the multi-hop D2D communication was neglected, and the application scenario is different from our study.Considering the energy efficiency problem, the purpose of the original four-dimensional matching among D2D pairs, relay selections and resource blocks (RBs) reused in the first and second hop of two-hop D2D links is to maximize the EE of D2D links.
In [11], the authors formulated the joint resource allocation problem as a one-to-one matching problem under two-sided preferences and employed the Gale-Shapley (GS) algorithm to match D2D pairs with cellular UEs.In [12], a three-dimensional iterative matching algorithm was proposed to maximize the sum rate of D2D pairs weighted by the intensity of social relationships.The work in [26] focused on the energy-efficient context-aware resource allocation problem.The authors transformed the NP-hard optimization problem into a one-to-one matching problem under two-sided preferences and proposed an energy-efficient matching algorithm.In [27], a two-stage EE optimization problem, which consists of a joint spectrum and power allocation problem in the first stage and a context-aware D2D peer selection problem in the second stage, was investigated.The authors proposed an iterative power allocation algorithm to optimize EE under a specific match and proposed an iterative matching algorithm to solve the combinatorial problem.The difference between previous works and this work is summarized as follows.First of all, we focused on the multi-hop D2D scenario, which was not investigated by [11,12,26,27].Second, the optimization problem involves a four-dimensional matching among D2D pairs, relay selections and resource blocks (RBs) reused in the first and second hop of two-hop D2D links, which is much more complicated than the problems studied in [11,12,26,27].Last, but not least, the solutions proposed in previous works cannot be directly applied to this work.To provide a tractable solution, we firstly decompose the origin problem into a two-dimensional matching among D2D TRs and RBs for the second hop and a three-dimensional matching problems among D2D TRs, RSs and RBs for the first hop.Then, we employ the proposed pricing-based iterative two-dimensional matching and pricing-based iterative three-dimensional matching algorithm to provide a tractable solution.In summary, the system model, problem formulation and proposed solution of this work are completely different from our previous works in [11,12,26,27].

System Model
We consider a D2D relay-assisted cellular network, which consists of a BS, CUEs, D2D TX-RX pairs (TRs) and idle users acting as D2D RSs.Each CUE is allocated with an orthogonal uplink resource block (RB), and each RB can be reused by at most one D2D link.In this paper, we only consider the static scenario, and the moving speeds of users are not studied.Actually, the fast mobility of users has a significant impact on the network topology, mode selection and channel gains.This is a new problem and is left for future study.Nevertheless, our work is still valid for users with walking speeds.The reason is that since the optimization is performed in a slot by slot fashion, the position variations caused by the low mobility of users during channel coherence time are negligible.For instance, assuming that the moving speeds of users are an average of 1.5 m/s and the time slot for the proposed algorithm to optimize is 1 ms, therefore, after one time slot, the moving distance of the users is 0.0015 m, and this is ignorable when compared to the distance between the transmitters and the receivers.
We focus on the D2D TRs, which suffer from bad channel qualities and cannot achieve direct single-hop links.To improve network coverage and throughput, D2D communication is operated in a wo-hop fashion via D2D RSs, which is shown in Figure 1.That is, a D2D TX first sends the signal to a D2D RS, and then, the RS forwards the signal to the D2D RX.In each hop, the RB of a CUE is reused by a D2D link.We also assume that each D2D RS can serve at most one D2D TR, and each D2D TR can utilize only one RS.
RSs and CUEs (i.e., RBs) are denoted by the sets R = {1, • • • , r, • • • , R} and C, respectively.We assume that D2D TX m t , D2D RS r and D2D RX m r form a two-hop D2D link.The RBs reused by the first hop and second hop D2D links are denoted as c f and c s , respectively, which also represent the c f -th and c s -th CUEs, ∀c f , c s ∈ C. All the meaning of used variables are given in Table 1.For the channel model, we use the Rayleigh fading to model the small-scale fading and employ the free space propagation path-loss to model the large-scale fading [53,54].The free space propagation path-loss model employed in this work is an ideal model, but it does not impact the solution structure of the proposed algorithm.
In other words, the proposed algorithm is adaptive to other more complicated path-loss models.The signal to interference plus noise ratio (SINR) received by RS r can be expressed as: Here, P m t and P c f are the transmit power of D2D TX m t and CUE c f , respectively.h m t r and h c f r denote the channel responses of the first hop D2D link and the interference link from CUE c f , respectively.d m t r is the transmission distance between D2D TX m t and RS r, while d c f r is the distance between CUE c f and RS r. α is the path-loss exponent corresponding to the large-scale fading of the transmission channel.h s,m t r and h s,c f r are the small-scale fading (Rayleigh) channel coefficients, which obey the complex Gaussian distribution CN (0, 1).N 0 represents the one-sided power spectral density of the additive white Gaussian noise (AWGN).
The SINR received by D2D RX m r is given by: where P r and P c s are the transmit power of D2D RS r and CUE c s , respectively.h rm r and h c s m r represent the channel responses of the second hop D2D link and the interference link from CUE c s , respectively.d rm r is the transmission distance between RS r and D2D RX m r , and d c s m r is the distance between CUE c s and D2D RX m r .h s,rm r and h s,c s m r are the corresponding small-scale fading channel coefficients.
Lemma 1. Accordingly, the effective SINR of the two-hop D2D link from TX m t to RX m r through RS r by reusing RBs of CUE c f and CUE c s can be calculated by [23]: The proof of Lemma 1 can be found in Appendix A.
For the cellular transmissions, the SINR received by the BS corresponding to the CUE whose RB is reused by the first hop D2D link, i.e., c f , is given by: and the SINR received by the BS corresponding to the CUE whose RB is reused by the second hop D2D link, i.e., c s , is given by: Here, h c f B , h m t B , h c s B and h rB are the channel responses of the cellular link from CUE c f , the interference link from TX m t , the cellular link from CUE c s and the interference link from RS r, respectively.d c f B , d m t B , d c s B and d rB are the corresponding transmission distances.h s,c f B , h s,m t B , h s,c s B and h s,rB are the corresponding small-scale fading channel coefficients.

Problem Formulation
The purpose of this work is to improve network coverage by D2D cooperative relay communications with the aim of maximizing the total EE of the two-hop D2D TR pairs, while simultaneously satisfying the QoS requirements of both D2D and cellular links.Relay selection, spectrum allocation and power control are optimization variables or methodologies rather than the optimization objectives.
To achieve the EE maximization, we need to design an approach to jointly optimize relay selection, spectrum allocation and power control for the two-hop links.We employ a four-dimensional variable X = {X m,r,c f ,c s } to denote the allocation of relay and spectrum resources, where m ∈ M = {1, 2, • • • , M}, X m,r,c f ,c s ∈ {0, 1}.If X m,r,c f ,c s = 1, the signal transmitted from TX m t to RX m r is relayed by RS r, and meanwhile, RB c f and RB c s are reused by the first hop and the second hop D2D links, respectively.We define the EE as the ratio of SE to the total power consumption.The definition of SE is how many bits can be transmitted per second per Hz.The system bandwidth is 5 MHz in simulations.Then, the total throughput needs to be divided by the system bandwidth to derive the SE.Therefore, when X m,r,c f ,c s = 1, the EE of the two-hop D2D link can be obtained by: and substituting (3) into (6), we have: where P cir is the circuit power of D2D TXs or RSs.The numerator of (6), i.e., log 2 (1 + γ m t ,m r ), represents the SE, which is calculated based on the effective SINR of the two-hop D2D link, defined by (3).The denominator of (6), i.e., P m t + P r + 2P cir , represents the total power consumption, which consists of the transmission power of D2D TX, the transmission power of RS and the circuit power.Based on the above derivation, we jointly design the binary allocation variable {X m,r,c f ,c s }, the continuous power variables P m t and P r to maximize the total EE of the two-hop D2D links, thus formulating a mixed integer nonlinear programming (MINLP) problem as: max C1 : P min,m t ≤ P m t ≤ P max,m t , ∀m t ∈ M T , P min,r ≤ P r ≤ P max,r , ∀r ∈ R, C1 gives the transmit power constraints of D2D TXs and RSs.P min,m t , and P max,m t represent the lower and upper bounds on the transmit power for D2D TX m t .P min,r and P max,r represent the lower and upper bounds on the transmit power for RS r.The specific values of P min,m t , P max,m t , P min,r and P max,r are analyzed and determined in Section 5.The inequalities in C3 ensure that there is a one-to-one correspondence among the D2D TR pair, the selected RS, the RB reused by the first hop D2D link and the RB reused by the second hop link.C4 guarantees the QoS requirements of links.γ D min and γ C min denote the SINR threshold of D2D links and cellular links, respectively.

Energy-Efficient Resource Matching Algorithm for D2D Relay Communications
As the mixed integer programming ( 8) is an NP-hard problem that cannot be solved in polynomial time, we investigate an approach based on matching theory to provide a sub-optimal solution.However, the MINLP problem with a four-dimensional variable X leads to a four-dimensional matching, which is highly complicated.The problem defined in ( 8) is a mixed integer nonlinear program because it involves integer optimization variables for relay selection and spectrum allocation and continuous optimization variables for power allocation.By observing (1) and ( 2), we notice that in the first hop, both signal power and interference power received by the RS are related to the result of relay selection, while in the second hop, only signal power received by the RX is related to the result of relay selection.Thereupon, we start from the optimization of RB selection in the second hop link, which is independent of relay selection.First of all, each two-hop D2D link is matched with an RB for its second hop D2D transmissions, which can minimize its received co-channel interference, and then, it is matched with an RS and another RB for its first hop D2D transmission, which can maximize the EE of two-hop D2D links.Thus, the original four-dimensional matching can be approximatively decomposed into a two-dimensional matching between D2D TRs and RBs for the second hop and a three-dimensional matching among D2D TRs, RSs and RBs for the first hop.Note that we ignore the effect of dimensionality reduction on CUEs for simplification.For each matching problem, firstly, we introduce how to establish the preference list and then propose a pricing-based iterative matching algorithm.

Two-Dimensional Matching
We start from the two-dimensional matching problem in the second hop D2D link, which involves a two-sided matching with M D2D TRs on one side and C RBs on the other side.We have the following definition: Definition 1.A matching X is a one-to-one correspondence from the set M ∪ C onto itself, which is X : M ∪ C → M ∪ C, such that X (m) = c s means that D2D TR m is matched with the RB c s and X (m) = m means that m remains single.X (m) is regarded as the matching partner of m.
We consider a matching X , in which individuals m and c s prefer each other to their matching partners, but are not matched with each other under X , namely c s m X (m) and m c s X (c s ).Thus, m and c s form a blocking pair for the matching X , that is, (m, c s ) blocks the matching.We say that matching X is not stable because both m and c s have the desire to disrupt the matching X in order to be matched with each other.Definition 2. The matching X is stable when there exists no blocking pair.

Preference Establishment
In the two-dimensional matching problem, D2D TRs on one side propose to establish pairs with RBs on the other side based on their preference lists.As mentioned above, the RB selection for the second hop D2D link is independent of relay selection.Thus, regardless of which RS is selected, D2D TR m prefers to be matched with the RB that maximizes the reciprocal of interference power received by RX m r , i.e., which is defined as the preference value of TR m on RB c s .By temporarily paring each D2D TR pair with each RB, the reciprocal of the interference corresponding to each TR-RB pair can be obtained.

Matching Algorithm
In the matching process, D2D TR pairs can propose their most preferred RBs based on the established preference lists.However, it is possible that more than one D2D TR proposes the same RB.To solve this problem, we design a pricing strategy to resolve the conflict, which utilizes the concept of price to represent the matching cost for D2D TRs.The prices of RBs are virtual, without any physical significance, and are set to zero at the beginning.Let F = { F1 , • • • , Fc s , • • • , FC }, ∀c s ∈ C denote the price set of RBs for the second hop D2D links.The matching algorithm proceeds iteratively.In each iteration, the actual preference of D2D TR m on c s is updated as V m,c s − Fc s , and the preference list Ôm should be updated accordingly.Each D2D TR proposes its most preferred RB in Ôm .If any RB receives more than one request, there are conflicts in the matching process.Then, the conflicting RBs raise their prices with step s until they receive no more than one request.The algorithm ends if there exist no new requests from D2D TRs.We summarize the two-dimensional matching algorithm in Algorithm 1 and provide an illustration of the preference establishment process and the derived one-to-one matching result for the second hop D2D link in Figure 2; the flowchart is given in Figure 3.

Algorithm 1 The Two-Dimensional Matching Algorithm 1:
Input: M, C, s.

6:
while ∃ X (m) = ∅ do 7: for m ∈ M do 8: D2D TR m proposes to its most preferred RB in updated Ôm .9: end for 10: Find the RBs that have received more than one requests and put it into Ω.11: if Ω = ∅ then 12: Match TRs with the requested RBs directly.

Three-Dimensional Matching Algorithm
Given the RB allocation of the second hop, the joint optimization problem ( 8) is equivalent to a three-dimensional matching problem involving the matching among D2D TRs, RSs and RBs for the first hop.To simplify the problem, we combine one RS and one RB (CUE) as a one-dimensional unit RC.Owing to the existence of R RSs and C RBs, there are R × C different combinations, denoted as RC = {RC r,c f } r=R,c f =C r=1,c f =1 .Thus, the three-dimensional matching problem is reduced to a two-sided matching with M D2D TRs on one side and R × C RC units on the other side, which can also be solved by the proposed pricing-based matching approach.We give the definition below: Definition 3. A matching X is a one-to-one correspondence, denoted as X : M ∪ RC → M ∪ RC ∪ ∅, such that X (m) = RC r,c f means that D2D TR m is matched with unit RC r,c f and X (m) = ∅ means that m remains single.X (m) is the matching partner of m.
Since the matching in the first hop D2D link is a one-to-one three-dimensional matching among D2D TRs, RSs and RBs, if X (i) = RC r,c f , for ∀i ∈ M − {i}, we have X (i ) = {RC − RC r,c f } ∪ {∅}.Similar to the definition of the two-dimensional stable matching, the three-dimensional matching is stable if there exists no blocking pair.That is, there exists no TX and RC unit that have not been matched with each other under X and that prefer each other more than their current matching partners.

Preference Establishment
In the three-dimensional matching, D2D TRs on one side propose to establish pairs with RC units on the other side based on their preference lists.As the ultimate optimization goal is to maximize the EE of D2D links, the preference values of D2D TR on RC units can be obtained by solving a power control problem with the optimization of transmit power P m t and P r .When calculating the preference of the same D2D TR towards another RC unit, the formulated power control problem is different from the previous one, and the derived optimum power control strategy would not be the same.Hence, the determination of optimal power control strategy cannot be handled separately from matching problems because it depends on the specific matching assumption.To establish the preference lists, each D2D TR is temporarily paired with each RC unit to obtain the optimum EE and corresponding transmit power, which is also restricted to meet the QoS requirements of both D2D and cellular links.When TR m has been matched with RB c s for the second hop, the EE of TR m can be rewritten as ẼE m,r,c f ,c s .The preference value of TR m on each RC unit in a descending order is denoted as the set Ẽm = { Ẽm,1 , Ẽm,2 , ..., Ẽm,R×C }, which is firstly calculated as the achievable EE of the two-hop link, and which is the preference list of TR m.For a specific matching X (m) = RC r,c f , the maximum achievable EE of the two-hop link can be obtained by solving the following power control problem: max Since ( 7) is concave with respect to P m t and P r , the transmit power ( Pm t , Pr ) that maximizes the EE can be obtained by: The upper and lower bounds in the transmit power constraints are calculated as: The upper bounds are derived based on the QoS requirements of CUEs, and the lower bounds are derived based on the QoS requirements of D2D links.P max denotes the maximum transmit power allowed for any mobile equipment.A and B in ( 14) and ( 15) are given by, respectively, Finally, the optimum power of D2D TX m t and RS r can be expressed as: P * r = max min Pr , P max,r , P min,r , respectively.Note that if the calculated upper bound is lower than the lower bound, i.e., P max,m t < P min,m t or P max,r < P min,r , then TR m cannot be matched with unit RC r,c f .

Matching
Similar to the two-dimensional matching problem, a conflict occurs if more than one D2D TR proposes the same RS or RB.Thus, the units contain the conflicting RSs, and RBs are denoted as Ω; and the proposed pricing strategy can also be used to resolve the conflict.
C } denote the prices of RSs and RBs, respectively.The price of RC unit RC r,c f is the sum price of RS r and RB c f , and the price set of all of the RC units is denoted as In each iteration, the actual preference of D2D TR m towards unit RC r,c f is updated as ẼE m,r,c f ,c s − FRC r,c f , and the preference list Õm should be updated accordingly.Any D2D TR m that has not been matched with any RC unit would propose its most preferred RC unit in Õm .If both RS and RB in one RC unit receive only one request, the unit would be directly matched with the D2D TR that sends the request.If Ω = ∅, the conflicting elements in Ω would raise their prices with the price step s, until there is only one request left.The iterative process ends when there are no new requests from D2D TRs.
We summarize the three-dimensional matching algorithm in Algorithm 2, and an illustration of the preference lists' establishment and a stable matching that we expected is shown in Figure 4; the flowchart is given in Figure 5. Input: M, R, C, s.

6:
while ∃ X (m) = ∅ do 7: for m ∈ M do 8: D2D TR m proposes to its most preferred RC unit in updated Õm .9: end for 10: Count the amount of RSs and RBs that have received requests and put the conflicting elements that have received more than one request into Ω.11: if Ω = φ then 12: Match the RC unit with its requesting D2D pair directly.13: end if 14: if Ω = φ then 15: for RC r,c f ∈ RC do 16: if RS r and RB c f receive requests from more than one D2D pair then 17: RS r and RB c f in Ω increase their prices FR r and FC c f with the price step s, and D2D TR update the preference list Õm .After this process, RC unit which includes RS r and RB c f would be matched with the last remaining D2D TR m that proposed to them, denoted by X (m) = RC r,c f .Delete RC unit RC r,c f from Ω.

Implementation
In the real system, the BS is a major work component of the implementation of the proposed algorithm.At the beginning, the transmitters send some packets with detection signals, and then, the BS would obtain the feedback of the CSIfrom each terminal, i.e., D2D and cellular users.After that, the four-dimensional matching process would be conducted at the BS.Finally, after the matching process ends, the control signals would be forwarded to each node, and there is no signaling interaction among the network nodes during the process of matching.

Properties of the Matching Algorithm
In this section, the properties of the three-dimensional and the two-dimensional matching algorithm, including convergence, stability, optimality, and complexity, are analyzed in detail.
The analyses of the convergence, stability and optimality of the three-dimensional matching algorithm are given as follows: Theorem 1.The matching process would end with finite iterations.
Theorem 2. The proposed Algorithm 2 can converge to a two-sided stable matching X in finite iterations.Theorem 3. The solution of three-dimensional matching X is weak Pareto optimal for D2D TRs on combinations of RSs and RBs.
The proof of Theorems 1-3 can be found in Appendices B-D, respectively.The analysis of the optimality of two-dimensional matching algorithm is given as follows: Theorem 4. For any set of prices when the matching process ends, the result of the matching process has the maximum total utility of all of the possible assignments of RBs to D2D TRs.
The proof of Theorem 4 can be found in Appendix E. The proofs of the convergence and stability of two-dimensional matching are similar to Theorems 1 and 2, respectively, which can also be referred to Appendices B and C. Next, we discuss the complexity of the proposed algorithm.
For the three-dimensional matching algorithm, because each D2D TR would firstly calculate the achieved EE for each RC unit, in the process of preference establishment, the computational complexity for any TR i ∈ M to obtain the preferences is O(RC).The preference list can be derived by sorting the preference values for each TR, and the computational complexity is O(RC log(RC)).In Algorithm 2, the complexity of each price raising process, in which TRs that remain single propose their most preferred RC units, is Mloop .Mloop is the required number of iterations by the price rising process to resolve the conflict, i.e., during Mloop iterations, the conflicting elements assignment are finished when Ω = ∅.We have Mloop = 1 when Ω = ∅.Then, the computational complexity of the matching For the two-dimensional matching algorithm, since each D2D TR would calculate the interference for each RB and for any TR i ∈ M, the computational complexity to obtain the preference is O(C).The complexity of each price rising process, in which TRs that remain single propose their most preferred RBs, is Mloop .Mloop is the required number of iterations by the price rising process to resolve the conflict.Then, the computational complexity of the matching process is O(

Simulation Results and Discussion
In this section, the performance of the proposed pricing-based iterative matching algorithm is validated through simulations.Table 2 presents the simulation parameters.We employ mathematical simulation software MATLAB as our simulation environment.We consider a single cellular network with a radius of R = 300 m, which contains M D2D TRs, R RSs and C CUEs.The CUEs are randomly distributed around the BS, and the D2D TRs and the RSs are deployed in a circular hot spot area with a radius of r = 30 m.The Monte Carlo simulation is used, and in general, we set the number of simulations as 2000.We compare the proposed algorithm with three heuristic algorithms, i.e., the exhaustive search with power control, matching with fixed power and random allocation with fixed power.The exhaustive search algorithm that examines all of the possible combinations to find the optimum solution is used to serve as an upper performance benchmark.In the proposed matching with the fixed power algorithm, the power allocation problem is not considered, and the transmit power for any D2D TX or RS is always fixed as the maximum transmit power P max .Remark 2. The random matching with the fixed power algorithm also employs the maximum transmit power.In addition, the D2D TRs, RSs and RBs are matched in a random way, which is used to indicate the lower performance benchmark.This is a conventional algorithm for the lower bound that is also utilized in many works [12,27,30].

dB
Step size s 0.1 For the optimum exhaustive matching algorithm, the computational complexity increases rapidly with the number of users increasing, which makes it impossible to simulate the optimum exhaustive matching algorithm with a large scale of users.For this reason, we use 2 ∼ 6 pairs of D2D TRs as an example to indicate the performance gap between the proposed algorithm and the optimum one.
Figure 6 shows a snapshot of user locations with M = R = 4 and C = 8. Figure 7 shows the overall EE performance of all of the D2D pairs versus the number of D2D TR pairs.The performance gap between the proposed algorithm and the optimum exhaustive matching algorithm is smaller compared with the gaps of the other heuristic algorithms, such as matching with fixed power and random allocation with fixed power.For instance, when M = 6, the proposed algorithm achieves more than 80% of the optimum performance, and outperforms matching with fixed power and random allocation with fixed power by 68.75% and 145.45%, respectively.Besides, the computational complexity of the proposed matching algorithm is an order of magnitude lower than that of the exhaustive algorithm.The EE of D2D pairs of matching with fixed power is worse than that with power control.This proves that the matching with fixed power performance gain achieved by maximizing transmit power in an interference-limited environment is not able to compensate the corresponding EE loss.The random matching with fixed power performs the worst due to the following two reasons.Firstly, the relay selection and spectrum allocation are not jointly optimized, and the power control has not been taken into consideration.Secondly, some potential D2D TR pairs are blocked when the QoS requirements of either D2D TR pairs or CUEs are not satisfied under the random matching.Figure 8 shows the number of iterations required to reach a stable matching.Simulation results show that it takes only three iterations and four iterations for the proposed algorithm to converge when M = 4 and M = 5, respectively.As the number of D2D pairs increases, it takes more iterations for the proposed algorithm to converge.The reason is that both the number of matching candidates and the number of matching conflicts increase along with the number of D2D TR pairs.Thus, it takes more iterations for the proposed algorithm to find a suitable RC unit and reach a stable matching.
Figure 9 shows the percentage of served users, which is defined as the number of D2D pairs served successfully by relay stations divided by the total number of D2D pairs.We compare the percentage of served users under two scenarios, i.e., with RSs and without RSs.Matching without RSs means that the D2D TX and the D2D RX communicate directly, and it reuses only one RB.The D2D TX in matching without RSs adopts the optimal transmit power by solving the power allocation problem to maximize the EE.If the QoS requirements of both D2D TR pairs and CUEs cannot be satisfied simultaneously, the D2D TR pairs are blocked and not served.Simulation results show that the proposed algorithm outperforms matching with power control, but without relay and random allocation with fixed power by 18.57% and 72.92%, respectively, when the SINR threshold of D2D links is 28 dB. Figure 9 also shows that the average percentage of served users degrades as the SINR threshold increases.The reason is that it is more difficult to find suitable RBs and RSs for the D2D pairs to satisfy the QoS requirements of both D2D pairs and CUEs simultaneously.Figures 10 and 11 show the influence of changing the number of D2D RSs or CUEs to the average energy efficiency of D2D pairs.We compare the proposed algorithm with two heuristic algorithms, i.e., matching with the fixed power algorithm and random allocation with the fixed power algorithm.In Figure 10, when R = 10, the proposed algorithm outperforms the matching with the fixed power algorithm and the random allocation with the fixed power algorithm by 129.82% and 567.84%, respectively.It is noted that increasing the number of RSs could significantly improve the performance of the proposed algorithm.For instance, when the number of D2D RSs is increased from 6-10, the performance of proposed algorithm is improved by 29.58%, while the performances of the other two heuristic algorithms are only improved by 24.00% and 2.58%.In Figure 11, when C = 16, the proposed algorithm outperforms the matching with the fixed power algorithm and the random allocation with the fixed power algorithm by 134.62% and 767.21%, respectively.Similar to Figure 10, it is clear that increasing the number of CUEs could also dramatically improve the performance of the proposed algorithm.For instance, when the number of CUEs is increased from 12-16, the performance of the proposed algorithm is improved by 47.08%, while the performances of the other two heuristic algorithms are only improved by 39.49% and 2.37%.Thus, the proposed algorithm could receive more benefits from the increasing number of RS and CUE compared to the other two heuristic algorithms.
We also consider another single cellular network with a radius of R = 100 m, for which the D2D TRs and the RSs are deployed in a circular hot spot area with a radius of r = 10 m.In fact, the simulation parameters do not impact the performance results of the proposed algorithm.In other words, the proposed algorithm is adapted to other scenarios, including an indoor scenario.

Conclusions
In this paper, we considered the energy-efficient resource management issue in D2D cooperative relay communications and proposed a pricing-based matching approach to jointly optimize relay selection, spectrum allocation and power control.First of all, we formulated the joint optimization problem as an NP-hard four-dimensional matching problem.Secondly, to provide a tractable solution, we proposed a low-complexity two-stage optimization approach, which decomposes the original joint problem into a two-dimensional matching between D2D RXs and RBs for the second hop and a three-dimensional matching among D2D TRs, RSs and RBs for the first hop.Thirdly, for the matching problem in each stage, we proposed a pricing-based iterative matching algorithm to maximize the EE while guaranteeing the QoS requirements of both D2D TRs and CUEs.Finally, the proposed algorithm was compared with some heuristic algorithms through simulations.It was demonstrated that the proposed algorithm cannot only dramatically improve the energy efficiency performance, but also significantly enhance the network coverage with low computational complexity.

Figure 1 .
Figure 1.System model of D2D cooperative relay communications underlying the cellular network.
The cellular user whose resource blocks are reused in first hop D2D link c s The cellular user whose resource blocks are reused in second hop D2D link P mt Transmit power of D2D transmitter P r Transmit power of relay station P c f Transmit power of cellular user c f P cs Transmit power of cellular user c s d mtr Transmission distance between D2D transmitter and relay station d c f r Transmission distance between cellular user and relay station h mtr Channel response of the first hop D2D link h c f r Channel response of the interference link from cellular user c f α Path-loss exponent h s,mtr Small-scale fading (Rayleigh) channel coefficients between D2D transmitter and relay station h s,c f r Small-scale fading (Rayleigh) channel coefficients between cellular user and relay station N 0 One-sided power spectral density of the additive white Gaussian noise γ r SINR received by relay station r γ mr SINR received by D2D receiver m r γ mr,mr SINR of the two-hop D2D link γ c f SINR of the cellular user whose resource blocks are reused by the first hop D2D link γ cs SINR of the cellular user whose resource blocks are reused by the second hop D2D link EE m,r,c f ,cs Energy efficiency of the two-hop D2D link It is assumed that there are M D2D TRs, R D2D RSs and C CUEs in the system.D2D TXs and RXs are denoted by the sets

13 : else 14 :Figure 2 .
Figure 2.An illustration of the two-dimensional matching in the second hop D2D link.

Figure 3 .
Figure 3. Flowchart of the two-dimensional matching algorithm.

Figure 4 .Algorithm 2
Figure 4.An illustration of the three-dimensional matching in the first hop D2D link.

Figure 5 .
Figure 5. Flowchart of the three-dimensional matching algorithm.

Figure 6 .
Figure 6.A snapshot of user locations for a single cellular network with C CUEs, M D2D pairs and R RSs (C = 8, M = 4, R = 4; the cell radius is 300 m, and the size of the spot hot is 30 m, respectively).

Figure 7 .
Figure 7. Energy efficiency of D2D pairs vs. the number of D2D pairs.

Figure 8 .
Figure 8. Energy efficiency of D2D pairs vs. the number of matching iterations.

Figure 9 .
Figure 9. Average percentage of served users vs. the threshold of SINR.

Figure 10 .
Figure 10.Energy efficiency of D2D pairs vs. the number of RSs.

Figure 11 .
Figure 11.Energy efficiency of D2D pairs vs. the number of D2D CUEs.