Deep Learning-Based Power Control Scheme for Perfect Fairness in Device-to-Device Communication Systems

: The proximity-based device-to-device (D2D) communication allows for internet of things, public safety, and data ofﬂoading services. Because of these advantages, D2D communication has been applied to wireless communication networks. In wireless networks using D2D communication, there are challenging problems of the data rate shortage and coverage limitation due to co-channel interference in the proximity communication. To resolve the problems, transmit power control schemes that are based on deep learning have been presented in network-assisted D2D communication systems. The power control schemes have focused on enhancing spectral efﬁciency and energy efﬁciency in the presence of interference. However, the data-rate fairness performance may be a key performance metric in D2D communications, because devices in proximity can expect fair quality of service in the system. Hence, in this paper, a transmit power control scheme using a deep-learning algorithm based on convolutional neural network (CNN) is proposed to consider the data-rate fairness performance in network-assisted D2D communication systems, where the wireless channels are modelled by path loss and Nakagami fading. In the proposed scheme, the batch normalization (BN) scheme is introduced in order to further enhance the spectral efﬁciency of the conventional deep-learning transmit power control scheme. In addition, a loss function for the deep-learning optimization is deﬁned in order to consider both the data-rate fairness and spectral efﬁciency. Through simulation, we show that the proposed scheme can achieve extremely high fairness performance while improving the spectral efﬁciency of the conventional schemes. It is also shown that the improvement in the fairness and spectral efﬁciency is achieved for different Nakagami fading conditions and sizes of area containing the devices.


Introduction
Recently, deep learning techniques have been widely integrated in wireless communication systems (WCSs) in order to resolve radio resource management problems and improve system performances with low complexity. In [1], a spectrum shortage problem, which is induced by the rapid growth of data traffic, has been resolved by the recurrent neural network (RNN)-based learning technique. Especially, it has addressed the problem of resource allocation for uplink-downlink decoupling in heterogeneous small cell networks (SCNs) that incorporate long-term evolution in the unlicensed band. In [2], the deep learning system for heterogeneous network traffic control has been proposed for handling the significant traffic growth. The simulation results have also shown that the signaling overhead, throughput, and delay of the deep learning system are considerably improved. In [3], a convolutional neural network (CNN)-based learning algorithm has been used for network traffic control, and it has attained significantly low average delay and packet loss rate. In [4], the auto-encoder architecture of deep neural network (DNN)-based learning has been proposed for peak-to-average power ratio (PAPR) reduction in orthogonal frequency division multiplexing systems, and it has been also shown that the DNN-based learning scheme improves both the PAPR and bit error rate. In [5], DNN-based sparse code multiple access (SCMA) has been proposed for the mapping of data to codebook and decoding of the received signal, where DNN is learned to minimize the bit error rate. It has been also noted that the decoder that is based on DNN requires lower computational complexity than the conventional SCMA decoder.
Device-to-device (D2D) communication has attracted great attention as one of the techniques to fulfill the requirements for the 5G communication system in terms of data rate, latency, and spectrum utilization. D2D communication enables devices that are in close proximity to communicate while using a direct link without main infrastructure. Thus, D2D communication techniques allow internet of things, public safety, and data offloading services. Because of such advantages, D2D communication has been introduced into wireless sensor networks in order to realize various applications, such as smart cities, smart home, smart grids, healthcare, military, and internet of things applications [6]. However, applying D2D communication into the wireless sensor networks induces technical problems, such as device discovery, mode selection, data security, and interference mitigation [7]. In particular, there are challenging issues on the data rate shortage and coverage limitation due to co-channel interference in the proximity communication. In D2D WCSs, radio resources, such as transmit power, frequency bandwidth, and time slot, can be properly managed in order to improve the data rate and the coverage. The importance of radio resource management becomes more significant when devices are dense in WCS, because co-channel interference can be severer. In D2D WCSs, most of the researches have focused on transmit power control for radio resource management since devices typically share the frequency and time resources in D2D communication [6][7][8][9]. In addition, network-assisted D2D WCSs have been considered for further performance enhancement, where a base station or control center obtains information, such as D2D communication channels and manages radio resources for D2D communications [9][10][11][12].
For network-assisted D2D WCSs, a deep learning-based transmit power control scheme has been proposed in [13,14], where it has been also shown that the deep learning-based scheme provides better spectral efficiency than the weighted minimum mean square error (WMMSE) scheme. The WMMSE scheme has been proposed to maximize distributed sum-utility for a multiple-input multiple-output (MIMO) interfering broadcast channel [15], where transmit weights are iteratively updated for the sum-utility maximization. Such an iterative algorithm to update the weights can be similar to a deep learning algorithm, in that deep learning is a method for optimizing a loss function by updating the weights' values inside the neural network. However, the deep learning scheme needs lower computational complexity than WMMSE, because it uses the neural network that is trained by previous channel realizations.
The power control scheme in [13,14] can limit the spectral efficiency due to a local optimum problem of deep learning. Hence, in this paper, we present CNN while using the batch normalization (BN) scheme [16] to resolve the problem, which achieves better spectral efficiency than the conventional scheme. In addition, the deep learning-based power control schemes presented in [13,14] have focused on the improvement in spectral efficiency and energy efficiency. Thus, their data-rate fairness is quite poor, as seen in Section 4 of this paper. Therefore, we propose a fairness-based deep learning algorithm while using CNN with the BN scheme. It is noted that the data-rate fairness between devices may be a key performance metric in D2D communications, since sensors or devices, which are located in a certain area and use the same WCS, may anticipate a fair quality of service. In addition, to the best of our knowledge, a deep learning-based transmit power control technique for D2D WCSs considering fairness performance has not yet been investigated.
The contributions of this paper can be summarized, as follows: 1. We propose a fairness-based transmit power control scheme using CNN with the BN process for network-assisted D2D WCSs, where the BN scheme is applied to the CNN structure to reduce the learning time and enhance the spectral efficiency. 2. We define a loss function for the deep-learning optimization in order to maximize both spectral efficiency and data-rate fairness, where the priority between spectral efficiency and data-rate fairness are adjusted by the loss function parameter. 3. We present a pertinent value of the loss function parameter that maximizes the spectral efficiency while achieving the perfect data-rate fairness. 4. We provide simulation results of spectral efficiency, fairness, and computational time in the presence of co-channel interference from devices and cellular users assuming independent wireless channels with path loss and Nakagami-m fading effects. 5. From the simulation results, we verify that the proposed scheme is superior to the conventional deep-learning scheme and the WMMSE scheme in terms of the spectral efficiency as well as fairness.
The remainder of this paper is organized, as follows. Section 2 describes the system model and problem formulation. Section 3 proposes the CNN structure and deep learning method. In Section 4, the spectral efficiency and fairness performance of the proposed scheme are shown to verify its superiority. Finally, in Section 5 we conclude this paper.

System Model and Problem Formulation
We consider the network-assisted D2D WCS, where multiple users are distributed by uniform distribution over an area D × D, as shown in Figure 1. We assume that there is the interference from U cellular users. In addition, it is assumed that there are N single-antenna transceiver pairs, each of which comprises one transmitter and one receiver, and all transmissions are carried out simultaneously over the same frequency. Accordingly, each transmitter can interfere to other transceiver pairs, i.e., the receiver j ∈ J receives data from the transmitter i ∈ I and receives interference from other transmitters k ∈ I\{i}, where I = {1, 2,· · · , N} and J = {1, 2,· · · , N} denote the set of the transmitters and receivers, respectively. The path loss model, which is denoted as G i,j , between transmitter i and receiver j is expressed, as follows: where β, α, and d i,j are the path loss coefficient, path loss exponent, and distance between transmitter i and receiver j, respectively. Using Equation (1), the wireless channel power between transmitter i and receiver j can be modelled, as follows: where g i,j is a Nakagami-m fading channel [17]. The Nakagami factor m means the ratio of line-of-sight and non-line-of-sight components. For instance, when m = 1, there is no line-of-sight component, which denotes the Rayleigh fading channel, whereas the line-of-sight component becomes stronger as m increases. Using Equation (2), the spectral efficiency (SE) for the i-th transceiver pair, which is used as a part of the loss function for deep learning, is expressed as where P i , N 0 , and W are the transmit power of transmitter i, the noise spectral density and the bandwidth, respectively, 0 ≤ P i ≤ P max , and P max is the maximum transmit power at the transmitter. Moreover, P c denotes the transmit power of the cellular users and z u,i is the wireless channel power between cellular user u and receiver i, which is obtained using Equation (2). In order to measure data-rate fairness for transceiver pairs, using Equation (3), the index of fairness (IF) is defined as [18] whereSE i represents the average SE for the i-th transceiver pair, and 0 ≤ IF ≤ 1. As IF rises, the fairness performance becomes better and, thus, the perfect fairness is achieved when IF = 1.  In order to consider the fairness performance in the transmit power control scheme, we use both SE and IF in Equations (3) and (4) in order to find the optimal transmit powers for each transmitter. Subsequently, in this paper, we determine the transmit powers that maximize the sum of SE for all the transceivers while satisfying the maximum IF. Thus, its optimization problem can be formulated, as follows: max It is noted that the optimization problem that is given in Equation (5) is non-convex, and thus it is hard to solve it efficiently.

Proposed Convolutional Neural Network Structure and Learning Process
To enhance the SE performance, we apply BN process into the structure of the CNN. The BN scheme is the process of normalizing each node in the CNN structure, which is adjusting the input or output values of the activation function of each layer to the proper range. By using BN, techniques, such as dropout, to prevent overfitting can be replaced, and the learning rate can be set higher. Thus, the BN scheme can improve the learning speed and solve the problem of weight initial value and weight loss [16]. Figure 2 shows our proposed CNN structure with BN process. As seen in Figure 2, the first thing to learn in the proposed scheme is to obtain the normalized channels after changing to dB scale. The normalized channel powers,ĥ i,j , are then obtained aŝ After that, we perform the BN process, as in Figure 2, which normalizes the structure of neural network through the mean and standard deviation of each batch data. Figure 3 shows the convolutional operations without BN and with BN. In the figure, it is described that the output can be normalized by adding two additional processes for BN in a fundamental convolutional operation. Subsequently, usingĥ i,j in Equation (6), we obtain the output of the BN process for transmitter i and receiver j, Y BN i,j , as follows: and where µ, σ, and are mini-batch mean, mini-batch standard deviation, and the value that prevents the denominator from becoming zero, respectively. In particular, γ L k and η L k , which can be trainable, determine the scale and shift of normalized values, respectively, where L k is the layer index of k = {1, 2,· · · , 8}. The output of BN process is used as the input of the rectified linear unit (ReLU) layer in Figure 2, and then the output of the ReLU layer is expressed, as follows: where it is noted that the ReLU layer prevents a negative value and provides nonlinearity to CNN [19] by performing max(·, 0) operation. The output of the ReLU layer goes into the convolution layer, where the 3 × 3 weight matrix of convolution layer, W C , is used [19]. In this paper, it is assumed that the depth of the convolution layer is eight as in [13]. The output of the convolution layer is then obtained, as follows: where W C m,n is the (m, n)-th element of W C . The convolution layer is operated with the step size of 1 in the convolution filter and zero padding, by which the size of the output is equal to that of the input. The CNN model is more advantageous than the general neural network when two-dimensional input data are given by the form of the two-dimensional product rather than the one-dimensional product of the convolution layer. It is noted that each of Equations (8)-(10) is performed for all of the normalized channel powers, i.e.,ĥ i,j for i = 1, 2, · · · , N and j = 1, 2, · · · , N. Subsequently, letting the output matrix of the convolutional layer be denoted as Y conv , the size of Y conv is N × N ×8.
Because the convolution part consists of eight layers, as shown in Figure 2, the process in Equations (8)-(10) is performed eight times in series. After carrying out the convolution part with eight layers, its output goes into the BN process again. Subsequently, let the output of the BN process be denoted as Y conv O , and the N × N ×8 matrix Y conv O is transformed into the 1 × 8N 2 vector x FC , which is the input of the fully connected (FC) part, as seen in Figure 2. Thus, using the input vector x FC , the output of the FC part is obtained as where W FC and b FC are the weight matrix and bias vector of FC part, respectively, and the sizes of W FC and b FC are set to 8N 2 × N, 1 × N, respectively. Consequently, the size of the output vector of the FC part, y FC , is 1 × N. Finally, the output of the FC part enters into the sigmoid part as in Figure 2, and then the output of the sigmoid part is expressed as where the output y sig i is limited between 0 and 1. Thus, it is multiplied by P max in order to set the transmit power between 0 and P max . Therefore, the proposed transmit power for transmitter i is expressed, as follows: Using our CNN structure, we solve the optimization problem of Equation (5) through the off-the-shelf stochastic gradient algorithm, which is adaptive moment estimation [20]. In addition, the weight and bias values of W FC and b FC are initialized by a normal distribution. The deep learning optimization method learns to minimize the value of the loss function, which is defined using Equation (5), as follows: In the loss function, λ is the weight to determine which of maximum SE and perfect IF is more significant in the CNN-based learning scheme, and 0 ≤ λ ≤ 1.

Convolution without BN
Convolution with BN Especially, when λ is close to 1, the transmit power of the device is determined without considering the fairness, whereas, when λ is close to 0, the perfect fairness can be dominantly achieved while the SE maximization is rarely treated. The IF value of the proposed training model should be one in order to maximize IF. In other words, we need to find λ * to satisfy the maximum IF. After finding all of the possible values of λ * , we choose the maximum value of λ * in order to maximize the sum of SE while achieving the perfect fairness.
The proposed CNN-based learning scheme is trained according to the above-mentioned procedure. The trained model can derive the transmit power by receiving the channel as an input in real time. Note that, the proposed scheme can provide the appropriate transmit power, even in an untrained channel, i.e., the weight and bias values of W FC and b FC in the CNN, which are well trained, can be used for various channels.

Performance Evaluation
In this section, in order to compare the conventional WMMSE [15] and deep power control (DPC) [13] schemes and our proposed DPC scheme in terms of SE and IF, we show their simulation results. We also show the IF results according to λ in the loss function in Equation (14) to find the maximum value of λ that satisfies the maximum IF in the training model.
For simulation, we assume that D = 40, N = 10, P max = 43 dBm, β = 10 −3.453 , α = 3.8, W = 10 MHz [21]. In addition, 100,000 and 20,000 channel samples are generated for the training and test data, respectively. The samples of the Nakagami-m fading channel power between transmitter i and receiver j are generated by |g i,j | 2 = ∑ m n=1 (R 2 n + I 2 n ), where R n and I n are independent Gaussian processes with zero mean and unit variance [17], and the channel samples for all of the links are independently generated. The batch size is set to 500 and the learning rate is set to 0.00005. Figure 4 depicts the average SE results of the conventional WMMSE and DPC schemes and the proposed DPC scheme with λ = 1 for various area sizes in Rayleigh fading channels (i.e., m = 1), where the proposed scheme with λ = 1 means that it only focuses on SE maximization with no consideration of IF, and the conventional DPC scheme to maximize SE [13] is used. In the figure, the proposed scheme has greatly better SE performance than the conventional WMMSE and DPC schemes for all the sizes of area. From those results, it is verified that the BN process, which is applied into the proposed scheme, works effectively to attain the significant SE improvement. Additionally, from the figure, it is observed that the SE performance of all the power control schemes becomes worse as the area size increases because of the path loss effect. Figure 5 shows the IF results of the conventional WMMSE and DPC schemes and the proposed DPC scheme with λ = 1 for various area sizes in Rayleigh fading channels, where no fairness is considered in both the conventional and proposed DPC schemes. In the figure, the conventional DPC scheme has the poorest IF, whereas the WMMSE scheme achieves the perfect fairness, i.e., IF = 1. The proposed DPC scheme with λ = 1 obtains better IF results than the conventional DPC because of the BN process. It is also demonstrated that the IF results for all of the schemes are not changed by the area size. It is remarkably noted that the proposed DPC works better than the conventional DPC in terms of both SE and IF. In addition, the figure illustrates that the proposed scheme is inferior to the WMMSE scheme in terms of fairness and, thus, it requires pertinent λ in our loss function to achieve the perfect fairness.   Figure 6 depicts the IF results of the proposed DPC scheme according to λ in the loss function in Equation (14) for Rayleigh fading channels when IF thr = 1. In the figure, it is shown that the IF result becomes closer to 1 as λ decreases, and the maximum value of λ that meets the maximum IF (i.e., IF = 1) is 0.0001. It is very important to find the maximum value of λ because when λ is close to 0, the SE maximization is rarely taken into account, as specified in Section 3. Additionally, the figure illustrates that the IF results become away from one as λ increases. Figures 7 and 8 show the SE and IF results of the proposed DPC scheme with λ = 1 and 0.0001 in Rayleigh fading channels, respectively. In Figure 7, the proposed DPC with λ = 0.0001 and the maximum IF has lower SE results than the proposed DPC with λ = 1 for all of the area sizes. However, in Figure 8, the proposed DPC with λ = 0.0001 and the maximum IF has higher IF results than the proposed DPC with λ = 1 for all the area sizes, and it also achieves perfect fairness performance. From Figures 7 and 8, it is recognized that the fairness performance of the proposed DPC with λ = 0.0001 improves at the cost of SE.

Performance of the Proposed DPC for Different Channel Environments in the Absence of Cellular Users
The proposed DPC scheme is trained with the channel parameters that are set at the beginning of Section 4. In this section, we use the trained DPC scheme to obtain the performance results for different channel environments, i.e., different path loss exponents and Nakagami factors. In addition, the optimal results presented in Figures 9-12 Figures 9 and 10 show the SE and IF results of the conventional schemes and the proposed scheme with λ = 0.0001 for α = 5 in Rayleigh fading channels, respectively, where the proposed DPC is trained when α = 3.8. It is noted that an increase in the path loss exponent α means that the wireless channel condition becomes severer. In Figure 9, the conventional DPC scheme for SE maximization has worse SE performance than the WMMSE scheme, and the proposed DPC scheme provides the best SE performance when compared to the conventional schemes. In Figure 10, the proposed DPC scheme as well as the WMMSE scheme have the perfect fairness, as shown in the previous results in Figures 5  and 8. However, the conventional DPC scheme provides the worst IF result, which is almost 0.1. Figures 11 and 12 show the SE and IF results of the conventional schemes and the proposed scheme with λ = 0.0001 for α = 3.8 in Nakagami-m fading channels with m = 3, respectively, where the proposed DPC is trained in Rayleigh fading channels when α = 3.8. An increase in the Nakagami factor m implies that the line-of-sight component becomes stronger when compared to the non-line-of-sight component.   In Figure 11, when comparing to the results for Rayleigh fading channels, the WMMSE scheme has slightly worse SE results, but both the DPC schemes have better SE results. In addition, in Figure 12, it is shown that the proposed DPC scheme as well as the WMMSE scheme, still achieve the perfect IF, but the conventional DPC scheme has the worst fairness performance.
From the simulation results presented in Figures 9-12, it is verified that the proposed DPC scheme, which is trained in a certain channel condition, can work effectively, even in different channel environments. In addition, in Figures 9 and 11, it is observed that there is a non-negligible performance gap between the proposed DPC scheme and the optimal one, and the performance gap is not changed according to the size of area and the channel environment. Thus, the proposed DPC scheme can be further enhanced to achieve the optimal SE result.

Performance of the Proposed DPC for Different Channel Environments in the Presence of Cellular Users
In this section, we evaluate the impact of the interference from the cellular users on the spectral efficiency of the proposed DPC scheme. Figures 13 and 14 show the SE results of the proposed scheme for α = 5 in Rayleigh fading channels and α = 3.8 in Nakagami-m fading channels with m = 3, respectively, when the cellular users are in the D2D communication area and P c = 23 dBm. These figures demonstrate that the SE performance is significantly degraded by the interference from the cellular users, even when a single cellular user exists, and the performance degradation becomes more severe as the size of D2D communication area decreases because the interference power from the cellular user becomes stronger. In addition, when U = 1, 2, 5, the SE results for Nakagami fading channels are worse than those for Rayleigh fading channels, since the interference power from the cellular users is stronger in Nakagami fading channels than Rayleigh fading channels. Figures 15 and 16 show the SE results of the proposed scheme for α = 5 in Rayleigh fading channels and α = 3.8 in Nakagami-m fading channels with m = 3, respectively, when the cellular users are out of the D2D communication area, and P c = 23 dBm, where the distance between the center of the D2D communication area and all the cellular users is equally 1.5D. In these figures, the SE performance becomes worse as U increases, and the performance loss is larger for Nakagami fading channels than Rayleigh fading channels, as seen in Figures 13 and 14. However, in Figure 15, the performance loss is relatively small when D = 40, because the distance between D2D communication area and the cellular users is longer than the other simulation cases.
It is noted that the perfect IF is achieved for all of the simulation cases and, thus, the IF results are not shown in this section.

Computational Complexity of the Proposed DPC Scheme
The proposed DPC scheme takes a long time to train. However, when it is used in real time, it can be faster than the conventional iterative schemes, such as WMMSE. Figure 17 depicts the computational time to obtain the transmit powers in the WMMSE and proposed DPC schemes according to the number of transceiver pairs, N. In the figure, as N increases, the computational time of the WMMSE scheme goes up, but that of the proposed DPC scheme increases slightly.

Conclusions
In this paper, we proposed the CNN-based transmit power control scheme for network-assisted D2D WCSs, where the BN process was introduced into the conventional CNN model to reduce the learning time and improve SE. In addition, the loss function for deep-learning optimization was defined in order to consider IF. In the loss function, we presented a pertinent value of λ that meets perfect IF as well as maximum SE. The simulation results of SE, IF, and computational time were provided by assuming independent wireless channels with path loss and Nakagami-m fading effects. From the results, we verified that the proposed scheme with appropriate λ was vastly superior to the conventional DPC and WMMSE schemes in terms of SE and IF. It was also shown that the proposed scheme, which was trained in a certain channel condition, achieved better SE and IF performance than the conventional schemes, even in different channel conditions. Moreover, the proposed learning scheme can significantly reduce the computational time, especially when the number of transceivers is large.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: