A Deep Learning Based Transmission Algorithm for Mobile Device-to-Device Networks

: Recently, device-to-device (D2D) communications have been attracting substantial attention because they can greatly improve coverage, spectral efﬁciency, and energy efﬁciency, compared to conventional cellular communications. They are also indispensable for the mobile caching network, which is an emerging technology for next-generation mobile networks. We investigate a cellular overlay D2D network where a dedicated radio resource is allocated for D2D communications to remove cross-interference with cellular communications and all D2D devices share the dedicated radio resource to improve the spectral efﬁciency. More speciﬁcally, we study a problem of radio resource management for D2D networks, which is one of the most challenging problems in D2D networks, and we also propose a new transmission algorithm for D2D networks based on deep learning with a convolutional neural network (CNN). A CNN is formulated to yield a binary vector indicating whether to allow each D2D pair to transmit data. In order to train the CNN and verify the trained CNN, we obtain data samples from a suboptimal algorithm. Our numerical results show that the accuracies of the proposed deep learning based transmission algorithm reach about 85% ∼ 95% in spite of its simple structure due to the limitation in computing power.


Introduction
According to a recent study, it was predicted that the total amount of Internet traffic will increase threefold over five years from 2017 to 2022 and that mobile Internet traffic will increase sevenfold for the same time period [1]. More specifically, video traffic will be growing more steeply than other types of traffic. Video traffic, which accounted for about 75% of total Internet traffic in 2017, will account for about 82% in 2022. Another interesting study showed that the most popular 50 videos account for almost 80% of the total amount of views for YouTube [2]. Thus, the mobile caching network has been attracting much attention as a new approach to cope effectively with the explosively growing mobile Internet traffic [3][4][5][6][7].
A probabilistic caching scheme with a low complexity for minimizing the caching failure probability was proposed [3]. It was shown that the density of successful reception can be maximized by optimally placing files on caching servers according to varying channel conditions [4]. The concept of collaboration in mobile caching was proposed in [5,6]. The collaboration distance was optimized in [5], and a tradeoff between collaboration distance and interference was investigated in [6]. A joint non-convex problem for resource scheduling and power allocation in a wireless caching network was formulated, and an algorithm was designed based on two decomposed convex problems [7]. Contrary to conventional approaches such as multiple antennas and heterogeneous networks to improve the spectral efficiency of cellular networks, the mobile caching network can dramatically reduce the traffic load, especially for core and backhaul networks, and is based on D2D communication. D2D communication can shorten the distance between transmitters and receivers compared to conventional cellular communications. The shortened distance between end points can reduce the end-to-end latency and power consumption and can enhance data rates. Motivated by these potentials, many previous studies have investigated mobile D2D communications [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].
In this paper, we also investigate the problem of radio resource management for D2D networks, which is one of most challenging problems. We consider a cellular overlay D2D network where there is no mutual interference between D2D links and cellular links. To improve the spectral efficiency, all D2D devices are allowed to share given radio spectra. Thus, we need to choose an optimal set of D2D links to transmit data considering the interference among D2D devices. Most of the previous studies adopted heuristic or mathematical approaches to propose algorithms for resource management for D2D networks, which continuously cause complexities that are repeated every scheduling decision, and the total complexities will be thus tremendous if accumulated. We propose a new algorithm based on deep learning, which has been widely used in various fields due to its potential. Especially, many studies have demonstrated that deep learning can be successfully exploited in the communication field [23][24][25][26][27][28]. We formulate a CNN to design an algorithm that chooses D2D links to transmit data. Even though the CNN based algorithm might result in unignorable complexities in the learning process, they are only caused during the learning process, and no more complexities are thus required in the scheduling process. In addition, the complexity of the supervised deep learning is bounded and predictable compared to deep reinforcement learning (DRL). We also design a sub-optimal scheme to obtain data samples required to train the designed CNN and verify the trained CNN. Ninety percent of the obtained data are used to train the designed CNN, and the remaining 10% is used to verify the trained CNN. An early terminating learning scheme with an adaptive learning rate is used to avoid over-fitting of the CNN with limited computing power. We analyze the performance of the CNN in terms of accuracy and average sum rate and compare them with those of the sub-optimal scheme. Our numerical results confirm that the CNN has accuracies of 85%∼95%, which indicates that the CNN can yield 85%∼95% identical scheduling results to the sub-optimal scheme. The rest of this paper is organized as follows. A detailed discussion of relevant studies about D2D communications is provided in Section 2. In Section 3, our D2D network model and wireless channel model are described. In Section 4, a sub-optimal scheme to obtain data samples to train and verify our neural network is described and a deep learning based scheduling scheme is also proposed. The performance of the proposed scheme is analyzed in terms of accuracy and average sum rate in Section 5. Finally, the conclusions of this paper are drawn in Section 6.

Related Work
Interesting algorithms to select optimal communication modes for D2D devices were studied for cellular aided D2D networks [8][9][10]. It was assumed in [8] that mobile devices can select a communication mode among a mode using dedicated resources, a mode reusing cellular resources, and a conventional cellular mode, while a D2D mode using dedicated resources was not considered in [9]. An adaptive mode selection of potential D2D devices was formulated as a follower evolutionary game, and an evolutionary stable strategy was considered to be the solution [10]. The authors investigated how to manage or mitigate the cross-interference between cellular and D2D communications in cellular underlay D2D networks [11][12][13]. In cellular underlay D2D networks, D2D devices share the same radio resource with cellular mobile devices. The authors of [11] showed that D2D devices can avoid the harmful interference from cellular networks if they decode signaling messages broadcast by cellular base stations (BSs) and take advantage of the information for radio resource management embedded in the decoded signaling messages. A hybrid mechanism based on fractional frequency reuse (FFR) scheme and an almost blank sub-frame (ABS) scheme was proposed to reduce the interference caused by cellular networks to D2D networks [12], which is expected to be very effective especially in cell edge areas. In [13], it was shown that a game theory based formulation for resource block allocations can have at least one Nash equilibrium point, and a distributed power control scheme was also proposed to minimize the cross-interference between cellular and D2D networks.
In addition, various problems of radio resource management for D2D networks have been widely investigated [14][15][16][17][18][19][20][21]. A channel aware scheduling scheme for D2D links was proposed in [14]. It was shown in [15] that if a BS can acquire the perfect information of channel gains of all communication links, multi-user diversity (MUD) gain can optimize the performance of D2D networks without significantly deteriorating the performance of cellular networks. Contrary to [14,15], the authors of [16] attempted to reduce the interference caused by D2D networks to cellular networks, and they proposed a simple heuristic algorithm because of the tremendous complexity of an optimal algorithm. The interference relationships among D2D and cellular communication links were modeled as an interference graph and a joint resource allocation scheme, yielding a near-optimal solution with low computational complexity [17]. An efficient bandwidth allocation scheme to maximize the utility of both D2D users and cellular users was proposed in [18]. A distributed algorithm with low complexity was also proposed because the original allocation problem was NP-hard. The convergence of the proposed distributed algorithm was proven in a static environment. The authors of [19] investigated how to form spectrum sharing partners between D2D links and cellular links optimally in cellular underlay D2D networks considering the cross-interference. Centralized and distributed algorithms for cellular overlay D2D networks were proposed in [20]. It was shown that a distributed algorithm can significantly reduce the signaling overhead with a marginal loss in performance, compared to a centralized algorithm. A two phase resource sharing algorithm was designed in such a way that its computational complexity could be adapted according to the network condition. In the first phase, the initial set of candidate channels is adaptively determined, and Lagrangian dual decomposition is used to determine the optimal power for D2D devices, maximizing the network sum rate in the second phase [21]. Finally, the authors in [22] proposed a novel peer-to-peer (P2P) protocol based on D2D communication, which combines the conventional application layer P2P protocol and the routing and scheduling schemes in lower layers.
On the other hand, an overview of the state-of-the-art deep learning architectures and algorithms relevant to the network traffic control systems was provided [23]. Deep neural network (DNN) based channel estimation and signal detection in orthogonal frequency-division multiplexing (OFDM) was studied [24]. A DNN enabled millimeter wave massive multiple-input multiple-output framework for effective hybrid precoding was also proposed [25]. A radio resource allocation algorithm for cognitive satellite communications was proposed by leveraging multi-objective deep reinforcement learning (DRL) and artificial neural network ensembles [26]. An energy efficient DRL based algorithm for unmanned aerial vehicle (UAV) control was proposed [27]. Despite recent improvements, DNNs tend to be easily over-fitted, while DRL faces several challenges. For DRL, a policy must be inferred by trial-and-error interaction with the environment, and agents must deal with long range time dependencies, which is known as the credit assignment problem [28].

Network Model
We investigated a cellular overlay D2D communication network with 2N mobile devices, as illustrated in Figure 1. We had no mutual interference between D2D links and cellular links because a dedicated radio resource was allocated for D2D devices, while all D2D devices shared radio spectra for higher spectral efficiency. If a mobile device wishes to receive data, it must be associated with another mobile device storing the data. We assumed that the process for associations was beyond the scope of this paper. We have N associations in Figure 1 because each mobile device was already associated with another mobile device. Although all N pairs were allowed to share radio spectra, the overall performance of the network could be enhanced by optimally choosing D2D pairs among N pairs due to the interference. In this paper, we thus focus on how to choose an optimal set of D2D pairs among N pairs. h ji denotes a channel coefficient between a transmitter i and a receiver j, where 1 ≤ i ≤ N and 1 ≤ j ≤ N. We assumed a semi-static Rayleigh fading channel model. Thus, h ji ∀i, j was distributed by a complex Gaussian distribution following ∼ CN (0, 1). All channel coefficients were independent and identically distributed (i.i.d.). It was assumed that all channels between transmitters and receivers were reciprocal because a time division duplexing (TDD) scheme was considered, and thus, h ji = h ij ∀i, j. In addition, we assumed that all channels were semi-static. Thus, h ji was static during one frame period and varied randomly each frame period. If i = j, h ji denotes the gain of the channel for the ith pair's data transmission. Otherwise, it denotes the gain of an interference channel. All mobile devices' transmission power was identical and denoted by P t . We considered a greedy source model as a traffic demand model where all transmitters had infinite packets to transmit. A greedy source model is a simple packet data model that is effective in analyzing the maximum throughput or data rate without guaranteeing any quality-of-service. In this paper, we focus on verifying the feasibility of deep learning in D2D networks and investigating the accuracy of the deep learning based algorithms. Even though more practical spatial and mobility models such as stochastic geometry and Manhattan models were not considered for simplicity, they will be able to be considered in our future work once the feasibility of deep learning has been verified in D2D networks. If we define T and U as a transmission set consisting of the pairs of devices that will be allowed to transmit data and a universal set consisting of all N pairs, respectively, then T ⊂ U {1, 2, · · · , N}. For a given T, the received signal-to-interference plus noise power ratio (SINR) for the i-pair in the given T, γ i , can be calculated as: where P n denotes a Gaussian thermal noise power. If the numerator and denominator of (1) are both divided by P n , (1) can be rewritten as: where Γ is defined as P t P n and denotes a transmission power of the signal-to-noise power ratio (SNR). Then, the sum rate for the given T can be easily calculated as: We can find an optimal set of pairs, T * , to maximize the sum rate as follows:

Proposed Deep Learning Based Scheme
Supervised deep learning algorithms continuously train neural networks to minimize the error of the output of the neural networks and the target solution. An extensive amount of data is thus required for training. In this paper, we repeated the training of our neural network toward optimal solutions given in (4) and obtained data for the repeated trainings from extensive channel realizations. In addition, we should verify whether the algorithms have been over-fitted by using extra channel realizations different from those used for training. If the whole channel gains are available, we can find the optimal combination given in (4) based on the brute-force searching algorithm. However, the brute-force searching algorithm will cause a tremendous computational complexity, especially as N increases. Thus, we formulated a sub-optimal scheme as an alternative to obtain data samples required to train our deep learning algorithm.

A Sub-Optimal Scheme to Obtain Data Samples for Training
The main concept of the sub-optimal scheme was proposed in our previous study [20]. It was shown that the sub-optimal scheme can achieve comparable sum rates to the brute-force searching scheme with an extremely low computational complexity. The sub-optimal scheme is described in Algorithm 1. In this paper, we used the sub-optimal scheme to obtain data samples instead of an optimal scheme merely because of the complexity of the optimal scheme. However, using the sub-optimal scheme does not cause any change in the proposed algorithm, nor does it limit the contributions of this paper. For given N pairs, the brute-force scheme requires a maximum of 2 N iterations, while the sub-optimal scheme only requires a maximum of N iterations. In the sub-optimal scheme, N pairs of mobile devices are sorted according to their channel gains in descending order, ignoring interference channels. The sorted pairs are re-indexed byî,1 ≤î ≤N. Thus, the sorted pairs satisfy: |h11| 2 ≥ |h22| 2 ≥ · · · ≥ |hNN| 2 .
Algorithm 1 A sub-optimal algorithm to obtain training samples.
Sort |h ii | 2 in descending order Initialize: T = ∅ and R 0 = 0 for k = 1 to N do for i = 1 to k do Calculate the SINR for theî th pair, γˆi end for In the k(1 ≤ k ≤ N) th iteration, the sub-optimal scheme calculates R k = ∑ˆk i=1 log 2 (1 + γˆi), which is the sum rate when the k pairs1 throughk transmit data simultaneously, and compares it with R k−1 . If the calculated sum rate is greater than or equal to the sum rate obtained in the previous iteration, i.e., R k−1 ≤ R k , the pairk is allowed to transmit data and added to T. Thus, T is updated by T = T ∪ {k}, and the algorithm moves on to the next iteration. Otherwise, the algorithm is terminated.
Finally, the pairs included in the transmission set T are allowed to transmit data simultaneously as soon as the algorithm is terminated early before N iterations or stops after completing N iterations.

A Proposed Scheme Based on Convolutional Neural Networks
The architecture of our CNN for deep learning is shown in Figure 2 and consists of two hidden convolution layers. The first convolution layer consists of 256 convolution filters with an N × N input matrix. The input matrix consists of channel coefficients and is denoted by [h ji ] 1≤j≤N,1≤i≤N . Each convolution filter is initialized by the Xavier normal initializer [29]. The width and height of the output of a convolution filter can both be calculated by: where O is the width and height of the output of a convolution filter, N is the input size, K is the kernel (filter) size, P is the number of paddings, and S is the stride. In the first convolutional layer, it was assumed that the kernel size of each convolution filter was 5 × 5 with a stride of one, and we did not pad zeros; thus, K = 5, S = 1, and P = 0. Based on (6), the height and width of our first convolutional layer is given by: · · · · · · · · · /0111 &+,%.
· · · · · · 2%34 5678+6, N × 1 ; Each convolution filter was activated by a rectified linear unit (ReLU) function, which returned the element-wise max(x, 0) for a given input x. The output of each convolution filter was followed by a 2 × 2 max pooling layer. A 2 × 2 max pooling layer performed down-sampling operations along the spatial dimensions by applying a max filter to non-overlapping sub-regions. For each of the regions represented by the filter, the maximum value of that region would be output. Thus, each element of the output matrix would be the maximum value of a region in the original input. If O 1 is odd, the 2 × 2 max pool will be only applied to the (O 1 − 1) × (O 1 − 1) matrix except for the last column and row. Otherwise, it will be applied to the O 1 × O 1 matrix. Thus, the width and height of the output of the 2 × 2 max pool layer is given by O 1 2 , which can be calculated as: if O 1 is replaced by (7). The final output size of the first layer was The second convolution layer consisted of 512 convolution filters. Each filter was also initialized by the Xavier normal initializer and K = 2. We also assumed that S = 1 and P = 0. The input size of the second convolutional layer was the output size of the first max pooling layer, which is given in (8). If N is replaced by O 1 2 in (6), then the width and height of the output of each convolution filter in the second layer are given as: As in the first convolution layer, each convolution filter was also activated by a ReLU function, and the output of each filter was down-sampled by a 2 × 2 max pooling layer. The width and height of the output of the 2 × 2 max pooling layer is given by O 2 2 , which can be calculated as: where the second equality is valid because: for any positive integer n [30]. The output size of the second max pooling layer was N − 6 4 × N − 6 4 × 512. The outputs of the max pooling layer were dropped out with a probability p = 0.2 to prevent the neural network from over-fitting. Thus, randomly selected neurons were ignored with a probability of 0.2 during training. The outputs were flattened to a one-dimensional array with the size of 512 N − 6 4 2 × 1 and were reduced to 1000 × 1 by a fully connected layer, which had a ReLU as an activation function. The 1000 × 1 array went through another drop-out layer with p = 0.5. It was reduced to an N × 1 array by another connected layer. Finally, the output of the fully connected layer was activated by the sigmoid function. The sigmoid function defined by S(x) = 1 1 + e −x for a given input x can be interpreted as a probability in many applications because 0 ≤ S(x) ≤ 1. The output activated by the sigmoid function is denoted by P, and the ith element of the P, P[i], can be interpreted as the probability that the ith D2D pair is allowed to transmit data. Our scheduler determined if each D2D pair i would be allowed to transmit data based on the corresponding P[i]. Thus, B[i] indicating whether to allow the ith D2D pair to transmit data can be determined as: Our proposed neural network was repeatedly trained to enhance the performance of scheduling by reducing the error between B and the result obtained by the sub-optimal scheme.

Numerical Results
In this section, we analyze the performance of the proposed scheme based on a CNN by using Python and Tensorflow. We obtained 100,000 data samples from the sub-optimal scheme, of which 90,000 samples were used to train the neural network proposed in Section 4 to increase the accuracy of scheduling, and the remaining 10,000 samples were used to verify whether the trained neural network was well fitted by testing the accuracy of scheduling based on the neural network. The size of batch was set to 100, and the number of epochs was 100. We thus needed 900 iterations for each epoch to train our neural network. Over-fitting is always a challenging problem for neural networks. Although large learning rates increase the learning speed of neural networks, they can easily cause over-fittings. On the contrary, small learning rates that can prevent neural networks from over-fitting slow down the learning speed of neural networks, and tremendous computing power is thus required to train neural networks. In this paper, we used an early terminating learning scheme with an adaptively decreasing learning rate. The learning rate for the ith epoch can be given by: where r init and d denote an initial learning rate and a non-negative number to control the decaying speed, respectively. If d = 0, learning rates are constant for all epochs. Figure 3 shows learning rates given in (13) for r init = 10 −3 and d =∈ {0.1, 0.2, · · · , 1.0}. We began to train the neural network with r init , which was relatively large to speed up the trainings. However, we decreased the learning rate gradually to prevent over-fitting of neural networks as i increased. Thus, r(i) in Figure 3 decreased as i increased. r(i) decreased more sharply as d increased. In this paper, we used r init = 10 −3 and d = 1 for our neural network. In addition, our training procedure could be automatically terminated if there was no improvement for three epochs to reduce the training time.  Figure 4 shows the accuracy of the proposed neural network. SNR was set to 0 dB, 10 dB, or 20 dB, and N was set to 10 or 20. The accuracy was measured by comparing B obtained from the neural network with that of the sub-optimal scheme. It was clearly shown that the accuracies were enhanced as the number of epochs for training increased regardless of SNR and N. The trainings were terminated at different epochs due to the early terminating learning scheme. For SNR = 0 dB, when N = 10 and N = 20, the trainings were terminated at the 37th and 35th epochs, respectively. For SNR = 10 dB, when N = 10 and N = 20, the trainings were terminated at the 20th and 26th epochs, respectively. For SNR = 20 dB, when N = 10 and N = 20, the trainings were terminated at the 29th and 26th epochs, respectively. It was shown that the accuracy of the proposed neural network based scheduling improved as N or SNR increased. There were many more training data than test data. Thus, it took more time to stabilize the accuracy for the training data. It was shown that as the epoch increased, the accuracy for the training data became higher than for the test data. It was also confirmed that no over-fitting was observed.   Figure 5 shows the average sum rates for both the training samples and test samples, obtained by the proposed neural network and the sub-optimal scheme, respectively. SNR ∈ {0 dB, 4 dB, · · · , 20 dB}, and N was set to 10 or 20. As shown in Figure 2, the number of convolution layers, filters per layer, and filter size that were used in this paper were all restricted due to the limitation in computing power. Thus, all average sum rates of the neural network were lower than those of the sub-optimal scheme for both the training samples and test samples, regardless of SNR and N. Fortunately, however, the difference of average sum rates between the neural network and the sub-optimal scheme decreased as SNR increased. For the neural network, no significant difference of the average sum rates between the training samples and the test samples was observed, which showed that over-fitting was efficiently prevented thanks to the learning scheme with adaptive learning rates.

Conclusions
In this paper, we investigated D2D communication networks, which are attractive for offloading mobile Internet traffic from core networks and can significantly enhance the quality of communications and spectral efficiency by reducing end-to-end communication ranges between transmitters and receivers, compared to mobile cellular communication networks. The performance of D2D communication networks was closely related to how transmissions of D2D pairs are scheduled. In this paper, we adopted a new approach to schedule transmissions in D2D communication networks efficiently using supervised learning based on a CNN. The CNN consisted of two convolution layers and a fully connected layer. We used a sub-optimal scheme instead of an optimal scheme to obtain samples for supervised learning because an optimal scheme achieving the maximal performance requires a tremendous computational complexity. Ninety percent of the obtained samples were used to train the neural network to achieve the same scheduling results as the sub-optimal scheme, while the remaining 10% of the obtained samples were used to test whether the trained neural network was over-fitted. To overcome our limitation in computing power, we adopted an early terminating learning scheme with an adaptive learning rate where the training procedure was automatically terminated if no improvement was observed for three epochs, and a learning rate that began with quite a large value exponentially decreased as the epoch increased. Our extensive numerical results showed that the neural network could yield about 85%∼95% accuracies, which indicated that 85%∼95% of scheduling decisions from the neural network were identical to the scheduling decisions from the sub-optimal scheme, which was the target algorithm. Especially when SNR = 20 dB and N = 20, the accuracy of the neural network approached about 97%, and the average sum rate of the trained neural network was also about 97% of the sub-optimal scheme for both the training samples and the test samples.