Throughput Maximization Using Deep Complex Networks for Industrial Internet of Things

The high-density Industrial Internet of Things needs to meet the requirements of high-density device access and massive data transmission, which requires the support of multiple-input multiple-output (MIMO) antenna cognitive systems to keep high throughput. In such a system, spectral efficiency (SE) optimization based on dynamic power allocation is an effective way to enhance the network throughput as the channel quality variations significantly affect the spectral efficiency performance. Deep learning methods have illustrated the ability to efficiently solve the non-convexity of resource allocation problems induced by the channel multi-path and inter-user interference effects. However, current real-valued deep-learning-based power allocation methods have failed to utilize the representational capacity of complex-valued data as they regard the complex-valued channel data as two parts: real and imaginary data. In this paper, we propose a complex-valued power allocation network (AttCVNN) with cross-channel and in-channel attention mechanisms to improve the model performance where the former considers the relationship between cognitive users and the primary user, i.e., inter-network users, while the latter focuses on the relationship among cognitive users, i.e., intra-network users. Comparison experiments indicate that the proposed AttCVNN notably outperforms both the equal power allocation method (EPM) and the real-valued and the complex-valued fully connected network (FNN, CVFNN) and shows a better convergence rate in the training phase than the real-valued convolutional neural network (AttCNN).


Introduction
The high-density Industrial Internet of Things [1][2][3] needs to meet the requirements of multiple device access and massive data transmission, especially in fields such as augmented reality and wide-area connectivity for fleet maintenance [4,5], which requires the support of multi-antenna technology and network optimization strategies such as radio resource management. Massive multiple-input multiple-output (MIMO) technology enables the users to multiplex in the spatial domain by transmitting their signals as beams. However, the reflections in the wireless channel cause inter-user interference, turning the resource allocation problem into the non-convex (The non-convexity refers to the existence of a multitude of local maxima in the function range. It needs an exhaustive search to find the optimal solution. With such a case, systematic mathematical approaches such as the interior point method [6] are computationally too expensive to handle real-time communications.) formulation, which is hardly solvable. In such a system, the quality of the power allocation plan will significantly affect spectral efficiency (Spectral efficiency is the normalization of the Shannon bound, which refers to the channel capacity and how many bits per second can be achieved in 1 Hz of the system bandwidth.) (SE). This motivates us to build a highly efficient power allocation plan to optimize the spectral efficiency so that we can improve the network throughput. However, the growth of the network scale and the expansion of radio resources place improving spectral efficiency and fairness processing as a crucial requirement to keep high throughput and low access latency.
Cognitive radio (CR) with multiple-input multiple-output (MIMO) systems is a potential candidate for the industrial domain [7] since CR attempts to minimize the conflict and interference between heterogeneous users creating a peaceful coexistence, as well as higher area throughput. Furthermore, in the industrial domain, the regulatory authorities of developing countries manage and coordinate the peaceful coexistence between the heterogeneous industrial networks manually. As an example, the German Federal Agency of Networks (BNetzA) has specified the band 3.7-3.8 GHz according to [8] for industrial wireless networks and imposes strict application procedures to grant licenses to the stakeholders. It is worth pointing out that such coexistence management can be automated effortlessly by means of CR technology. Therefore, the combination of massive MIMO and CR is ideal to meet the high throughput yet massive connectivity requirements.
Regarding the power optimization theory, systematic mathematical approaches such as interior point methods [6] are computationally expensive as they take centric iterations within a complex Newton step. Besides, the solution quality of these methods highly depends on the initial guess within the domain of the objective function. Heuristic algorithms are also widely used for these problems. Reference [9] used the modified lion algorithm (LA) for power allocation. The ant lion optimizer (ALO) employed in [10] achieved a good performance in fault location for power system state estimation. However, their many iterative calculations brought a great computational burden. In current real implementations, existing techniques in massive MIMO (e.g., specified in [11]) address the non-convexity issue with the equal assignment of power among users, which is obviously a sub-optimal, but time-efficient solution.
Recently, machine learning has been a hot research direction to address several wireless and networking issues [12], such as deep reinforcement learning for traffic puncturing [13], an adversarial network for adaptive antenna diagram generation [14], energy harvesting tactics [15], channel estimation of mmWaves [16], and many others. Regarding SE optimization, Reference [17] implemented a deep neural network (DNN). The authors of [7,18] used a fully connected neural network (FNN) to estimate the best power allocation solution to maximize SE, and Lee et al. [19] proposed a convolutional neural network for power control; however, their method cannot strictly control constraints, and the FNN has the problem of unfair power allocation. Hence, Sun et al. [20] proposed the attention-based deep convolutional neural network, which has also a better time and storage space complexity. However, they all utilized real-valued neural networks to process the complex-valued channel data, which generally take the complex-valued input data as two separate parts of real-valued data. Obviously, they failed to fully take advantage of the representational capacity of complex-valued data. Furthermore, real-valued neural networks are not friendly with the non-circular complex-valued dataset (In signal processing, the complex-valued channel data are assumed circular, which is a stochastic simplification, but not always the case in reality), as they provide less accuracy and result in more overfitting compared to the complex-valued counterparts [21].
With the advent of complex-valued neural networks (CVNNs), this problem can be well addressed. Chiheb et al. [22] proposed several key components for complexvalued deep neural networks. Reference [23] proposed complex non-parametric activation functions for CVNNs. Reference [21] implemented a tensorflow-based python library, which enabled the training and implementation of CVNNs. Yihong et al. [24] generalized meta-learning and an attention mechanism to the complex domain for signal recognition. Reference [25] proposed a sparse CVNN to acquire the downlink channel state information in the frequency division duplexing massive MIMO system.
To the best of our knowledge, no current techniques have ever applied complex-valued neural networks on power allocation for maximizing SE. Therefore, this paper proposes a complex-valued power allocation network with a complex attention mechanism (AttCVNN) to accomplish this task. Please note that the focus of this contribution is confined to the neural network design, not the beamforming processing. In more detail, our contributions are summarized as follows:

1.
We propose a complex-valued convolutional neural network with a complex attention mechanism (AttCVNN) to implement the per-antenna power allocation task in massive MIMO systems.

2.
Complex-valued attention mechanisms are implemented in our model, which are the complex cross-channel attention network and the complex in-channel attention network, where the former considers the relationship between cognitive users and the primary user, while the latter focuses on the relationship among cognitive users. 3.
Four power allocation benchmarks are implemented to show the superiority of our model. They are the equal power allocation method (EPM), the real-valued fully connected network (FNN) [7], the complex-valued fully connected network (CVFNN), and the real-valued convolutional network (AttCNN) [20].

System Model
We assumed a system model, illustrated in Figure 1, that has a cognitive radio base station (CB) of N antennas coexisting with a primary radio base station (PB) with a single antenna. The CB communicates with K cognitive users (CUs) via h k ∈ C 1×N , where k ∈ [1, K], and interferes with the primary user (PU) via h 0 ∈ C 1×N . The PB communicates with a single PU via g 0 and interferes with CU k through g k , where k ∈ [1, K]. Based on the system model, our target is to optimize the SE of the CB via a lowcomplexity power assignment design, which is crucial for massive connectivity applications. We formulated the optimization problem as maximizing the summation of all single CU's SE, which must meet two constraints: C1 is used to limit the CUs consumed sum-power under the power budget of the CB (P T ), and C2 controls the actual interference I CB under the interference limit at the PU (I th ). P PB denotes the power budget of the PB. Then, the issue of SE optimization is formulated as follows.
where σ 2 denotes the Gaussian white noise variance which is the noise power, g k 2 P PB is the interference from PB to CU k, and . denotes the 2-norm. P ∈ R K×N is the power allocation solution which collects the power of K CUs distributed spatially over N transmit antennas.

Mathematical Basis for Complex-Valued Network
Compared to real-valued neural networks, a typical complex-valued neural network should possess the ability to process complex-valued inputs, which means it would contain several complex layers, such as complex dense, complex convolution, complex dropout, complex batch normalization, and others, besides that the complex-valued activation functions should also be supported.

Complex Convolution
For the complex-valued convolution layer with a complex-valued convolution kernel K = K r + jK i and a complex-valued input matrix X = X r +jX i . The complex convolution performed on them can be defined as: where Y out denotes the output matrix. K r , K i , X r , and X j are real-valued matrices. * denotes the real-valued convolution.

Complex Dense
For the complex-valued dense layer with complex-valued weight matrix W = W r +jW i and complex-valued bias vector b = b r +jb i , the output vector y out can be calculated as: where x = x r +jx i denotes the input vector.

Complex-Valued Activation Functions
A complex-valued activation function is needed to realize nonlinear transformation on the complex tensor. Many complex-valued activation functions have been proposed to process complex variables. They can be classified into two types, Type A would process the real part and the imaginary part of the complex variable z = x+jy separately, while Type B works in the phase and magnitude domain.
The following complex-valued activation functions proposed in this section will be used in our network; these are CRELU, RSigmoid, and RSo f tmax. The complex variable z is defined as z = x+jy: • CReLU would apply ReLU on the real and the imaginary part of z, respectively: • RSigmoid would apply Sigmoid on the magnitude of z: where |z| denotes the magnitude of z; • RSo f tmax would apply So f tmax on on the magnitude of z: Note that the output of CRELU is a complex-valued number, while RSigmoid and RSo f tmax would produce real-valued outputs. That is because the latter two are used to generate a real-valued output power p in our model. Section 3.4 shows that a complexvalued activation function does not need to satisfy the Cauchy-Riemann equation, so a complex-valued neural network utilizing the above-mentioned activation functions can be trained properly in the complex domain.

Complex Backpropagation
Before the backpropagation phase, a loss function needs to be defined so that we can calculate the gradient on each parameter in the network. Although the loss function takes complex numbers as the input, the output of it must be real-valued, as complex numbers are not comparable. This fact means a real-valued complex loss function is non-analytic, so we must find another way to perform a complex derivation on it. Using Wirtinger calculus [26], we can calculate the complex gradient for non-holomorphic functions.
The main idea of it is considering the complex function f (z) as a function of z and z * , denoted as f (z, z * ), where z * = x− jy is the complex conjugate of z = x+ jy. If f is real-differentiable, then f (z, z * ) will be analytic with respect to z when taking z * as constant and vice versa [27]. Thus, we can define the following partial derivatives: We can define the complex gradient of f by the two partial derivatives [28]: The chain rule of the loss function L composition with the other complex function g(z) = r(z) + js(z) can be calculated as: Therefore, we can train the complex-valued neural network using the equations above.

Attention-Based Complex Neural Network
We propose a complex-valued convolutional neural network with an attention mechanism for the above-mentioned SE optimization problem, i.e., the AttCVNN. The AttCVNN directly takes complex-valued channel data as the input, taking complexvalued network layers as its building blocks, using complex cross-channel and in-channel attention mechanisms, i.e., the complex cross-channel attention network and the complex in-channel attention network, to improve model performance. As shown in Figure 2, the AttCVNN has a proposed data process network and three sub-networks; by multiplying the outputs of each sub-networks, we will finally obtain the allocated power for each CB user per antenna. To support complex inputs, the AttCVNN not only extends each layer to the complex domain, but realizes complex-valued attention layers, which are ComplexH 0 Att and ComplexH k Att.  Figure 2. The structure of the complex-valued power allocation neural network (AttCVNN).

Att-CVNN
The input data are the channel coefficients, denoted as

Complex-Valued Attention
The attention mechanism is a technique that mimics the cognitive attention of human beings, which is widely used in computer vision, natural language processing, and other fields in deep learning. This mechanism would generate a weight matrix from the input data, which can be used to strengthen some parts of the input data while weakening others, making the network concentrate more on the minute, but crucial details of the data.
To employ this technique in our network, we need to extend it to support complexvalued data. Given the input matrix X, we can compute matrices Q, K, and V by linear transformations, which are generally implemented as fully connected layers in neural networks. The real-valued attention can be written as [29]: where So f tmax(·) takes the cross product of Q andK as the input and, then, acts on each row of the matrix QK T . d k is a scaling factor, which denotes the row dimension of K. For a complex-valued matrix Z, we can use a complex linear transformation to obtain complex-valued matrices Q z , K z , and V z . RSo f tmax is introduced to map the complex-valued matrix Q z K T z to the real domain. Then, the complex-valued attention can be written as: where RSo f tmax(·) takes a complex-valued matrix as the input and generates a real-valued weight matrix, which is defined in Section 3.3.

Complex Cross-Channel Attention Network
The complex cross-channel attention network, i.e., ComplexH 0 Att, is designed to pay more attention to h 0 , since it is strongly related to C2 and has not yet appeared in the loss function. As shown in Figure 3, the inputs h 0 and H b are, respectively, fed into a complex dense layer and a complex Conv1D layer. Their cross product with a complex Softmax operation cross products back to H b as a new H b . Here, the complex dense layer is a fully connected layer, and the complex Conv1D layer is a 1D convolutional layer.
ComplexH k Att

Complex In-Channel Attention Network
The complex in-channel attention network, i.e., ComplexH k Att, focuses on the relationship with h k , because the definition of SE shows that the channel gain relationship among users also influences the result of SE. The input H b is fed into three complex Conv2D layers, respectively, to generate Q H b , K H b , and V H b . The cross product between Q H b and K H b would be fed into RSo f tmax. Then, H b is calculated by the cross product between the value of RSo f tmax and V H b .

Power Allocation
The AttCVNN obtains the channel gain matrix H as the input, which will be separated into two parts, h 0 and H b . H b will be fed into a complex dense layer to be preprocessed before calculating the relationship with h 0 , then the two parts are fed into ComplexH 0 Att to generate H b . After this, the rest of the networks are split into three parts, each of them containing a ComplexH k Att as their first layer and H b as their input. Their last layer is the activation functions, which will map complex-valued outputs into real values, so that their outputs can represent meaningful physical quantities. Finally, it produces N 1 , N 2 , N 3 , and N 4 after the operations of RSo f tmax, RSigmoid, RSo f tmax, and RSigmoid, respectively. Considering the result range of the four operations, the outputs N 1 , N 2 , N 3 , and N 4 can be represented as: where λ k means a user's minimum power and P k = λ k +P k . Hence, the allocated power of the ith antenna serving the k CUs can be obtained as: Then, we build the loss function to optimize the neural network parameters as follows.

Assessment Metric and System Configuration
The employed evaluation metric in this article is the spectral efficiency mentioned in (1) as SE. This metric corresponds to the objective of the optimization, which is the major demand in augmented reality and machine vision scenarios and applications.
We define a channel model on the basis of [30] that takes the path loss and Rayleigh fading into consideration. Regarding the model configuration, we set the path loss exponent as 2.5 and treated the distance between the CUs/PU and CB/PB as a random variable uniformly distributed ranging in [10,200]. The dataset contains the channel blocks. Specifically, the training examples have 1000 H's, while the test set is 10% of the training set, where H ∈ C 10×100 and H b ∈ C 9×99 . Note that K was set to 9 in this contribution as the purpose was just to prove the concept. Then, 100 Monte Carlo realizations were performed, and the simulation curves thereof were averaged. Noise is generated as a random variable following a complex Gaussian distribution with zero mean and σ 2 = 1 × 10 −9 , where σ 2 collects thermal and ambient noises. The parameters of the neural network are configured as follows: epoch = 150, batch size = 100, and learning rate = 1.5 × 10 The EPM treats each CB user equally, and the allocated powerP k,i of the EPM is calculated as follows:P where P k,i = P T N .

2.
The FNN is a real-valued fully connected power allocation network, which was proposed in [7]. 3.
The CVFNN uses the complex dense layers as its building blocks. The input data are directly fed into three consecutive complex dense layers, then the output will be flattened and fed into four complex dense layers with the complex activation functions: RSo f tmax, RSigmoid, RSo f tmax, and RSigmoid, respectively, to generate the final result. 4.
The AttCNN is a real-valued attention-based power allocation network, which was proposed in [20]. 5.
The AttCVNN is defined in Section 4, which realizes the complex-valued layers and complex-valued attention mechanism. Equations (13) and (14) are used to calculate the allocated powerP k,i .   Throughout the experiments, although their SE curves converged to a similar value eventually, the AttCVNN has a faster convergence rate than the AttCNN. In Figure 4a,b, the SE curves of the AttCVNN reach the steady states 5 bps/Hz and 7 bps/Hz, respectively, at Epoch 20, while the AttCNN needs around 35 epochs to reach it. Figure 5a,b illustrate a similar convergence behavior, but at a higher INR setting, which relaxes the constraint C2 and allows the SE to attain larger values, i.e., 6 bps/Hz and 8 bps/Hz, respectively.

Training Performance for AttCVNN and AttCNN
The comparison results show that the proposed AttCVNN has a faster convergence rate than the AttCNN in the training stage, which is an advantage in real-time communications.
In terms of the model design, our model holds a similar structure as the AttCNN scheme.
Regarding the layer size inflation, the complex-valued implementation doubles the number of layer parameters for the sake of a rapid convergence. 10 20

Power Allocation Performance
We conducted two sets of comparative experiments, using the AttCVNN, EPM, FNN, CVFNN, and AttCNN, to make a comparison of their power allocation performance, where the SNR CB and INR would vary from 20 to 50 dB to compare their SE performances. We assumed that σ 2 = 1 × 10 −9 , λ k = 0. Figure 6 demonstrates the SE against SNR CB with different INRs. In Figure 6a, we set SNR PB = 60 dB and INR= 20 dB. The EPM has the lowest SE since it allocates the power equally among CB users without being able to relax the constraints and the limitations, so that the entire power budget is fully distributed among CB users. The SE performance becomes better when introducing the FNN, CVFNN, AttCNN, and AttCVNN, which use the channel knowledge H as the input to allocate and optimize the power assignment to the CB users. When the SNR CB keeps increasing, the SE increases monotonically at the same time. Furthermore, the proposed AttCVNN always outperforms the EPM, FNN, and CVFNN. Note that the gap between the AttCVNN and EPM became obvious when SNR CB = 40 dB. The gap reached almost 0.7 bps/Hz at SNR CB = 50 dB. The AttCVNN and AttCNN have almost identical performance when the SNR CB varies from 20-40 dB. However, our proposed AttCVNN is superior from the convergence rate perspective, as revealed in the previous experiments.

SE against SNR CB
In Figure 6b, we set the INR at a higher value 30 dB. All curves trend monotonically, and the AttCVNN still has the best performance. Note that the EPM begins to diverge from the AttCVNN at SNR CB = 35 dB. The gap reached almost 0.7 bps/Hz at SNR CB = 50 dB. The AttCVNN breaks the limitation of the FNN and CVFNN with an improvement of 0.5 bps/Hz. Like Figure 6a, for SNR CB < 40 dB, the trends of the AttCVNN and AttCNN are quite similar from the SE perspective. At SNR CB = 45 dB, the AttCVNN outperforms the AttCNN with an SNR CB gain of 2 dB. This refers to a 37% reduction in the transmit power enabled by our approach. 20 Figure 7 introduces the results of the SE versus the INR in the range between 0 and 70 dB. The transmit power of the CB was set to P T = 10 mW, which is equivalent to SNR CB = 70 dB. Figure 7a illustrates the SE performance for SNR PB = 60 dB. It shows that the CR network spectral efficiency becomes high at a large INR, referring to a more relaxed upper bound for the constraint C2. In other words, the SE of the CR network increases monotonically with relaxed interference thresholds. It is worth noting that the proposed AttCVNN has always greater performance than the FNN and the CVFNN and even outperforms the EPM with a remarkable gain, e.g., 0.571 bps/Hz at INR= 0 dB increasing all the way to 4.905 bps/Hz at INR= 50 dB. This refers to a significant gain possibility with our proposal with an idle PR network. In Figure 7b, the same experiment is conducted, but for SNR PB = 50 dB. It demonstrates that the SE becomes higher due to the lower SNR PB , which induces less interference at the CB users. Note that the proposed AttCVNN does not have a remarkable SE gain over the FNN and CVFNN in Figure 7, but it has significant horizontal or INR gain, which attains 5 dB. This implies the superiority of the AttCVNN in tighter interference conditions. Therefore, the AttCVNN and AttCNN are not distinguished in the SE performance, but in the convergence rate in favor of the proposed AttCVNN. 5

Discussion
All the above experiments revealed the potential of the proposed model compared to the existing benchmarks. Moreover, all the neural-network-based methods are a huge improvement over the EPM scheme, since it does not employ any optimization theory; it only equally allocates power for the users without the consideration of interference among them. the FNN and CVFNN schemes have reasonable performance, however, associated with a large number of parameters, leading to severe overfitting. This limits their performance improvement. The AttCVNN and AttCNN use the convolutional layers to reduce the amount of parameters to prevent overfitting, and the introduction of the attention mechanism significantly improved their performance. However, the complex-valued implementation speeds up the process of training, which is a major advantage in real-time communications.

Computational Complexity
In practice, we generally use floating-point operations per second (FLOPs) to measure the time complexity of neural network models. With the experiment configuration, the time complexity of our model is 17.92 million FLOPs. As a comparison, the MobileNetV3-Small [31], proposed for mobile phone CPUs, has a time complexity of 59 million FLOPs. With more powerful processors, our model can support industrial applications with lower computational complexity.

Conclusions
This paper proposed a novel attention-based complex-valued power allocation network, the AttCVNN, to optimize the power allocation performance, where complex in-channel and cross-channel attention networks were implemented. We performed comparative experiments by varying the SNR CB and INR. Compared with the designed benchmarks (i.e., EPM, FNN, CVFNN, and AttCNN), it was shown that the proposed AttCVNN outperforms the EPM, the FNN, and the CVFNN notably regarding SE. The proposed model has faster convergence than the AttCNN in the training phase, which is a major advantage in real-time communications. The AttCVNN is a promising method for enhancing the throughput performance via radio resource management and optimization in the IoT scenarios of Industry 5.0.