Joint Optimization of Massive MIMO System Resources Based on Service QoS

: Aiming at the problem of low throughput and energy efﬁciency caused by the mutual restriction of energy efﬁciency and spectral efﬁciency in massive MIMO systems and the fact that resource allocation does not consider the factors of user service QoS and the upper and lower speed limits, a resource joint optimization method based on user service QoS guarantee is proposed. The method ﬁrst performs user scheduling according to service delay and channel state under the condition of equal power distribution and calculates the current system capacity, and then combines transmit antenna power and service QoS constraints to redistribute power, and corrects the system capacity, establishing the objective function for the joint optimization of the spectral efﬁciency and energy efﬁciency. An algorithm combining deep learning and Q learning is used to solve the problem, and ﬁnally, the purpose of joint optimization is achieved. The simulation shows that the joint optimization method proposed in this paper can control the timeout of user data packets more ﬁnely and, at the same time, obtain greater energy efﬁciency


Introduction
Multiple-Input and Multiple-Output (MIMO) technology has gradually matured after years of development and has become one of the key technologies used in intelligent communication [1], and this technology enables communication systems to obtain higher transmission rates, system capacity, and spectral efficiency [2].In the field of wireless communication, because different types of services have different requirements for QoS (quality of service, QoS) latency and rate, and when considering resource allocation, it is necessary to take user service as the premise.Due to the limited spectrum resources and the demand for high-rate capacity, spectrum efficiency as a traditional performance index has long been widely studied [3].At the same time, with the need for the future development of green communication, the spectrum efficiency of the system is no longer blindly pursued; therefore, the optimization index of energy efficiency has emerged, and the improved energy efficiency means that the energy consumption of the system can be reduced [4].In the real environment, the RF link corresponding to each antenna in the MIMO system has a certain power consumption, and in the traditional MIMO system, due to the small number of antennas, the power consumption generated by this part of the RF link can usually be ignored.However, massive MIMO systems are equipped with a large number of antennas, resulting in circuit power consumption that cannot be ignored anymore [5].With an increasing number of antennas, the spectral efficiency of the system will continue to increase, while the energy efficiency will increase to a certain extent and then begin to decline, and the two restrict each other, presenting a contradictory relationship [6], and it is difficult to achieve relative optimization at the same time.Therefore, for massive MIMO systems, the joint optimization of spectral efficiency and energy efficiency is still worth exploring.
Many scholars, both domestically and internationally, have conducted research on this topic.The research in [7] proposes a power allocation method based on the maximum and minimum fairness criteria under massive MIMO systems, which maximizes the worst signal-to-noise ratio of all users and ensures the average performance for the users but does not consider the type of service and does not meet the QoS requirements of the users.The research conducted in [8] studies the power allocation problem of massive MIMO systems and proposes a power allocation method using the asymptotic concave formation of the system sum rate, and the sum rate of the system increases with the increased number of antennas but ignores the index of spectral efficiency.The research in [9] proposes a beam allocation and power optimization scheme, which is solved by expressing the problem of beam allocation and power optimization as a multivariate mixed integer nonlinear programming problem.This scheme has certain research value but does not consider the user's QoS index.The research carried out in [10] considers the QoS delay requirements of user services and the fairness of occupying wireless channels, and a power allocation strategy based on user expectations and pre-allocation is proposed to improve user satisfaction and fairness between users but the influence of channel status information is not considered.The research in [11] uses power allocation to obtain optimal energy efficiency, but the default number of users meets the antenna restriction conditions, which is not in line with the access situation of users in practical applications.The research in [12] obtains optimal energy efficiency through power allocation but does not add the limitation of the transmission power of the base station antenna, which will cause the power allocation to lose practical significance.The research in [13] proposes a joint optimization design method for antenna selection and power distribution in massive MIMO systems.The research in [14] optimizes energy efficiency under the constraints of spectral efficiency.The research in [15] proposes an optimization algorithm for the energy efficiency of a massive MIMO system based on the particle swarm optimization algorithm, which takes the transmit power and the number of antennas in the system as the decision variables in the optimization, and uses the improved particle swarm optimization algorithm to solve it, which has certain advantages, but does not consider the factors of the number of users and user service.The research in [16] takes the transmit power and the number of transmitting antennas as the decision variables to obtain the joint optimization problem of spectral efficiency and energy efficiency and then maps it to the NSGA-II algorithm for solving but does not consider the user-side situation, and there are few comparative experiments.With the continuous development of deep learning, neural networks have been applied to resource allocation, electromagnetism, and antenna fields.As described in document [17], neural networks have been used in the field of communication resource allocation.Using a well-trained network to solve the resource allocation problem has very close performance and low computational complexity compared with traditional mathematics algorithms.The research in [18] points out that neural networks in deep learning have made a breakthrough in terms of the antenna used for environmental sensing.The research in [19] utilized deep neural networks for resource allocation among multiple users in MIMO systems.Firstly, the objective function is optimized based on the multi-objective sine c-sine algorithm.Secondly, the demand level of each user is identified, and a deep neural network algorithm is used to solve the problem, which to some extent, improves the system performance.However, the default number of users is less than the antenna limit, and there is no user scheduling, which is not in line with the actual situation.
According to the above analysis, there is little literature that has studied the joint optimization problem of energy efficiency and spectral efficiency based on the user service QoS guarantee.Therefore, the research in this paper is carried out in two steps, firstly, user scheduling is carried out under the condition of ensuring the QoS delay requirements of users, and the system capacity is maximized on the basis of equal power distribution.Then, the power of the scheduled users is re-distributed, the system energy efficiency is optimized on the basis of the refined QoS rate requirements, and the system capacity after adjusting the power is not lower than the system capacity in the first step of scheduling so as to establish a joint optimization problem, and finally, the Deep Q-Leaning Network (DQN) algorithm is to solve the problem.
The main contributions of this article are as follows: Before resource allocation, users are scheduled based on their business latency and channel state information to improve the satisfaction of different users.Based on the refinement of the QoS rate requirements, optimize system energy efficiency and ensure that the system capacity after power adjustment is not lower than the system capacity during the first step of scheduling in order to establish a joint optimization problem.Utilize the DQN algorithm to solve problems and improve system performance.

Problem Modeling
Firstly, a multi-user massive MIMO system model is established, and the block diagonalization precoding method is used under this system model equivalent to the multi-user system as a single-user system in order to eliminate the interference of other users [20].Then, based on the average power allocation, user scheduling is carried out based on service QoS delay requirements and channel status, and the system capacity is calculated.Then, under the requirements of ensuring the upper and lower limits of transmitter power and QoS rate, the selected users are reallocated to optimize the system's energy efficiency, the system capacity in the scheduling stage is corrected, and the objective function of the spectrum efficiency and energy efficiency joint optimization is established to achieve a compromise between the two.

System Model
This paper takes the downlink of a multi-user massive MIMO system as the background, assuming that the base station has K T transmitting antennas and M 0 users, and if the number of receiving antennas for the mth user is k m , the base station can support M users to communicate at the same time in each scheduling time slot.The system model is shown in Figure 1.
so as to establish a joint optimization problem, and finally, the Deep Q-Leaning Network (DQN) algorithm is to solve the problem.
The main contributions of this article are as follows: Before resource allocation, user are scheduled based on their business latency and channel state information to improve the satisfaction of different users.Based on the refinement of the QoS rate requirements optimize system energy efficiency and ensure that the system capacity after power adjust ment is not lower than the system capacity during the first step of scheduling in order to establish a joint optimization problem.Utilize the DQN algorithm to solve problems and improve system performance.

Problem Modeling
Firstly, a multi-user massive MIMO system model is established, and the block diag onalization precoding method is used under this system model equivalent to the multi user system as a single-user system in order to eliminate the interference of other user [20].Then, based on the average power allocation, user scheduling is carried out based on service QoS delay requirements and channel status, and the system capacity is calculated Then, under the requirements of ensuring the upper and lower limits of transmitter powe and QoS rate, the selected users are reallocated to optimize the system's energy efficiency the system capacity in the scheduling stage is corrected, and the objective function of the spectrum efficiency and energy efficiency joint optimization is established to achieve a compromise between the two.

System Model
This paper takes the downlink of a multi-user massive MIMO system as the back ground, assuming that the base station has T K transmitting antennas and 0 M users and if the number of receiving antennas for the mth user is m k , the base station can sup port M users to communicate at the same time in each scheduling time slot.The system model is shown in Figure 1.In a massive MIMO system, in order to improve the spectral efficiency of the system all users are allowed to reuse the same time-frequency resources.In this way, each use will receive signals from other users in addition to receiving the signals they need, result ing in inter-user interference.Therefore, in the transmitter end of the downlink system, i is generally necessary to use precoding technology to preprocess the transmitted signal in order to increase the signal-to-noise ratio, thereby accelerating the data transmission rate and improving the performance of the entire system.In this paper, block diagonalization precoding is used to decompose the downstream channel matrix of a multi-user MIMO system into a block diagonalized form, which is equivalent to multiple single-user MIMO systems that do not interfere with each other, eliminating interference from other users The equivalent channel model is shown in Figure 2. In a massive MIMO system, in order to improve the spectral efficiency of the system, all users are allowed to reuse the same time-frequency resources.In this way, each user will receive signals from other users in addition to receiving the signals they need, resulting in inter-user interference.Therefore, in the transmitter end of the downlink system, it is generally necessary to use precoding technology to preprocess the transmitted signal in order to increase the signal-to-noise ratio, thereby accelerating the data transmission rate and improving the performance of the entire system.In this paper, block diagonalization precoding is used to decompose the downstream channel matrix of a multi-user MIMO system into a block diagonalized form, which is equivalent to multiple single-user MIMO systems that do not interfere with each other, eliminating interference from other users.The equivalent channel model is shown in Figure 2. Assuming that the channel state information of the base station transmitter is known, represents the transmit signal vector of the mth user, and represents the received signal vector of the mth user, and then there are: where m m m H D x represents the signal required by the mth user, sents interference from other users, and represents additive white Gaussian noise in the mth user channel.
represents the complex Gaussian random channel matrix for the mth bit, and represents the precoded matrix for the mth use.Block diagonalization is applied to find the pre-coded matrix, j D , so that the interference from other users is zero, and for the mth user, the matrix consisting of the channel matrix of the other users is as follows: , ,..., , ,..., where ,0 ,0 , where m U is the unitary matrix of order V is composed of right singular vectors corre- According to the unitary matrix property: 3) can be written as follows: ,0 , Assuming that the channel state information of the base station transmitter is known, x m ∈ C k m ×1 represents the transmit signal vector of the mth user, and y m ∈ C k m ×1 represents the received signal vector of the mth user, and then there are: where H m D m x m represents the signal required by the mth user, H m M ∑ j=1,j =m D j x j represents interference from other users, and n m ∈ C k m ×1 represents additive white Gaussian noise in the mth user channel.H m ∈ C k m ×K T represents the complex Gaussian random channel matrix for the mth bit, and D m ∈ C K T ×k m represents the precoded matrix for the mth use.Block diagonalization is applied to find the pre-coded matrix, D j , so that the interference from other users is zero, and for the mth user, the matrix consisting of the channel matrix of the other users is as follows: where Decomposing Λ H m by singular value yields the following: where U m is the unitary matrix of order According to the unitary matrix property: U H m U m = I; therefore, Equation (3) can be written as follows: Multiplying the left and right of formula (6) together gives: According to (7), for the mth user, V m can eliminate the interference of other users, and in order to solve the equation system, . ., M needs to be satisfied, which is the use of the block diagonalization method to remove multi-user interference on the user scheduling scheme constraints; that is, the maximum number of simultaneous communication users M limit.Further, let m and perform singular value decomposition to obtain: where Λ m is a diagonal matrix composed of k m non-zero singular values, and V m is composed of right singular vectors corresponding to k m non-zero singular values of H m .Take the block diagonalized precoded matrix of and substitute D m into Equation ( 1) to obtain the following: where m is the equivalent channel matrix.Substituting Equation ( 8) into (9) yields: Multiply U H m on both sides to obtain: where n m = U H m n m and Λ m are the diagonal matrices in which the diagonal elements are not zero and the other elements are all zero.Let the diagonal element of Λ m be λ m,k and let . ., k m .Block diagonalized precoding equates multi-user channels to multiple independent single-user channels, which in turn can be equivalent to multiple parallel channels.At this point, the data rate R m of the mth user after bandwidth normalization can be expressed as follows: where p m,k represents the signal power of the mth user on the kth parallel channel, the diagonal element λ m,k of Λ m represents the channel fading coefficient, and σ 2 represents the power of additive white Gaussian noise.

User Scheduling
In practical applications, due to the burstiness of users, the number of users accessing the system will be greater than the limit of the number of antennas at the base station end; therefore, user scheduling is required first in resource allocation, and M users are selected in each scheduling time slot to maximize system throughput while ensuring user service QoS requirements.This article discusses four types of user services: conversational class, streaming class, interaction class, and background class.The conversational class focuses on realtime requirements, and the most critical QoS indicator is latency, which is very severe and will cause the session to fail to proceed normally; therefore, latency is listed as an important indicator affecting the conversational class.The streaming class does not require interactions between two users, and data are only transmitted in one direction; therefore, the service has certain real-time requirements but is not as strict as the conversational class.Compared with the previous two, the delay requirements of the interactive class are not high.The background class basically has no hard requirements in terms of delay.Therefore, this article takes latency as the indicator of the QoS requirements in the user scheduling stage and specifies that the delay requirement is the maximum time that data are waiting in the queue.Table 1 shows the rate and delay requirements of the four services.Among them, the conversation class pays the most attention to real-time experiences, and the most critical QoS indicator is delay, which will cause the session to not continue normally when the delay is very serious.In the streaming class, data are transmitted in one direction, which has certain real-time requirements, but it is not as strict as that of the conversational class.Compared with the previous two, the delay requirements of the interactive class are not so strict.The background class only cares about whether the data are transmitted correctly and almost do not require delay.In summary, this chapter takes delay as the QoS metric in the user scheduling stage and specifies that the delay requirement is the maximum time that data wait in the queue.
The number of antennas used in real life is not enough for users to use according to the above business characteristics, which for delay requirements, often need user scheduling in order to be achieved, assuming that users only use one service in a certain time slot, in the user scheduling stage, consider the user's service delay and channel status, set the number of user waiting time slots to W m,z , the maximum number of waiting time slots to n z , set a scheduling cycle to t, and the delay requirement is expressed by the maximum number of waiting cycles: d z = n z •t.When scheduling, first dispatch the user services that W m,z is about to reach or exceed n z , and if all the users who meet the conditions have been accessed but there are still antennas left, the channel state information of the user is considered.The user scheduling process is shown in Figure 3: As can be seen from the above flowchart, the specific execution method of user scheduling is: Step 1: Initialize all user collections, set the unchecked collection to N = {1, 2, . . . ,M 0 } and the selected collection to Y = φ.
Step 2: Determine the number of waiting time slots for each service W m,z , and if W m,z ≥ n z , select User M. Update the user collection,Y = {m : W m,z ≥ n z } selected, N = N − Y unchecked.
Step 3: If the number of selected users exceeds the antenna limit, it ends.Otherwise, select User As can be seen from the above flowchart, the specific execution method of user scheduling is: Step 1: Initialize all user collections, set the unchecked collection to Step 5: Repeat step 4 to finally update the user collection.

Joint Optimization Function Establishment
After user scheduling, it can ensure the delay requirements of the user's business.However, this stage is performed under the circumstances of average power distribution.Therefore, it is necessary to redistribute power and optimize the system capacity obtained during the scheduling phase.To ensure the normal progress of the business, the lowest limit of R m0 is set to set the rate.Similarly, in order to avoid waste of resources, try not to exceed the user m rate upper limit of R m1 .Therefore, the rate of user m R m is limited as follows: The total rate of all selected users is: The optimization objective of this article is not only to maximize the throughput of the scheduled user set but also to consider energy efficiency as an important indicator in this article.Assuming that P 0 is the upper limit of the transmitting power of the i-root antenna, the power P TX i limit of the launch antenna is as follows: In summary, the total launch power of the base station can be expressed as follows: among them, e is the efficiency of the base station power amplifier.P c is the power consumption of the circuit component, which is a fixed value, defining the energy efficiency EE as follows: Therefore, the optimization proposed in this article is as follows: It can be observed that if R(p m,k ) is maximized, the greater the power consumption, the worse the energy efficiency EE.The two restrict each other and are difficult to optimize at the same time.The total capacity R(p m,k ) after power redistribution should be greater than the total capacity at average allocation in order to be meaningful.Therefore, this article uses the main objective method to transform the problem, with EE as the main optimization objective and R(p m,k ) as the constraint, thus transforming the problem into:

Solving the Combination Optimization Problem Based on DQN Algorithm
The joint optimization problem proposed above is the problem of NP-difficulty nonconvex optimization.It is more complicated to use traditional methods when solving this problem.Therefore, for this decision-making problem, this article uses the DQN model in deep Q learning to solve this problem.Among them, the neural network of the Q value function is selected from the deep neural network of the full connection.In the above resource allocation, define each user as an intelligent agent.At the moment of t, the user observes the current status of the environment x t ∈ X, then use the ε − greedy strategy to adopt action y t from the allowable set of action set A and obtain a reward r t+1 , and then obtain the status x t+1 and reward in the next moment.
Status collection: set as the maximum waiting cycle of the user, and record the status corresponding to the t-transmission time of the learning process as follows: Action collection: Define actions as selecting users and allocating power and record the action corresponding to the t-th transmission time interval of the learning process as y t = a k (t), p k (t) .Among them, a k l,m (t) is a dual variable, and its value is determined by using p k l,m (t): To reduce the set of actions, simplify the actions as follows: Instantaneous reward is defined as energy efficiency, and the instantaneous reward for executing action y t in state x t is recorded as follows: Cumulative reward: the cumulative reward for executing action y t in state x t is defined as the state action value function Q(x t , y t ) and expressed as incremental updates: among them, Q(x t , y t ) represents the current value function of action y t executed in state x t at time t, and max Q(x t+1 , y) represents the maximum value function corresponding to various actions a taken by time t + 1 in state x t+1 .α represents the learning rate, usually taken as a very small value.β ∈ (0, 1) represents the discount factor related to the future.The objective value function of executing action y t in state x t is denoted as the sum of the maximum Q value of the reward and the discount in the next state: The DQN model adopts a dual network structure, which records the current Q value and the target Q value separately.The purpose of training the neural network is to reduce the difference between the current Q value and the target Q value by minimizing the loss function.The loss function loss is defined as follows: The solution model based on DQN is as Figure 4, after each action selection, the intelligent agent will store the state, action, rewards obtained, and the state of the next time in the experience pool.When the experience pool is full, the network starts to update.The reward and the next moment's state are used to calculate the Q value, and the target Q value is calculated from the Q value, and then the loss function value is calculated until convergence. .α represents the learning rate, usually taken as a very small value.
( ) represents the discount factor related to the future., ; )

The objective value function of executing action
The DQN model adopts a dual network structure, which records the current Q value and the target Q value separately.The purpose of training the neural network is to reduce the difference between the current Q value and the target Q value by minimizing the loss function.The loss function loss is defined as follows: ' arg ( , ; ) ( , ; ) arg ( , ; ) ( , ; ) The solution model based on DQN is as Figure 4, after each action selection, the intelligent agent will store the state, action, rewards obtained, and the state of the next time in the experience pool.When the experience pool is full, the network starts to update.The reward and the next moment's state are used to calculate the Q value, and the target Q value is calculated from the Q value, and then the loss function value is calculated until convergence.It can be seen from the above that the pseudocode for solving the above optimization problem with DQN is as follows (Algorithm 1): Copy and update the target Q-value network parameter θ every T round of cycling; x t → x t+1 end for end for

Feature Extraction and Analysis
The network parameters and deep learning algorithm parameter values for this experiment are shown in Tables 2 and 3.The neural network used for training is a fully connected neural network containing two hidden layers, and the activation function used by each neuron is a modified linear unit (ReLU).

Parameter Name Parameter Value
Exploring Probability ε 0.8~0.1 Learning rate α 0.001 Discount factor γ 0.9 Experience Pool Size

Analysis of Simulation Results
To avoid repeated experiments, this chapter only discusses two types of business: conversation and background.The comparative experiment selects the following three algorithms: algorithm a, which is based on the user's channel state; greedy algorithms are used to schedule and select users, aiming to maximize system capacity, and then, energy efficiency is optimized based on this scheduling.This method does not consider the QoS latency requirements of users.The number of antennas used in reference [11] is required to meet the number of user accesses, and there is no user scheduling.If the number of connected users exceeds the set number of antennas, random scheduling may be carried out, and some users may not be able to access the services, which does not meet the user's QoS requirements.The comparison algorithm, b, limits the QoS rate of users based on the above.The comparison algorithm, c, is the algorithm taken from the research [16], which jointly optimizes spectral efficiency and energy efficiency and then maps it to the NSGA-II algorithm to provide the solution without taking into account the QoS of the user service.
Assuming that for each scheduling slot, the number of waiting slots for a user's data packet increases by 1, reflecting user satisfaction as the number of users who have timed out the data packet, the fewer timeout users, the higher user satisfaction.Assuming there are a total of 20 users who use session-based and background-based services extremely and evenly, the user satisfaction of the algorithm under different situations is shown in Figure 5: Experience Pool Size 2000

Analysis of Simulation Results
To avoid repeated experiments, this chapter only discusses two types of business: conversation and background.The comparative experiment selects the following three algorithms: algorithm a, which is based on the user's channel state; greedy algorithms are used to schedule and select users, aiming to maximize system capacity, and then, energy efficiency is optimized based on this scheduling.This method does not consider the QoS latency requirements of users.The default number of antennas used in reference [11] is required to meet the number of user accesses, and there is no user scheduling.If the number of connected users exceeds the set number of antennas, random scheduling may be carried out, and some users may not be able to access the services, which does not meet the user's QoS requirements.The comparison algorithm, b, limits the QoS rate of users based on the above.The comparison algorithm, c, is the algorithm taken from the research [16], which jointly optimizes spectral efficiency and energy efficiency and then maps it to the NSGA-Ⅱ algorithm to provide the solution without taking into account the QoS of the user service.
Assuming that for each scheduling slot, the number of waiting slots for a user's data packet increases by 1, reflecting user satisfaction as the number of users who have timed out the data packet, the fewer timeout users, the higher user satisfaction.Assuming there are a total of 20 users who use session-based and background-based services extremely and evenly, the user satisfaction of the algorithm under different situations is shown in Figure 5: From Figure 5a, it can be seen that when the number of users using background services is 19 and the number of session users is 1, the user data packets in all of the algorithms almost do not time out because background services do not require latency; therefore, whether the algorithm considers the delay requirements during user scheduling has little impact on the results.From Figure 5b, it can be seen that when the number of session-type users and background-type users is half, whether the algorithm considers delay requirements a significant impact on the results because session-type services have strict requirements in terms of delay.The algorithm proposed in this article minimizes the number of timeout packets, ensuring the user's business latency requirements.Algorithms a, b, and c did not consider the business latency requirements during the user scheduling process, resulting in a significant increase in the number of timeout packets, which cannot be met by users using session-based services.As shown in Figure 5c, when the number of session-based business users is 19 and the number of background-based business users is 1, the number of timeout users in all algorithms will increase.However, compared to the other three algorithms, the scheduling scheme proposed in this paper still has fewer timeout users starting around the 10th time slot.In summary, the user scheduling scheme proposed in this paper can alleviate the situation of packet timeout and improve user satisfaction.
In order to demonstrate the advantages of the proposed method in the joint optimization of energy efficiency and system capacity, two algorithms were added for comparison on the basis of comparative experiments a, b, and c.Comparative algorithm d was an algorithm used in reference [11], which optimized energy efficiency but did not consider the user's business delay requirements and the upper and lower limit requirements in terms of rate.The comparison algorithm e only considers the throughput indicator under the same scheduling scheme without considering the energy efficiency indicator in green communication.
Let the signal-to-noise ratio be calculated as follows: SNR = P 0 /σ 2 .In the experiment, the signal-to-noise ratio is changed by changing the value of σ 2 .In order to verify that the method proposed in this article can achieve high energy efficiency, the efficiency of different algorithms under different signal-to-noise ratios is compared, as shown in Figure 6.From Figure 5a, it can be seen that when the number of users using background services is 19 and the number of session users is 1, the user data packets in all of the algorithms almost do not time out because background services do not require latency; therefore, whether the algorithm considers the delay requirements during user scheduling has little impact on the results.From Figure 5b, it can be seen that when the number of sessiontype users and background-type users is half, whether the algorithm considers delay requirements has a significant impact on the results because session-type services have strict requirements in terms of delay.The algorithm proposed in this article minimizes the number of timeout packets, ensuring the user's business latency requirements.Algorithms a, b, and c did not consider the business latency requirements during the user scheduling process, resulting in a significant increase in the number of timeout packets, which cannot be met by users using session-based services.As shown in Figure 5c, when the number of session-based business users is 19 and the number of background-based business users is 1, the number of timeout users in all algorithms will increase.However, compared to the other three algorithms, the scheduling scheme proposed in this paper still has fewer timeout users starting around the 10th time slot.In summary, the user scheduling scheme proposed in this paper can alleviate the situation of packet timeout and improve user satisfaction.
In order to demonstrate the advantages of the proposed method in the joint optimization of energy efficiency and system capacity, two algorithms were added for comparison on the basis of comparative experiments a, b, and c.Comparative algorithm d was an algorithm used in reference [11], which optimized energy efficiency but did not consider the user's business delay requirements and the upper and lower limit requirements in terms of rate.The comparison algorithm e only considers the throughput indicator under the same scheduling scheme without considering the energy efficiency indicator in green communication.
Let the signal-to-noise ratio be calculated as follows:  σ .In order to verify that the method proposed in this article can achieve high energy efficiency, the efficiency of different algorithms under different signal-to-noise ratios is compared, as shown in Figure 6.From Figure 6, it can be seen that as the signal-to-noise ratio (SNR) continues to increase, the energy efficiency of all algorithms increases accordingly.It should be noted that the energy efficiency of algorithm e starts to slowly increase after increasing to a certain extent.This is because the algorithm aims to improve system throughput, resulting in higher energy consumption and having advantages in optimizing throughput alone.
From the graph, it can also be observed that the algorithm proposed in this article and the comparison algorithms a, b, c, and d have a similar growth rate as the signal-to-noise ratio increases.This is because these algorithms are all optimized based on energy efficiency.However, compared to this, the method proposed in this article still has a slight advantage, indicating that the algorithm proposed in this article can improve throughput while not affecting energy efficiency.
In order to further verify the advantages of the algorithm proposed in this article in terms of system capacity compared to other algorithms mentioned above, different algorithms were compared as the number of users in the system continued to increase, as shown in Figure 7. From Figure 6, it can be seen that as the signal-to-noise ratio (SNR) continues to increase, the energy efficiency of all algorithms increases accordingly.It should be noted that the energy efficiency of algorithm e starts to slowly increase after increasing to a certain extent.This is because the algorithm aims to improve system throughput, resulting in higher energy consumption and having advantages in optimizing throughput alone.From the graph, it can also be observed that the algorithm proposed in this article and the comparison algorithms a, b, c, and d have a similar growth rate as the signal-to-noise ratio increases.This is because these algorithms are all optimized based on energy efficiency.However, compared to this, the method proposed in this article still has a slight advantage, indicating that the algorithm proposed in this article can improve throughput while not affecting energy efficiency.
In order to further verify the advantages of the algorithm proposed in this article in terms of system capacity compared to other algorithms mentioned above, different algorithms were compared as the number of users in the system continued to increase, as shown in Figure 7. From Figure 7, it can be observed that when the number of users is small, the throughput obtained by all of the comparison algorithms increases rapidly without any difference as the total number of users increases.When the total number of users exceeds the maximum number of users served simultaneously set by the system, the throughput obtained by all of the algorithms will no longer continue to increase.As the total number of users continues to increase, the system throughput begins to fluctuate.Note that the fluctuation range of algorithm e is small and that the system throughput is maximum because this algorithm only aims to improve system throughput.Compared to the other algorithms, achieving higher throughput is reasonable, but it ignores the indicator of energy efficiency.Among the remaining four algorithms, the system throughput fluctuation range of the proposed algorithm and algorithm c is relatively small and relatively high.In summary, Figures 6 and 7, combined from the perspectives of energy efficiency and throughput, demonstrate that the algorithm proposed in this paper can effectively balance these two objectives, achieving a relatively optimal combination of the two.Although algorithm c can also effectively improve energy efficiency and throughput, it can be seen from Figure 5 that the packet timeout situation of this algorithm is severe and user satisfaction is low.Therefore, overall, the algorithm proposed in this paper has good performance.From Figure 7, it can be observed that when the number of users is small, the throughput obtained by all of the comparison algorithms increases rapidly without any difference as the total number of users increases.When the total number of users exceeds the maximum number of users served simultaneously set by the system, the throughput obtained by all of the algorithms will no longer continue to increase.As the total number of users continues to increase, the system throughput begins to fluctuate.Note that the fluctuation range of algorithm e is small and that the system throughput is maximum because this algorithm only aims to improve system throughput.Compared to the other algorithms, achieving higher throughput is reasonable, but it ignores the indicator of energy efficiency.Among the remaining four algorithms, the system throughput fluctuation range of the proposed algorithm and algorithm c is relatively small and relatively high.In summary, Figures 6 and 7, combined from the perspectives of energy efficiency and throughput, demonstrate that the algorithm proposed in this paper can effectively balance these two objectives, achieving a relatively optimal combination of the two.Although algorithm c can also effectively improve energy efficiency and throughput, it can be seen from Figure 5 that the packet timeout situation of this algorithm is severe and user satisfaction is low.Therefore, overall, the algorithm proposed in this paper has good performance.

Conclusions
Aiming at the problem of low throughput and energy efficiency in large-scale MIMO systems due to the mutual constraints between energy efficiency and spectral efficiency and the lack of consideration of user service QoS and the rate of the upper and lower limits in resource allocation, a method based on the combined optimization of spectrum and energy resources under QoS guarantees is proposed.This method is divided into two steps.First, greedy algorithms are used to schedule users based on their latency requirements.

Figure 1 .
Figure 1.Downlink channel model of multi-user MIMO system.

Figure 1 .
Figure 1.Downlink channel model of multi-user MIMO system.
Decomposing m H Λ by singular value yields the following:

m
is composed of right singular vectors corresponding to r Λ H m non-zero singular values of Λ H m , and V (0) m is composed of right singular vectors corresponding to K T − K R + k m zero singular values.

3 : 4 :
the selected collection to Y φ = .Step 2: Determine the number of waiting time slots for each service , If the number of selected users exceeds the antenna limit, it ends.Otherwise, Iterate through the remaining user collection N.For each user s in N, define s Y Y s = + and calculate the capacity of set s Y :
as the sum of the maximum Q value of the reward and the discount in the next state:

Figure 4 .
Figure 4. DQN algorithm diagram.It can be seen from the above that the pseudocode for solving the above optimization problem with DQN is as follows (Algorithm 1):

Figure 5 .
Figure 5.Comparison of data packet timeouts of different algorithms.(a) Packet timeout when the number of session users is 1 and background users are 19; (b) Packet timeout when the number of session users is 10 and background users are 10; (c) Packet timeout when the number of session users is 19 and the background user is 1.

Figure 5 .
Figure 5.Comparison of data packet timeouts of different algorithms.(a) Packet timeout when the number of session users is 1 and background users are 19; (b) Packet timeout when the number of session users is 10 and background users are 10; (c) Packet timeout when the number of session users is 19 and the background user is 1.

.
In the experiment, the signal-to-noise ratio is changed by changing the value of

Figure 6 .
Figure 6.Comparison of energy efficiency under different signal-to-noise ratios.

Figure 6 .
Figure 6.Comparison of energy efficiency under different signal-to-noise ratios.

Figure 7 .
Figure 7.Comparison of system throughput obtained by different algorithms.

Figure 7 .
Figure 7.Comparison of system throughput obtained by different algorithms.

Algorithm 1 :
DQN Initialize experience playback pool D and capacity N; Initialize parameter θ of the current Q-value network Q(x, y; θ); Initialize parameter θ of target Q-value network maxQ(x , y ; θ ), i.e., θ→θ ; for episode = 1, M do Randomly select initial state x 1 ; for t = 1, T do if random < ε do Select actions based on the ε − greedy strategy and randomly select action y t with probability ε; else Select action y t = arg maxQ(x t , y, θ); end if Execute action y t , observe reward r t and the next state x t+1 ; Store memory, store (x t , y t , r t , x t+1 ) in experience playback pool D; Batch extract sample data from D to train the current Q-value network; Using the loss function, the parameter θ is updated through the gradient back propagation of the neural network;

Table 2 .
Wireless network parameter values.

Table 3 .
Parameter values of DQN algorithm.