Maximizing Channel Capacity of 3D MIMO System via Antenna Downtilt Angle Adaptation Using a Q-Learning Algorithm

: 3D MIMO introduces the vertical dimension of the antenna downtilt angle to make the direction of signal transmission more accurate to improve system capacity. In this paper, we verify the effect of antenna downtilt angle on channel capacity through simulations of four ﬁxed antenna downtilt angles, 90, 96, 99, and 102 degrees under the conditions that the distance between mobile station (MS) and base station (BS) is 250 m, and the heights of antenna in BS and MS are 25 m and 1.5 m, respectively. The simulation results show that the antenna downtilt angle of 96 degrees has a larger channel capacity than the others. In addition, we proposed an adaptive optimization method by applying the Q-learning algorithm to adaptively optimize the antenna downtilt angles to maximize system capacity. The performance of the proposed method is to investigate the Q-learning algorithm with three different discount rates at 0.9, 0.5, and 0.1, and four different propagation distances on 20 × 1 and 60 × 4 MIMO. We demonstrate that there is only a 1% difference between the adaptively optimized antenna downtilt angle and the ideal optimal antenna downtilt angle when the discount rate of Q-learning algorithm is 0.9, and its channel capacity performance can reach more than 99.72% of the ideal optimal one.


Introduction
In mobile communication, after following 4G LTE (Long Term Evolution, LTE) and LTE-Advanced, 5G (the fifth generation mobile communication) announced standards and commercial operation in 2020. One of the key technologies is 3D MIMO (Multiple Input Multiple Output) technology [1,2]. Among the currently major technologies of 5G, 3D MIMO can be applied downward to the current 4G network to improve system performance. 3D MIMO uses a technology similar to Phased Array Radar (PAR) to greatly improve spectral efficiency through more accurate beamforming and multiplexing [3]. 3D MIMO is better than 4G in improving system capacity, coverage, anti-interference and transmission rate. It is an extremely important key technology for 5G technology currently developed around the world [4].
At the same time, new antenna technologies and network architectures have propelled the expansion of MIMO technology to higher dimensions. The 3D MIMO technology of Array Antenna came into being. Compared with the traditional MIMO technology, 3D MIMO increases the vertical dimension of the beam. By combining the horizontal and vertical dimension beams, the signal transmission in space is closer to reality. The threedimensional spatial characteristics of the antenna can be adjusted by the antenna downtilt angle to realize the 3D Beamforming, which can effectively improve the signal strength and reduce the interference.
Beamforming technology is a critical technology in 3D MIMO. The advantages of beamforming technology are mainly reacting in the following aspects: expanding coverage, improving cell edge transmission quality, and anti-interference. The horizontal azimuth angle in traditional beamforming technology cannot accurately transmit communication for different users, and the communication quality is poor [5]. Because of conventional beamforming technology problems, 3D beamforming technology applies. 3D beamforming technology combines beams in horizontal and vertical dimensions, which can accurately beamform different users, improve Spectrum Efficiency and Channel Capacity, and effectively improve communication quality [5].
The massive parameters have to be optimized in 3D beamforming technology to achieve the maximum performance of 3D MIMO systems. Therefore, machine learning (ML) catches a lot of our attention. Machine learning is increasingly adopted to develop advanced methods that can adaptively extract patterns based on given environmental measurements and performance indicators. These patterns can optimize the parameter settings for different interesting problems [6][7][8].
Because of the importance of beamforming in the vertical dimension in 3D MIMO systems, this paper will investigate the effect of antenna downtilt on channel capacity in 3D MIMO systems; and apply it with an enhancement machine learning. The Q-learning algorithm [9,10] enables antenna downtilt to perform an adaptive operation method so that the 3D MIMO system can automatically adjust the optimal antenna downtilt angle based on the position of the MS to maximize the channel capacity of the communication system. This work firstly inspects the pros and cons of fixing a specific elevation angle on channel capacity. Based on the literature [11], four fixed tips, 90, 96, 99, and 102 degrees, are employed for this paper's study, which should determined in advance. A specific angle has a significant improvement in beamforming and channel performance. However, this method fails to adapt the system to adjust the antenna downtilt angle to the optimal one while MS is moving.
Secondly, we investigate the proposed method that adopts a reinforcement learning method named the Q-learning algorithm to maximize the channel capacity employing optimizing the antenna downtilt angle of a single MS at different distances. Based on the Q-learning algorithm, the reward value is given according to the size of channel capacity to varying angles of the antenna inclination. The Q-Table is generated by learning the iterative operations. Each reward value of the Q- Table corresponds to a specific state, that is, the channel capacity at a particular antenna downtilt angle. Eventually, according to the data in the Q- Table, an optimal antenna downtilt angle can be selected to maximize the channel capacity. Two different antenna configurations of 20 × 1 and 60 × 4 are considered to verify the performance of the proposed method under different transmission distances. We use MATLAB software to simulate and verify the performance of proposed method in this paper.
The remainder of the present work is organized as follows: Section 2 explains the system model and Q-learning algorithm. Section 3 presents the method of using a Qlearning algorithm to optimize antenna downtilt angles for maximum channel capacity adaptively. Section 4 presents the simulation results to investigate the performance of the proposed method. Finally, conclusions are given. delay, amplitude, angle of arrival (AoA), angle of departure (AoD), and flakes. It is highly accurate and close to the physical environment [18][19][20][21]. In the existing 2D channel models of 3GPP SCM and WINNER, the propagation path is only described by the azimuth angle of the horizontal dimension. These 2D models all have a fixed antenna downtilt angle value of π/2. However, considering the 3D channel model as shown in Figure 1, the literature [4,11] pointed out that the vertical component in the emission vector occupies a large part of the energy, so the propagation path only considering the azimuth in the horizontal direction does not conform to the actual situation. In addition, when the antenna downtilt angle is fixed, the elevation degrees of freedom representing the channel are not utilized. Dynamic adjustment of the antenna downtilt angle at the base station (BS) transmitter can provide multiple possibilities for transmitting 3D beamforming signals to the mobile station (MS), significantly improving system performance. Therefore, extending the existing 2D channel model to a 3D channel needs to consider the elevation angle of the propagation path and introduce the parameter into the channel model instead of being assumed to be fixed [22][23][24][25]. According to the antenna configuration in [5], the effective 3D channel matrix [H] su between the sth antenna of the base station (BS) transmitter and the uth antenna of the mobile station (MS) receiver can be written as where φ n and θ n are the azimuth and elevation angles of the nth transmit path, respectively, ϕ n and ϑ n , their relationship as shown in Figure 1, are the azimuth and elevation angles of the nth receive path, respectively, θ tilt is the antenna downtilt angle, α n represents the random amplitude of the nth path, g t (φ n , θ n , θ tilt ) and g t (ϕ n , ϑ n ) represent the global azimuth map of the transmitter antenna and the receiver antenna, respectively, d t and d r represent the distance between the transmitter antenna and the receiver antenna, respectively, and k represents the wave number. Finally, the array response matrices of the transmitter and receiver antennas are expressed by and respectively [11], where s and u represent the sth transmitter antenna and the uth receiver antenna, respectively. In order to make the concept of antenna downtilt angle specific in practical application scenarios, the International Telecommunication Union (ITU) uses a narrow-band beam to approximate the global azimuth of each antenna expressed by [13] in which In Equation (5), φ 3dB is the horizontal half beam width; in Equation (6), θ 3dB is the vertical half beam width. At the receiver, the antenna pattern expression g t (ϕ n , ϑ n ) has no antenna gain, so its gain is set to 0 dB. According to the above, the expression of the antenna pattern in the 3D channel model includes the parameter θ tilt , so the value of θ tilt will affect the channel performance. The goal of this study is to dynamically adjust the antenna downtilt angle to maximize the channel capacity.

Maximum Channel Entropy
The channel model released by commercial standards [13] is more accurate and clearly defines a more rigorous 3D channel model. Since it is a commercial standard, the theoretical analysis of this 3D channel model is rather complex. Firstly, multiple propagation paths lead to many random variables (RVs) in the model. Secondly, in addition to numerous random variables in the model, there are also multiple nonlinearly correlated random variables, AoDs and AoAs parameters. To facilitate a theoretical analysis, reconstructing an equivalent channel model is necessary and further derives a more concise channel model architecture. Therefore, Ref. [26] uses the principle of maximum entropy for Equation (1) to bridge the gap between commercial standards and theory and derives an equivalent channel model of Equation (1), which can avoid the problem of a large number of random variables, and enable the model to be used in a natural environment.
Ref. [4] proves that the model generated by the maximum entropy principle can conform to the actual environment. In addition, inspired by the Bayesian parameter estimation analysis method in the maximum entropy principle to analyze the modeling problem, the author of Ref. [26] uses this architecture as a theoretical basis to create the channel model of the 2D MIMO system conforming to the actual environment characteristics.
The channel model proposed in the commercial standard is the Geometry-based Stochastic Model (GBSM). This model assumes that there is only one reflection (Single Bounce Scattering) between the transmitter and receiver [11]. In addition, the positions of the scatterers, the base station (BS), and the mobile station (MS) are assumed to be fixed. Therefore, the modeler can calculate (1) AoDs, (2) AoAs, (3) the azimuth of the nth transmitting path at the base station (φ n ), (4) the elevation angle (θ n ) of the nth transmitting path at the base station, (5) the azimuth angle (ϕ n ) of the nth receiving path at the mobile station, (6) the elevation angle (ϑ n ) of the nth receiving path at the mobile station, etc. In Ref. [11], it is assumed that the channel model parameters, except for the amplitude, including AoD and AoA, are all fixed values in the modeling stage. Then, the only random variable in the channel model is the size of the vibration (α), which is a normal distribution. Hence, a channel model that meets the virtual environment can be simulated.
It is proved in [11] that when the number of propagation paths, the AoD, AoA, and the transmitting and receiving power of the propagation paths are all known, the maximum entropy channel model can be expressed as follows: where P Tx and P Rx represent the transmitting and receiving power, respectively, Ω is the mask matrix of the capture path gain, Ψ and Φ are the antenna array responses, and " • " is the Hadamard Product. In addition, another assumption for maximizing entropy is to make the elements of the unknown random matrix G have zero mean, and the variance is one independent and identically distributed (i.i.d) Gaussian variable [11].
In order that the expressions in the channel model of Equation (1) can be expressed in the structural manner of Equation (7), let A and B be the determinants of N BS × N and N MS × N given by Equations (8) and (10) matrix, respectively: and In addition, array response and antenna azimuth diagram expressions have been included in A and B. P Tx and P Rx in Equation (7) are identity matrices because the transmitting and receiving powers are combined into the antenna azimuth map. In addition, because of the single bounce scattering model, the Ω Mask Matrix is a diagonal matrix. Let Ψ and Φ be represented by B and A, respectively; then, the solution of the maximum entropy problem of the 3D MIMO channel model of N MS × N BS single bounce scattering in Equation (1) can be expressed as where α is a vector of rank N, and each element in the vector is an i.i.d. Gaussian matrix variable with zero mean and one variance.

Calculation of 3D MIMO Channel Capacity
Equation (1) is the element in Equation (12), so Equation (1) can be represented by Equation (13): Then, Equation (12) can be expressed as The parameters in the channel model will be defined as follows.

Elevation Angle between Transmitting and Receiving Path (AoDs, AoAs)
According to Ref. [11], since it is close to the commercial communication standard, we use the angle distribution and antenna azimuth map specified in the commercial standard for analysis. The elevation angles of the transmitting and receiving paths are drawn according to the Laplace distribution density of the elevation angle mentioned in the [11], and the expression of the elevation angle is as in Equation (15): where θ 0 is the mean value of the elevation angles of departure (EAoDs) or the elevation angles of arrival (EAoAs), and σ is the angular spread in the elevation. Next, the azimuth angles of departure (AAoDs) of the transmitting path and the azimuth angles of arrival (AAoAs) of the receiving path can be expressed by the VMF (Von Mises-Fisher distribution) distribution, as in Equation (16): where I n (k) is the modified Bessel function, n is the order, and µ is the mean of the azimuth angles of departure (AAoDs) or the azimuth angles of arrival (AAoAs). k is the density of scattering. The larger the k is, the more the azimuth of the path is concentrated on the mean value of the azimuth, and the smaller it is, the more scattered. Since the channel is assumed to be a single bounce scattering, the angle is generated only once and is assumed to be a fixed angle in the channel.

Antenna Downtilt Angle and Elevation Angle of Line of Sight
In Equation (4), θ 0 in the channel model is set as the Elevation Angle of Line of Sight θ LoS [11] between the base station (BS) and the mobile station (MS). Let ∆h represent the height difference between the mobile station (MS) and the base station (BS), and x and y, respectively, represent the distance difference between the base station (BS) and the mobile station (MS) in x-y coordinates, then the line-of-sight axis elevation angle θ LoS can be defined as Then, the antenna downtilt angle θ tilt can be defined as shown in Figure 2.

Channel Capacity
The MIMO channel is often changed randomly, so the H channel matrix is a random matrix, which means that the MIMO channel capacity is also random and time-varying. It is assumed that the channel matrix is a stationary random process; in other words, the MIMO channel capacity can be calculated based on the time average of the signal transmission on the channel. To sum up the above, the formula of MIMO channel capacity is as Equation (18): where λ i is the eigenvalue obtained by performing singular value decomposition of the channel matrix H, so the channel matrix H can be rewritten as where U and V are unitary matrices (UU H = 1, VV H = 1), and Λ is a real diagonal matrix, which only has non-zero values on the diagonal λ 0 ≥ λ 1 ≥ · · · ≥ λ n min −1 , where n min = min(n t , n r ). In addition,

The Effect of Antenna Downtilt Angle on Channel Capacity
The 3D channel model proposed in [11] is constructed on the condition that the channel parameters are known, and is derived from the system-level 2D random channel model that appears in standards such as 3GPP, ITU, and WINNER. Ref. [11] constructs the channel matrix of 3D MIMO under the condition of known AoDs and AoAs, which can accurately represent the channel model. The simulation results in [11] show that choosing an accurate antenna downtilt angle at the base station (BS, transmitter) can greatly improve the performance, so the antenna downtilt angle has a critical impact on the channel capacity.

Distributions of EAoDs and EAoAs
In Equation (15), the elevation angle distribution of the path is Laplace distribution, and the spread degree of the path will increase as s increases. The minor s is, the more the spread degree of the path will be concentrated on the mean value of the elevation angle. Figure 3 is the cumulative density function (CDF) diagram when the mean elevation angle is zero (θ 0 = 0) degrees, and σ = 5 • , 10 • , 15 • , and 20 • . Figure 3 shows that, when σ is larger, the degree of dispersion of its angles is greater. Conversely, it is more concentrated in the region where the average angle equals to zero (θ 0 = 0). In Equation (15), the elevation angle distribution of the path is Laplace distribution, and the spread degree of the path will increase as s increases. The minor s is, the more the spread degree of the path will be concentrated on the mean value of the elevation angle. Figure 3 is the cumulative density function (CDF) diagram when the mean elevation angle is zero ( 0 = 0) degrees, and = 5°, 10°, 15°, and 20°. Figure 3 shows that, whenσ is larger, the degree of dispersion of its angles is greater. Conversely, it is more concentrated in the region where the average angle equals to zero ( 0 = 0).

Distributions of AAoDs and AAoAs
In Equation (16), the von Mises distribution (VM) is used, which is a particular case of the von Mises-Fisher distribution (VMF) [27][28][29]. The original von Mises-Fisher distribution is shown in Equation (20) [29]: in which When p = 2, it can be simplified to von Mises distribution (VM), such as Equation (22) that is, Equation (23) is substituted into Equation (20) to obtain Equation (16).
In Equation (16), it is assumed that the mean value of the azimuth angle between the transmitting path and the receiving path is zero degrees (µ = 0 • ). Thus, when the value of k is more significant, the azimuth angle of the path will be more concentrated towards the mean value of the azimuth angle (µ = 0 • ). Therefore, for the overall performance, the larger the value of k, the more the beamforming ability can be significantly improved, and the overall channel capacity improved. The Probability Density Function of Path Azimuth is shown in Figure 4.

Simulation Parameters with Different Antenna Downtilt Angles
The simulation in this section will be carried out in accordance with the actual environment, and the parameters are set as shown in Table 1. The simulation result in Figure 5 shows that the channel capacity is the largest when θ tilt = 96 • , the smallest when θ tilt = 102 • , and when θ tilt = 90 • reduced to 2D MIMO. From the perspective of the average channel capacity of the system, when the CDF value is 0.5, the difference between 96 • and 102 • is about 0.471 bits per Hz (0.471 bps/Hz). Furthermore, it can seem that, when the antenna downtilt angle (θ tilt ) is close to the line-of-sight axis elevation angle (θ LoS ), the channel capacity is the largest. In other words, a slight angular difference in the antenna downtilt angle will have a significant impact on the channel capacity. The above simulation results are obtained under the condition of fixed four antenna downtilt angles. In the wireless communication system, how to adaptively select an optimal antenna downtilt angle is a topic worthy of in-depth discussion.

Q-Learning Algorithm
Q-learning is a reinforcement learning technique in machine learning. Q-learning aims to learn a policy that tells the agent under what circumstances to act. It does not require an environment model and can handle random transitions and rewards without modification. For any finite decision process, Q-learning finds an optimal policy that maximizes the expected value of the total premium after all successive steps, starting from the current state. For any given Markov decision process, Q-learning can determine to provide an optimal action selection policy by considering infinite exploration time and a partially random policy. Q-learning is a function of reward used to support reinforcement learning, which can represent the value of an action taken in a certain state.
Q-learning is an off-policy reinforcement learning algorithm. Q-learning uses the value function of the action with the highest value in the next state, not the value function that the learning agent actually uses in the next state, which is the meaning of off-policy. The operation mode behind Q-learning depends on the value iteration. The calculation method is expressed as where Q(s t , a t ) represents the value of value function (Q value) obtained after taking action a t in the state s t ; R(s t , a t ) is the reward or value obtained after taking action a t in the current state s t ; α is the learning rate, whose value is 0 ≤ α ≤ 1; this parameter is used to determine the convergence rate, the larger α is, the faster the convergence speed is. In a fully deterministic environment, a learning rate of α = 1 is optimal. The discount rate γ determines the importance of future rewards, usually a constant less than 1, used to control the expected reward value in the future. If γ = 0, it means that the agent is short-sighted and only based on the current strategy. The smaller the discount rate, the more it appears that the agent focuses on the current and recent learning. On the contrary, the larger the discount rate, the more foresight the agent has and the more attention paid to the follow-up learning outcomes and the value of the rewards for future learning. max a Q(s t+1 , a) means to select the value of value function (Q value) of action a with the highest value function in all possible actions from state s t to state s t+1 , used for adjustment purposes, and not the value of value function of the action actually taken by the next state. The value of the value function of the action actually taken in the next state should be Q(s t+1 , a t+1 ). Figure 6 shows the Q-learning algorithm.

Adaptive Optimization of Antenna Downtilt Angle Based on the Q-Learning Algorithm
According to the results shown in Section 2.4, antenna downtilt angle affects prominently the performance of channel capacity of 3D MIMO systems. The fixed ideal optimal antenna downtilt angle is not useful in a real environment. Therefore, the 3D MIMO system needs to be able to adaptively find an optimal antenna downtilt angle, and the fixed parameter method cannot make the system find an optimal antenna downtilt angle. This section will explain how to use the Q-Learning algorithm helps the base station (BS) antenna to automatically search for the best antenna downtilt angle, so that the channel capacity can be improved through the training of reinforcement learning, so that the channel capacity can achieve the best improvement.

Simulation Procedure with the Q-Learning Algorithm
In Equation (24), α is the learning rate or step size, which determines the degree to which the newly acquired information covers the old information. A factor of α = 0 makes the agent learn nothing (fully exploiting previous knowledge), while a factor of α = 1 makes the agent only consider the latest information (ignoring previous knowledge to explore future possibilities). In a fully deterministic environment, the learning rate of α = 1 is optimal; then, Equation (24) will be reduced to Equation (25) as follows: The discount rate γ determines the importance of future rewards. When γ approaches zero, the agent will only consider the current reward, and when γ approaches 1, it will make the agent strive for long-term high rewards. If γ = 1, the learning result after iteration will diverge. This study will discuss the optimization degree of the antenna downtilt angle and the value of the channel capacity when γ = 0.9, 0.5, and 0.1 are considered.
Q-learning is about updating numbers in a two-dimensional array (action space × state space), similar to dynamic programming, and it does not have the ability to estimate unseen states. Hence, it is necessary to understand all the states and give the reward value, and then carry out the establishment of the future state to evaluate the Q table. Therefore, first of all, it required to determine the establishment of R (reward matrix). In this study, it is assumed that the antenna moves 0.5 degrees each time. When the antenna starts machine learning, it goes to the first random angle θ n when the state n (channel capacity c n ). Its next action is either to move forward 0.5 degrees, or back 0.5 degrees, and randomly decide to move forward or backward, but it cannot jump to any other angle. Therefore, if its next action is to move forward by 0.5 degrees, reaching the next state n + 1 is the angle θ n + 0.5 (the channel capacity is c n+1 ), at this time, the channel capacity is given a reward value R; then c n + R = c n+1 , and then R = c n+1 − c n . If it goes back 0.5 degrees, the next state n − 1 is the angle θ n − 0.5 (the channel capacity is c n−1 ); at this time, the channel capacity is given a reward value R; then, c n + R = c n−1 , and then R = c n−1 − c n . When the next state of the antenna is the current state and the angle does not move, the reward R is 0, and, at the same time, the reward R of any other angle state that the antenna cannot reach is also set to 0.
From the initial Q matrix to zero matrix, which means that the environment is unknown, learning and training are carried out through iterative operations, so that the expected value of the total reward is maximized after all consecutive steps, the best Q table is established, and finally the angle of the best Q value is found, which is the optimal antenna downtilt angle. That is, when a channel capacity value is found whose channel capacity value is both greater than the previous angle and the next angle, it means that it is the maximum channel capacity, and the angle at which it is located is the optimal antenna downtilt angle.
The detailed operation procedure will be described as follows: 1.
In order to understand all states first and then set the reward value, the search angle is between 90 • -125 • , and the interval is 0.5 degrees, so a total of 71 angles need to be searched.

1.
Calculate the channel capacity at each angle.
α: 1 (Control the convergence rate, known as the learning rate, in the hope of finding the best antenna inclination in a short time, so α is set to 1). γ: 0.9 (The larger the discount rate is, the more the agent pays attention to the results of training over a long time, so it is set to 0.9).
After setting the above parameters, Equation (26) can be obtained:

4.
The initial environment is unknown and the initial Q matrix is a 71 × 71 zero matrix because it has not been iterated, 5.
Build R (reward matrix), c n : channel capacity at the nth angle.
The 1st to 71st columns on the left side of the R matrix are the 1st to 71st angles, called the state column s t . Furthermore, on the upper side, the 1st to 71st columns marked are the actions to be taken a t (the action is from the left state column and the nth angle to the (n + 1)th angle or the (n − 1)th angle) because the movement of the angle can only be randomly moved from the nth angle to the (n + 1)th angle or the (n − 1)th angle. Thus, the reward matrix is only part of the adjacent two angle compensation values, as the above table is the circled part (if n at this time is the second point of view, there is a square circle for parts for the second Angle to pay the first point of view, there is an oval circle for parts for the second Angle to a third Angle compensation, and so on, to generate the compensation matrix R matrix. 6. Iterative process: randomly select the angle (state, s t ) to perform the action (a t ), obtain the reward value (R(s t , a t )), and then move from the angle (state, s t ) to the next angle (state, s t+1 ). Among all possible actions a, the maximum Q value is multiplied by γ to iterate, and the Q table can be obtained after the iteration. 8.
The angle (state) corresponding to the optimal Q value in the search Q matrix is the optimal antenna downtilt angle.

Ideal Optimal Channel Capacity Based on Ideal Condition θ tilt = θ LoS
In this section, we use MATLAB software (Portola Valley, CA, USA) developed by the MathWorks to execute the operation procedure of a Q-learning algorithm to find out the ideal optimal channel capacity based on the ideal condition assumption of θ tilt = θ LoS .
For the purpose of inspecting the proposed method normally, we consider two different antenna configures of 20 × 1 and 60 × 4. The simulation parameters for them are listed in Tables 2 and 3, respectively.  Figure 7 and Table 4 show that D = 100 m (θ tilt = θ LoS = 103.22 • ), the maximum channel capacity is 13.09 bps/Hz, and the SNR at the receiver is 39.6 dB; D = 150 m (θ tilt = θ LoS = 99.904 • ), the maximum channel capacity is 10.89 bps/Hz, and the receiver SNR is 33 dB; D = 200 m (θ tilt = θ LoS = 96.7015 • ), the maximum channel capacity is 9.331 bps/Hz, and the receiver SNR is 28.3 dB; D = 250 m (θ tilt = θ LoS = 95.37 • ), the maximum channel capacity is 8.124 bps/Hz, and the receiver SNR is 24.6 dB. Depending on the distance, the longer the transmission distance, the lower the channel capacity due to the path attenuation.   Figure 8 and Table 5 show that D = 100 m (θ tilt = θ LoS = 103.22 • ), the maximum channel capacity is 51.80 bps/Hz, and the SNR at the receiver is 157.6 dB; D = 150 m (θ tilt = θ LoS = 99.904 • ), the maximum channel capacity is 43.03 bps/Hz, and the receiver SNR is 131 dB; D = 200 m (θ tilt = θ LoS = 96.7015 • ), the maximum channel capacity is 36.78 bps/Hz, and the receiver SNR is 112.3 dB; D =250 m (θ tilt = θ LoS = 95.37 • ), the maximum channel capacity is31.95 bps/Hz, and the receiver SNR is 7.8 dB. Due to the path attenuation, the longer the transmission distance, the lower the channel capacity.

Results and Discussion
The performance of the proposed method used in real circumstances without an ideal condition assumption will be shown and discussed in this section. In the following simulation, learning rate = 1, and the effect of different discount rate of Q-learning algorithm was mentioned.
The simulation results for two different antenna configurations of 20 × 1 and 60 × 4 by using a Q-learning algorithm to adaptively adjust the antenna downtilt angle to optimize the channel capacity in real circumstances are as follows. The simulation results from Figure 9 shows that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 6, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.22%, which is the smallest error in all errors. Moreover, the optimized channel capacity can reach 99.92% of the optimal ideal channel capacity.   Figure 10 shows that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 7, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.1%, which is the smallest error in all errors. In addition, the optimized channel capacity can reach 100% of the optimal ideal channel capacity.   Figure 11 shows that, when the discount rate is equal to 0.9, the optimized antenna downtilt angle is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 8, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.31%, which is the smallest error of all errors. Moreover, the optimized channel capacity can reach 99.81% of the optimal ideal channel capacity.    Figure 12 shows that, when the discount rate is equal to 0.9, the optimized antenna downtilt angle is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 9, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.39%, which is the smallest error of all errors. Moreover, the optimized channel capacity can reach 99.82% of the optimal ideal channel capacity.  The simulation results from Figure 13 show that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 10, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.22%, which is the smallest error of all errors, and its optimized channel capacity can reach 99.90% of the optimal ideal channel capacity.  The simulation results from Figure 14 show that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 11, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.90%, which is the smallest error of all errors, and its optimized channel capacity can reach 99.81% of the optimal ideal channel capacity.  The simulation results from Figure 15 shows that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 12, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.31%, which is the smallest error, and the optimized channel capacity can reach 99.81% of the optimal ideal channel capacity.   The simulation results from Figure 16 shows that, when the discount rate is equal to 0.9, the antenna downtilt angle found after optimization is closest to the optimal ideal antenna downtilt angle, and the channel capacity is also the largest. From the comparison in Table 13, when the discount rate is equal to 0.9, the error between the optimized antenna downtilt angle and the optimal ideal antenna downtilt angle is 0.66%, which is the smallest error, and the optimized channel capacity can reach 99.75% of the optimal ideal channel capacity when the discount rate is 0.9.

Conclusions
This article verified the effect of antenna downtilt angle on channel capacity that a little deviation from an optimum angle will decrease the channel capacity significantly. In addition, this paper proposed an adaptive optimization method by applying the Q-learning algorithm to adaptively optimize the antenna downtilt angles to maximize system capacity for real circumstances. Simulation results show that the proposed method could adaptively adjust the antenna downtilt angle to obtain the channel capacity of more than 99.72% of the ideal optimal one on ideal condition assumption when the discount rate of Q-learning algorithm is 0.9.