GPU Acceleration and Game Engines for Wireless Channel Estimation in Millimeter Waves

: In this article, the intended purpose is to show an innovative technique for estimating the MIMO channel at millimeter wave bands, candidates for mobile 5G technology, by using hardware acceleration, game engines and heuristic algorithms applied to optical ray launching techniques. To verify the performance of the ray launching tool, the normalized Power Delay Proﬁle (PDP) was simulated. The channel was analyzed using the mean square delay error (RMS), the average value of the delay (MD) and the basic propagation loss (PL). The results obtained in computational precision and time were compared with those of a traditional ray tracing tool simulation programmed in MATLAB and with the measurements made in the 57 to 66 GHz range in a specialized laboratory. The results show that the presented technique becomes e ﬃ ciently proﬁtable from a small number of simulated events (reﬂections, di ﬀ ractions).


Introduction
Overcrowding in access to multimedia information such as online videogames and streaming services forces the next generation of communications to present technological advances quickly [1,2]. That is, increase the efficiency of the wireless channel through transmission in the shortest time and with an acceptable quality to the greatest number of possible users. Nowadays, rates between wireless LAN technologies and mobile networks are comparable, forcing a coexistence to avoid interference [3]. Therefore, they tend to integrate and become the same access technology.
5G mobile networks must face the challenges of energy consumption and high rates [4]. In compliance with the formulation in Shannon's capacity, there are two ways to improve service quality: Increasing signal strength or increasing bandwidth.
The best way to improve energy consumption and, at the same time, increase the signal strength is through the implementation of femtocells [5]. Smaller coverage areas between the transmitter and the receiver improve signal strength and interference to provide better quality of service. In this paradigm, the difficulty of interference can be resolved with the use of a radio spectrum available at high frequencies and cognitive radio [6].
By the other way, spectral efficiency improves by increasing the bandwidth of the channel [7]. To solve the need bandwidth, mobile technologies have been increasing their operating frequencies as it evolves. The next mobile generation hopes to use millimeter waves or terahertz to improve the quality of service.
A relevant factor is the implementation of multiple antennas for the simultaneous transmission of information, using spatial multiplexing or diversity [8]. By means of a lower wavelength given by the millimeter or terahertz waves, it is possible to integrate a greater number of antennas and make them smaller. The MIMO and MIMO massive sets will support emerging technologies called next-generation mobile networks (NGMN).
To evaluate technological implementations, it is necessary to estimate the behavior of the radio channel through precise models such as deterministic models [9]. With the estimation of the radio channel at the physical layer level, support is provided for the choice of digital modulations, media access techniques and the encodings that will be used.
Radio channel estimation use stochastic or deterministic models. First models use a probabilistic distribution function and disregard some channel effects due to the antennas or the type and number of scatterers in the scenario to be evaluated. Deterministic models are adequate in different environments with scatterers of considerable size, given their exact and verifiable predictions [10]. Computational methods have had an enormous influence on the prediction of wireless channel parameters, with the optical beam technique being the most accurate [11]. Until now, the technique of ray tracing (RT) based on the image method [9,12,13] and the technique of ray launching (RL), which uses the brute force algorithm, have been implemented with greater efficiency and precision [14][15][16].
However, computationally, the first technique is limited to a simplified scenario due to the number of secondary sources (images) that are produced by the multiple reflection and diffraction phenomena. With respect to the second technique, its accuracy is based on the firing resolution angle. The rays produced initially and those generated as a secondary source, a product of the Huygens-Fresnel principle, exponentially increase the number of paths to analyze. For ray launching to be a practical, accurate and computationally efficient strategy, it is necessary to resort to the use of game engines and hardware acceleration, but such studies have been limited to the estimation of the wireless channels of a transmitter and a receiver (SISO, single input single output) in decimeter waves.
The above mentioned indicates that it is necessary to develop efficient and practical acceleration techniques that allow a reduction in the simulation time while maintaining accuracy. Each ray launched or traced is independent of the rest of the rays and, therefore, it is possible for a processor or processor unit to calculate the path and the interactions of one ray at the same time that another processor does the same with another ray. In this way, if several processors are available, the task of launching or tracing beams can be parallelized reaching a high efficiency without losing precision.
In Section 2 we present the materials and methods used to develop the software in the game engine using ray launching and the way in which the measurement campaign was carried out. In Section 3 the obtained results are described and compared with another optical ray strategy developed in MATLAB. In Section 4 we discuss some aspects of how this application would be in other environments. Finally, in Section 5 the conclusions are presented.

Materials and Methods
Next, we describe both the geometrical model of the stage and the physical model of wave propagation using game engines and hardware acceleration. The estimation of channel parameters depends, to a large extent, on these representations.

Description of the Test Scenario
The accuracy of the results of the ray launcher is intimately linked with the description of the environment or 3D scenario for both indoor and outdoor environments. The representation of the scatterers as well as their constitutive parameters must be included in a high-resolution model, and the representation must include both the electromagnetic characteristics of the materials and the dimensions of each one of the scenario elements. Figure 1 shows a photograph (right) of the laboratory at the Polytechnic University of Cartagena (UPCT), Spain, used for measurements and its 3D computational model (left). The scenario has dimensions of 6.40 m x 4.44 m x 2.60 m. Five classes of materials present in the analysis scenario were identified. The floor and columns consist of concrete, glass windows, wooden furniture with metal frames and a plaster ceiling. On the left, the game engine representation is shown, where the red sphere represents the transmitter position and the blue sphere represents the receiver position (setup of 5 transmitting antennas) for a total of 180 MIMO channels.   The initial values used for the relative permittivity and the conductivity of the scenario materials were provided by the UPCT advisors (mentioned in the Acknowledgments) and are shown in Table  1.    The initial values used for the relative permittivity and the conductivity of the scenario materials were provided by the UPCT advisors (mentioned in the Acknowledgments) and are shown in Table  1. The initial values used for the relative permittivity and the conductivity of the scenario materials were provided by the UPCT advisors (mentioned in the Acknowledgments) and are shown in Table 1. In the literature, the nominal values of relative permittivity and conductivity for wood, concrete and glass can be found for specific materials and at low frequencies, far from the measurement frequencies. For this reason, measured values were taken in [12]. Figure 3 shows a scheme of the system used for the measurements. The frequency response is measured using a vector network analyzer. Then, the inverse Fourier transform is applied to obtain the complex impulse response (CIR) in the time domain. Once all CIRs are collected for one location the PDP is obtained as usual by averaging the CIRs. The range of measurements was between 57 GHz and 66 GHz with 4096 frequency points. The measurement system used consists of two groups of identical omnidirectional antennas in virtual arrays (Q-pair QOM55-65 VRA) with vertical polarization and gain ranging from 4 to 5.1 dBi along the bandwidth. In each array, the antenna elements, both in reception and transmission, are separated equidistantly 2 mm, which in terms of wavelengths gives a ratio of λ/2. In addition, the antennas have an omnidirectional radiation pattern in the H plane and a medium power beam centered at 40 • , 28 • and 21 • for the frequencies of 55 GHz, 60 GHz and 65 GHz, respectively.  In the literature, the nominal values of relative permittivity and conductivity for wood, concrete and glass can be found for specific materials and at low frequencies, far from the measurement frequencies. For this reason, measured values were taken in [12]. Figure 3 shows a scheme of the system used for the measurements. The frequency response is measured using a vector network analyzer. Then, the inverse Fourier transform is applied to obtain the complex impulse response (CIR) in the time domain. Once all CIRs are collected for one location the PDP is obtained as usual by averaging the CIRs. The range of measurements was between 57 GHz and 66 GHz with 4096 frequency points. The measurement system used consists of two groups of identical omnidirectional antennas in virtual arrays (Q-pair QOM55-65 VRA) with vertical polarization and gain ranging from 4 to 5.1 dBi along the bandwidth. In each array, the antenna elements, both in reception and transmission, are separated equidistantly 2 mm, which in terms of wavelengths gives a ratio of 2 λ . In addition, the antennas have an omnidirectional radiation pattern in the H plane and a medium power beam centered at 40°, 28° and 21° for the frequencies of 55 GHz, 60 GHz and 65 GHz, respectively. The antennas are supported on metal structures 1.44 m high for the transmitter and 1.54 m for the receiver. They are connected with a coaxial cable to the Rohde & Schwartz ZVA67 vector network analyzer whose transmission power was -10 dBm. To compensate for the attenuation effects of the coaxial cables, amplifiers (HXI HLNA-465) with maximum gains of 25 dB were used, whose effects were measured and adjusted. In order to obtain precision, a software was created for positioning with servomotors and operated from outside the environment to maintain the channel stationary. With an intermediate frequency of 10 Hz, a dynamic range of more than 100 dB was obtained.
The 3D model of the scenario was obtained using plane geometries to represent all the objects. Each of the physical geometries of the objects present in a laboratory can be modeled by combining some basic primitive geometries, which altogether achieve great geometric resolution. However, in the selected scenario, the environment and the present objects do not have great geometric complexity and are composed of flat surfaces with their respective properties of dimension, thickness, permittivity and conductivity. Likewise, to identify the impact of the propagation wave on the edges, The antennas are supported on metal structures 1.44 m high for the transmitter and 1.54 m for the receiver. They are connected with a coaxial cable to the Rohde & Schwartz ZVA67 vector network analyzer whose transmission power was -10 dBm. To compensate for the attenuation effects of the coaxial cables, amplifiers (HXI HLNA-465) with maximum gains of 25 dB were used, whose effects were measured and adjusted. In order to obtain precision, a software was created for positioning with servomotors and operated from outside the environment to maintain the channel stationary. With an intermediate frequency of 10 Hz, a dynamic range of more than 100 dB was obtained.
The 3D model of the scenario was obtained using plane geometries to represent all the objects. Each of the physical geometries of the objects present in a laboratory can be modeled by combining some basic primitive geometries, which altogether achieve great geometric resolution. However, in the selected scenario, the environment and the present objects do not have great geometric complexity and are composed of flat surfaces with their respective properties of dimension, thickness, permittivity and conductivity. Likewise, to identify the impact of the propagation wave on the edges, a cylinder shape was used. Furthermore, to represent the measurement points and to identify the rays that impact on them, the geometry of the reception sphere was used. The open source multiplatform tool Blender 2.74 was used to develop 3D modeling of the internal environment, which uses Python as an internal programming language. In a first phase, the data are selected to create the environment according to the measurements of the realistic UPCT scenario, obtaining the points of the XY coordinates with their respective height for each element that is included in the study environment.
In Blender each object is included by adding a cube-type mesh in the scene for the walls, floor, ceiling, windows and other elements of the environment. Additionally, each material is represented as textures that are incorporated in the model, by means of the texturing technique mapped in 2D coordinates (UV), which allows to establish a correspondence between the vertices of an object and the coordinates of a texture.
Assigning an image to each polygon according to the material in the model of the environment, to improve rendering, you get great 3D resolution providing realism to the scenario. Finally, before exporting the files, the transformations of scale, units, rotation and location of the scene applied in the 3D environment of Blender must be considered. This is important in order to apply the changes of the scene with the characteristics of the model as material, meshes and associated textures, to later import the model in the JME (Java Monkey Engine). Subsequently, the 3D scenario in Blender format is exported to a format compatible with the game engines.

Channel Modeling
The brute force method employed uses the theory of the optical beam that allows to separate each propagation mechanism (reflection, diffraction and dispersion) with the advantage of applying optical geometry. It is based on the SBR (Shooting and Bouncing Ray) algorithm. First, rays are launched from the transmitting antenna, and then the ray trajectories are traced, allowing us to see if it hits any object in the environment or if it reaches the receiving antenna. When the beam hits a flat surface, a reflected beam is generated and another transmitted by the interface, but if the beam hits an edge, a large amount of diffracted rays are generated and each of them is drawn until it hits another object and so on until it is discarded as having less power than the sensitivity of the receiver, or successfully impacting the receiver. Finally, the ray characteristics associated with the propagation are calculated.
The ray launching method applies the Fermat principle and the local field principle to model rays. Fermat's principle states that a ray follows the shortest path from the source to the field point, while the principle of the local field states that when the rays strike a surface, they experience reflection, refraction and diffraction, and at high frequencies, it only depends on the electrical and geometric properties of the scatterers.
Each ray is associated with a complex and vectorial electric field of amplitude, which is computed taking into account the field emitted in the transmitter, the losses in the free space, the reflections, and the diffractions and diffusions experienced by the ray in which the appropriate models are used. The losses in the free space are represented by the Friss equation, and the reflections are represented by applying the Fresnel reflection coefficients, while for diffraction the vector field is multiplied by the relevant diffraction coefficients and obtained from the UTD theory, using a modification of Luebber's coefficients.

Ray Propagation
The transmitter is modeled as a source point. To determine all the possible rays that can be launched from the transmitter and reach the receiver it is necessary to consider all possible angles of the transmitter's output and arrival at the receiver. The rays must be launched from the transmitter at an elevation angle and an azimuth angle relative to the fixed coordinate system and following a launch method.
The ray tracer calculates both the temporal dimension and the frequency. For purposes of the comparisons with the ray tracer of the UPCT and with the measurements, its operation is used in the where E rx is the electric field at the point of reception and E tx the electric field at the point of transmission. For this last situation the electric field is assumed as a phasor with a zero phase of reference. The power in the receiver is calculated by [17]: For all trajectories without line of sight (NLoS, Non-Line of Sight) you can write of (2) and (3): if the received power is considered at a specific distance In terms of the parameter it is possible to rewrite the reception power as

Launch Algorithm
For ray launching, the regular icosahedron method was implemented [18], where it is inscribed in a unitary sphere. This platonic solid, containing 20 polygonal faces and 12 vertices, allows the rays to be launched at angles that pass through the inner vertices and edges. The main advantage of this technique is that, using the tube beam, it does not generate overlap between the wave fronts as shown in Figure 4.
where rx E is the electric field at the point of reception and tx E the electric field at the point of transmission. For this last situation the electric field is assumed as a phasor with a zero phase of reference. The power in the receiver is calculated by [17]: (3) For all trajectories without line of sight (NLoS, Non-Line of Sight) you can write of (2) and (3): if the received power is considered at a specific distance In terms of the parameter it is possible to rewrite the reception power as

Launch Algorithm
For ray launching, the regular icosahedron method was implemented [18], where it is inscribed in a unitary sphere. This platonic solid, containing 20 polygonal faces and 12 vertices, allows the rays to be launched at angles that pass through the inner vertices and edges. The main advantage of this technique is that, using the tube beam, it does not generate overlap between the wave fronts as shown in Figure 4.  The direction of propagation is not necessarily normal for the face of the icosahedron compared to that the tube. The tessellation strategy is used to determine the direction of propagation [17,18]. This is to divide the icosahedron into a mosaic in order to determine the direction of the tube ray. When making this division, the wave fronts modify their size, but the relationship in shape and distance in the vicinity of the wave fronts is maintained. The number of rays launched by the transmitting source is given by [19]: where N is the frequency of the division. The radial separation is a measure of the angular resolution of the scattering of the rays for ray launching [19] and was used to calculate the angular resolution of separation between the wave front neighborhood:

Reception Algorithm
A highly efficient method to determine reception is by variable spheres [20][21][22][23]. This method consists of examining whether a launched ray reaches a sphere located in the receiver and centered on a point with a radius of where d is the length of the ray. If the launched ray has contact with the sphere, it can be received. In this method, each ray is verified with the generation of a sphere, which makes its computational implementation inefficient for thousands of launched rays. Therefore, the technique of variable spheres is modified to a constant sphere, which is described in [19]. The sphere now has a radius defined by Equation (9), calculating the distance of the ray from the maximum expected length according to the propagation in the free space at that distance. Once the losses are calculated by the ray trajectory, the sensitivity of the receiver is compared to discard the ray. If the power of the ray is below the sensitivity of the receiver, the ray is not taken into account. This receiving sphere contains all the spheres of variable reception for possible expected rays.

Hardware Acceleration
In programming using game engines, if the computer architecture offers multiple processors, it can allow the programmer to parallelize the game so that it can use all the resources and have improvements in performance. In Java, there is a Remote Method Invocation (RMI) to maximize the efficiency of multi-core processors. RMI allows the programmer to distribute the processes in which remote objects are invoked.
Multiple Java virtual machines (JVMs) were used on a single computer to parallelize the tasks and thus be able to use the multi-core processor as a system of distributed processes.
As a result of the pre-processing of rays, the list of stored candidate rays is taken and processed by the RMI server, executing a game engine for each nucleus and delivering a batch of candidate rays from the list. The RMI server stores in memory the list of the initial addresses of the rays to be launched from the transmitter, and then sends them sequentially and in order.
The game engine, upon receiving the address list of the rays to be launched from the transmitter, generates the rays with their initial parameters to be launched. Then it launches the generated rays into the environment, and they are processed in the GPU. In this stage, the GPU is efficiently used when processing a large number of rays in parallel and without displaying them on a monitor or screen. In the next rendering cycle, the GPU returns the list of processed rays to the CPU so that the game engine processes reception, reflection and diffraction, and generates the new list of rays to be launched in the next cycle. This process is repeated iteratively while the number of events is within the initial parameters. The game engine receives from the graphics card a list of the rays that were processed by the GPU. The algorithm reads sequentially from the list of candidate rays, one by one, and identifies for each ray if it impacted with an object and determines the object with which it impacted (sphere, cylinder or plane). Depending on the impact object, it validates the impact and calculates its multipath parameters and verifies that the power level is above the threshold to continue or discard the ray.
In case of reflection or diffraction, it generates the new rays with their parameters to store them in a new list of rays to be launched from the impact position. Then it checks if there are more rays to be processed, and if it does, it reads the next ray in the list and processes it, otherwise, it ends the cycle for the rays sent by the GPU. Therefore, once all the rays are launched from a transmitter, it is possible to calculate the electric field at any reference point within the scenario.

Results
For this experiment, a maximum of two diffraction and a maximum of three consecutive interactions between reflections and diffraction were defined. The geometric separation resolution for the SBR is 0.13 as it is an indoor scenario and thus obtain a reasonable number of discrete rays in the analyzed scenario. Figure 5 shows the result for the tracer, the launcher and the measurements for Position 3. In the graph you can see the differences between the two models and the measurements, being in this case very similar the results between the tracer and the launcher. The comparison has also been carried out quantitatively. Thus, Table 2 shows the measured and estimated channel parameters; in particular, the mean square value of the delay (RMS), the average value of the delay (MD) and the basic propagation loss (PL) have been used.  In Figure 6 the results for Point 18 are shown, and, in this case, the result of the launcher is a little more like the real measurements than in the case of the tracer. In Figure 6 the results for Point 18 are shown, and, in this case, the result of the launcher is a little more like the real measurements than in the case of the tracer.
Statistics in Table 2 show very small differences between the two systems; the largest differences are found at Position 7 in which the ray tracer obtained a greater accuracy than the ray launcher in estimating the mean value while the launcher was more accurate in estimating the RMS. There are also some cases in which the percentage of accuracy is the same, for example, in the mean value of Position 18, and yet there is an appreciable difference between the results of both techniques. Finally, comparison for Position 7 is observed in Figure 7; the accuracy is similar, although the tracer shows a group of replicas between 6 and 8 m that the launcher only partially represents. The precision is again similar without clearly distinguishing which of the two methods is more precise.   Table 2 show very small differences between the two systems; the largest differences are found at Position 7 in which the ray tracer obtained a greater accuracy than the ray launcher in estimating the mean value while the launcher was more accurate in estimating the RMS. There are also some cases in which the percentage of accuracy is the same, for example, in the mean value of Position 18, and yet there is an appreciable difference between the results of both techniques.

Statistics in
Finally, comparison for Position 7 is observed in Figure 7; the accuracy is similar, although the tracer shows a group of replicas between 6 and 8 m that the launcher only partially represents. The precision is again similar without clearly distinguishing which of the two methods is more precise. In Figure 6 the results for Point 18 are shown, and, in this case, the result of the launcher is a little more like the real measurements than in the case of the tracer.
Statistics in Table 2 show very small differences between the two systems; the largest differences are found at Position 7 in which the ray tracer obtained a greater accuracy than the ray launcher in estimating the mean value while the launcher was more accurate in estimating the RMS. There are also some cases in which the percentage of accuracy is the same, for example, in the mean value of Position 18, and yet there is an appreciable difference between the results of both techniques. Finally, comparison for Position 7 is observed in Figure 7; the accuracy is similar, although the tracer shows a group of replicas between 6 and 8 m that the launcher only partially represents. The precision is again similar without clearly distinguishing which of the two methods is more precise.

Discussion
The objective of programming the ray launcher in GPU-type processors is the acceleration of the estimation of the wireless channel. The ray launcher takes 36 h to estimate the path of all rays considered in the room. Once this simulation has been carried out, the 180 transfer functions in frequency for each channel can be obtained immediately and therefore the PDP for any position; so it can be said that the launcher takes 36 h to simulate the 20 measured locations, of which we have shown results for four of them. The ray tracing tool programmed in MATLAB takes around 13 h to simulate the 180 frequency transfer functions that allow finding the PDP of one position; that is, it is only necessary to perform the simulation of three locations to make the ray launcher programmed in the GPU profitable and therefore more computationally efficient with the same precision features.
The ray launcher is capable of including rays reflected of up to a third order; first order diffractions, up to two reflections after a diffraction, and up to two reflections with a subsequent diffraction. As seen in the results, this number of components is sufficient to accurately estimate the behavior of the wireless channel in the studied environment. A large number of contributions from the dispersers improve the electric fields received and, therefore, the prediction.
Next, another propagation environment is analyzed. High-density outdoor environments of buildings with low heights, determined by the downlink channel, predominantly reflected rays [24,25]. Diffraction components increase generating more rays. The combinations between reflection and diffraction events increase the number of rays and, therefore, the calculation time. However, there are external factors that decrease processing. Outdoors, the rays travel a greater distance, and this leads to only a few rays reaching the power above the sensitivity threshold of the receiver. Therefore, simulations in open environments can take the same time to interior simulations with good precision, such as that obtained in [26].

Conclusions
In this work, it has been shown that the programming of a ray launcher in GPU processors allows for the acceleration of the wireless channel estimation thanks to the parallelization of the ray-launching task. The accuracy of the launcher is high and has been compared with that achieved by a ray tracer. The differences between the simulations made with the ray launcher and the measurements may be due to the fact that, although the modeling of the environment was done more precisely in the launcher, there may be errors in locating objects of a few centimeters, which would affect the results at the frequencies used (millimeter waves). In future work the ray launcher will be compared with commercial ray launching tools to test their efficiency.