Feature Importances: A Tool to Explain Radio Propagation and Reduce Model Complexity

: Machine learning models have been widely deployed to tackle the problem of radio propagation. In addition to helping in the estimation of path loss, they can also be used to better understand the details of various propagation scenarios. Our current work exploits the inherent ranking of feature importances provided by XGBoost and Random Forest as a means of indicating the contribution of the underlying propagation mechanisms. A comparison between two di ﬀ erent transmitter antenna heights, revealing the associated propagation proﬁles, is made. Feature selection is then implemented, leading to models with reduced complexity, and consequently reduced training and response times, based on the previously calculated importances.


Introduction
Radio coverage, received power levels, and path loss are parameters of crucial importance when designing mobile communication networks. The need for fast and accurate predictions has driven the deployment of many path loss prediction models.
Artificial intelligence and machine learning have provided reliable solutions for modeling radio propagation. A wide variety of models have been deployed [1][2][3][4][5] and optimized [6,7] to produce trustworthy estimations of path loss prediction. However, these models are mostly considered to be black boxes. That is, their internal functionality and decision-making process are considered to be opaque. This situation degrades the assistance of the machine learning models, leaving a significant part of their potential unused.
Our current work examines the possibility of using two machine learning models (namely XGBoost and Random Forest) as both predictors and explainers. That is, in addition to, and in parallel with, the path loss predictions, we attempt to use the models to better explain and understand the underlying propagation mechanisms.
In this paper, two propagation scenarios for two different base station antenna heights are examined. The predictions are explained in conjunction with the respective propagation mechanisms. These mechanisms are indicated using machine learning models, specifically with their internal mechanism of ranking feature importances. It should be mentioned that the work in [8] uses Random Forest's inherent ranking of feature importances to explain path loss between two unmanned aerial vehicles.
As stated in [9], an increased number of features does not necessarily lead to a model with better performance. This is why it has been suggested [9] that methodologies are developed to guide the procedure of feature selection. A step towards that direction is to use the ranked feature importances as a means to produce models with fewer inputs, describing the trade-off between prediction performance and model complexity.
The contributions of our work can be summarized as follows: Insight into the model's behavior is gained through the association between the changes in feature importances and the emergence of different radio propagation mechanisms. Simpler and faster models are deployed through a feature selection procedure based on the ranked importances.
The remainder of this paper is structured as follows: Section 2 describes the physical mechanisms that emerge according to the transmitter's height, and Section 3 analyzes the details of the machine learning models. The results are presented in Section 4 and discussed in Section 5.

Propagation Mechanisms According to the Transmitter's Height
For a given built-up environment, the arising propagation mechanisms are dictated from the transmitter's height. More specifically, for the case of placing the transmitter well below the rooftop level, the received field (for NLOS conditions) is characterized by multiple reflected rays from building walls, in addition to diffractions from perpendicular building corners [10,11]. By comparison, when the transmitter is placed well above the rooftops, electromagnetic waves propagate above them until they are diffracted down to the receiver [11,12]. For the intermediate case, when the transmitter is placed at a height near building rooftops, a mixture of propagation mechanisms takes place. That is, neither propagation inside streets nor over-rooftop diffraction can be neglected [13]. This situation is difficult to simulate because many propagation mechanisms have to be taken into account.
From a machine learning model's perspective, simulation of the above-mentioned mechanisms proceeds with feature engineering [14]: that is, the inputs of the model should describe those parameters of the built-up environment that have the greatest influence on the dominant propagation mechanisms. For the case of placing the transmitter well above the rooftop height, the inputs should describe the mechanism of over-rooftop diffraction [15]; this is why the position of the tallest building along the Line of Sight path near the receiver is expected to determine the level of the received power. However, when the transmitter height is reduced, the contribution of over rooftop diffraction also decreases, steadily giving rise to the mechanism of multiple reflections and the creation of an urban street canyon. A feature that could be coupled with this mechanism is that which depicts the number of buildings that obstruct the LOS path, thus producing the aforementioned reflections.

Features and the Associated Propagation Mechanisms
The propagation environment and the 23 features that describe it were introduced in our previous research [15][16][17]. The area is formed from rectangular buildings, whose dimensions (in addition to the roads' dimensions) are randomly distributed, corresponding to an urban area. All measurements are taken along the roads and at the crossroads of the area.
The 23 features describe the Line of Sight path (Figure 1), the area around the receiver (Figure 2), and the positions of the transmitter and the receiver, in addition to the distances between them ( Figure 3).
Looking closer at the group of the 10 features of the Line of Sight path, nine [18] refer to specific segments (three features for each of the three segments created). For a transmitter placed well above the rooftop, the features describing the third segment, and primarily the position of its tallest building, are expected to have the greatest influence on the power received. Looking closer at the group of the 10 features of the Line of Sight path, nine [18] refer to specific segments (three features for each of the three segments created). For a transmitter placed well above the rooftop, the features describing the third segment, and primarily the position of its tallest building, are expected to have the greatest influence on the power received.
The 10th feature of this group describes the whole Line of Sight path. Its value is equal to the number of buildings that obstruct the direct ray (which is equal to five for Figure 1). These buildings act as sources for the mechanism of multiple reflections. Thus, this feature is expected to have a greater impact when the height of the transmitter is reduced.  The 10th feature of this group describes the whole Line of Sight path. Its value is equal to the number of buildings that obstruct the direct ray (which is equal to five for Figure 1). These buildings act as sources for the mechanism of multiple reflections. Thus, this feature is expected to have a greater impact when the height of the transmitter is reduced.
Looking closer at the group of the 10 features of the Line of Sight path, nine [18] refer to specific segments (three features for each of the three segments created). For a transmitter placed well above the rooftop, the features describing the third segment, and primarily the position of its tallest building, are expected to have the greatest influence on the power received.
The 10th feature of this group describes the whole Line of Sight path. Its value is equal to the number of buildings that obstruct the direct ray (which is equal to five for Figure 1). These buildings act as sources for the mechanism of multiple reflections. Thus, this feature is expected to have a greater impact when the height of the transmitter is reduced.  Features 11 to 18 describe the area around the receiver. As shown in Figure 2, they return the heights of the nearest buildings. If the point in question corresponds to a road, the returned value is equal to 0. Features 11 to 18 describe the area around the receiver. As shown in Figure 2, they return the heights of the nearest buildings. If the point in question corresponds to a road, the returned value is equal to 0.  Table 1 summarizes and describes all features.
It is apparent that information regarding the map of the area is needed to calculate the values corresponding to each feature. This could either be obtained through specific databases, or through a combination of OpenStreetMap [19] (which provides the footprints of the buildings) and the open source software QGIS (which provides the heights of the buildings).

Number
Name Description 1 LOS_1a The distance (h1) of the top of the tallest building of the first segment, from the point at which the building intersects the LOS ray, between Tr and R 2 LOS_1b The distance d1 of the tallest building of the first segment from the transmitter 3 LOS_1c The length, l1, of the tallest building of the first segment 4 LOS_2a The distance (h2) of the top of the tallest building of the second segment, from the point at which the building intersects the LOS ray, between Tr and R 5 LOS_2b The distance d2 of the tallest building of the second segment from the transmitter 6 LOS_2c The length, l2, of the tallest building of the second segment 7 LOS_3a The distance (h3) of the top of the tallest building of the third segment, from the point at which the building intersects the LOS ray, between Tr and R 8 LOS_3b The distance d3 of the tallest building of the first segment from the transmitter 9 LOS_3c The length, l3, of the tallest building of the third segment 10 Buildings: The total number of the buildings which interrupt the LOS path.  Table 1 summarizes and describes all features.

LOS_1a
The distance (h 1 ) of the top of the tallest building of the first segment, from the point at which the building intersects the LOS ray, between Tr and R 2

LOS_1b
The distance d 1 of the tallest building of the first segment from the transmitter 3

LOS_1c
The length, l 1 , of the tallest building of the first segment 4

LOS_2a
The distance (h 2 ) of the top of the tallest building of the second segment, from the point at which the building intersects the LOS ray, between Tr and R

LOS_2b
The distance d 2 of the tallest building of the second segment from the transmitter The length, l 2 , of the tallest building of the second segment 7

LOS_3a
The distance (h 3 ) of the top of the tallest building of the third segment, from the point at which the building intersects the LOS ray, between Tr and R 8

LOS_3b
The distance d 3 of the tallest building of the first segment from the transmitter 9

LOS_3c
The length, l 3 , of the tallest building of the third segment

10
Buildings: The total number of the buildings which interrupt the LOS path.

SSI_1
The height of the building (or the existence of a street) 10m right from the receiver

SSI_2
The height of the building (or the existence of a street) 10 m left from the receiver

SSI_3
The height of the building (or the existence of a street) 10 m above the receiver

SSI_4
The height of the building (or the existence of a street) 10 m below the receiver

SSI_5
The height of the building (or the existence of a street) 10 m left and above the receiver The height of the building (or the existence of a street) 10 m left and below the receiver

SSI_7
The height of the building (or the existence of a street) 10 m right and above the receiver

SSI_8
The height of the building (or the existence of a street) 10 m right and below the receiver The distance between transmitter and receiver in the xy plane It is apparent that information regarding the map of the area is needed to calculate the values corresponding to each feature. This could either be obtained through specific databases, or through a combination of OpenStreetMap [19] (which provides the footprints of the buildings) and the open source software QGIS (which provides the heights of the buildings).

Models Used: XGBoost and Random Forest
XGBoost [20] and Random Forest [21] were implemented to make predictions based on the aforementioned features. Their performance when dealing with features of tabular data format, in addition to their built-in capability of ranking feature importances, were the reasons for choosing them.
Both models are based on the combination of regression trees. That is, both are ensemble methods, whose final predictions are based on the combination of the predictions of single regression trees. A key difference exists between them, however, in the manner in which each ensemble is constructed [22]. XGBoost utilizes the concept of boosting, whereas Random Forest relies on the concept of bagging.
According to bagging, all trees are grown in parallel, with each tree a standalone predictor of the quantity under estimation. The ensemble's estimation is the average of the estimations of all trees. Boosting, by comparison, relies on the sequential growth of trees. Each new tree is grown with respect to the previous tree's errors, and tries to compensate for these errors.

Relative Feature Importances in Tree-Based Models
Feature selection is a crucial element of the design of every machine learning model [23]. It can be performed based on the ranking of features, according to their contribution to the model's predictions. A number of specific methods which calculate feature importance have been developed. An exhaustive approach towards this goal would involve first training the model according to every possible subgroup of features, and then comparing the estimation results to determine the most efficient subgroup of features. Such an approach would need a vast amount of computer resources and would also be extremely time-consuming.
Another approach, which is implemented in the current study, is one that takes advantage of the built-in ability of most tree-based models to rank features (also denoted as predictors) according to their participation in the process of node splitting during the procedure of building the regression trees.
Each node of a regression tree corresponds to one of the predictors, combined with a cutpoint (or split point) of the aforementioned predictor. The terminal nodes hold the value of the output variable. That is, for every given set of predictors' values, a specific path through the nodes is followed, finally leading to a terminal node where the prediction is contained; that is, the tree segregates the predictor space (which is the set of all possible values for all predictors) to make its predictions. Every added node divides the predictor space further.
Node splitting can be described as a procedure in which [22] the predictor space is divided into two distinct and non-overlapping regions in such a way that the residual sum of squares is minimized. This means that all predictors, in addition to all possible cutpoint values for each predictor, are considered to perform the split. The procedure terminates when a preselected criterion (mostly referring to the tree's maximum depth or the minimum number of training examples corresponding to a single node) is met.
Mathematically speaking, for any predictor j and any cutpoint s, the following pair of half-planes is defined: where the notation X|X j < s denotes the region of the predictor space in which the observation X j takes on a value less than s. The values of j and s are determined such that the following quantity (residual sum of squares) is minimized: whereŷ R 1 is the mean response for the training observations x i in the region R 1 (j,s) andŷ R 2 is the mean response for the training observations x i in the region R 2 (j,s). The actual output for the ith input pattern is denoted y i . It is therefore straightforward to claim that node splitting improves the tree's predictions. The sum of the improvements over all nodes of a tree for which the splits were made according to a particular feature is the relative importance of that feature. This measure of importance is then easily generalized to the whole ensemble of trees by calculating it for each individual tree and averaging it for the total number of trees.

Metrics of the Prediction Error
The error measurement metrics reflect the distance between the actual and the predicted values. Two of the most widely used error metrics, with their definitions, are shown in the following equations: where y i (p) is the actual path loss value, y i,mean (p) is the mean actual path loss value and y o (p) is the predicted path loss value. N test is the number of test patterns, while p represents the input according to which the prediction is made.

Numerical Results
Two collections of 34,501 simulations, corresponding to two transmitter heights (namely 35 and 30 m) were taken from a specific software application implementing the ray-tracing algorithm [24].
The area under consideration was urban, with randomly distributed dimensions of buildings and roads. The building heights ranged between 5 and 29 m, regardless of the base station's height. We could therefore characterize the case where the transmitter height is equal to 35 m as a scenario in which the base station is placed well above the rooftops (because even the tallest building would be 6 m below the transmitter). This assumption does not fully hold true when placing the transmitter at 30 m because buildings would exist that have only a 1 m difference in height with the placement of the transmitter, therefore leading to a situation where the base station is placed at a height close to that of the building rooftops.
Both sets of simulations were split into a training set consisting of 80%, and the corresponding testing set comprised 20% of the simulations.    It is clear that the prediction is more precise at the 35 m case. Moreover, XGBoost leads to better results than Random Forest for both transmitter heights.   The three most important features are Buildings, LOS_3b, and Distance. Buildings is the top-ranked feature, in accordance with the fact that the mechanism of multiple reflections is expected to be strong at the particular transmitter's height. It is also worth observing that the importance of LOS_3b (which indicates the presence of the over-rooftop diffraction) is significant, particularly when XGBoost is applied. The three most important features are Buildings, LOS_3b, and Distance. Buildings is the top-ranked feature, in accordance with the fact that the mechanism of multiple reflections is expected to be strong at the particular transmitter's height. It is also worth observing that the importance of LOS_3b (which indicates the presence of the over-rooftop diffraction) is significant, particularly when XGBoost is applied.

Feature Importances When the Transmitter is at 35 m
The feature importances for the 35 m case can be found in Figure 5. (b) Figure 4. Feature importances, as estimated when the transmitter is placed at 30 m, via two machine learning models: (a) XGBoost; (b) Random Forest. The feature called Buildings is ranked as the most important.
The three most important features are Buildings, LOS_3b, and Distance. Buildings is the top-ranked feature, in accordance with the fact that the mechanism of multiple reflections is expected to be strong at the particular transmitter's height. It is also worth observing that the importance of LOS_3b (which indicates the presence of the over-rooftop diffraction) is significant, particularly when XGBoost is applied.

Feature Importances When the Transmitter is at 35 m
The feature importances for the 35 m case can be found in Figure 5. According to both models, the same set of three features was found to be the most important for this height. However, the inner ranking between the three features changed: LOS_3b was clearly the most important, indicating that over-rooftop diffraction is the primary propagation mechanism for this height of the transmitter. Moreover, the importance of Buildings fell, since the contribution of multiple reflections decreased. According to both models, the same set of three features was found to be the most important for this height. However, the inner ranking between the three features changed: LOS_3b was clearly the most important, indicating that over-rooftop diffraction is the primary propagation mechanism for this height of the transmitter. Moreover, the importance of Buildings fell, since the contribution of multiple reflections decreased.

Gradual Addition of Features with Reverse Order of Importance
Feature importances can be used to perform feature selection. That is, a subset of the most important features can be selected to construct a simpler machine learning model. This concept was investigated for both heights using Figure 6. Starting from the most important feature (as calculated from XGBoost) for each height, new features are progressively added according to their importance. After the addition of each new feature, the model's performance is evaluated. It is evident that the incorporation of features with the lowest importance does not lead to significant performance improvement.

Model Reduction
As Figure 6 suggests, the subset of the 18 most important features leads to the same MAE with that obtained when using the full set of 23 features (for the 30 m case). Thus, the application of models with lower complexity can be attempted because fewer features are required. An investigation of the number of trees, the features used, and the MAE value is shown in Figure 7. It is evident that the incorporation of features with the lowest importance does not lead to significant performance improvement.

Model Reduction
As Figure 6 suggests, the subset of the 18 most important features leads to the same MAE with that obtained when using the full set of 23 features (for the 30 m case). Thus, the application of models with lower complexity can be attempted because fewer features are required. An investigation of the number of trees, the features used, and the MAE value is shown in Figure 7.
It is evident that the full set of features requires more trees to obtain the minimum error. By comparison, the reduced subset produces the same result with lower model complexity. Table 3 presents the training and response times of the models with the lowest MAE values. A computer with an Intel i5-5575R processor and 6 GB of RAM memory was used to train and test the models.

Model Reduction
As Figure 6 suggests, the subset of the 18 most important features leads to the same MAE with that obtained when using the full set of 23 features (for the 30 m case). Thus, the application of models with lower complexity can be attempted because fewer features are required. An investigation of the number of trees, the features used, and the MAE value is shown in Figure 7. It is evident that the full set of features requires more trees to obtain the minimum error. By comparison, the reduced subset produces the same result with lower model complexity. Table 3

Discussion
Feature importance was used as a means to explain the different propagation scenarios that arise when the transmitter is placed at different heights. For both heights, the same set of three features captured more than 75% of the total importance, leaving the remaining 25% (or less) for the remaining 20 features. It is worth mentioning that this behavior is the same according to both machine learning models.
Comparing the inner rankings among the three most important features, it is clear that Buildings outperforms the other features when the transmitter is placed at 30 m. The situation clearly changes when elevating the transmitter to a height of 35 m, in which case LOS_3b becomes the most important feature for this scenario.
It is therefore straightforward to conclude that when the transmitter is placed at 35 m, the mechanism of over-rooftop diffraction has a large influence, because this mechanism is connected with the feature LOS_3b. By comparison, the mechanism of multiple reflections has a stronger influence when the transmitter is placed at 30 m, as concluded through its connection with the feature Buildings.
A closer look at Figure 4; Figure 5 helps explain why the prediction for the 30 m case is worse than that for the 35 m case: for the 35 m case, the dominance of LOS_3b is much clearer than the dominance of Buildings for the 30 m case. That is, over-rooftop diffraction is clearly greater in the 35 m case, whereas the propagation profile of the 30 m case cannot be attributed with the same level of clarity to a single mechanism. This sharper propagation profile makes prediction for the 35 m case easier for the machine learning models, in contrast to the 30 m case, in which the existence of different mechanisms makes the simulation more challenging.
This can also be observed in Figure 6: the error curve drops at a much faster pace in the 35 m case, meaning that a small number of features contains a significant amount of the information needed to model the propagation scenario. However, in the 30 m case the improvement takes place more smoothly, as indicated by the small drops in the prediction error. This is due to the co-existence of different propagation mechanisms for the particular transmitter height. Figure 7 and Table 3 demonstrate the means of benefiting by eliminating poorly contributing features: less complex models can provide the same prediction performance more quickly, thus making real-time predictions more feasible.

Conclusions and Future Work
Our work highlighted the significance of feature importances, both as a tool to explain radio propagation, and a means to perform feature elimination and reduce model complexity. The emergence of different propagation mechanisms, for two different transmitter heights, was associated with the difference between the values of feature importances. Moreover, model reduction on the basis of the ranked feature importances was performed, leading to substantially faster, although equally accurate, predictions.
The acquisition of more data regarding other transmitter heights, or frequencies, could pave the way to an extension of our work, through the incorporation of the aforementioned parameters as extra features. A single model would then be able to provide results for various propagation scenarios.
Author Contributions: Conceptualization, S.P.S.; methodology, K.S.; software, S.P.S.; writing-original draft preparation, S.P.S. and K.S.; writing-review and editing, S.K.G.; supervision, K.S. and S.K.G. All authors have read and agreed to the published version of the manuscript.