Advancements in the Statistical Study, Modeling, and Simulation of Microwave-Links in Cellular Backhaul Networks

: While the effect of rainfall and other environmental phenomena on a link budget in microwave wireless communication has been well studied for network design, it has usually been done for each microwave-link separately. Recently, attenuation in multiple microwave-links is being used simultaneously for rainfall mapping over speciﬁc areas, and consequently, rain-induced attenuation ﬁelds can be constructed. Dedicated algorithms have been designed to relate attenuation in multiple microwave-links to its corresponding rain-ﬁeld. Their performance depends signiﬁcantly on the structure of the network. As the topology of a cellular microwave network (CMNs) is region-dependent, general theory for its effect on performance can only be developed statistically. In this paper we study the statistical nature of CMNs and lay the groundwork for such models based on empirical results.


Introduction
In cellular backhaul networks, cellular microwave-links (CMLs) are used as wireless channels to connect two base stations (BS).Figure 1 portrays a single CML.Each BS is equipped with a transmitting-receiving antenna.Giuli et al. describe in their work [1,2] how such microwave channels can be utilized for rainfall mapping.Capitalizing on the use of CMLs in cellular communications, Messer et al. [3] suggested commercial backhaul networks for environmental monitoring.They suggested utilizing the commercial network, already set-up and functional, thus offering a cheap and opportunistic approach to the problem of precipitation monitoring.In that framework, the precipitation field is modeled as the signal to be reconstructed and the CMLs are modeled as random line projections which sample the signal and serve as data observation.As the microwave propagates along the CML, it accumulates attenuation that is attributed greatly to the air's present moisture, thus providing a form of spatial sampling of the precipitation field.This physical phenomenon is described in [4] and is elaborated on in [1].In [5] it is shown that in order to yield a reconstruction method for a given sampled field, one must characterize the sampling scheme, i.e., the distribution of CMLs.In order to do so, we studied the design of a backhaul network.
When a cellular microwave network (CMN) is being initially architected, the chosen topology of network BS, and thus CMLs, is based on several considerations.One can divide those considerations to two types, micro and macro.A single cellular microwave-link (CML) [6].
Micro factors would be those that have a minor impact on the chosen position of the BS.For instance, after deciding to position a BS on a specific street block, one would consider micro factors for positioning that BS on a specific building's roof-top rather than another's.The macro factors, on the other hand, will inform the CMN architect what should be the amount of CMLs to deploy in an area, how to spatially distribute them, and how their lengths (i.e., distances between linked BS) should vary.Insights regarding the design of backhaul networks can be found in [7].We suggest that there is a random factor in the spread of CMLs.This claim, which is the basis of this paper, relies on both the micro and macro factors.However, in this paper we focus on the macro factors as they impact the spatial distribution of CMLs globally.As will be demonstrated, CMLs' spatial distribution can be divided into subsets of distribution categories, each corresponding to characteristic population density and topography.
As pointed out in [8], CMLs tend to be distributed in clusters.This means that not only is their spatial distribution observed to be non-uniform, it also leaves regions with little or no coverage.We will make the case that these clusters correspond in their locations to population densities.As this hypothesis will be supported, it provides an intuition for the macro factors of CMLs' distribution and volume.
We analyze CMLs in the context of distribution categories, namely being urban (most dense), suburban, and rural (least dense) [9,10].In [11] one may also find a connection between population density and perspective BS service capacity.Population volumes may be high so as to saturate BS service capacity, thus calling for the allocation of additional BS to share the load.Table 1 shows results of BS density studies.
BS densities project on CML densities.To witness this relationship, Figure 2 presents the four common CML topologies.One can see that for each of these, the number of edges is similar to the number of nodes, as they symbolize the CMLs and BS, respectively.[10].BS densities depend on population densities.Since cellular microwave-links (CMLs) topologies are such that the number of CMLs is nearly identical to the number of BS, this table also reflects spatial densities of CMLs.

Region
Area (km 2 ) BS Amount BS Density (1/km    Micro factors would be those that have a minor impact on the chosen position of the BS.For instance, after deciding to position a BS on a specific street block, one would consider micro factors for positioning that BS on a specific building's roof-top rather than another's.The macro factors, on the other hand, will inform the CMN architect what should be the amount of CMLs to deploy in an area, how to spatially distribute them, and how their lengths (i.e., distances between linked BS) should vary.Insights regarding the design of backhaul networks can be found in [7].We suggest that there is a random factor in the spread of CMLs.This claim, which is the basis of this paper, relies on both the micro and macro factors.However, in this paper we focus on the macro factors as they impact the spatial distribution of CMLs globally.As will be demonstrated, CMLs' spatial distribution can be divided into subsets of distribution categories, each corresponding to characteristic population density and topography.
As pointed out in [8], CMLs tend to be distributed in clusters.This means that not only is their spatial distribution observed to be non-uniform, it also leaves regions with little or no coverage.We will make the case that these clusters correspond in their locations to population densities.As this hypothesis will be supported, it provides an intuition for the macro factors of CMLs' distribution and volume.
We analyze CMLs in the context of distribution categories, namely being urban (most dense), suburban, and rural (least dense) [9,10].In [11] one may also find a connection between population density and perspective BS service capacity.Population volumes may be high so as to saturate BS service capacity, thus calling for the allocation of additional BS to share the load.Table 1 shows results of BS density studies.Micro factors would be those that have a minor impact on the chosen position of the BS.For instance, after deciding to position a BS on a specific street block, one would consider micro factors for positioning that BS on a specific building's roof-top rather than another's.The macro factors, on the other hand, will inform the CMN architect what should be the amount of CMLs to deploy in an area, how to spatially distribute them, and how their lengths (i.e., distances between linked BS) should vary.Insights regarding the design of backhaul networks can be found in [7].We suggest that there is a random factor in the spread of CMLs.This claim, which is the basis of this paper, relies on both the micro and macro factors.However, in this paper we focus on the macro factors as they impact the spatial distribution of CMLs globally.As will be demonstrated, CMLs' spatial distribution can be divided into subsets of distribution categories, each corresponding to characteristic population density and topography.
As pointed out in [8], CMLs tend to be distributed in clusters.This means that not only is their spatial distribution observed to be non-uniform, it also leaves regions with little or no coverage.We will make the case that these clusters correspond in their locations to population densities.As this hypothesis will be supported, it provides an intuition for the macro factors of CMLs' distribution and volume.
We analyze CMLs in the context of distribution categories, namely being urban (most dense), suburban, and rural (least dense) [9,10].In [11] one may also find a connection between population density and perspective BS service capacity.Population volumes may be high so as to saturate BS service capacity, thus calling for the allocation of additional BS to share the load.Table 1 shows results of BS density studies.
BS densities project on CML densities.To witness this relationship, Figure 2 presents the four common CML topologies.One can see that for each of these, the number of edges is similar to the number of nodes, as they symbolize the CMLs and BS, respectively.Table 1.Statistical base station (BS) densities [10].BS densities depend on population densities.Since cellular microwave-links (CMLs) topologies are such that the number of CMLs is nearly identical to the number of BS, this table also reflects spatial densities of CMLs.

Region
Area (km 2 ) BS Amount BS Density (1/km   BS densities project on CML densities.To witness this relationship, Figure 2 presents the four common CML topologies.One can see that for each of these, the number of edges is similar to the number of nodes, as they symbolize the CMLs and BS, respectively.
Figure 2 and Table 1 suggest that the spatial distribution of CMLs is not, and cannot be homogenous.However, our study is based on the assumption that any given region can be partitioned into sub-regions, each homogeneous in the sense of CMLs density, meaning that all the CMLs in a sub-region have their positions drawn from the same uniform distribution.Here and throughout this paper, when referring to a CML's position, it is the position of its midpoint that is considered.Based on [12], we suggest describing CML distribution in any homogeneous region by three characteristics: 1.
Spatial density of the CMLs; 2.
Lengths of the CMLs.
The rest of the paper is organized as follows: in Sections 2 and 3 we study the CMLs' statistical characteristics as listed above and suggest statistical models.Section 2 addresses the CMLs' spatial density, and Section 3 addresses their orientations and lengths.Section 4 suggests a mathematical model for the relationship between CMLs' lengths and their spatial density.Section 5 then combines the statistical derivations from previous sections to suggest a novel computational method for simulating sets of synthetic CMLs.Section 6 then concludes and discusses the results.The geographical regions analyzed here are described in the appendix.Note that as this paper is an extension of [13], its novelty is presented in Sections 4 and 5.

Spatial Distribution
In order to be able to characterize the distributions of CMLs in a given region, we suggest partitioning the region into sub-regions, each with a spatially uniform CML density.Such sub-regions are expected to correspond to the common environmental terms: urban, suburban, and rural.If such a partition to homogeneous regions was not performed and the CMLs were clustered together, their spatial distribution would need to be addressed more meticulously in order to evaluate reconstruction potential, as it would not have been simply uniform.
When partitioning a region, it is critical to maintain a nominal area size which is appropriate for capturing relevant rain phenomena.Typical rain clouds over Israel tend to stretch over an area of up to 10 × 10 (km 2 ) [14].It is recommended to maintain a minimum of such region size.Accordingly, in this paper we examined regions that are indeed 10 × 10 (km 2 ).

Length and Orientation
The locations and volume of CMLs that were discussed in Section 2 characterize CMLs collectively.Individual CML attributes present a significant attenuation measurement factor as well.These characteristics are the orientation and length of the CML.The understanding of all three factors allows for 2-D modeling of CMLs, as this paper is concerned with.It should be mentioned that if one were to examine the 3-D modeling of CMLs, the CMLs' height variability would be a factor as well.
The study presented here regards CMLs in the state of Israel belonging to a single cellular provider, Cellcom (see Appendix A for further details).All CMLs are operating in the same frequency range, the K-band.
Results show that in any type of region studied, the CMLs' orientation takes on any angle with equal probability.This means the direction of CMLs is distributed uniformly and is not at all correlated with the type of population density.Figure 3 portrays this conclusion.Moreover, the orientation is found to be statistically independent of the other factors studied here, the CML's length and density.Sendik [12] studied the distribution of CMLs as well.To do so, a map of Israel was partitioned into four parts based on latitude (as Israel stretches from latitude 29.5° N to 33.29° N).These four parts were used as sub-regions to study CMLs.However, these four regions were heterogeneous in their environmental types.Here, by isolating sub-regions of homogeneous environment types and then characterizing CML statistics by such types, we suggest a contribution to Sendik's work on statistical modeling of CMLs in Israel.
The study of CMLs' lengths yielded much different results than the orientations.Unlike orientations, CML lengths are distributed non-uniformly.CML lengths are distributed exponentially.Moreover, the lengths' statistical characteristics depend on the type of environment.The sample-means specified in Figure 4 are used for the fitting of an exponential distribution.For an exponential random variable, e.g., , the probability density function is: Sendik [12] studied the distribution of CMLs as well.To do so, a map of Israel was partitioned into four parts based on latitude (as Israel stretches from latitude 29.5 • N to 33.29 • N).These four parts were used as sub-regions to study CMLs.However, these four regions were heterogeneous in their environmental types.Here, by isolating sub-regions of homogeneous environment types and then characterizing CML statistics by such types, we suggest a contribution to Sendik's work on statistical modeling of CMLs in Israel.
The study of CMLs' lengths yielded much different results than the orientations.Unlike orientations, CML lengths are distributed non-uniformly.CML lengths are distributed exponentially.Moreover, the lengths' statistical characteristics depend on the type of environment.Figure 4 illustrates both findings.Sendik [12] studied the distribution of CMLs as well.To do so, a map of Israel was partitioned into four parts based on latitude (as Israel stretches from latitude 29.5° N to 33.29° N).These four parts were used as sub-regions to study CMLs.However, these four regions were heterogeneous in their environmental types.Here, by isolating sub-regions of homogeneous environment types and then characterizing CML statistics by such types, we suggest a contribution to Sendik's work on statistical modeling of CMLs in Israel.
The study of CMLs' lengths yielded much different results than the orientations.Unlike orientations, CML lengths are distributed non-uniformly.CML lengths are distributed exponentially.Moreover, the lengths' statistical characteristics depend on the type of environment.The sample-means specified in Figure 4 are used for the fitting of an exponential distribution.For an exponential random variable, e.g., ) / 1 ( ~ Exp T , the probability density function is: The sample-means specified in Figure 4 are used for the fitting of an exponential distribution.For an exponential random variable, e.g., T ∼ Exp(1/θ), the probability density function is: where θ = E[T].This presents a direct tie between the sample mean of the CMLs' length and the exponential fitting.Table 2 presents a data summary for CMLs in Israel used in this study.
Table 2. Empirical CML distributions based on environment type.Sub-regions are described in the appendix.Note that region "Tel Aviv", for instance, does not apply solely to the city of Tel Aviv but to a more general region [13].

Modeling the Relationship between Cellular Microwave-Links' (CMLs) Length and Density
The findings in Table 2 suggest that there may be an underling relationship between the CMLs' density and their mean length for a given region.Following that intuition, an analysis on a larger scale was performed.Given the set of CMLs over Israel, a set of 3626 sub-regions was generated.A moving window with varying dimensions scanned the area of Israel.For every iteration, meaning, for every one of the 3626 windows, there was a subset of CMLs that were captured within the window's bounds.Two features were captured for each such iteration, the CMLs' density and the CMLs' mean length.Thus, 3626 observations of pairs {density, mean length} were observed.Figure 5 presents the scatter plot of those observations in blue.where . This presents a direct tie between the sample mean of the CMLs' length and the exponential fitting.Table 2 presents a data summary for CMLs in Israel used in this study.
Table 2. Empirical CML distributions based on environment type.Sub-regions are described in the appendix.Note that region "Tel Aviv", for instance, does not apply solely to the city of Tel Aviv but to a more general region [13].

Region
Area (km

Modeling the Relationship between Cellular Microwave-Links' (CMLs) Length and Density
The findings in Table 2 suggest that there may be an underling relationship between the CMLs' density and their mean length for a given region.Following that intuition, an analysis on a larger scale was performed.Given the set of CMLs over Israel, a set of 3626 sub-regions was generated.A moving window with varying dimensions scanned the area of Israel.For every iteration, meaning, for every one of the 3626 windows, there was a subset of CMLs that were captured within the window's bounds.Two features were captured for each such iteration, the CMLs' density and the CMLs' mean length.Thus, 3626 observations of pairs {density, mean length} were observed.Figure 5 presents the scatter plot of those observations in blue.

Approach to Modeling
A non-linear empirical relationship is hinted at in the scatter plot, thus calling for non-linear modeling.With the wide range of non-linear models available, we were only interested in an analytical mathematical formula and not a black-box model.Thus, parametric non-linear regression was chosen.

Methodology
The data set of 3626 observations was split randomly into two, holding out 20% (725) of the observations to be designated as the test set.These observations were not used for model selection or for model training.The test set was used to report model performance after the model was chosen and trained.The remaining 80% (2901 observations) were used for training and validating.

Approach to Modeling
A non-linear empirical relationship is hinted at in the scatter plot, thus calling for non-linear modeling.With the wide range of non-linear models available, we were only interested in an analytical mathematical formula and not a black-box model.Thus, parametric non-linear regression was chosen.

Methodology
The data set of 3626 observations was split randomly into two, holding out 20% (725) of the observations to be designated as the test set.These observations were not used for model selection or for model training.The test set was used to report model performance after the model was chosen and trained.The remaining 80% (2901 observations) were used for training and validating.
We conducted model selection for a variety of parametric formulas.We constrained the pool of models to those having up to two parameters.The motivation for this constraint was to present a simple model.
The following models were examined: Here d is the estimator of d, being the CMLs' density (CMLs/km 2 ), and it is noted as an explicit function of l, the CML's mean length (km).a 1 and a 2 are constant coefficients to be optimized via non-linear regression.They are chosen to be those that minimize the mean squared error (MSE).For example, this is the optimization process for the third model: min (3) Here, I is the number of train observations.We performed a 5-fold cross validation for the training of every model out of the four.The MSE of the five validation iterations was calculated for each model by averaging the five MSEs.The best model was found to be the third model, achieving the lowest averaged validation MSE: The coefficients were derived to be (rounded to two decimal places), a 1 = 3, a 2 = 1.14, yielding an averaged validation MSE = 0.221, and a test MSE = 0.229.
The regressed model is plotted in Figure 5 in black.

Simulating CMLs for Computational Experiments
Capitalizing on the statistical derivations of previous sections, we are now introducing a novel method for synthesizing a data set of CMLs.The motivation behind using computer simulated CMLs rather than real-world CMLs data revolves around these virtues: 1.
It allows for a controlled study.Simulated CMLs allow one to account for every attribute they possess.

2.
It strengthens the integrity of the results.When deducing an outcome of an experiment, since the simulation of the CMLs is controlled, one can determine a clear set of assumptions under which the outcome holds.

3.
It provides statistical robustness of the results.When simulating CMLs, the amount of CMLs is not limited, thus allowing one to utilize as many CMLs as necessary for the statistical experiment.

4.
It introduces a new experimental feature, sensitivity analysis.The computer simulation allows one to tweak the CMLs' parameters and evaluate their effect.

Modeling a Single CML as a Computational Structure
In [5] we discuss the modeling of rain-field reconstruction to a great length.The modeling of CMLs is only discussed shortly, here we elaborate on that.
The computational structure that represents a single CML is a 2-D array.The dimensions of the array represent the physical area of interest (e.g., a 10 × 10 (km 2 ) area), each array element represents a pixel, and thus the size of the array is directly derived from the chosen resolution.A pixel that the CML crosses has a positive value equal to the length of the overlap that the CML has with that pixel.A pixel that the CML does not cross is zeroed.Here we use the terms "pixel" and "array element" interchangeably.Figure 6 portrays how an array models a CML.CML crosses has a positive value equal to the length of the overlap that the CML has with that pixel.A pixel that the CML does not cross is zeroed.Here we use the terms "pixel" and "array element" interchangeably.Figure 6 portrays how an array models a CML.

Key Simulation Factors
Two key factors are to be determined when simulating CMLs.The first is the CMLs' mean length, which is a physical parameter, and the second is the spatial resolution, translating to number of pixels per area, which is not a physical feature but a computational one.Since a CML is represented as a numerical array, the spatial resolution dictates how many elements will be in that array.The higher the resolution, the more pixels the area is being partitioned to.
It may be easy to understand why we do not proclaim the CMLs' orientation to be a key factor.Since we have established that the orientation is completely random and distributed uniformly, there is no distribution parameter that needs to be pre-set to model it.What is not so straightforward is the reason spatial density of CMLs is not declared as a key factor.The answer is computational modularity.Sets of CMLs are generated such that they allow for all relevant spatial densities to be utilized in the experiments they mean to serve.All sets are generated with a sufficiently high number of CMLs that allows for the highest spatial density of CMLs.This way, when lower spatial densities are desired, smaller randomly selected subsets of the CMLs can be utilized.For instance, for an area of 10 × 10 (km 2 ), a set of 1000 CMLs is generated.The experiment of interest requires a density of 2 CMLs per km 2 , thus 200 CMLs are randomly selected, and the fact that there is a "sufficiently high" number of CMLs allows for many Monte-Carlo iterations with different subsets of CMLs.Here "sufficiently high" refers to a number so high that it allows for the maximal number of CMLs to be randomly selected out of the set, multiple times.The contrary would be to limit the set to having "just enough" CMLs, thus allowing only one manner for selecting the maximal number of CMLs by simply selecting the entire set.The latter would not allow for repetitions of the experiment in a Monte-Carlo setting.

The CMLs Simulation Algorithm
Pre-set variables (i.e., variables that are constant throughout the run):

Spatial Resolution-N:
N is the number of pixels the area is being partitioned to.3. Environment Type-Mean CML Length:

Key Simulation Factors
Two key factors are to be determined when simulating CMLs.The first is the CMLs' mean length, which is a physical parameter, and the second is the spatial resolution, translating to number of pixels per area, which is not a physical feature but a computational one.Since a CML is represented as a numerical array, the spatial resolution dictates how many elements will be in that array.The higher the resolution, the more pixels the area is being partitioned to.
It may be easy to understand why we do not proclaim the CMLs' orientation to be a key factor.Since we have established that the orientation is completely random and distributed uniformly, there is no distribution parameter that needs to be pre-set to model it.What is not so straightforward is the reason spatial density of CMLs is not declared as a key factor.The answer is computational modularity.Sets of CMLs are generated such that they allow for all relevant spatial densities to be utilized in the experiments they mean to serve.All sets are generated with a sufficiently high number of CMLs that allows for the highest spatial density of CMLs.This way, when lower spatial densities are desired, smaller randomly selected subsets of the CMLs can be utilized.For instance, for an area of 10 × 10 (km 2 ), a set of 1000 CMLs is generated.The experiment of interest requires a density of 2 CMLs per km 2 , thus 200 CMLs are randomly selected, and the fact that there is a "sufficiently high" number of CMLs allows for many Monte-Carlo iterations with different subsets of CMLs.Here "sufficiently high" refers to a number so high that it allows for the maximal number of CMLs to be randomly selected out of the set, multiple times.The contrary would be to limit the set to having "just enough" CMLs, thus allowing only one manner for selecting the maximal number of CMLs by simply selecting the entire set.The latter would not allow for repetitions of the experiment in a Monte-Carlo setting.

The CMLs Simulation Algorithm
Pre-set variables (i.e., variables that are constant throughout the run): 1.

2.
Spatial Resolution-N: N is the number of pixels the area is being partitioned to.

3.
Environment Type-Mean CML Length: As established in Section 3, simulating a different environment corresponds to choosing a different mean CML length.Values typically range from 1.5 to 10 (km).

Figure 2 .
Figure 2. Typical cellular microwave-link (CML) topologies.All four are such that the number of CMLs is approximately identical to the number of BS [6].

Figure 2 .
Figure 2. Typical cellular microwave-link (CML) topologies.All four are such that the number of CMLs is approximately identical to the number of BS [6].Figure 2. Typical cellular microwave-link (CML) topologies.All four are such that the number of CMLs is approximately identical to the number of BS [6].

Figure 2 .
Figure 2. Typical cellular microwave-link (CML) topologies.All four are such that the number of CMLs is approximately identical to the number of BS [6].Figure 2. Typical cellular microwave-link (CML) topologies.All four are such that the number of CMLs is approximately identical to the number of BS [6].

Figure 3 .
Figure 3.The distribution of CML angles.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 4 .
Figure 4. Lengths distributions for various environments.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 3 .
Figure 3.The distribution of CML angles.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 3 .
Figure 3.The distribution of CML angles.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 4 .
Figure 4. Lengths distributions for various environments.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 4 .
Figure 4. Lengths distributions for various environments.(a) all Israel, (b) top of northern Israel as rural, (c) Hasharon as suburban, (d) Tel Aviv as urban [6].

Figure 5 .
Figure 5.The empirical relationship between a CML's density and mean length in a given region.The black curve was derived through non-linear regression[13].

Figure 5 .
Figure 5.The empirical relationship between a CML's density and mean length in a given region.The black curve was derived through non-linear regression[13].

Figure 6 .
Figure 6.A discrete model of a CML.(a)-a single CML of length 4.8 pixels, (b)-its discrete representation.

Figure 6 .
Figure 6.A discrete model of a CML.(a)-a single CML of length 4.8 pixels, (b)-its discrete representation.