Wireless Signal Propagation Prediction Based on Computer Vision Sensing Technology for Forestry Security Monitoring

In this paper, Computer Vision (CV) sensing technology based on Convolutional Neural Network (CNN) is introduced to process topographic maps for predicting wireless signal propagation models, which are applied in the field of forestry security monitoring. In this way, the terrain-related radio propagation characteristic including diffraction loss and shadow fading correlation distance can be predicted or extracted accurately and efficiently. Two data sets are generated for the two prediction tasks, respectively, and are used to train the CNN. To enhance the efficiency for the CNN to predict diffraction losses, multiple output values for different locations on the map are obtained in parallel by the CNN to greatly boost the calculation speed. The proposed scheme achieved a good performance in terms of prediction accuracy and efficiency. For the diffraction loss prediction task, 50% of the normalized prediction error was less than 0.518%, and 95% of the normalized prediction error was less than 8.238%. For the correlation distance extraction task, 50% of the normalized prediction error was less than 1.747%, and 95% of the normalized prediction error was less than 6.423%. Moreover, diffraction losses at 100 positions were predicted simultaneously in one run of CNN under the settings in this paper, for which the processing time of one map is about 6.28 ms, and the average processing time of one location point can be as low as 62.8 us. This paper shows that our proposed CV sensing technology is more efficient in processing geographic information in the target area. Combining a convolutional neural network to realize the close coupling of a prediction model and geographic information, it improves the efficiency and accuracy of prediction.


Introduction
Sensing technology plays an increasingly important role in many fields, especially in the field of public security. Among them, sensing technology is widely used in forestry safety monitoring systems. With the continuous development of afforestation, the area of forest land has been increasing year by year, and fire prevention has become the focus of forestry safety work. Forest fires have the characteristics of being sudden, random, and can cause huge losses in a short time. For this reason, video acquisition and coding are carried out at the monitoring points in important forestry areas by placing cameras, and the monitoring system server of the monitoring center is connected through wireless channels, while the video monitoring system software is used for centralized display and unified management. In the construction process of forestry security monitoring system, ensuring that the cameras are always in the signal coverage range of the central base station and other wireless communication facilities is the basis of guaranteeing the video transmission To address the issues above with solutions different from typical methods of ANNs, this paper introduces the ANNs from Computer Vision (CV) that can process images or maps to detect and identify ground objects or terrain patterns, with reasonable modification and improvement, to design two basic CV-based propagation (parameters) prediction schemes. The proposed schemes seem preliminary, but the models underlying them are fundamental, which utilizes map data with terrain information as inputs, and process the propagation related factors represented by the map similarly to the way CV does in object detection or tracking. One proposed scheme maps these factors to a propagation/diffraction loss model, which could also be extended to a more generalized path loss prediction model beyond diffraction effect for irregular terrains with larger area. Another proposed scheme extracts the stochastic properties of shadow fading such as correlation distance, from topographic maps by CV.
The aim of the study is to establish a framework and method for CV-based terrainrelated propagation model parameters prediction, including diffraction loss, shadow fading correlation distance, etc., as well as to verify its validation and efficiency.
There are some related works on applying CV in wireless communications [19][20][21][22][23], including feature extraction of signals' time-frequency spectra, e.g., to generate a fast-fading channel model with the same stochastic properties as a real wireless channel by constructing images from the channel's time-frequency responses with Generative Adversarial Networks [24]; or fast fading model establishment based on Simultaneous Localization And Mapping (SLAM) [25] in which CV is employed to detect objects and reconstruct the 3D spatial model to derive the multi-path properties. But these studies are not considering the large scale propagation models over complex terrains, which differ from the schemes of this paper. To be specific, the main contributions are as follows:

•
Our proposed schemes directly process the maps with terrain information by Convolutional Neural Network (CNN) to obtain large-scale propagation/diffraction loss or shadow fading parameters. A framework including data set generation, network structure, training, and metrics evaluation has been constructed to research into the combination of CV and terrain-related radio propagation. • A direct link is established by a CNN between topographic map to propagation/ diffraction loss for a pair of transmitter and receiver on the map. Furthermore, the pathloss between a transmitter and multiple receivers can be predicted in one batch by taking advantage of the multiple parallel outputs defined for the CNN, which greatly enhances the computation efficiency and lays the foundation for extending the scheme to predict the pathloss from a transmitter to a coverage area in a very fast way.

•
The quantitative relation is found between the terrain fluctuation pattern to correlation distance of shadow fading through a CNN model that can process a map of a coverage area, and the results would help configure radio access networks, e.g., to optimize handover performance.
These two schemes and use cases validate and demonstrate the great potential of CV in this research direction. Even though the paper launches the study from a quite basic propagation model in a limited coverage area, it can help establish a general CNN-based prediction model for further improvement via fine-tuning or transfer learning, as well as better understand the essence of the one-step prediction schemes and gain a deep insight about the most important factors involved, which deals with exploitation of the advantage of CNN in detection of 2-D and even 3-D patterns over a reasonable balance between prediction accuracy and computation efficiency.
ANN demonstrates great potential in predicting wireless physical signal security propagation characteristics also in other technical areas. Generally speaking, the employment of ANN for radio channel prediction is limited to learning, tracking, and predicting the variation of path loss or fading based on some drive test results or theoretical models, which does not directly process and exploit the terrain profile or ground object distributions that affect radio propagation. The complex terrain environment in forest areas has great influence on wireless signal propagation. The traditional forecasting methods are insufficient in predicting efficiency or coupling degree with terrain. Computer vision sensing technology is more efficient in processing the geographic information of the target area. Combined with the wireless signal coverage prediction based on CNN, the prediction model can be closely coupled with the geographic information, and the efficiency, accuracy, and adaptability of the prediction can be improved, which provides an important support for the construction of forestry safety monitoring system.
The remainder of the paper is organized as follows, Section 2 introduces the CNNbased diffraction loss prediction scheme, Section 3 describes the correlation distance extraction scheme for shadow fading based on CNN, Section 4 provides simulation results with analysis, Section 5 discusses some potential future works, and Section 6 concludes the paper.

Diffraction Loss Prediction Based on CNN
We get started from diffraction loss prediction by studying how to adopt CV sensing technology to participate in the establishment of a radio propagation model according to the terrain profile or blocking obstacles information shown in a map, and it is believed that the proposed scheme could be easily extended to scenarios with larger coverage areas and more complicated terrain properties as long as the data set is large and accurate enough. So it can provide more adaptive ability for the prediction of wireless signal propagation range.
Diffraction is an important and complex mechanism in electro-magnetic wave propagation, especially for signals with relatively longer wavelengths such as ultra-short waves, which allow the radio signal to transmit around the obstacles and form a signal coverage behind the obstacles. In Huygen' theorem [26], diffraction is modelled by treating the points at the obstacles' edge as new sources of wavefront that bring the waves to the shadow of the obstacles. In an irregular or hilly terrain, the line of sight path will be blocked by the mountains, thus, the propagation relies heavily on diffraction, and the losses are mainly caused by diffraction. It should be noted that the diffraction loss is related to signal wavelength, sizes of the obstacles, and their geometric distribution, as well as the electrical properties of the materials involved, which is often combined with reflections forming multi-path propagation. So the key to applying CV sensing technology to predict diffraction loss is, how to properly map those factors mentioned above in the 3D space to the overall loss value, especially in a single step, by necessary adaptation of traditional CNNs for processing topographic maps.

Prediction Method
CNN is proposed to be applied for diffraction loss prediction, and the well trained CNN will process a topographic map to generate the loss value between the transmitter and receiver marked on the map. When the altitude profile of an irregular terrain is used for path loss prediction, ray-tracing based method or ITU-R P.1546 model [27] requires some form of 3D reconstruction for the particular scenario before the loss value could be computed, which involves two steps of operation with considerable complexity for calculation of the output just between two positions, being not suitable for efficient predictions between one position to multiple positions or an area as in radio network planning process. Ray-tracing has higher accuracy, but it seems unable to identify and filter the main obstacles' blockage effect with high efficiency to find the rays that make the most contribution for propagation, which makes it hard to reduce the complexity. But when CNN is adopted, it can exploit its advantage to extract the objects' properties such as shapes and boundaries, which represent the terrain profiles directly affecting propagation loss, so we can train the CNN to extract these properties, then map them to a pathloss value, for which a complicated non-linear relation may function behind it. Besides the simple one-step operation of CNN from map to loss, it is proposed to use CNN to predict multiple loss values in parallel along the path from transmitter to receiver to further enhance the efficiency. Figure 1 presents these two methods. Though performing multiple predictions at one time would require a more complicated CNN structure, the CNN's inherent feature that can support multiple output values would balance this negative effect, and with the aid of parallel computing power of GPU, the proposed scheme will significantly increase the prediction efficiency while keeping a good accuracy level.
can train the CNN to extract these properties, then map them to a pathloss value, for which a complicated non-linear relation may function behind it. Besides the simple onestep operation of CNN from map to loss, it is proposed to use CNN to predict multiple loss values in parallel along the path from transmitter to receiver to further enhance the efficiency. Figure 1 presents these two methods. Though performing multiple predictions at one time would require a more complicated CNN structure, the CNN's inherent feature that can support multiple output values would balance this negative effect, and with the aid of parallel computing power of GPU, the proposed scheme will significantly increase the prediction efficiency while keeping a good accuracy level.

Data Set Generation
Data set is generated according to Figure 2. First, a mountain peak with random altitude is generated, and its terrain profile information is recorded in an image, where each pixel with a certain grey level corresponds to the altitude value at that location. Then a traditional diffraction model is used for each picture to calculate the diffraction losses as the labels for CNN to process in the next step. It should be noted that, if field measurement results are available, they can also be employed as labels for training the CNN. This process will be repeated until the data set of maps is large enough. In this paper, 1000 maps are generated for training, 200 maps for testing, and 100 maps for validation. The data set with n = 1300 maps is represented by

=
; For the n-th map, with Y X × altitude pixels and NL diffraction losses recorded.  Figure 3 shows an example of the topographic map in the data set with 3D form, which is generated in a way specified as follows. A mountain is modeled in a map with size of X = 512 by Y = 512 pixels, and the peak amplitude is uniformly distributed in the range of (500 m, 2000 m); the peak is located in the center along the y-axis, but located in a random position along the x-axis. One pixel in the map represents 5 m. This mountain simulates the terrain of a typical mountain forest farm. The coordinate of the transmitter is at (0, 256, 10) in pixels. To obtain multiple diffraction loss values in the map in parallel,

Data Set Generation
Data set is generated according to Figure 2. First, a mountain peak with random altitude is generated, and its terrain profile information is recorded in an image, where each pixel with a certain grey level corresponds to the altitude value at that location. Then a traditional diffraction model is used for each picture to calculate the diffraction losses as the labels for CNN to process in the next step. It should be noted that, if field measurement results are available, they can also be employed as labels for training the CNN. This process will be repeated until the data set of maps is large enough. In this paper, 1000 maps are generated for training, 200 maps for testing, and 100 maps for validation. The data set with n = 1300 maps is represented by D L = {{M 1 , L 1 }, {M 2 , L 2 }, . . . , {M N , L N }}; For the n-th map, M n ∈ R X×Y , L n ∈ R N L with X × Y altitude pixels and N L diffraction losses recorded.
can train the CNN to extract these properties, then map them to a pathloss value, for which a complicated non-linear relation may function behind it. Besides the simple onestep operation of CNN from map to loss, it is proposed to use CNN to predict multiple loss values in parallel along the path from transmitter to receiver to further enhance the efficiency. Figure 1 presents these two methods. Though performing multiple predictions at one time would require a more complicated CNN structure, the CNN's inherent feature that can support multiple output values would balance this negative effect, and with the aid of parallel computing power of GPU, the proposed scheme will significantly increase the prediction efficiency while keeping a good accuracy level.

Data Set Generation
Data set is generated according to Figure 2. First, a mountain peak with random altitude is generated, and its terrain profile information is recorded in an image, where each pixel with a certain grey level corresponds to the altitude value at that location. Then a traditional diffraction model is used for each picture to calculate the diffraction losses as the labels for CNN to process in the next step. It should be noted that, if field measurement results are available, they can also be employed as labels for training the CNN. This process will be repeated until the data set of maps is large enough. In this paper, 1000 maps are generated for training, 200 maps for testing, and 100 maps for validation. The data set with n = 1300 maps is represented by

=
; For the n-th map, altitude pixels and NL diffraction losses recorded.  Figure 3 shows an example of the topographic map in the data set with 3D form, which is generated in a way specified as follows. A mountain is modeled in a map with size of X = 512 by Y = 512 pixels, and the peak amplitude is uniformly distributed in the range of (500 m, 2000 m); the peak is located in the center along the y-axis, but located in a random position along the x-axis. One pixel in the map represents 5 m. This mountain simulates the terrain of a typical mountain forest farm. The coordinate of the transmitter is at (0, 256, 10) in pixels. To obtain multiple diffraction loss values in the map in parallel,  Figure 3 shows an example of the topographic map in the data set with 3D form, which is generated in a way specified as follows. A mountain is modeled in a map with size of X = 512 by Y = 512 pixels, and the peak amplitude is uniformly distributed in the range of (500 m, 2000 m); the peak is located in the center along the y-axis, but located in a random position along the x-axis. One pixel in the map represents 5 m. This mountain simulates the terrain of a typical mountain forest farm. The coordinate of the transmitter is at (0, 256, 10) in pixels. To obtain multiple diffraction loss values in the map in parallel, multiple labels will be calculated correspondingly, where the assumed multiple receivers are deployed along the line between position (413, 156, 10) and position (512, 156, 10), with a space of one pixel between the adjacent two of them, as indicated by the short red line in Figure 3, i.e., 100 labels will be needed for the CNN to output N L = 100 loss values simultaneously. multiple labels will be calculated correspondingly, where the assumed multiple receivers are deployed along the line between position (413, 156, 10) and position (512, 156, 10), with a space of one pixel between the adjacent two of them, as indicated by the short red line in Figure 3, i.e., 100 labels will be needed for the CNN to output NL = 100 loss values simultaneously. When the topographic maps are generated, a typical single-edge diffraction model is used to calculate multiple labels of the loss values. This paper proposes a generalized prediction framework using CV to focus on the validation, efficiency, and generalization ability of the proposed model. Therefore, this paper trains CNN network with a general diffraction loss model, so as to predict the diffraction loss under most topographic conditions. When this model is applied to a practical system with unique terrain-related wireless propagation characteristics, as consequences of unique vegetation and soil, field collected data can be used to learn new terrain-related features through fine-tuning or transfer learning on the basis of the proposed generalized CNN.
In this paper, the diffraction loss model is determined by a single parameter v that combines all the geometric parameters involved, and calculated as follows, where λ = c/f is wavelength with c as light of speed and f as the radio frequency; geometrical parameters h, d1, d2 are illustrated as in Figure 4. h represents the height above the connection of transmitter and receiver and d1 and d2 are the distances between transmitter and the knife edge and receiver and the knife edge, respectively. A single parameter v combining all the geometric parameters is first derived by Equation (5); then Equations (3) and (4) give the sine term and cosine term of Fresnel integral respectively; the linear When the topographic maps are generated, a typical single-edge diffraction model is used to calculate multiple labels of the loss values. This paper proposes a generalized prediction framework using CV to focus on the validation, efficiency, and generalization ability of the proposed model. Therefore, this paper trains CNN network with a general diffraction loss model, so as to predict the diffraction loss under most topographic conditions. When this model is applied to a practical system with unique terrain-related wireless propagation characteristics, as consequences of unique vegetation and soil, field collected data can be used to learn new terrain-related features through fine-tuning or transfer learning on the basis of the proposed generalized CNN.
In this paper, the diffraction loss model is determined by a single parameter v that combines all the geometric parameters involved, and calculated as follows, where where λ = c/f is wavelength with c as light of speed and f as the radio frequency; geometrical parameters h, d 1 , d 2 are illustrated as in Figure 4. h represents the height above the connection of transmitter and receiver and d 1 and d 2 are the distances between transmitter and the knife edge and receiver and the knife edge, respectively. A single parameter v combining all the geometric parameters is first derived by Equation (5); then Equations (3) and (4) give the sine term and cosine term of Fresnel integral respectively; the linear value of diffraction loss is calculated by Equation (2) with the two terms of Fresnel integral; finally the dB value of diffraction loss is derived by Equation (1). value of diffraction loss is calculated by Equation (2) with the two terms of Fresnel integral; finally the dB value of diffraction loss is derived by Equation (1).

CNN Structure and Performance Metrics
A CNN structure similar to VGG (Visual Geometry Group) is adopted as shown in Figure 5. A typical VGG structure consists of several groups of convolutional layers followed with a pooling layer as well as several fully connected layers at the end of the network. The convolutional-and-pooling layer groups are capable of 2D image information extraction such as object edges detection and terrain parameter extraction; the fully connected layers are able to construct complex mapping functions, for instance, from terrain parameters to terrain-related diffraction loss. The ReLu activation function is adopted after each convolutional layer and fully connected layer, which is also known as ramp function. Considering that the geometric parameter extraction of the terrain profile would not be a challenging task due to the simple layout of the single edge obstacle, we employ a network structure different from typical VGG, i.e., a 5 × 5 convolution core is used in convolutional layer with relatively less channels. As this prediction task poses a high precision requirement for the loss model, several neural networks with different numbers of neurons are investigated in the full connection layer. The purpose of designing these networks is to verify whether more neurons would have the ability to better fit the mapping from geometric parameters to diffraction loss, so as to guide further network design in similar tasks using ANN. The input is single-channel topographic maps in the size of 512 × 512 pixels. In Figure 5, "5 × 5 conv, 4" means a convolutional layer with 5 × 5 convolution core and 4 channels; "2 × 2 pool" represents a pooling layer with a 2 × 2 max pooling core; "FC 1000" stands for a fully connected layer with 1000 neurons. Moreover, after each pooling layer, the length and width of the map are halved, for instance, from "512 × 512" to "256 × 256" after the first pooling layer. The output is a series of predicted diffraction losses, with the total number NL = 100. The loss function is defined as mean square error (MSE) between the predicted loss value and the corresponding label value, both in dB, expressed by ( ) where i is the index of a predicted loss for a receiver position among all the NL locations, i l is the label's value, and i lˆ is the predicted loss value.

CNN Structure and Performance Metrics
A CNN structure similar to VGG (Visual Geometry Group) is adopted as shown in Figure 5. A typical VGG structure consists of several groups of convolutional layers followed with a pooling layer as well as several fully connected layers at the end of the network. The convolutional-and-pooling layer groups are capable of 2D image information extraction such as object edges detection and terrain parameter extraction; the fully connected layers are able to construct complex mapping functions, for instance, from terrain parameters to terrain-related diffraction loss. The ReLu activation function is adopted after each convolutional layer and fully connected layer, which is also known as ramp function. Considering that the geometric parameter extraction of the terrain profile would not be a challenging task due to the simple layout of the single edge obstacle, we employ a network structure different from typical VGG, i.e., a 5 × 5 convolution core is used in convolutional layer with relatively less channels. As this prediction task poses a high precision requirement for the loss model, several neural networks with different numbers of neurons are investigated in the full connection layer. The purpose of designing these networks is to verify whether more neurons would have the ability to better fit the mapping from geometric parameters to diffraction loss, so as to guide further network design in similar tasks using ANN. The input is single-channel topographic maps in the size of 512 × 512 pixels. In Figure 5, "5 × 5 conv, 4" means a convolutional layer with 5 × 5 convolution core and 4 channels; "2 × 2 pool" represents a pooling layer with a 2 × 2 max pooling core; "FC 1000" stands for a fully connected layer with 1000 neurons. Moreover, after each pooling layer, the length and width of the map are halved, for instance, from "512 × 512" to "256 × 256" after the first pooling layer. The output is a series of predicted diffraction losses, with the total number N L = 100. The loss function is defined as mean square error (MSE) between the predicted loss value and the corresponding label value, both in dB, expressed by where i is the index of a predicted loss for a receiver position among all the N L locations, l i is the label's value, andl i is the predicted loss value. Two performance metrics are used to evaluate the accuracy of the neural network after training: the distribution of absolute errors and the distribution of normalized errors obtained by test data set, respectively expressed as:

Prediction Method
The correlation distance of shadow fading is important for coverage analysis or optimizing of forest district wireless network structure. The prediction method about it is similar to that for diffraction loss, which is carried out by detecting and extracting the fluctuation properties of the terrain profile in a map. For instance, when a VGG network is used to process the topographic maps with grey-scale, some existing channel model such as ITU-R P.1546 or other deterministic models can be used as the reference to produce shadow fading values as labels, or field test results can be adopted as labels, to train the neural network for obtaining shadow fading parameters, i.e., correlation distance in this paper. However, if our main purpose is to validate the proposed CV sensing technology solution from a theoretical perspective and to gain some insight, we can make certain simplifications on the generation of the data set, in other words, shadow fading random variables will be generated for each pixel on the data set maps that follow certain statistical property, as these values are corresponding directly to the correlation distance of shadow Two performance metrics are used to evaluate the accuracy of the neural network after training: the distribution of absolute errors and the distribution of normalized errors obtained by test data set, respectively expressed as: normalized error(%) = l i − l i l i (8)

Prediction Method
The correlation distance of shadow fading is important for coverage analysis or optimizing of forest district wireless network structure. The prediction method about it is similar to that for diffraction loss, which is carried out by detecting and extracting the fluctuation properties of the terrain profile in a map. For instance, when a VGG network is used to process the topographic maps with grey-scale, some existing channel model such as ITU-R P.1546 or other deterministic models can be used as the reference to produce shadow fading values as labels, or field test results can be adopted as labels, to train the neural network for obtaining shadow fading parameters, i.e., correlation distance in this paper. However, if our main purpose is to validate the proposed CV sensing technology solution from a theoretical perspective and to gain some insight, we can make certain simplifications on the generation of the data set, in other words, shadow fading random variables will be generated for each pixel on the data set maps that follow certain statistical property, as these values are corresponding directly to the correlation distance of shadow fading caused by the fluctuating terrain profile or the objects on the ground. Accordingly, the maps of shadow fading will be used as the input to CNN, and the correlation distance of the shadow fading will serve as the output predicted by the CNN in just one step, as similar to Section 2.

Data Set Generation
The basic procedure here is similar to the one in Section 2.2, which generates a large number of maps with pixels being random shadow fading values. The correlation distance for the shadow fading values in one map is assumed to be fixed, but could be different from map to map. Then, the maps will be used as train set, test set, and validation set. The data set also includes 1000 maps for training, 200 for testing, and 100 for validation (n = 1300), represented by D SF = {{SF corr,1 , d 1 }, {SF corr,2 , d 2 }, . . . , {SF corr,N , d N }}; For the n-th map, SF corr,n ∈ R X ×Y , d n ∈ R with X × Y shadow fading pixels and a correlation distance recorded. To distinguish the coordinate symbols from Section 2, we use X , Y , x', and y in this section.
The correlated shadow fading values are obtained according to [28]. An independent lognormally distributed random variable SF uncorr,n,x ,y ∼ N 0,σ 2 is first generated for each grid in the map. In order to introduce correlation, the generated uncorrelated shadow fading maps are processed as follows: where where F 2D (•) is 2D discrete Fourier transformation, calculated as: H(u, v) is the 2D discrete Fourier transformation of X k ; S W (u, v) is the 2D Fourier transformation of a 2D correlation function, represented as below: where L denotes the correlation distance of shadow fading and • 2 is the L2-norm operator. Finally, the correlated shadow fading map will be computed by 2D inverse discrete Fourier transformation of S Z (u, v): and the 2D inverse discrete Fourier transformation is as follows: In order to train the neural network to extract the correlation distance of shadow fading, we assume the correlation distance follows a uniform distribution in a certain range for the whole data set, and each map in the set has a unique correlation distance, but shares the same standard deviation for simplicity. An example of the shadow fading map is shown as in Figure 6. Sensors 2021, 21, x FOR PEER REVIEW 10 of 17  Figure 7 displays the VGG network structures under consideration, where net D, net E, and net F are based on VGG11, VGG13, and VGG 16 [29,30], respectively. The input map has 224 × 224 ( ' ' Y X × ) pixels of shadow fading values, which will lead to a corresponding correlation distance as output. Different from the diffraction loss prediction task, the main challenge of correlation distance extraction is image pattern recognition rather than complex mapping fitting. Therefore, VGG11, VGG13, and VGG16 with different convolution layer depths are used to verify the impact of convolution layer depth on the performance of correlation distance pattern recognition. We also expect that this research will provide reference and guidance for further similar research.

CNN Structure and Performance Metrics
Similar to Section 2.3, the MSE loss function is adopted as the criteria for training CNN. The distribution of absolute and normalized errors in test data set are used to evaluate the accuracy performance, respectively calculated as:  Figure 7 displays the VGG network structures under consideration, where net D, net E, and net F are based on VGG11, VGG13, and VGG 16 [29,30], respectively. The input map has 224 × 224 (X × Y ) pixels of shadow fading values, which will lead to a corresponding correlation distance as output. Different from the diffraction loss prediction task, the main challenge of correlation distance extraction is image pattern recognition rather than complex mapping fitting. Therefore, VGG11, VGG13, and VGG16 with different convolution layer depths are used to verify the impact of convolution layer depth on the performance of correlation distance pattern recognition. We also expect that this research will provide reference and guidance for further similar research.

CNN Structure and Performance Metrics
Similar to Section 2.3, the MSE loss function is adopted as the criteria for training CNN. The distribution of absolute and normalized errors in test data set are used to evaluate the accuracy performance, respectively calculated as:

Simulation Results
The training and evaluation processes for the CNNs are based on the data sets generated as described in Sections 2.2 and 3.2, respectively. For each task, in total, 1000 maps are generated as train set, 200 maps for test set, and 100 maps for validation set. Back propagation method [31] is adopted for CNN training in this paper, and the main hyperparameters for CNN training are shown in Table 1. The adopted training hyperparameters are determined after artificial attempts to ensure that the CNNs for the two tasks would have good performance after training. The second task uses a larger batch size (the number of pictures fed to the network during a parameter update process in back-propagation method) to learn general pattern characteristics of the images, while the first task uses a smaller batch size to fit the mapping from terrain to diffraction loss more accurately. The adoption of exponential decay learning rate can dynamically reduce the learning rate in the training process, so as to reduce the risk of over fitting and further improve the network performance. ReLu activation function and Adam optimizer are used to improve the training efficiency. Under the assumptions of the above training parameters, the two types of tasks converge at 200 and 100 epochs (the whole training data set is fed to the network once in a training epoch), respectively. The radio signal frequency is set to 600 MHz, which is capable of wide coverage in 5G frequency bands. The data sets are generated with Matlab; CNN is established, trained, and tested using Python with Tensorflow framework.

Simulation Results
The training and evaluation processes for the CNNs are based on the data sets generated as described in Sections 2.2 and 3.2, respectively. For each task, in total, 1000 maps are generated as train set, 200 maps for test set, and 100 maps for validation set. Back propagation method [31] is adopted for CNN training in this paper, and the main hyperparameters for CNN training are shown in Table 1. The adopted training hyperparameters are determined after artificial attempts to ensure that the CNNs for the two tasks would have good performance after training. The second task uses a larger batch size (the number of pictures fed to the network during a parameter update process in back-propagation method) to learn general pattern characteristics of the images, while the first task uses a smaller batch size to fit the mapping from terrain to diffraction loss more accurately. The adoption of exponential decay learning rate can dynamically reduce the learning rate in the training process, so as to reduce the risk of over fitting and further improve the network performance. ReLu activation function and Adam optimizer are used to improve the training efficiency. Under the assumptions of the above training parameters, the two types of tasks converge at 200 and 100 epochs (the whole training data set is fed to the network once in a training epoch), respectively. The radio signal frequency is set to 600 MHz, which is capable of wide coverage in 5G frequency bands. The data sets are generated with Matlab; CNN is established, trained, and tested using Python with Tensorflow framework.  Figure 8 gives the loss variations during the training iterations. Figure 9 compares the errors' distributions for the three types of CNN structures after training, presented by cumulative distribution function (CDF) curves, and their performances are summarized in Table 2. It can be seen that, with the increasing number of neurons in the networks from net A to net C, the prediction accuracy is enhanced significantly, with a slightly prolonged processing time. Net C has the best error performance, while net A has the shortest processing time. The results in Table 2 were tested on the hardware platform of GTX1080Ti, with the powerful parallel processing capability of GPU and large memory. Even the computation workload of net B can be as twice as that of net A, but their processing times are almost the same. So the appropriate CNN structure should be carefully selected according to the specific error requirement of a pathloss prediction task as well as the computation power available.   Figure 8 gives the loss variations during the training iterations. Figure 9 compares the errors' distributions for the three types of CNN structures after training, presented by cumulative distribution function (CDF) curves, and their performances are summarized in Table 2. It can be seen that, with the increasing number of neurons in the networks from net A to net C, the prediction accuracy is enhanced significantly, with a slightly prolonged processing time. Net C has the best error performance, while net A has the shortest processing time. The results in Table 2 were tested on the hardware platform of GTX1080Ti, with the powerful parallel processing capability of GPU and large memory. Even the computation workload of net B can be as twice as that of net A, but their processing times are almost the same. So the appropriate CNN structure should be carefully selected according to the specific error requirement of a pathloss prediction task as well as the computation power available.   Figure 8 gives the loss variations during the training iterations. Figure 9 compares the errors' distributions for the three types of CNN structures after training, presented by cumulative distribution function (CDF) curves, and their performances are summarized in Table 2. It can be seen that, with the increasing number of neurons in the networks from net A to net C, the prediction accuracy is enhanced significantly, with a slightly prolonged processing time. Net C has the best error performance, while net A has the shortest processing time. The results in Table 2 were tested on the hardware platform of GTX1080Ti, with the powerful parallel processing capability of GPU and large memory. Even the computation workload of net B can be as twice as that of net A, but their processing times are almost the same. So the appropriate CNN structure should be carefully selected according to the specific error requirement of a pathloss prediction task as well as the computation power available.  To highlight the advantage of the proposed method, we also did some experiments about predicting just one loss value at a time with other conditions unchanged, which means that the CNN is simplified to output just one value at a time. In this case, the normalized error for 95% (percentile) samples is below 4%, and the processing time ranges from 6.15 to 6.3 ms. It can be observed that, the processing time per image or per map for the CNN with n parallel outputs is almost the same as that for the CNN with only one output, which can be translated to a nearly n (=100) times of efficiency enhancement for obtaining each pathloss value on one pixel. Meanwhile, the gain in computation efficiency comes at some cost of reduced accuracy, but it seems to be a reasonable and acceptable tradeoff, for example, regarding net C, its normalized error for 95% samples is below 8.238%, just a little higher than the error around 4% for the CNN with one output.

Results of Shadow Fading Correlation Distance Extraction
To generate the data set of shadow fading maps, the standard deviation is fixed at 2dB, and correlation distances are uniformly distributed in the range (5,15) in the pixels. The size of the map is 224 × 224 pixels, and one pixel can correspond to different distances in the real world with varying scaling levels. The loss variation during the training process is shown in Figure 10, and the error performances are compared in Figure 11. To highlight the advantage of the proposed method, we also did some experiments about predicting just one loss value at a time with other conditions unchanged, which means that the CNN is simplified to output just one value at a time. In this case, the normalized error for 95% (percentile) samples is below 4%, and the processing time ranges from 6.15 to 6.3 ms. It can be observed that, the processing time per image or per map for the CNN with n parallel outputs is almost the same as that for the CNN with only one output, which can be translated to a nearly n (=100) times of efficiency enhancement for obtaining each pathloss value on one pixel. Meanwhile, the gain in computation efficiency comes at some cost of reduced accuracy, but it seems to be a reasonable and acceptable tradeoff, for example, regarding net C, its normalized error for 95% samples is below 8.238%, just a little higher than the error around 4% for the CNN with one output.

Results of Shadow Fading Correlation Distance Extraction
To generate the data set of shadow fading maps, the standard deviation is fixed at 2dB, and correlation distances are uniformly distributed in the range (5,15) in the pixels. The size of the map is 224 × 224 pixels, and one pixel can correspond to different distances in the real world with varying scaling levels. The loss variation during the training process is shown in Figure 10, and the error performances are compared in Figure 11.  It can be seen from the results that, for extracting the correlation distance from shadow fading maps, the neural networks differ in convergence speed during training and in prediction error. Table 3 summarizes the error performances. For all the three networks, the 95% of samples have an error lower than 7%, and net F (VGG16) has the lowest median error; yet for other percentile values, net D indicates better performance. This is because Net F is able to learn more complex characteristics at the cost of losing universality, which might be attributed to the better fitting accuracy from improved depth of convolution layer; on the contrary, Net D is more capable from the perspective of universality. From this perspective, further study may be needed to optimize the CNN structure. The correlation distance is deeply connected with the image patterns of the shadow fading map, which enables the CV sensing technology to take its advantage in object detection. But compared with the normalized error results of diffraction loss prediction, correlation distance prediction has a generally wider error distribution, which could also be explained as that, object detection would be more difficult on a random shadow fading map than on a topographic map with a single-edge-shaped mountain.
According to the discussion above, a preliminary observation could be made: for a terrain profile with significant 2D or 3D shape features, or for a terrain with less random properties, CV sensing technology would have more advantage to predict propagation loss-related parameters thanks to its inherent capability of object detection. For a real map that contains terrain or building information, though it looks 'random', the natural ground objects like rivers, mountains or valleys, etc., have some 2D or 3D features to be recognized or detected by CNN, not to mention the man-made buildings and roads with unique visual features. So we believe that the proposed basic scheme of using CV sensing technology to large and even small scale radio channel modeling would have great potential to predict the propagation range of wireless signals in more complex scenarios or larger geographical areas. Additionally, it has great significance to support the construction of a large area forest safety monitoring system. It can be seen from the results that, for extracting the correlation distance from shadow fading maps, the neural networks differ in convergence speed during training and in prediction error. Table 3 summarizes the error performances. For all the three networks, the 95% of samples have an error lower than 7%, and net F (VGG16) has the lowest median error; yet for other percentile values, net D indicates better performance. This is because Net F is able to learn more complex characteristics at the cost of losing universality, which might be attributed to the better fitting accuracy from improved depth of convolution layer; on the contrary, Net D is more capable from the perspective of universality. From this perspective, further study may be needed to optimize the CNN structure. The correlation distance is deeply connected with the image patterns of the shadow fading map, which enables the CV sensing technology to take its advantage in object detection. But compared with the normalized error results of diffraction loss prediction, correlation distance prediction has a generally wider error distribution, which could also be explained as that, object detection would be more difficult on a random shadow fading map than on a topographic map with a single-edge-shaped mountain.
According to the discussion above, a preliminary observation could be made: for a terrain profile with significant 2D or 3D shape features, or for a terrain with less random properties, CV sensing technology would have more advantage to predict propagation loss-related parameters thanks to its inherent capability of object detection. For a real map that contains terrain or building information, though it looks 'random', the natural ground objects like rivers, mountains or valleys, etc., have some 2D or 3D features to be recognized or detected by CNN, not to mention the man-made buildings and roads with unique visual features. So we believe that the proposed basic scheme of using CV sensing technology to large and even small scale radio channel modeling would have great potential to predict the propagation range of wireless signals in more complex scenarios or larger geographical areas. Additionally, it has great significance to support the construction of a large area forest safety monitoring system.

Future Work
Based on the proposed modelling method and schemes, more research could be carried out in the future.
First, the proposed schemes in this paper can be extended to a more complicated scenario with longer distance between transmitter and receiver, e.g., for a propagation path with multiple mountains. We think that the prediction performance by CV sensing technology is mainly affected by the size of the data set and the accuracy of the labels. There would be two types of solutions to accomplish this. One is to introduce a more complicated convolution layer design to directly process maps of a larger area, with some pre-process of filtering to reduce the randomness of the large scale maps. The other is to divide the longer propagation path into several smaller segments, then let the less complicated CV sensing network to process these segments before the pathloss values in each segment could be accumulated to the overall loss value. For these two types of solutions, the former is expected to have lower error, but higher complexity, while the latter may have lower complexity but higher error, and how to optimize them or to search for other solutions needs further investigation, especially with the support of the field test results.
Second, the CV sensing network structures should be studied for their applications in a more practical and more complicated propagation scenarios. The neural networks in this paper can only process scenarios with fixed pre-configured parameters such as frequency, transmitter-receiver distance, and locations in grey scale maps, and are not able to be flexibly adjusted for varying parameter settings, or to exploit the multi-dimension information from colorful images. For example the current scheme may adopt a three channel image input with different colors to model the impact of air refraction or absorption. To flexibly support more diversified evaluation requirements, more inputs can be added to the neural networks, e.g., some inputs can be introduced at the beginning part of the fully connected layer to control the classification of operation for different frequency bands, or new inputs can be added for controlling map scales and modelling normalized locations for transmitters or receivers, etc. The data sets should also be extended correspondingly. All the potential issues involved would bring about more research directions in the future. The physical security propagation prediction method of wireless signal will also have stronger adaptability and accuracy in these research processes.

Conclusions
It is proposed in this paper that CNN structures can be applied in a novel way in CV sensing technology to process maps with terrain profiles or shadow fading information, so that the related pathloss model or statistical properties of shadow fading can be obtained directly, which can be used in a forest safety monitoring system. First, CNNs with multiple outputs are used to predict a batch of diffraction losses at multiple locations in a map with very high efficiency. The data set is generated in a form of maps containing random edgeshaped mountains, labelled by diffraction losses, calculated based on a typical mathematic diffraction model. Then, the CNNs are trained and tested to predict diffraction losses, with their structures being studied in terms of accuracy and efficiency. It is also discussed how to extend this type of CNNs to operate in more complicated propagation scenarios with longer distances. Second, CNNs are also adopted to extract shadow fading parameters such as correlation distances. Shadow fading maps are generated with fixed standard deviation and randomly configured correlation distances to create the data set. Then, the structures and performances of the trained CNNs are analyzed. Finally, through extensive comparisons and analysis, certain basic principles or potential guidelines are discussed for designing and applying CV sensing technology based on CNNs for a wireless physical signal security propagation model, followed by some related future works being provided. At the same time, this paper provides a theoretical support for the forestry security monitoring technology of wireless signal propagation prediction.