A Deep Learning Model of Radio Wave Propagation for Precision Agriculture and Sensor System in Greenhouses

: The production of crops in greenhouses will ensure the demand for food for the world’s population in the coming decades. Precision agriculture is an important tool for this purpose, supported among other things, by the technology of wireless sensor networks (WSN) in the monitoring of agronomic parameters. Therefore, prior planning of the deployment of WSN nodes is relevant because their coverage decreases when the radio waves are attenuated by the foliage of the plantation. In that sense, the method proposed in this study applies Deep Learning to develop an empirical model of radio wave attenuation when it crosses vegetation that includes height and distance between the transceivers of the WSN nodes. The model quality is expressed via the parameters cross-validation, R 2 of 0.966, while its generalized error is 0.920 verifying the reliability of the empirical model.


Introduction
The increase in demand for crops and food production is associated with the growth of the world population, which according to data from the Food and Agriculture Organization (FAO) of the United Nations, is currently 7.7 billion humans, projected to be 9.4 billion in 2030 and 10.1 billion in 2050, when the world population will need 70% more food, 42% more arable land and 120% more water for food-related purposes [1][2][3][4]. Since traditional outdoor agriculture does not satisfy food production, coupled with the reduction of limited agricultural land for civil works construction, an optimal solution is protected crops called greenhouses that increase the number of harvests. Better yet, when transformed to smart greenhouses using information technology and sensors, can contribute to the increase of agricultural production [5].
In relation to the technological advances of Industry 4.0, cloud computing and the IoT (Internet of Things) contribute to making traditional systems smart [6][7][8]. An example of this process is smart farming (SF) that improves productivity and reduces surplus elements used in crops [9]. On the other hand, within the IoT concept, the role of wireless sensor networks (WSN) is paramount [10,11] because several IoT applications are based on wireless data transmission allowing sensor/actuator nodes to communicate with each other through a wireless network connection, even potentialized within the mMTC (massive machine-type communications) scenario of 5G [12][13][14][15].
Its sensors record variable data in crop fields and transfer it wirelessly to the base station for agricultural decision-making and monitoring [16]. Proper planning of the arrangement of the number of wireless nodes within a greenhouse is a major challenge. Maximum coverage in wireless communication is a research objective to establish a model to determine the attenuation curves of the radio signal when deployed inside the greenhouse.
Several empirical models, such as Weissbeerger's or ITU-R's model for radiowave attenuation, have significant error rates when compared to results obtained in greenhouse field tests because they ignore the antenna height variable in their equations [17,18]. Efforts have been made to improve the predictions through novel models that introduce variable antenna height because foliage in crops has a different density at different spans. Among these, we highlight some that employ linear and polynomial [19][20][21] regressions. However, the best prediction was performed by regularized non-linear regression in [22].
There are several reasons why deep learning models may be useful, even in cases where there is a small amount of data available. First, deep learning models are particularly well-suited for tasks that involve learning from complex high-dimensional data. These types of tasks can be challenging to model using traditional machine learning approaches, but deep learning models are able to learn useful features and patterns directly from the data. Second, deep learning models are able to learn hierarchical representations of the data, with different layers of the model learning to represent different levels of abstraction. This allows the model to learn complex relationships in the data and make more accurate predictions. Third, deep learning models are able to handle large amounts of noise and variability in the data, which can be especially useful in real-world applications where data is often messy and incomplete.
This research aims to improve prediction by means of deep learning, a sub-field of machine learning, a branch of artificial intelligence, to find a new empirical model of attenuation and contrast it with the previous model (regularised regression) to determine whether it offers greater accuracy in its prediction. Until now, with respect to the literature reviewed, we found that this is the first time that, using deep learning, an empirical propagation model has been developed for application to any greenhouse plantation.

Background
Based on the paradigms of Industry 4.0 (Fourth Industrial Revolution), the PA (Precision Agriculture, Third Agricultural Revolution) evolved into Agriculture 4.0 (A4.0) and is also called smart farming (SF) [23]. It integrates information and communication technologies (ICT) into traditional farming practices to monitor a wide range of agricultural parameters that improve crop yields [24]. Both terms (SF and A4.0) related to digital agriculture (DA) are driving change in revolution, sustainability, efficiency, productivity, and food security. This novel paradigm is based on technologies such as IoT, artificial intelligence, big data, cloud computing, and other related smart systems and devices for crop and farm management [25][26][27].
Within this technological scenario, the wireless sensor networks (WSN) provide a local crop monitoring system that enables appropriate decisions to be made in a controlled production system affected by climate change [28,29]. Through wireless data transmission, WSN supports the collection of information in agriculture due to their low cost, minimal power consumption, self-organizing capability, wide area coverage by multi-hop links, and deployment in environments changed by plant growth, with limited power grid [27], contributing to improved agricultural productivity in an environmentally sustainable way [30,31]. The types of sensors for agriculture are set according to the characteristics of each plantation [32,33].
The Received Signal Strength Indicator (RSSI) reveals power values in radio wave propagation. The environment, crop growth, and antenna heights determine RSSI values [34,35]. The models used to predict the RSSI between two transceivers are called propagation models [36].
The Friis model of free space propagation was used to obtain the line-of-sight (LOS) path loss incurred in a free space environment from a transmitter to a receiver, as a relation between the received power to the transmitted power, in terms of effective areas of the receiving (R x ) and transmitting (T x ) antenna through free space [37][38][39][40][41][42][43][44].
In greenhouses, the effects of the vegetation impact in the radio-wave propagation, which occurs with NLOS (non-line of sight). Signals at microwave (1-30 GHz) [45] and millimeter (30-300 GHz) frequencies [14] experience scattering and absorption caused by randomly distributed vegetation leaves and branches [46]. The total path losses are formulated by combining the PL fs model losses with the PL veg vegetation losses predicted by the different vegetation models [19,47,48].
The second category, the empirical model of path loss, was chosen for the present study because of the simplicity with which its equations are formulated, notably those listed in [21,22] based on the EDM (exponential decay model). However, its estimates have a considerable margin of error compared to those taken in field tests prompting us to focus our work to improve them.
Among empirical models, the authors developed an empirical multi-parametric equation model based on non-linear regularised regressions using experimental measurements of the RSSI signal obtained from field test measurements of four greenhouses. In that study, the evaluation of the model with 5th degree polynomials yielded 0.948 for R 2 , 0.946 in R 2 adj (20-parameter solution), and 0.942 for R 2 , y 0.940 en R 2 adj when the equation was reduced to 15 parameters by applying cross-validation [22].
The attenuation of the radio wave inside the greenhouse depends on the signal frequency, antenna height, and distance between antennas, exhibiting a non-linearity behavior. Therefore, an interesting approach can also be applied, taking advantage of machine learning (ML) [49] in order to find the relationship between these non-independent variables. ML builds a model automatically by deducing meaningful ideas (known as features) from the dataset, with feature extraction being the most critical step in a model generation [50]. Then the non-linear features of the input data establish interactions and relationships with the output predictor variables [51]. Analogously, humans use a model of the world as a simulator in our brain, which is obtained by learning from large amounts of data collected by our senses interacting with the surrounding environment [52].
ML collects input and output data to subsequently predict future values [53][54][55]. For the implementation of machine learning algorithms ANNs (artificial neural networks) [56][57][58][59][60]. Based on this architecture, ANNs can be classified into CNNs (convolutional neural networks) [61][62][63] and recurrent neural networks (RNNs) [62,64]. DL (Deep learning) is a form or subfield of ML [65,66]. ANNs are the core algorithms of DL. If the depth or number of layers of the ANN is greater than three, it will cease to be a simple ANN and become a DL algorithm [67], called a deep neural network (DLL), allowing it to successfully interpret more complex non-linear inputs [68][69][70][71]. As mentioned before, although there has been no research using ML in the estimation of radio propagation loss in the presence of vegetation, there are some works related to radio propagation, such as DNN-based, employing CNN for radio propagation loss estimation using spatial information, such as building occupancy maps for input data [72], path loss prediction in rural areas using 3.7 GHz band, combines different ML models, for the base learning stage uses ANN, DT (decision trees), SVR (support vector regression), kNN (k-nearest neighbors), GLM (generalized linear model) and a custom DNN with three hidden layers as meta-learner [73]. The paper by Bogdándy et al. [74] used the log of WiFi RSSI values as input data to determine the indoor positioning of nodes with an ANN. In addition, [75] used ML to obtain an ANN-based model that predicts radio propagation loss characteristics inside tunnels.

Source of Data
All data were collected by Cama-Pinto et al. [21,22]. The experiment was performed in greenhouses located in Almería, southeastern Spain [76][77][78][79]. Vegetable and fruit production is exported mainly to the EU [80][81][82][83][84][85][86][87]. RSSI data are from trials in four greenhouse fields during February 2020, each with areas of 10,000 m 2 in the Almeria localities of La Cañada, Retamar, El Alquian, Níjar, and greenhouse test data from La Cañada in 2018. The total number of data collected were 345. Each experiment was repeated 10 times in 2020 and 60 times in 2018. The data used was the average of the experiments. The outline of the measurement system hardware configuration is detailed by the authors in [88].
As shown in Figure 1, during the measurement phase, the antennas of the T x node and the sink node (R x ) were placed at the same height. The signal arrived at the receiver attenuated after passing through the tomato plant walls (1 m thick) every 5 min, repeating the process 10 times: (1) For the measurement, both the T x node and the sink are located at equal distances from the ground. Every 5 min the R x node records the signal from the T x node, which arrives attenuated. The measurement is repeated 10 times, then the distance between the nodes is increased by adding one more tomato wall and doubling the previous procedure. After the separation increases by adding more tomato plant walls, there comes a point where there is no communication, ending this stage. (2) The T x and R x nodes are moved two meters next to the tomato wall into the side corridor and step 1 is repeated.

Source of Data
All data were collected by Cama-Pinto et al. [21,22]. The experiment was performed in greenhouses located in Almería, southeastern Spain [76][77][78][79]. Vegetable and fruit production is exported mainly to the EU [80][81][82][83][84][85][86][87]. RSSI data are from trials in four greenhouse fields during February 2020, each with areas of 10,000 m 2 in the Almeria localities of La Cañada, Retamar, El Alquian, Níjar, and greenhouse test data from La Cañada in 2018. The total number of data collected were 345. Each experiment was repeated 10 times in 2020 and 60 times in 2018. The data used was the average of the experiments. The outline of the measurement system hardware configuration is detailed by the authors in [88].
As shown in Figure 1, during the measurement phase, the antennas of the Tx node and the sink node (Rx) were placed at the same height. The signal arrived at the receiver attenuated after passing through the tomato plant walls (1 m thick) every 5 min, repeating the process 10 times: (1) For the measurement, both the Tx node and the sink are located at equal distances from the ground. Every 5 min the Rx node records the signal from the Tx node, which arrives attenuated. The measurement is repeated 10 times, then the distance between the nodes is increased by adding one more tomato wall and doubling the previous procedure. After the separation increases by adding more tomato plant walls, there comes a point where there is no communication, ending this stage. (2) The Tx and Rx nodes are moved two meters next to the tomato wall into the side corridor and step 1 is repeated.   The schematic of the top view of the deployment of the T x and R x nodes inside the greenhouse is shown in Figure 2.

A Deep Learning Model of Radio Wave Propagation
A novel deep learning model is proposed in this work based on binary feedforward neural network. It is composed of two layers, an encoding and a decoding layer. The encoding layer converts the distance and the height at which the attenuation is to be known into binary. Since the range of data is limited. The number of bits to determine the integer part and the decimal part will be small. The distance varies from 1 to 35 m, and the height varies from 30 cm to 200 cm. The encoding is done using 14 bits for the distance 7 for the integer part and the other 7 for the decimal part. For height, we used 11 bits, 4 of them for the integer part and the rest for the decimal part, giving two real numbers with two decimals using 25 bits in total. The decoding converts from binary to real number with an accuracy of 3 decimal places, using 17 bits to perform this conversion to the real number, so it used 7 bits for the integer part and the other 10 for the decimal part. The neural network is composed of 7 layers. The first and the last are the input and output layers, respectively. The rest of the layers are hidden. Figure 3 below shows the structure of the deep neural network. The activation function for the perceptrons is the sigmoid function. The input layer has 25 perceptrons corresponding to the 25 input bits, while the output layer has 17 perceptrons corresponding to the 17 output bits.

A Deep Learning Model of Radio Wave Propagation
A novel deep learning model is proposed in this work based on binary feedforward neural network. It is composed of two layers, an encoding and a decoding layer. The encoding layer converts the distance and the height at which the attenuation is to be known into binary. Since the range of data is limited. The number of bits to determine the integer part and the decimal part will be small. The distance varies from 1 to 35 m, and the height varies from 30 cm to 200 cm. The encoding is done using 14 bits for the distance 7 for the integer part and the other 7 for the decimal part. For height, we used 11 bits, 4 of them for the integer part and the rest for the decimal part, giving two real numbers with two decimals using 25 bits in total. The decoding converts from binary to real number with an accuracy of 3 decimal places, using 17 bits to perform this conversion to the real number, so it used 7 bits for the integer part and the other 10 for the decimal part. The neural network is composed of 7 layers. The first and the last are the input and output layers, respectively. The rest of the layers are hidden. Figure 3 below shows the structure of the deep neural network. The activation function for the perceptrons is the sigmoid function. The input layer has 25 perceptrons corresponding to the 25 input bits, while the output layer has 17 perceptrons corresponding to the 17 output bits.  The distance and height values compose the vector X while the estimated attenuation values , or measured dB are Y. The representation of the two vectors X and Y is shown in Figure 4 below, The parametric adjustment of the deep neural network is performed by minimizing the following cost function is, where m is the number of experiment performed in the greenhouse, where for a distance and height given, we obtain a signal attenuation, and K is the total number of bits in the output layer. The logistic function is defined as, ℎ where g is the sigmoid function, 1 1 (3) The parametric adjustment of the deep neural network is performed by minimizing the following cost function is, where m is the number of experiment performed in the greenhouse, where for a distance and height given, we obtain a signal attenuation, and K is the total number of bits in the output layer. The logistic function is defined as, where g is the sigmoid function, To avoid deviations and overfitting of the cost function parameters of Equation (1), the regularization function called Tikhonov regularization [118,119] is added as follows, The network parameters are represented by θ where N is the number of layers, J n is the number of total incoming connections at the n-th layer, and S n is the number of total incoming connections at the n-th layer. λ is the regularising term and establishes the weight that the parameters should have in the cost function, avoiding overfitting and variability in the parameterized functions. In this optimization problem, it is mandatory to determine the gradients in each direction. The gradients can be calculated using the backpropagation algorithm (see Algorithm 1).
Once the gradients have been calculated, the Polac-Ribiere method [120] is used to calculate the conjugate gradients to estimate the search direction. The approximation is performed using quadratic polynomial functions. The stopping criterion used is the so-called Wolfe-Powel conditions [121,122]. Training set x (1) , y (1) , x (2) , y (2) , . . . , x (m) , y (m) 2 For the entire training package 3 It establishes ∆ Compute forward propagation 5 Compute regularized cost function J(θ) 6 Set a (1) = x (i) 7 Perform forward propagation to compute a (n) for n = 2, 3, . . . , N 8 Using Compute The total number of parameters conditions both the training time and the density of perceptrons in the neural network. A study is made of the number of parameters for a given value of λ. The optimal number of perceptrons of the neural network is 15,575. Following this architecture, the value of λ is optimized by choosing values of 0.1, 0.01, 0.001, and 0.0001, resulting in the following ( Figure 6). The optimal number of perceptrons of the neural network is 15,575. Following this architecture, the value of λ is optimized by choosing values of 0.1, 0.01, 0.001, and 0.0001, resulting in the following (Figure 6).
The result suggests the best value of λ. The root mean sum square error (RMSE) remains constant when λ is near to 0.001. Then, the optimal value is set up to 0.001. The value of λ is used in the backpropagation algorithm to avoid bias and overfitting [118,119]. Figure 7 shows the loss function versus the number of epoch. It is necessary for 10,000 epoch to obtain a loss cost value equal a 0.0332. The optimal number of perceptrons of the neural network is 15,575. Following this architecture, the value of λ is optimized by choosing values of 0.1, 0.01, 0.001, and 0.0001, resulting in the following (Figure 6). The result suggests the best value of λ. The root mean sum square error (RMSE) remains constant when λ is near to 0.001. Then, the optimal value is set up to 0.001. The value of λ is used in the backpropagation algorithm to avoid bias and overfitting [118,119]. Figure 7 shows the loss function versus the number of epoch. It is necessary for 10,000 epoch to obtain a loss cost value equal a 0.0332.  Figure 8 shows the solution obtained for the proposed deep learning model. This is the 3D view of the neural network, as can be seen in Figure 3 (Figure 8a), where the values taken in the greenhouse appear as blue dots. The x-axis and y-axis are distance (d) and height (h), respectively, in meters. The z-axis is values when evaluating the deep neural network , ℎ for the distance and height data. Figure 8b shows the residual values between measured data and those calculated with the deep neural network.  Figure 8 shows the solution obtained for the proposed deep learning model. This is the 3D view of the neural network, as can be seen in Figure 3 (Figure 8a), where the values taken in the greenhouse appear as blue dots. The x-axis and y-axis are distance (d) and height (h), respectively, in meters. The z-axis is values when evaluating the deep neural network L f oliage (d, h) for the distance and height data. Figure 8b shows the residual values between measured data and those calculated with the deep neural network. The cross-validation of parameters reveals the quality of the new model. R 2 and Q 2 , these values were 0.966 and 0.957. The RMSECV was 1.98. The deep neural network was also validated by permutation testing.

Results
The values for the evaluation of the multi-parametric optimised function are presented in Table 1. The 0.966 value was the adjusted R 2 .  The cross-validation of parameters reveals the quality of the new model. R 2 and Q 2 , these values were 0.966 and 0.957. The RMSECV was 1.98. The deep neural network was also validated by permutation testing.
The values for the evaluation of the multi-parametric optimised function are presented in Table 1. The 0.966 value was the adjusted R 2 . The radio wave attenuation of deep learning for vegetation in a tomato greenhouse developed in this research improved the predictions of other models; the R 2 was 0.966. Likewise, the adjusted R 2 in the two scenarios was very close, at around 0.964. The MSE and RMSE values were near, which means the deep learning model has a good fit. The AIC and SBC are of similar magnitude, which suggests that no information was lost when applying either model. The deep learning model was compared with other measurements taken in field tests. These measurements were made in 2020 and 2018, all of them were greenhouses producing the same type of tomato (tinkwino), at the peak of production and leafiness (February). The values summarized in Table 2 for all greenhouses, and it reveals that the deep learning model obtained worked well. The generalized error of the model was evaluated with real values for its fit. Table 2 reveals that the R 2 parameter was between 0.908 and 0.935. The generalized R 2 can be considered as its mean. The mean of R 2 was 0.920. RMSE was between 2.88 and 3.33. The generalized RMSE can be considered as its mean was 3.08. This error was larger than that obtained with the model fit values, which is to be expected

Discussion
These results demonstrate the applicability of the novel approach. In addition, field tests established that the highest coverage between the R x and T x nodes occurred when the height of the nodes' antennas was 0.5 m from the ground. Using the deep learning model proposed, the behavioral model of the attenuation of radio waves passing through vegetation was in the 2.4 GHz band widely used by the IEEE 802.15.4 standard. The evaluation of the statistical quality of the models between deep learning and regularised regression with 20 parameters is shown in Table 3, with the former giving better values. As this research has proven, the main problem with predicting attenuation lies in its highly non-linear analytical function. To obtain an expression adjusted to this non-linearity, a 5-degree polynomial with all possible combinations for two variables was needed to be added to the analytical expression usually used. With a non-linear adjustment and performing a novel dimensionality reduction on a regularized cost function, the generalized accuracy of this model is 0.906.
For a high degree of accuracy, the only technique which can handle highly non-linear analytical functions is deep learning. This technique has evolved with the evolution of computation and is currently used for a multitude of systems precisely because of its potential to increase the accuracy of the predictions it makes.
The important feature of this manuscript lies in the fact that it is the first application of the deep learning model for the prediction of electromagnetic attenuations inside a tomato greenhouse; for this case, there is no model in the literature, and it also lays the foundation for further experiments with various crops that, on this architecture, the deep learning model will be able to make attenuation predictions on any greenhouse with any crop. This deep learning model has achieved such an impressive performance that it improves the generalized accuracy by 2 points and reduces the mean square error by one-third.

Conclusions
Experimental measurements of radio wave attenuation inside a tomato greenhouse have been carried out using a wireless sensor network using the 2400 MHz frequency band. Previously recorded attenuation values compared to other empirical attenuation models showed significant errors. It is useful for the planning of the distribution and the deployment of WSN nodes in WSNs applied to precision agriculture.
To enhance the predictions of these empirical models, deep learning neural networks of our own were developed. We ensure that there is no over-fitting or bias in the prediction of the neural network parameters since a L-Curve method is applied. The deep learning model, based on the EDM has the simplicity of using only the variables of antenna height and node distance compared to other models, with height affecting the results the most. The evaluation of the deep learning model demonstrates reliability, obtaining 0.966 for R 2 and 0.964 for R 2 adj, while the generalized error was 0.920 for R 2 and 0.912 for R 2 adj . The methodology presented will serve to generate a deep learning model of the attenuation behavior of wireless signals when passing through vegetation. The model is composed of the exponential attenuation and the compensation function associated with the environment where the radio wave is transmitted. In summary, deep learning models are able to learn from a small amount of labeled data and make use of large amounts of unlabeled data, which can be especially useful in cases where it is difficult or expensive to obtain a large amount of labeled data. In general, deep learning models have a number of advantages that make them well-suited for tasks involving complex, high-dimensional data, as has been the proven case for signal attenuation within a tomato-growing greenhouse. We understand that this is one more step towards the modernization of agriculture, as automation will involve wireless communication inside the greenhouse. Future research will focus on evaluating or refining the model taking into consideration crop growth. Funding: This research received support from the AUIP (Iberoamerican University Association for Postgraduate Studies), by the Spanish Ministry of Science, Innovation, and Universities under the programme "Proyectos de I+D de Generacion de Conocimiento" of the national programme for the generation of scientific and technological knowledge and strengthening of the R+D+I system through grant number PGC2018-098813-B-C33 and by UAL-FEDER 2020, Ref. UAL2020-TIC-A2080.