On the Classification of a Greenhouse Environment for a Rose Crop Based on AI-Based Surrogate Models

A precise microclimate control for dynamic climate changes in greenhouses allows the industry and researchers to develop a simple, robust, reliable, and intelligent model. Accordingly, the objective of this investigation was to develop a method that can accurately define the most suitable environment in the greenhouse for an optimal yield of roses. Herein, an optimal and highly accurate BO-DNN surrogate model was developed (based on 300 experimental data points) for a quick and reliable classification of the rose yield environment considering some of the most influential variables including soil humidity, temperature and humidity of air, CO2 concentration, and light intensity (lux) into its architecture. Initially, two BO techniques (GP and GBRT) are used for the tuning process of the hyper-parameters (such as learning rate, batch size, number of dense nodes, number of dense neurons, number of input nodes, activation function, etc.). After that, an optimal and simple combination of the hyper-parameters was selected to develop a DNN algorithm based on 300 data points, which was further used to classify the rose yield environment (the rose yield environments were classified into four classes such as soil without water, correct environment, too hot, and very cold environments). The very high accuracy of the proposed surrogate model (0.98) originated from the introduction of the most vital soil and meteorological parameters as the inputs of the model. The proposed method can help in identifying intelligent greenhouse environments for efficient crop yields.


Introduction
Climate change throughout the globe is affecting agricultural production due to increasing temperatures, fluctuating precipitation patterns, and rising corban dioxide concentrations in the atmosphere. In these changing environmental conditions, greenhouse crop cultivation is preferred compared with open field growing. The cultivation of crops in the greenhouse prolongs the agricultural growing season, protect yields against weather variations, offers a reliable growing ecosystem, and thus maximizes productivity. Thus, it is essential to adopt precision agriculture techniques in order to maintain the ideal environmental parameters such as humidity, carbon dioxide, and temperature along with soil moisture and nutrients in accordance with the crop growth cycle [1,2]. Exposure to uneven environmental factors produces stress, disease, or even a fall in the crops, resulting in substantial financial losses to growers [3]. Greenhouse weather control mechanisms need to consider multivariate and non-linear systems with variables greatly dependent on the external environment and the design of the greenhouse [4,5], even though the greenhouse cannot be independently controlled. Thus, developing a precise climate model in a greenhouse is an essential approach to control these dynamic climate changes and attain proficient climate management.
Greenhouse environment models can be developed on either the physical laws driving ecological cycles, or the interpretation of data obtained from such processes. With the development of high-performance computational systems, several analytical models [6][7][8] have been developed. Yet, this methodology may produce inconsistent outcomes when applied to true environmental conditions due to the complexity of these models and the frequent need for calculation and the approximation of unmeasurable parameters, for example, water vapor pressure, biological factors, rate of photosynthesis, soil heat flux density, and other factors [9]. On the contrary, due to the advancement of existing computational strategies, deep leaning prediction models based on big data [10] are being progressively applied to several fields. ANN models are incredible predicting tools [11,12] because of their capabilities to model systems without making assumptions [13] and to evaluate nonlinear systems. The most significant benefits of deep learning models over several classes of nonlinear models is that ANN models can approximate a vast group of functions with a high level of precision [14]. This approach delivers swift and reliable results for precision agriculture applications, namely, the climate estimation of greenhouses [15], the growth of plants, and the detection of stress compared to existing physical models [6,7].
For the generation and collection of data, smart greenhouses are equipped with IoTs, wireless sensor networks (WSNs), and actuators [16]. Sensors sense the atmosphere in the greenhouse and measure temperature, light intensity, humidity, CO 2 levels, pressure, etc. If any irregularity is detected in the environmental conditions of the greenhouse, the ANN-based central control station directs actuators to execute required actions such as watering the crops, increasing or decreasing the light intensity, opening and closing windows, etc.
Besides, an appropriate comprehension of the variations of different parameters in the greenhouse climate related to the requirements of the particular crop at various development phases needs more consideration. As rose plants are susceptible to large variations in temperature, light, and humidity, the cultivation of greenhouse roses in geographical areas with environmental conditions that are not satisfactorily near the base prerequisites will encompass added risks and costs of production [17]. Exclusively relying upon the parameter measurement data from sensors is insufficient to obtain solid harvests in the greenhouse. Having a profound learning model for forecasting the future air parameters will assist in keeping up with the climate [18]. For instance, having the predicted values of temperature, CO 2 , and humidity assist in maintaining the flower size and a high yield, and can prevent the growth of pests that harm the rose plants. Additionally, predicting greenhouse climate changes will help in the event of sensor breakdown and will reduce the energy utilization in the greenhouse [3].

Aims and Motivation
Roses are amongst some of the most highly marketed flowers globally and have ruled the flower market since the 1990s owing to their year-long availability and the ever-increasing demand in beauty products and from the decoration industry. Roses are a functional food product similar to barley and other crops [19]. Natural environmental conditions are not always optimum to achieve the growing demand of crop requirements [20,21]. Extreme weather conditions such as exposure to direct sun, hail, biotic, and abiotic stresses can critically damage the product quality and yield [22]. Therefore, greenhouses are increasingly being used, since they can adjust the interior environmental parameters through artificial lights, aeration, and heating and ventilation systems [23]. Thus, crop growing cycles can be designed based on market demands. The environmental parameters required for the appropriate growth of roses are relative humidity, CO 2 concentration, soil humidity, air temperature, light intensity, and the electrical conductivity of soil (see Figure 1). are increasingly being used, since they can adjust the interior environmental parameters through artificial lights, aeration, and heating and ventilation systems [23]. Thus, crop growing cycles can be designed based on market demands. The environmental parameters required for the appropriate growth of roses are relative humidity, CO2 concentration, soil humidity, air temperature, light intensity, and the electrical conductivity of soil (see Figure 1).
There are several analytical models for the interpretation of the data collected from wireless sensor networks or IoTs, but these models may produce inconsistent outcomes when applied to true environmental conditions due to the high complexity of these models and the frequent need for calculation and the approximation of unmeasurable parameters. Based on the aforementioned discussion, it is extremely crucial to develop a method particularly for AI-based methods [24] that can accurately define the most suitable environment in the greenhouses for rose yield production. This is because the AI-based methods have gained lot of success in agriculture during recent years in relation to crop yield production, detection, precision agriculture, and so on [25][26][27][28][29]. To the best of the authors' knowledge, only a single study is available in the literature regarding the use of AI in the rose's greenhouse environment. This study presents the ANN and ANFIS methods to forecast the risk level for pests in the rose greenhouse [30]. Other than this, no study is available in the literature on this subject. The present study is the first of its kind in classifying the greenhouse environment for rose crops based on AIbased surrogate models. The proposed models are deep neural networks based on the optimal set of hyper-parameters defined by the Bayesian optimization scheme. AI-based surrogate models can be a reliable, simple, and robust solution. For instance, Bayesian optimization (BO) techniques such as the Gaussian process (GP) and Gradient boosting (GBRT) can be employed to provide optimal hyper-parameters to be integrated with deep neural networks (DNN). In line with this, the objective of this study was to develop an optimal and highly accurate BO-DNN surrogate model (based on 300 experimental data points) for a quick and reliable classification of the rose yield environment considering some of the most influential variables including soil humidity, the temperature and humidity of air, CO2 concentration, and light intensity (lux) into its architecture. The rose yield environments (outputs) are classified into four classes such as soil without water, correct environment, too hot, and very cold environments. Initially, two BO techniques (GP and GBRT) were used for the tuning process of the hyper-parameters (such as learning rate, batch size, number of dense nodes, number of dense neurons, number of input nodes, activation function, etc.). The most accurate set of hyper-parameters was selected to build the DNN model based on 300 data points, which was further used to classify the There are several analytical models for the interpretation of the data collected from wireless sensor networks or IoTs, but these models may produce inconsistent outcomes when applied to true environmental conditions due to the high complexity of these models and the frequent need for calculation and the approximation of unmeasurable parameters. Based on the aforementioned discussion, it is extremely crucial to develop a method particularly for AI-based methods [24] that can accurately define the most suitable environment in the greenhouses for rose yield production. This is because the AI-based methods have gained lot of success in agriculture during recent years in relation to crop yield production, detection, precision agriculture, and so on [25][26][27][28][29].
To the best of the authors' knowledge, only a single study is available in the literature regarding the use of AI in the rose's greenhouse environment. This study presents the ANN and ANFIS methods to forecast the risk level for pests in the rose greenhouse [30]. Other than this, no study is available in the literature on this subject. The present study is the first of its kind in classifying the greenhouse environment for rose crops based on AI-based surrogate models. The proposed models are deep neural networks based on the optimal set of hyper-parameters defined by the Bayesian optimization scheme. AI-based surrogate models can be a reliable, simple, and robust solution. For instance, Bayesian optimization (BO) techniques such as the Gaussian process (GP) and Gradient boosting (GBRT) can be employed to provide optimal hyper-parameters to be integrated with deep neural networks (DNN). In line with this, the objective of this study was to develop an optimal and highly accurate BO-DNN surrogate model (based on 300 experimental data points) for a quick and reliable classification of the rose yield environment considering some of the most influential variables including soil humidity, the temperature and humidity of air, CO 2 concentration, and light intensity (lux) into its architecture. The rose yield environments (outputs) are classified into four classes such as soil without water, correct environment, too hot, and very cold environments. Initially, two BO techniques (GP and GBRT) were used for the tuning process of the hyper-parameters (such as learning rate, batch size, number of dense nodes, number of dense neurons, number of input nodes, activation function, etc.). The most accurate set of hyper-parameters was selected to build the DNN model based on 300 data points, which was further used to classify the rose yield environment. The very high accuracy of the proposed surrogate model originates from the introduction of the most vital soil and meteorological parameters as the inputs of the model.

Data Collection
A total of 300 experimental data points from various sensors regarding soil humidity, light intensity, temperature, air humidity, and CO 2 concentration for 04 different classes of greenhouse rose yield environments were taken from the open literature [31]. The data were acquired by an autonomous robot integrating the sensors including soil humidity, light intensity, temperature, air humidity, and CO 2 concentration. Table 1 shows that a wide range of experimental data have been included in this study to discuss greenhouse rose yield environments.

Data Visualization
The experimental data have been visualized in terms of heat maps, correlation charts, pairs, and violin plots. The heat map and correlation chart represent the relationship between input and output features while the data distribution has been visualized by pairs, violins, and distplot. In addition, the data density for each class has been shown. A heat map showing the correlation between the input and output variables Figure 2. The dependency of the various input variables on the output parameters can be visualized by using a correlation chart as provided Figure 3. The data distribution of the input and output parameters including the soil humidity, air temperature and humidity, CO 2 concentration, lux (light intensity), and class (output) is represented by a pair plot (see Figure 4). A clearer picture of the experimental data distribution of various input features with respect to the only output parameter, class, is highlighted in the violin plots (see Figure 5).     Herein, the experimental data were distributed into 04 different classes (namely class 0, class 1, class 2, and class 3). The total number of data points for each class is illustrated in Figure 6.
The density of each input parameter's acquired data is presented bydistplot (see Figure 7). The distplot illustrates the data distribution of each parameter in terms of density distribution.

Bayesian Optimization Integrated with a Deep Neural Network Algorithm
Algorithms of two different Bayesian optimization schemes, namely, Gaussian process regression (GPR) and Gradient boosting regression trees (GBRT) integrated with the deep neural network are illustrated in Figure 8.     Herein, the experimental data were distributed into 04 different classes (namely class 0, class 1, class 2, and class 3). The total number of data points for each class is illustrated in Figure 6.   Herein, the experimental data were distributed into 04 different classes (namely class 0, class 1, class 2, and class 3). The total number of data points for each class is illustrated in Figure 6.   The density of each input parameter's acquired data is presented bydistplot (see ure 7). The distplot illustrates the data distribution of each parameter in terms of de distribution.

Bayesian Optimization Integrated with a Deep Neural Network Algorithm
Algorithms of two different Bayesian optimization schemes, namely, Gaussian cess regression (GPR) and Gradient boosting regression trees (GBRT) integrated wit deep neural network are illustrated in Figure 8.

Results and Discussion
In this section, the range of the considered hyper-parameters is first provided followed by the tuning processes of two different Bayesian optimization schemes (GP and GBRT). Furthermore, the way that the maximum convergence was achieved is explained. In addition, the optimal combination of the hyper parameters is chosen. The chosen optimal hyper-parameters are then employed to develop a deep neural network model, which is then used to classify the greenhouse environments for rose yields. Moreover, the classification accuracy of the developed model in terms of a confusion matrix and an accuracy table is presented. The details of the input features and their impact on the model's classification accuracy is evaluated in the sensitivity analysis section. Other than that, individual impact of each input variable on the model's classification accuracy is evaluated. More discussions are presented in the subsequent sections.

Optimization of the Hyper-Parameters
The considered hyper-parameters were tuned by using two different Bayesian optimization schemes (GP and GBRT). The selected hyper-parameters include the learning rate, Adam decay, input nodes, dense layers, dense nodes, batch size, and activation function. The range of all the investigated hyper-parameters for the tuning process is given in Table 2. The range of the considered hyper-parameters along with the hyper-parameter tuning process by the GBRT and GPR algorithms is depicted in Figures 9 and 10, respectively. It is worth mentioning that the blue and orange regions represent the strong and weak dependence of the variable, respectively, while the asterisk sign points towards the optimal point. For further analysis, the GPR algorithm was considered. Detailed information on the finally selected architecture of the optimal model (GPR) is tabulated in Table 3. From the Figure 10, it can be observed that the 'tanh' activation function provided optimal results compared to the Softmax, sigmoid, and ReLU. A comparison between the suitability of these activation functions is provided in Figure 11.   Convergence plots for both optimization schemes such as GP and GBRT provide a clear picture of the way the error was minimized. The initial convergence was reached very fast because the number of input parameters and the amount of training data affected the convergence rate, and in this study the model was evaluated for 300 experimental data points containing five input parameters. For instance, in Figure 12, it can be clearly noticed that the convergence error for both the GP and GBRT algorithms was minimized at the fourth call. Sustainability 2021, 13, x FOR PEER REVIEW 12 of 18     Convergence plots for both optimization schemes such as GP and GBRT provide a clear picture of the way the error was minimized. The initial convergence was reached very fast because the number of input parameters and the amount of training data affected the convergence rate, and in this study the model was evaluated for 300 experimental data points containing five input parameters. For instance, in Figure 12, it can be clearly noticed that the convergence error for both the GP and GBRT algorithms was minimized at the fourth call.

Training and Developing the Deep Neural Network
The experimental data were distributed into training (80%) and testing (20%) datasets. The training and validation losses of the developed model are depicted in Figure  13. The total number of iterations was kept up to 80. Apparently, both of the losses were minimized until the 36th iteration, so the training process was stopped. This shows that the training process was computationally economical and quick. Figure 14 illustrates the classification performance of the developed model for each class of the environment. Apparently, the developed model was able to accurately classify 59 out of 60 environments for various classes. This explains how well the model performs for different greenhouse environments within the tested range. The classification accuracy of the selected surrogate model is presented in Table 4.
The performance of the developed model is highlighted in terms of precision, recall, and F1-score. Precision and recall are the fraction of the relevant instances among the retrieved instances and the fraction of the relevant instances that were retrieved. Both precision and recall are therefore based on relevance. The values of precision and recall from Table 4 show that the proposed model had a high classification efficiency for the rose's greenhouse environment. The F1-score from Table 4 also indicates the perfect precision and recall of the optimal surrogate model. In addition, the overall accuracy of the model along with the macro and weighted averages are described as well. The final model could perform the classification task with an overall accuracy of 0.98.

Training and Developing the Deep Neural Network
The experimental data were distributed into training (80%) and testing (20%) datasets. The training and validation losses of the developed model are depicted in Figure 13. The total number of iterations was kept up to 80. Apparently, both of the losses were minimized until the 36th iteration, so the training process was stopped. This shows that the training process was computationally economical and quick.        The performance of the developed model is highlighted in terms of precision, recall, and F1-score. Precision and recall are the fraction of the relevant instances among the retrieved instances and the fraction of the relevant instances that were retrieved. Both precision and recall are therefore based on relevance. The values of precision and recall from Table 4 show that the proposed model had a high classification efficiency for the rose's greenhouse environment. The F1-score from Table 4 also indicates the perfect precision and recall of the optimal surrogate model. In addition, the overall accuracy of the model along with the macro and weighted averages are described as well. The final model could perform the classification task with an overall accuracy of 0.98.

Sensitivity Analysis
The individual impact of each input variable on the model's output (i.e., classification of the greenhouse rose yield environments) is portrayed using the SHAP library. More particularly, the ways in which the various input features such as soil humidity, temperature, air humidity, light intensity, and CO 2 concentration affected the model's classification accuracy are shown in Figure 15. It can be clearly seen that the sensitivity of the different features was not the same for various classes. However, some of the factors were sensitive for all the classes. For example, the most influential factor for each class was the soil humidity followed by the temperature. Regarding class 1 (correct environment), the feature with the most impact was air humidity followed by soil humidity, temperature, light intensity, and CO 2 concentration. particularly, the ways in which the various input features such as soil humidity, temperature, air humidity, light intensity, and CO2 concentration affected the model's classification accuracy are shown in Figure 15. It can be clearly seen that the sensitivity of the different features was not the same for various classes. However, some of the factors were sensitive for all the classes. For example, the most influential factor for each class was the soil humidity followed by the temperature. Regarding class 1 (correct environment), the feature with the most impact was air humidity followed by soil humidity, temperature, light intensity, and CO2 concentration. A tabulated performance comparison of the various developed models is illustrated in Table 5. In the original model, all five input variables were considered for classification while in the rest of the models, each single variable was dropped and rest of the four variables were used to classify the rose yield environment. It is obvious that there was no noticeable impact of dropping a single (any of the variable at a time) variable on the classification accuracy of the models. All of the developed models were able to perform the classification task with an overall accuracy of 0.98.  A tabulated performance comparison of the various developed models is illustrated in Table 5. In the original model, all five input variables were considered for classification while in the rest of the models, each single variable was dropped and rest of the four variables were used to classify the rose yield environment. It is obvious that there was no noticeable impact of dropping a single (any of the variable at a time) variable on the classification accuracy of the models. All of the developed models were able to perform the classification task with an overall accuracy of 0.98. tion accuracy are shown in Figure 15. It can be clearly seen that the sensitivity of the different features was not the same for various classes. However, some of the factors were sensitive for all the classes. For example, the most influential factor for each class was the soil humidity followed by the temperature. Regarding class 1 (correct environment), the feature with the most impact was air humidity followed by soil humidity, temperature, light intensity, and CO2 concentration. A tabulated performance comparison of the various developed models is illustrated in Table 5. In the original model, all five input variables were considered for classification while in the rest of the models, each single variable was dropped and rest of the four variables were used to classify the rose yield environment. It is obvious that there was no noticeable impact of dropping a single (any of the variable at a time) variable on the classification accuracy of the models. All of the developed models were able to perform the classification task with an overall accuracy of 0.98. tion accuracy are shown in Figure 15. It can be clearly seen that the sensitivity of the different features was not the same for various classes. However, some of the factors were sensitive for all the classes. For example, the most influential factor for each class was the soil humidity followed by the temperature. Regarding class 1 (correct environment), the feature with the most impact was air humidity followed by soil humidity, temperature, light intensity, and CO2 concentration. A tabulated performance comparison of the various developed models is illustrated in Table 5. In the original model, all five input variables were considered for classification while in the rest of the models, each single variable was dropped and rest of the four variables were used to classify the rose yield environment. It is obvious that there was no noticeable impact of dropping a single (any of the variable at a time) variable on the classification accuracy of the models. All of the developed models were able to perform the classification task with an overall accuracy of 0.98.

Conclusions
In the current study, surrogate models were developed that can accurately define the most suitable environment in greenhouses for rose yield production. In this regard, Bayesian optimization (BO) techniques such as the Gaussian process (GP) and Gradient boosting (GBRT) were employed to provide optimal hyper-parameters to be integrated with deep neural networks (DNN).

−
The optimal set of hyper-parameters includes the learning rate (0.000416), the number of hidden layers (10), the number of neurons in each hidden layer (265), the activation function (tanh), batch size (36), Adam decay (0.007963), and number of iterations (80). − An optimal and highly accurate BO-DNN surrogate model (based on 300 experimental data points) was developed for a quick and reliable classification of the rose yield environment considering the most influential variables including soil humidity, temperature and humidity of air, CO2 concentration, and light intensity (lux) into its architecture. − The proposed surrogate models can accurately classify the rose yield environments (classified into four classes such as soil without water, correct environment, too hot, and very cold environments

Conclusions
In the current study, surrogate models were developed that can accurately define the most suitable environment in greenhouses for rose yield production. In this regard, Bayesian optimization (BO) techniques such as the Gaussian process (GP) and Gradient boosting (GBRT) were employed to provide optimal hyper-parameters to be integrated with deep neural networks (DNN).

−
The optimal set of hyper-parameters includes the learning rate (0.000416), the number of hidden layers (10), the number of neurons in each hidden layer (265), the activation function (tanh), batch size (36), Adam decay (0.007963), and number of iterations (80). − An optimal and highly accurate BO-DNN surrogate model (based on 300 experimental data points) was developed for a quick and reliable classification of the rose yield environment considering the most influential variables including soil humidity, temperature and humidity of air, CO2 concentration, and light intensity (lux) into its architecture. − The proposed surrogate models can accurately classify the rose yield environments (classified into four classes such as soil without water, correct environment, too hot, and very cold environments). − The developed model can classify different roses yield environments with an overall accuracy of 0.98. The very high accuracy of the proposed surrogate models originates from the inclusion of the most influential parameters as the inputs of the model. − This study provides an easy, quick, reliable, and intelligent method to identify and perform corrective measures to improve the quality of the roses. With the proposed 0.98

Conclusions
In the current study, surrogate models were developed that can accurately define the most suitable environment in greenhouses for rose yield production. In this regard, Bayesian optimization (BO) techniques such as the Gaussian process (GP) and Gradient boosting (GBRT) were employed to provide optimal hyper-parameters to be integrated with deep neural networks (DNN). - The optimal set of hyper-parameters includes the learning rate (0.000416), the number of hidden layers (10), the number of neurons in each hidden layer (265), the activation function (tanh), batch size (36), Adam decay (0.007963), and number of iterations (80). -An optimal and highly accurate BO-DNN surrogate model (based on 300 experimental data points) was developed for a quick and reliable classification of the rose yield environment considering the most influential variables including soil humidity, temperature and humidity of air, CO 2 concentration, and light intensity (lux) into its architecture. - The proposed surrogate models can accurately classify the rose yield environments (classified into four classes such as soil without water, correct environment, too hot, and very cold environments). - The developed model can classify different roses yield environments with an overall accuracy of 0.98. The very high accuracy of the proposed surrogate models originates from the inclusion of the most influential parameters as the inputs of the model. - This study provides an easy, quick, reliable, and intelligent method to identify and perform corrective measures to improve the quality of the roses. With the proposed method, greenhouse environments can be evaluated and selected for an efficient crop yield of roses and other vegetables and fruits.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data are contained within the article.