1. Introduction
The global population’s persistent growth and the reduction in available arable land have led to the rapid expansion of agricultural greenhouses. The most critical factors that affect the growth conditions of greenhouse plants include indoor air temperature, humidity, soil temperature, light intensity, and carbon dioxide concentration [
1]. However, predicting the internal conditions of a greenhouse accurately can be challenging, as they are dependent on various external factors [
2,
3]. Under unusual circumstances, the natural environment may not be suitable for optimal crop growth as parameters like temperature, relative humidity, photosynthetically active radiation (PAR) level, carbon dioxide level, etc., affect plant development [
3].
Greenhouses are artificially controlled enclosed spaces where the indoor climate is regulated by the structure, cover, and by support from Heating, Ventilation, Air-Conditioning, and Dehumidification (HVACD) and lighting systems. The greenhouse cover is a crucial structural component that allows useful light spectrum (between 400 and 700 nm) to pass through for photosynthetic activities. All greenhouses absorb solar energy, but solar greenhouses are designed to store some of the heat for use at night or on cloudy days in addition to absorbing solar energy during daylight hours [
4].
The advancement of automation and artificial intelligence has led to a significant increase in the use of smart greenhouses. These greenhouses are equipped with tools and systems that aim to enhance the quantity and quality of the products while minimizing energy consumption [
5]. The primary task of these devices is to use appropriate control algorithms to intelligently manage the indoor climatic conditions, including humidity, temperature, CO
2, and lighting, with the aim of reducing and optimizing energy consumption [
6,
7].
Modern greenhouses measure, display, and control various parameters that affect the growth of greenhouse products, such as environmental temperature and humidity, light intensity and duration, carbon dioxide level, soil temperature, and other factors. These systems are based on complex control algorithms and installed with many sensors both inside and outside the greenhouse to stabilize the greenhouse conditions in an optimal state according to the momentary values of these parameters [
8]. However, increasing the use of sensors can lead to higher initial costs for the greenhouse and ultimately to higher prices for the harvested products.
On the other hand, growers’ awareness of the upcoming conditions during the day can lead to quicker reactions and better management of energy resources in the greenhouse [
9]. Therefore, many studies have been conducted since the early 20th century to model the greenhouse energy loads [
10,
11], as well as indoor parameters such as temperature [
12], humidity [
13], light intensity [
14], CO
2 [
15], etc. The basis for all these research studies is the initial modeling of the greenhouse conditions based on external variables such as temperature, humidity, wind speed, radiation level, etc. [
16,
17,
18].
Agricultural systems like greenhouses are very complex and dynamic systems, which makes physics-based modeling difficult. While dynamic models have been increasingly used for predicting the inside situation of agricultural greenhouses, they come with certain disadvantages. One of the main drawbacks of using dynamic models is that they require a significant amount of input data, which can be challenging to obtain in real-world scenarios. This is especially true for systems that involve complex, nonlinear interactions between different variables, such as temperature, humidity, light intensity, and air flow [
6]. Another limitation of dynamic models is that they are highly dependent on the accuracy of the input data. Any errors or uncertainties in the input data can significantly impact the accuracy of the model’s predictions. Additionally, dynamic models require a high level of expertise in both modeling and agricultural sciences to develop and apply effectively [
19]. Artificial Intelligence (AI) has become increasingly popular in agricultural studies due to its ability to model complex variables, which is essential in accurately predicting greenhouse climatic parameters and loads. The accurate prediction of microclimate parameters like temperature, humidity, and light intensity plays a crucial role in optimizing crop yield and quality while minimizing energy consumption and environmental impact [
3,
13,
19].
Machine learning (ML) techniques have gained popularity in predicting greenhouse microclimate variables due to their ability to handle high-dimensional and noisy data, learn from historical data, and adapt to changing conditions, making them suitable for dynamic greenhouse environments [
20,
21]. Among the most common ML approaches used in greenhouse microclimate prediction are Artificial Neural Networks (ANNs) and Support Vector Regression (SVR). ANNs are inspired by the structure and function of the human brain and consist of multiple layers of interconnected nodes that can recognize patterns in the data. ANNs have been successfully applied to predict various greenhouse microclimate parameters, such as air temperature, relative humidity, and PAR level [
22,
23]. SVR is a type of supervised learning algorithm that can handle both linear and nonlinear data, and has been used to predict greenhouse parameters such as air temperature, relative humidity, and soil moisture content [
24,
25,
26].
Other ML techniques such as Decision Trees (DT), Random Forests (RF), and Gaussian Processes Regression (GPR) have also been applied to greenhouse microclimate prediction with promising results [
27]. DTs can handle both categorical and continuous data and can be used to predict discrete outputs such as crop yield or continuous outputs such as temperature and humidity. RFs are an ensemble of decision trees that can improve prediction accuracy by combining the outputs of multiple trees. GPRs are a probabilistic model that can predict the uncertainty in the data and make probabilistic predictions [
28].
In conclusion, AI and ML techniques have become essential in accurately predicting greenhouse microclimate parameters. Accurate predictions can help optimize crop yield and quality while reducing energy consumption and environmental impact.
Figure 1 describes the modeling approaches used in greenhouse control and management. Further research can explore ways to optimize these models for reducing initial costs and energy consumption while minimizing the environmental impact of greenhouse production.
Although ML methods have shown promising results in predicting greenhouse microclimate parameters, there are still some challenges that need to be addressed. One of the main challenges is the availability of high-quality data and the robustness of the models. ML models require a large amount of high-quality data to train and validate the models, but collecting and preprocessing data from greenhouse environments is often time-consuming and challenging. Another challenge is the interpretability of the ML models, which are often considered black boxes, making it difficult to understand how the models make predictions. Therefore, developing interpretable ML models that can provide insights into the underlying relationships between microclimate parameters and crop growth is essential [
23,
29].
To ensure reliable and accurate predictions in smart greenhouses, ANN models must be optimized for robustness across a range of environmental conditions and input variables. Recent studies have reviewed the use of AI for predicting various environmental and other variables in greenhouses, as summarized in
Table 1.
Scope, Innovations and Structure
Accurately predicting greenhouse indoor climates is crucial for optimizing crop yield and quality while minimizing energy consumption. The literature reviewed in this study emphasizes the importance of investigating accurate methods for predicting the indoor climate of greenhouses. ANN models offer a more data-driven approach that can capture the non-linear relationships between input variables, such as light, humidity, and temperature. To address this issue, this research aims to investigate the potential of several AI-based models, including Artificial Neural Network with Radial Basis Function (ANN-RBF), Support Vector Machine (SVM), and Gaussian Process Regression (GPR) to estimate the indoor air temperature of an even-span polycarbonate greenhouse. The methodology employed in this study is outlined in
Section 2 of the paper, which includes the study area, data collection process, and the AI methods used to predict the indoor climate of the experimental greenhouse.
Section 3 reports the scientific findings of the study. The results of the RBF, SVM, and GPR model analyses are presented and compared with other similar studies. The discussion section of the paper presents suggestions for using this method in future greenhouse applications, including developing a smart control system for greenhouses. This would enable the real-time monitoring and control of the indoor climate, leading to more efficient energy use and increased crop yield. In the final part of the paper, conclusions and recommendations are presented based on the results of the study. The ultimate goal of this research and its future development is to enable smart control systems of greenhouses, leading to long-term reduction in energy losses.
3. Results and Discussion
This study aimed to predict the indoor air temperature of an even-span polycarbonate greenhouse using three different machine learning models, namely RBF, SVM, and GPR. The dataset comprised observations from an even-span polycarbonate greenhouse, with four factors used as inputs: Outside Solar Radiation (Iout) (Wm−2), Outside Air Temperature (Tout) (°C), Outside Air Humidity (Rhout) (%), and Outside Wind Speed (Wout).
3.1. Climate Condition of the Study Area
Figure 5 shows the climate variations in the studied area, which can impact crop growth and productivity. The summer months in this region are characterized by high temperatures that can exceed 50 °C, which can limit the growth of some crops when grown outside. While greenhouse cultivation can provide optimal conditions for plant growth, it also requires energy-intensive cooling systems to maintain suitable temperatures for most of the time in the study area. Additionally, the high levels of solar radiation in the region can be beneficial for plant growth; however, they can also lead to heat stress and damage to crops if not adequately managed. The low wind speeds in this region can be advantageous for greenhouse cultivation, as they create a stable environment for plant growth and reduce the risk of physical damage to the greenhouse structure. However, high humidity levels during the summer and winter months can increase the spread of plant diseases, which can be a significant challenge for greenhouse cultivation.
3.2. Selection of the Best Perform Models
This section presents the performance of several models in predicting the indoor air temperature of the greenhouse, with the best-performing model selected.
Table 2 shows the statistical metrics employed to assess the accuracy of the models in the training, test, and overall phases, which include MAPE, RMSE, TSSE, and EF. The results indicate that the RBF model outperformed the other models in predicting the greenhouse indoor air temperature, achieving a MAPE index ranging from 1.19 to 1.30%. The RBF model showed lower MAPE values in the training and test phases than the GPR and SVM models. On the other hand, the GPR model also demonstrated good accuracy for predicting the indoor air temperature, while the SVM model did not perform well in this study. Based on the results, the RBF model was selected for further analysis and development in the remaining parts of the study. It should be noted that the use of this model is crucial for accurately predicting the greenhouse indoor air temperature and ensuring the optimal growth of crops.
Ali and Hassanein [
40], developed a Recurrent Neural Network (RNN) model with long short-term memory to predict environmental parameters in greenhouses, specifically for tomato production. The model exhibited high accuracy and demonstrated its ability to predict future temperatures, achieving an RMSE value of 0.7. Similarly, Petrakis and Kavga [
13], implemented neural network models to forecast microclimates in greenhouses located in Greece. The results indicated maximum errors of 0.88 K and 2.84% for modeled temperature and relative humidity, respectively, while the coefficients of determination were both 0.99 for these parameters.
3.3. Input Parameters Optimization
In this step, a sensitivity analysis was performed to develop and improve the selected RBF model by considering the effects of the outside environment’s temperature, humidity, radiation, and wind speed on the input data. The sensitivity analysis can provide valuable insights into the behavior of the system and help identify the critical variables that affect the model’s accuracy. By incorporating the effects of these variables into the model, the accuracy of the predictions can be enhanced, leading to improved performance and efficiency of predictions. The variables were evaluated individually and then as a group to determine their impact on the accuracy of the RBF model.
Table 3 presents the results of the sensitivity analysis.
The findings reveal that including all input variables as datasets for RBF model training leads to greater accuracy. This finding suggests that all the input variables play a critical role in predicting the greenhouse indoor air temperature and that the RBF model’s performance can be improved by considering all the variables simultaneously.
According to the sensitivity analysis results presented in
Table 3, all four input variables, namely outside temperature, humidity, radiation, and wind speed, will be utilized as the primary dataset to train the RBF model in the subsequent steps. The inclusion of these variables is expected to result in more accurate predictions of the indoor air temperature in the greenhouse. Based on the outcomes displayed in
Table 2, the combination of outside temperature, humidity, solar radiation, and wind speed was selected as the input dataset for the RBF model in the subsequent steps of the study. This decision was based on the sensitivity analysis results, indicating that considering all four variables simultaneously can enhance the accuracy of the RBF model in forecasting the indoor air temperature of the greenhouse.
3.4. Optimization of Dataset Sizes
The size of the dataset used to train the RBF network can have a significant impact on the accuracy and performance of the model. Generally, larger datasets lead to more accurate predictions and can help avoid overfitting, which occurs when the model memorizes the training data and performs poorly on new data. However, using a large dataset can also increase the computational complexity and training time of the model, which can be a limiting factor in some applications. In contrast, using a small dataset can result in under-fitting, where the model fails to capture the complex relationships between the input and output variables.
To determine the optimal dataset size for training the RBF network, a sensitivity analysis can be performed by training the model with different dataset sizes and evaluating its performance using various statistical metrics. This approach can help identify the minimum dataset size required for accurate predictions, while also avoiding overfitting and excessive computational complexity.
This section evaluates the impact of dataset size on the accuracy of the RBF model by varying the size of the dataset and analyzing the resulting changes in the model’s accuracy (
Table 4).
The results indicate that the optimal dataset size for training the RBF model to predict the indoor air temperature in the greenhouse is 80% of the total dataset. This implies that the model achieves the highest accuracy when trained with 80% of the total dataset. The MAPE index for predicting the indoor air temperature in the entire phases was 1.33%, demonstrating that the RBF model can accurately predict the output. However, in the standard mode, the best results may not always be achieved when 60% of the total data are used for network training, as observed in this study. To ensure the highest accuracy of the RBF model, the dataset size for training the model was fixed at 80% in the subsequent analyses. This study emphasizes the importance of selecting the optimal dataset size to achieve the best results and avoid overfitting or under-fitting.
3.5. Selection of Best Training Algorithm for RBF Model
The selection of the best training algorithm for the RBF neural network model can have a significant impact on the accuracy and performance. Different training algorithms can vary in their convergence speed, computational complexity, and ability to avoid overfitting. One commonly used training algorithm for RBF neural networks is the backpropagation algorithm, which involves iteratively adjusting the weights and biases of the network to minimize the error between the predicted and actual outputs. While the backpropagation algorithm can be effective in training RBF models, it may also suffer from slow convergence and the risk of getting trapped in local minima. Other training algorithms, such as the Levenberg–Marquardt algorithm, can offer faster convergence and better generalization performance by adjusting the learning rate based on the curvature of the error surface. The Quasi–Newton algorithm can also be effective in training RBF models by approximating the second derivative of the error function and adjusting the weights and biases accordingly.
In this study, the performance of 13 different training algorithms for the RBF neural network model was evaluated and compared (
Table 5). The results indicate that the Levenberg–Marquardt algorithm (trainlm) achieved the lowest MAPE, RMSE, and TSSE, as well as the highest EF at the total phase, indicating superior accuracy and performance compared to the other algorithms evaluated. The Levenberg–Marquardt algorithm is a popular training algorithm for RBF neural networks due to its ability to converge more quickly than other algorithms such as the backpropagation algorithm, while also being less prone to overfitting than more complex algorithms like the Bayesian regularization algorithm. The use of the Levenberg–Marquardt algorithm in the next analysis is expected to lead to the improved accuracy and performance of the RBF neural network model for predicting greenhouse indoor air temperature, particularly when training on large datasets. Castañeda-Miranda and Castaño [
41] utilized a Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) to predict greenhouse air temperature. The ANN was trained with a Levenberg–Marquardt backpropagation algorithm, with the input parameters consisted of outside air temperature and relative humidity, global solar radiation, wind speed, and indoor relative humidity. The study reported a temperature forecast with 95% confidence, achieving a coefficient of determination of 0.96 in winter and 0.95 in summer. Yue et al. [
42], proposed an improved Levenberg–Marquardt Radial Basis Function Neural Network (LM-RBF) model to forecast greenhouse air temperature and humidity. Their model achieved a maximum relative error of less than 0.5%.
3.6. Optimization of Hidden Layer Neurons
The hidden layer of the RBF neural network model plays a critical role in transforming the input data into a new space that is more suitable for linearly separable analysis. Unlike linear models, the RBF model can handle nonlinear patterns in the input data by transforming them into a higher-dimensional space through the hidden layer. The number of neurons in the hidden layer is an important parameter that affects the model’s ability to capture complex relationships between the input and output variables. Cover’s theorem on the reparability of patterns suggests that nonlinear patterns in the input data can be transformed into a higher-dimensional space to make them more linearly separable. So, the number of neurons in the hidden layer should be greater than the number of input neurons to increase the dimensionality of the transformed space and improve the model’s ability to capture nonlinear relationships. Also, the optimal number of neurons in the hidden layer depends on the complexity of the input and output data, as well as the degree of nonlinearity in the relationships between them. A small number of neurons in the hidden layer can lead to under-fitting, where the RBF model is too simple and unable to capture the complex relationships between the input and output variables. On the other hand, a large number of neurons in the hidden layer can result in overfitting, where the RBF model memorizes the training data and performs poorly on new data. To determine the optimal number of neurons in the hidden layer, a sensitivity analysis can be performed by training the RBF model with different numbers of neurons and evaluating its performance using various statistical metrics. This approach can help identify the optimal number of neurons that achieves the best balance between overfitting and under-fitting and achieves the best accuracy and efficiency. This study examines the impact of the number of neurons in the hidden layer on the accuracy and performance of the RBF neural network model in predicting the indoor air temperature of the greenhouse. The number of neurons in the hidden layer was varied from 3 to 35, and the model’s performance was evaluated based on the lowest error and highest accuracy (
Figure 6). The findings demonstrate that the optimal number of neurons in the hidden layer was 33. By fixing the number of neurons in the hidden layer to this value, the MAPE factor reduced to 1.3%, indicating a significant enhancement in the model’s accuracy and performance. Francik and Kurpaska [
34] developed a three-layer Perceptron neural network with 10 neurons in the hidden layer, utilizing temperature, wind speed, solar radiation, and forecast time as input parameters to forecast temperature changes in a heated foil tunnel. The study achieved the lowest RMSE value (3.7 °C) for the testing dataset.
3.7. Effect of Spread Factor on the Efficiency of RBF Model
The spread factor is a crucial parameter in the RBF model that can significantly impact the efficiency and accuracy of the model. The spread factor determines the width of the RBF kernel function, which affects the degree of overlap between the RBF functions and the spatial distribution of the input data.
To determine the optimal spread factor for the RBF model, a sensitivity analysis can be performed by training the model with different spread factors and evaluating its performance using various statistical metrics. This approach can help identify the optimal spread factor that balances between overfitting and under-fitting and achieves the best accuracy and efficiency. In general, the optimal spread factor depends on the characteristics of the input data and the complexity of the relationships between the input and output variables. For complex and highly nonlinear systems, a smaller spread factor may be more appropriate, while simpler systems may require a larger spread factor. In this study, the spread factor was varied from 0.1 to 1, and for each value, the MAPE, RMSE, TSSE, and EF factors were computed (
Figure 7). The results indicate that, by selecting 0.2 for spread factor, the accuracy can considerable increase. This finding highlights the importance of selecting an optimal spread factor to achieve the best results in RBF modeling. The MAPE and RMSE at the training and test phases are 1.30% and 0.91 °C, respectively.
Figure 8 shows the distribution of actual and predicted data (45-degree line) from the RBF model. It can be concluded that the RBF model could predict the indoor air temperature of greenhouse with high accuracy and can be used for climate controlling in smart greenhouses.
In a study, a hybrid artificial neural network (ANN) was utilized to predict freshwater production in seawater greenhouses [
43]. The study demonstrated that the ANN method is highly accurate, with negligible differences between actual and predicted data. In another study, machine learning algorithms were employed to predict indoor air temperature in Moroccan agriculture greenhouses [
33]. The results showed that all predictive models performed well, with an R
2 value greater than 0.9.
Table 6 shows the statistical properties of the data utilized in the training, test, and overall stages of the selected RBF structure for predicting the indoor air temperature in the greenhouse. The outcomes demonstrate that the differences between the minimum, maximum, variance, and skewness of the actual and predicted data are negligible, which is insignificant for practical purposes.
The best model results are obtained when the linear relationship between the actual and predicted values has the highest coefficient of determination, the narrowest width from the origin, and a slope close to one. In this study, the RBF model exhibited a strong correlation coefficient in the training and testing phases, with regression relationships having the smallest width from the origin and a slope close to one. Hence, this model is considered the best for prediction. To further evaluate the RBF model, various statistical tests were conducted in this study. The tests analyzed the average, variance, and statistical distribution of the actual and predicted values by the RBF model in different stages of training, testing, and overall. The null hypothesis for each test is the equality of mean, variance, and statistical distribution of both data series:
At a significance level of 95%, each hypothesis was tested using the
p-value parameter. If the calculated
p-value for each stage exceeds 0.05, the null hypothesis cannot be rejected. To compare the mean, variance, and statistical distribution,
t-tests,
F-tests, and Kolmogorov–Smirnov tests were employed.
Table 7 shows the
p-values computed for all three stages (training, test, and overall).
The results demonstrate that the mean, variance, and statistical distribution values of the data obtained from the RBF model exhibit no significant difference from the actual values, indicating that this model can be utilized with high reliability.
4. Conclusions
Machine learning (ML) techniques have become increasingly important in modeling complex systems. ML enables more accurate and reliable predictions by leveraging large datasets and capturing complex relationships between input variables and output targets. In the context of predicting indoor air temperature, ML models can account for various factors, such as outdoor temperature, humidity, solar radiation, and occupant behavior, resulting in more comprehensive and holistic predictions. These predictions can be beneficial for optimizing energy consumption, improving indoor comfort and air quality, and reducing greenhouse gas emissions. Furthermore, ML models can adapt to changing conditions and learn from experience, making them ideal for predicting indoor air temperature in dynamic environments.
The primary objective of this study was to develop accurate ML models for predicting indoor air temperature in an even-span polycarbonate greenhouse using RBF, SVM, and GPR models. The results of the study are presented as follows:
The comparison of the three models revealed that the RBF model was the most effective in accurately predicting greenhouse temperature. The RBF model achieved the lowest RMSE values during the training and test phases, at 0.80 °C and 0.91 °C, respectively.
The evaluation of the RBF model’s performance showed that the dataset size, value of spared factor, number of neurons in the hidden layer, and type of training algorithm significantly impacted the output.
Accurate temperature prediction is crucial for achieving the goal of smart greenhouse operation, and the high accuracy and reliability of the RBF model make it a valuable tool for optimizing greenhouse management, improving time management, and increasing crop yields. The performance results of this study indicate that integrating artificial neural network (ANN) models into the control system can assist farmers in building smart greenhouses.