1. Summary
Greenhouse agriculture is gaining greater relevance due to increasing food demands and the need for sustainability, and technology has emerged as a fundamental ally in this context [
1]. The integration of automated systems in greenhouses represents an innovative solution that promises to improve crop efficiency through the precise management of environmental conditions [
2,
3].
Automation in greenhouses, supported by sensors and microcontrollers, enables continuous data collection that contributes to the efficient distribution of resources and improvements in crop yields [
4,
5]. The cultivation of stringless blue lake beans in a microcontroller-controlled environment allowed studying interactions between automated greenhouses and short-cycle crops. The data were collected in two greenhouse environments managed with a microcontroller in Bogotá, Colombia, located at 2559 m above sea level.
The first environment corresponded to a greenhouse where beans are cultivated, while the second environment was a greenhouse operating without any type of cultivation. Both environments were equipped with a set of sensors and actuators for their operation, such as sensors for internal relative humidity, internal temperature, ground humidity, light intensity, CO2 concentration, and luminosity, as well as actuator systems for ventilation, irrigation, and heating.
With these data, the aim is to contribute to developing intelligent crop management strategies, focusing on maximizing efficiency and productivity in automated greenhouses. By detailing the behavior of the environment in each scenario, this study contributes to the state of the art in precision agriculture, providing a solid foundation for future research and practical applications. This study is expected to offer a basis for implementing and adjusting technologies in precision agriculture, especially in urban contexts where space and resources could be limited.
2. Data Description
Two microcontroller-managed greenhouse environments are presented: one with the cultivation of stringless blue lake beans (greenhouse_dataset_cc.csv) with 99,957 records, and another operating without cultivation (greenhouse_dataset_sc.csv) with 118,233 records. The measurements from the system’s sensors and actuators were recorded every minute over a three-month period, from 23 May to 11 September 2021.
2.1. Geographical and Seasonal Environment
The greenhouses are located in Bogotá, Colombia, at 2559 m above sea level. Bogotá is situated in the Eastern Andean range of Colombia, in a flat savanna surrounded by mountains. It experiences a highland subtropical climate characterized by moderate temperatures throughout the year, ranging between 7 °C and 19 °C, with an approximate annual average temperature of 14 °C [
6].
The precipitation pattern in Bogotá is influenced by the trade winds and the Intertropical Convergence Zone (ITCZ). The city experiences two rainy seasons (April–May and October–November) and two dry seasons (December–March and June–September). The average annual precipitation is around 800 mm, with irregular distribution throughout the year [
7].
2.2. Plant Selection
The stringless blue lake bean variety was selected to be cultivated in the greenhouse environment. This cultivation is characterized by a short growth cycle, meaning that harvests are obtained within short periods of time, in this case, within a maximum period of three months. This allowed a shorter time window for data collection, encompassing the behavior of the controlled environment throughout the entire life cycle of the cultivation.
In selecting this cultivation, the uncontrolled environmental characteristics provided by the geographical area were considered, such as the direct light requirements, which for this variety amount to 6 to 8 h of direct sunlight per day. Additionally, the selection ensured scenarios for the activation of the system actuators throughout the life cycle of the cultivation. Therefore, a variety was chosen that is not overly sensitive to climatic changes and is relatively easy to cultivate [
8], prioritizing the characterization of the greenhouse environment.
2.3. Range of Values
created_at: Date of creation of the registry.
hum: Internal relative humidity, range of 0 to .
temp: Internal temperature, temperature range of −40 °C to 80 °C.
light_intensity: UV index of 0 to 11 for the UVA (315 nm a 400 nm) and UVB (280 nm a 315 nm) bands.
luminosity: Luminosity, range of 188 to 88,000 Lux.
ground_humidity_per: Ground humidity, range of 1 to 1023 V transformed from 0 to 100%.
co2_ppm: CO2 concentration, range from 0 to 10,000 ppm.
act_fan: Ventilation activation (0 off, 1 on).
act_solenoid_valve: Irrigation system activation (0 off, 1 on).
act_heating: Activation of the heating system (0 off, 1 on).
2.4. Descriptive Statistics
Descriptive statistics are presented for the two environments managed by microcontrollers: the count of values, the average, the standardized measure of the deviation of the variable from the mean, and the value of each quartile, including the minimum and maximum values.
Table 1 shows the descriptive statistics for the environment with cultivation of stringless blue lake beans, for each of the nine variables analyzed.
Similarly,
Table 2 shows the descriptive statistics for the environment without cultivation.
When comparing the statistics of the data from both environments, it was found that
The average values of humidity, temperature, light intensity, ground humidity, and CO2 concentration were higher in the dataset with cultivation.
The average value of ground humidity was the sensed variable that showed the greatest difference in values between the two environments.
Actuators were only activated in the cultivated environment, showing a tendency for their state to be off.
2.5. Null Values
A description of null values for the datasets, due to network interference during data communication, communication channel congestion in the microcontroller, or power outages in the environments is presented in
Table 3. Similarly, missing values were also recorded for the time period due to the absence of measurements for the environments with and without cultivation, and considering the subtraction of duplicate date records, in addition to the null values.
The occurrence of null values for the actuators in the environment without cultivation was noted, as, although the data collected were from actuators in the off state, they still reported data and also showed missing values due to interference in the communication channel or for the reasons mentioned above.
2.6. Outlier Data
The description of outlier data for the two datasets, for the recorded climatic variables, is presented in
Figure 1 for the dataset with cultivation and
Figure 2 for the dataset without cultivation, with the following findings:
hum: The environment with cultivation presented more outlier data.
temp: The distribution in the environment without cultivation was more centralized.
light_intensity: In the environment without cultivation, the outlier data were much further from the median compared to the environment with cultivation.
luminosity: The distribution of outlier data was similar in both environments.
ground_humidity_per: The environment without cultivation presented a greater number of outlier values than the environment with cultivation.
co2_ppm: The environment without cultivation presented a greater number of outlier values, but they were closer to the median.
The outliers recorded were mostly within the possible range of values that occur in the geographical location where the greenhouses are located, so their appearance represented a distribution that was affected by environmental changes characteristic of the region.
2.7. Data Dispersion
Data dispersion over time is shown in
Figure 3,
Figure 4 and
Figure 5, along with the state of the ventilation, irrigation, and heating actuators, respectively, for the environment with cultivation, providing a perspective on how the activation or deactivation of the actuators influenced the behavior of the system variables over time.
Figure 3,
Figure 4 and
Figure 5 are subdivided into humidity variables in
Figure 3a,
Figure 4a and
Figure 5a; temperature in
Figure 3b,
Figure 4b and
Figure 5b; light intensity in
Figure 3c,
Figure 4c and
Figure 5c; luminosity in
Figure 3d,
Figure 4d and
Figure 5d; ground humidity in
Figure 3e,
Figure 4e and
Figure 5e; and CO
2 in
Figure 3f,
Figure 4f and
Figure 5f. These graphs provide a representation of the interaction between the actuators and the system variables over time, facilitating the identification of patterns, trends, and possible causal relationships in the collected data.
In
Figure 6 dispersion diagrams are used to explore the relationship between the system variables over time. As observed and related in
Table 3, there were time periods without measurements or with missing values. Notably, it is possible to observe that the time periods with data allowed for the characterization of the behavior of humidity variables in
Figure 6a, temperature in
Figure 6b, light intensity in
Figure 6c, luminosity in
Figure 6d, ground humidity in
Figure 6e, and CO
2 in
Figure 6f.
In the datasets, a greater similarity is evident in the distributions of the temperature and luminosity variables, with differences in the dispersion of all variables and in the presence of limit values.
2.8. Normality
The results of applying the Shapiro–Wilk test to each of the fields in the datasets are shown in
Table 4.
Taking into account a statistic close to 1, the results from
Table 4 indicate that the variables temperature, ground humidity in the cultivation environment, and relative humidity and ambient temperature in the non-cultivation environment have a low probability of rejecting the null hypothesis that the sample results from a normal distribution.
2.9. Symmetry and Kurtosis
Skewness and kurtosis values were calculated for the variables in both the cultivation and non-cultivation environments, as shown in
Table 5.
As shown in
Table 5, the variables light intensity, CO
2 concentration, irrigation, and heating actuators had high values in kurtosis for the environment with cultivation, as well as light intensity and CO
2 concentration in the environment without cultivation. This indicates more extreme outlier values than a normal distribution, with longer tails. On the other hand, the temperature and soil humidity variables had skewness values closer to zero for the environment with cultivation, as did the relative humidity and ambient temperature variables in the environment without cultivation, which corresponds to the findings of the normality test.
A visualization of the skewness from the frequency plot for the environments is shown in
Figure 7 subdivided into the environment with cultivation in blue for the humidity variables in
Figure 7a, temperature in
Figure 7c, light intensity in
Figure 7e, luminosity in
Figure 7g, ground humidity in
Figure 7i, and CO
2 in
Figure 7k; and in the environment without cultivation, in purple, for the variables of humidity in
Figure 7b, temperature in
Figure 7d, light intensity in
Figure 7f, luminosity in
Figure 7h, ground humidity in
Figure 7j, and CO
2 in
Figure 7l.
Figure 7 shows that the presence of cultivation significantly affected the distribution of ambient and ground humidity, this is evidenced by the fact that the humidity data for the environment with cultivation are mostly distributed around values close to 80 or 100, unlike the environment without cultivation, where the values are uniformly concentrated around 70, forming a normal distribution. Similarly, the soil humidity in the environment with cultivation is distributed within a range of 10 to 90, while in the environment without cultivation, it is concentrated in two distinct groups: one primarily between 0 and 10, and another between 55 and 65.
2.10. Correlation Matrix
The correlation of variables in the environment with cultivation is shown in
Table 6, where the presence of cultivation increased the correlation values among most environmental variables, for 10 of the correlations. In contrast to the correlation matrix in
Table 7, the correlations are generally weaker, indicating that cultivation affected the interactions among environmental variables.
By comparing
Table 6 and
Table 7, and subtracting the absolute values of their correlations, excluding the diagonals, we find that in only 5 out of the 15 correlations, the environment without cultivation shows higher correlation values. These cases involve the correlations of luminosity with humidity, temperature, and ground humidity; humidity with soil humidity; and CO
2 concentration with light intensity.
2.11. Effect of Actuators
In
Table 8, it is shown how the actuators ventilation, irrigation, and heating affected the environmental variables in the environment with cultivation, relating the mean values and standard deviation of each variable filtered by the occurrence of the corresponding combination of actuators in their off (0) or on (1) states. The combination of all actuators being on is not recorded in
Table 8 because this only occurred in one record of the dataset.
Table 8 presents changes in the variables resulting from the activation of the actuators, highlighting some representative cases for each variable:
Humidity: A maximum average variation of 17.96% with the activation of ventilation and heating.
Temperature: A maximum average variation of 4.96 degrees with the activation of the irrigation system and heating.
Light Intensity: A maximum average variation of 0.03 UV indices with the activation of the irrigation system and heating. It should be noted that this value is influenced by changes in natural light.
Luminosity: A maximum average variation of 447.3 lux with the activation of the irrigation system and heating. Similarly, this value is affected by changes in natural light.
Ground humidity: A minimum average variation of 0.69% and 0.59% with the activation of the ventilation–heating and ventilation–irrigation systems, respectively.
CO2 concentration: A maximum average variation of 1541.43 ppm with the activation of the irrigation and heating systems.
3. Methods
The recording of values was conducted from 23 May to 11 September 2021, at one-minute intervals through data transmission over the Internet. Two identical greenhouse environments were used, each equipped with a set of sensors and actuators for operation. Among the sensors were an internal relative humidity sensor, an internal temperature sensor, a ground humidity sensor, a light intensity sensor, a CO2 concentration sensor, and a luminosity sensor. Additionally, the actuators included a ventilation system, a solenoid valve irrigation system, and a heating system.
These devices allowed monitoring and recording of key environmental variables such as relative humidity, temperature, ground humidity, light intensity, CO2 concentration, and luminosity. Additionally, the actuators provided the capability to control factors such as ventilation, irrigation, and heating within the greenhouses. The control devices were only used in one of the greenhouse environments, while in the other environment, stringless blue lake beans were cultivated.
Both environments communicated with the network through wireless connection. They were placed in parallel positions under equal lighting, ventilation, and natural heating conditions, reflecting the climatic conditions of Bogotá D.C. Similarly, both had fresh soil for cultivation at the start of data recording.
For the data description, the following activities were conducted:
Loading of the two datasets.
Descriptive statistics of each dataset.
Identification of null and missing values over the time period.
Identification of outliers using a Boxplot, where quartile values were represented along with a line at the median, considering outliers as those that exceeded
, where
[
9].
Data visualization with a dispersion diagram.
Shapiro–Wilk normality test, excluding null values. [
10].
Calculation of skewness using Pearson’s method, excluding null values [
11].
Calculation of kurtosis using Fisher–Pearson coefficient, excluding null values [
12].
Visualization of skewness from frequency plots.
Calculation of correlations among sensed variables using Pearson’s method [
13].
Calculation of the effect of actuators on variables in the environment with cultivation, grouping the average statistic for each combination of actuators and using the combination with all actuators off as reference.
4. User Notes
Some considerations are presented about use of the dataset, taking advantage of its characteristics:
Time structure: This dataset presents a time series structure with a granularity of one minute. It is recommended to use time series analysis techniques to explore patterns, trends, and seasonalities.
Treatment of missing data: The presence of null values requires consideration when using the entire dataset. It is suggested to evaluate methods for the treatment of missing values or, if required, the use of sections where they are not present in the majority. If you choose to use a section of the dataset, it is recommended to either remove records with null values or replace them using interpolation based on adjacent values or nearest neighbor imputation algorithms, such as KNN. Regression algorithms are not recommended, as the variables may exhibit dynamic, non-linear, and distributed parameter characteristics. Additionally, the dataset allows the application and testing of models that can adapt to the presence of missing values; in this case, no imputation is required.
Outliers: To address outliers, it is recommended to apply an elimination method, retaining only those that fall within the range of possible values for the analyzed region of Bogotá, Colombia. Retention should be based on whether the values have either similar values or close neighbors in the scatter diagram.
Non-normal distributions: The results of the Shapiro–Wilk test indicated that several variables did not follow a normal distribution. This aspect should be considered when selecting appropriate statistical methods, possibly opting for non-parametric approaches when necessary.
Correlation analysis: The correlation matrices revealed complex interactions between variables, suggesting the need for modeling approaches that can capture these interdependencies.
Impact of actuators: An effect of the actuators on the environmental variables was evident. This aspect deserves detailed analysis, potentially through intervention models or interrupted time series analysis.
Geographic and climatic contextualization: The data were collected in specific geographical and climatic conditions of Bogotá, Colombia, that is, the Neotropical zone. It is recommended to consider these factors when interpreting the results or when comparing with studies in different contexts.
Comparison of environments: The differences observed between environments with and without cultivation offer opportunities for comparative analysis. It is suggested to explore statistical methods to quantify and characterize these differences. Some of the differences described in the text are evident in the descriptive statistics, such as the average ground humidity; the differentiation in the presence of outliers in the climatic variables; the distribution and normality values, which were influenced by the actuators in the environment with cultivation; and the stronger correlations observed in the environment with cultivation.
Author Contributions
Conceptualization, S.-C.V.-A. and J.B.-V.; methodology, S.-C.V.-A.; software, O.-M.G.-C.; validation, S.-C.V.-A., O.-M.G.-C., and A.R.-P.; formal analysis, D.-D.L.-L.; investigation, S.-C.V.-A. and D.-D.L.-L.; resources, J.B.-V.; data curation, A.R.-P.; writing—original draft preparation, O.-M.G.-C. and A.R.-P.; writing—review and editing, S.-C.V.-A.; visualization, D.-D.L.-L.; supervision, J.B.-V.; project administration, S.-C.V.-A. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Vatistas, C.; Avgoustaki, D.D.; Bartzanas, T. A Systematic Literature Review on Controlled-Environment Agriculture: How Vertical Farms and Greenhouses Can Influence the Sustainability and Footprint of Urban Microclimate with Local Food Production. Atmosphere 2022, 13, 1258. [Google Scholar] [CrossRef]
- Maraveas, C.; Karavas, C.S.; Loukatos, D.; Bartzanas, T.; Arvanitis, K.G.; Symeonaki, E. Agricultural Greenhouses: Resource Management Technologies and Perspectives for Zero Greenhouse Gas Emissions. Agriculture 2023, 13, 1464. [Google Scholar] [CrossRef]
- Koukounaras, A. Advanced Greenhouse Horticulture: New Technologies and Cultivation Practices. Horticulturae 2021, 7, 1. [Google Scholar] [CrossRef]
- Zhao, X.; Han, Y.; Lewlomphaisarl, U.; Wang, H.; Hua, J.; Wang, X.; Kang, M. Parallel Control of Greenhouse Climate with a Transferable Prediction Model. IEEE J. Radio Freq. Identif. 2022, 6, 857–861. [Google Scholar] [CrossRef]
- Ullah, I.; Fayaz, M.; Aman, M.; Kim, D. Toward Autonomous Farming—A Novel Scheme Based on Learning to Prediction and Optimization for Smart Greenhouse Environment Control. IEEE Internet Things J. 2022, 9, 25300–25323. [Google Scholar] [CrossRef]
- Instituto de Hidrología, Meteorología y Estudios Ambientales. Características Climatológicas de Ciudades Principales y Municipios Turísticos. 2024. Available online: http://www.ideam.gov.co/documents/21021/418894/Características+de+Ciudades+Principales+y+Municipios+Turísticos.pdf/c3ca90c8-1072-434a-a235-91baee8c73fc (accessed on 21 June 2024).
- Instituto de Hidrología, Meteorología y Estudios Ambientales; Fondo de previsión y Atención de Emergencias. Estudio de la Caracterización Climática de Bogotá y Cuenca Alta del rí o Tunjuelo. 2024. Available online: http://www.ideam.gov.co/documents/21021/21135/CARACTERIZACION+CLIMATICA+BOGOTA.pdf/d7e42ed8-a6ef-4a62-b38f-f36f58db29aa (accessed on 21 June 2024).
- Ferry-Morse. Bean, Blue Lake Stringless Pole Organic Seeds. 2024. Available online: https://ferrymorse.com/products/bean-blue-lake-stringless-pole-organic-seeds (accessed on 16 August 2024).
- NumFOCUS Inc. DataFrame Boxplot. 2024. Available online: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html (accessed on 21 June 2024).
- The SciPy Community. SciPy Shapiro-Wilk. 2024. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html (accessed on 21 June 2024).
- The SciPy Community. SciPy Skew. 2024. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html (accessed on 21 June 2024).
- The SciPy Community. SciPy Kurtosis. 2024. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html (accessed on 21 June 2024).
- NumFOCUS Inc. DataFrame Correlation. 2024. Available online: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html (accessed on 21 June 2024).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).