Development of Growth Estimation Algorithms for Hydroponic Bell Peppers Using Recurrent Neural Networks

: As smart farms are applied to agricultural ﬁelds, the use of big data is becoming important. In order to efﬁciently manage smart farms, relationships between crop growth and environmental conditions are required to be analyzed. From this perspective, various artiﬁcial intelligence algorithms can be used as useful tools to quantify this relationship. The objective of this study was to develop and validate an algorithm that can interpret the crop growth rate response to environmental factors based on a recurrent neural network (RNN), and to evaluate the algorithm accuracy compared to the process-based model (PBM). The algorithms were trained with data from three growth periods. The developed methods were used to measure the crop growth rate. The algorithm consisted of eight environmental variables days after transplanting and two crop growth characteristics as input variables producing weekly crop growth rates as output. The RNN-based crop growth rate estimation algorithm was validated using data collected from a commercial greenhouse. The CropGro-bell pepper model was applied to compare and evaluate the accuracy of the developed algorithm. The training accuracies varied from 0.75 to 0.81 in all growth periods. From the validation result, it was conﬁrmed that the accuracy was reliable in the commercial greenhouse. The accuracy of the developed algorithm was higher than that of the PBM. The developed algorithm can contribute to crop growth estimation with a limited number of data.


Introduction
Smart farming is about applying new digital technologies such as remote sensing [1], cloud computing [2], and the Internet of Things (IoT) [3] to agricultural fields. These digital technologies contribute to large amounts of data at unprecedented rates [4,5]. Based on the increasing number of data, many studies trying to utilize big data were conducted [6]. For the efficient use of big data from the agricultural fields, it is necessary to quantitatively analyze the complex, diverse, and unpredictable relationship between crop growth and environmental factors [7]. Many studies have been conducted using processbased models (PBMs) to analyze the relationship [8]. PBM consists of many modules that express various crop physiological processes (e.g., photosynthesis, respiration, biomass assimilation, biomass distribution, and stress response). As PBM aims to include all biochemical functions, various modules are subjected to interlinked calculations for even a single variable and calibration of many indexes is required [9,10]. In addition, it is important to partition crop organ biomass to accurately simulate models because PBM estimates biomass production through the distribution to each organ [11,12]. In these respects, PBM has limitations directly utilizing big data that have been automatically accumulated.
An artificial neural network (ANN) provides a way of analyzing complex, non-linear, and multidimensional datasets from big data [13], and can abstract quantitative relation-ships from raw data [14]. ANN has been widely used in agricultural studies to analyze the biochemical and physiological characteristics for various crops [15][16][17].
Among ANN algorithms, the recurrent neural network (RNN) is the most promising for analyzing chronological data and displays better accuracy than previous algorithms [18,19]. RNNs have the advantage of inputting big data over long time periods and the length of output values is also theoretically unlimited [20]. With these advantages, RNNs have been adapted for agricultural purposes and showed higher accuracies than other algorithms [21][22][23]. The previous studies showed the adaptability of RNNs for environmental data but the RNNs were not trained to directly relate the environment and crop growth. As crop growth responses to the environment are determined by changes over the time of environmental factors, RNN would be appropriate to estimate the crop growth response for cumulative environmental changes. This study aims to develop an RNN algorithm to assess crop growth in response to various environmental factors in hydroponically grown bell peppers. In addition, the developed algorithm was validated by comparing it with an existing PBM.

Crop Growth Conditions for Algorithm Training
The data collecting for algorithm training were conducted in a Venlo-type greenhouse on the experimental farm of Seoul National University, Suwon, Korea (latitude, 37.3 • N; longitude, 127.0 • E) during 1 February to 1 June 2018 (growth period 1) and 1 December 2018 to 1 April 2019 (growth period 2), and in a Venlo-type greenhouse on the experimental farm of Nong Woo Bio Ansung, Korea (latitude, 37.0 • N; longitude, 127.0 • E) during 1 September to 1 December 2018 (growth period 3). The data collecting for validation is described in Section 2.4. The vents on the roof and sidewall were automatically opened when the temperature was higher than 26 • C during the day. Bell pepper seedlings (Capsicum annuum L. 'Sirocco') 40 days after sowing on rockwool cubes (Grodan delta, Grodan, Roermond, The Netherlands) in a seedling chamber were used. After two weeks of acclimatization to the irrigation system at an electrical conductivity (EC) of 2.0 dS/m, the seedlings with 5-6 nodes were transplanted into 0.9 * 0.15 * 0.07 m (L * W * H) rockwool slabs (Grotop GT Master Dry, Grodan, Roermond, The Netherlands) and placed on gutters with a plant density of 3.3 plants/m 2 . The nutrient solution EC and pH were maintained at 2.6-3.0 dS/m and 5.5-6.5, respectively. The plants were pruned to maintain two main stems, which were vertically trellised to a 'V' canopy system [24].

Data Collection and Preprocessing
The environmental data in the greenhouse, such as solar radiation, temperature, and relative humidity, were measured using a pyranometer (SP-110, Apogee Instruments, Logan, USA), temperature sensor (CS220, Campbell Scientific, Logan, UT, USA), and relative humidity sensor (PCMini70, Gilwoo Trading, Seoul, Korea), respectively. The substrate's moisture content was measured using a multiple frequency domain reflectometry (FDR) sensor (WT1000B, Mi-Rae Sensor, Seoul, Korea) located in the middle of the substrate. Crop growth data such as leaf area (LA) and fresh weight were also collected. The leaf area (LA) of the crop was calculated by substituting the weekly measured leaf length (L), leaf width (W), and node numbers (N) for Equation (1) [25].
The fresh weight measurement system ( Figure 1) [26] was designed to support the whole crop cultivation system and the weight of the whole system was measured using a tensile type of load cell. Furthermore, the water weight in the substrate was corrected with the FDR sensor to derive only the crop fresh weight. The crop fresh weight was measured at dawn (03:00-05:00) when the crop's relative moisture content was stable. With this method, the daily change in fresh weight was continuously collected. All of the environmental and growth characteristic data were normalized in the range of 0-1. The total data size is 59,168.

Recurrent Neural Network Application
The RNN has been used to analyze the crop growth rate response to various environmental factors. Among the various RNN algorithms, long short-term memory (LSTM) could solve the RNN vanishing gradient problem [20] and LSTM has a cell with several gates. In this study, input and output activation functions were set to the hyperbolic tangent functions and the gate activation functions were set to sigmoidal functions.
Environmental factors, crop growth factors, and days after transplant (DAT) were used as input data, while crop growth rate was used as output data. The environmental factors consisted of air temperature, relative humidity, light intensity of solar radiation, CO 2 concentration, and substrate moisture content. The crop growth factors consisted of leaf area index (LAI) and fresh weight. The crop growth rate was calculated as a weekly change in the fresh weight. The time step for the LSTM was set as seven days. AdamOptimizer was used for the algorithm training [27] and the hyperparameters for the LSTM and AdamOptimizer were empirically changed to solve regression problems ( Table 1). The core of LSTM is also a neural network, thus the algorithm has a hidden layer in its structure. As LSTM does not require deep layers because of its time step, one hidden layer was set for the model structure. The number of nodes in the hidden layer was 64. Seventy percent of the total data were randomly selected and used for algorithm training, and the remainder of the data were employed for accuracy tests of the training results. A 5-fold cross test was conducted to include all the cultivation data [28]. The mean square error (MSE) was set as a cost for reducing the computation. In the model training, the cost for test data was also checked. To avoid overfitting, the trained model with the lowest MSE for the test data was selected as a best model. TensorFlow (v. 1.12.0, Google, Menlo Park, CA, USA) was used for computation and model construction.

Crop Growth Conditions for Validation
Validation was conducted in Venlo-type glasshouses on a commercial farm located in Jinju, Korea (latitude, 35.1 • N; longitude, 128.0 • E) during 1 August 2017 to 1 June 2018. Environmental control and crop cultivation management in the greenhouse were performed similarly to the crop growth conditions for the algorithm development. The fresh weight of the crop was manually measured weekly for each organ. Equation (2) was used to compensate for the fresh fruit weight (FFT) that was dropped in a week.
where WF and LF are the fruit length and fruit diameter, respectively.

Evaluation of the Growth Rate Algorithm
A PBM (CropGro-Bell Pepper) was used to evaluate the accuracy of the algorithm. CropGro is a PBM platform that applies models to various crops, such as soybean, peanut, dry bean, faba bean, macuna, chickpea, cowpea, velvet bean, cotton, pasture, and tomato [29,30]. CropGro-Bell Pepper is a variant of the soybean model that reflects the bell pepper's genotype and ecotype. The model parameters (Tables 2 and 3) were calculated from the growth survey and calibrated with the GLUE coefficient estimator [31]. The weekly crop growth rate was estimated by using the environmental and crop growth factors collected during the growth period to validate the RNN algorithm. The fresh crop weight was calculated by integrating the estimated crop growth rate. Decision Support Systems for Agrotechnology Transfer v4.7 (DSSAT) [32,33] was used to simulate bell pepper growth rate and validate the PBM. As the crop growth rate from the PBM was based on dry matter, each organ (leaf, stem, root, and fruit) was calculated as fresh weight using the rate based on fresh weight and growth stage. The accuracy was evaluated by comparing the fresh crop weight estimated by the RNN algorithm and the PBM with the actual fresh weight.

Variable Collection for Algorithm Training
Environmental data in the greenhouse were measured during three growth periods (1 September 2018-1 December 2018 for growth period 1; 1 December 2018-1 April 2019 for growth period 2; and 1 February 2018-1 June 2018 for growth period 3) as shown in Figure 2. The fruit yield and the abortion weight of the crop measured as fresh weight are shown in Figure 3. In growth period 3, the fruits were kept without harvest until the end of the data collection. The RNN algorithm estimated crop growth rate as negative because the fruit harvest caused a drastic reduction in fresh weights. To compensate the negative effects of the total fresh weight calibrated, weight was calculated by adding the weight of harvested or aborted fruits to the current value weights [35] (Figure 4). The crop growth rate was calculated using weekly changes in the calibrated weight and was applied as an algorithm. The LAI and crop-calibrated fresh weight exhibited sigmoidal growth patterns with DAT ( Figure 4). A drastic decrease in fresh weight occurred during growth period 2 ( Figure 4c).

Crop Growth Rate Estimation of the Algorithm
The trained RNN algorithm estimated the actual crop growth rate similarly to the actual calibrated fresh weight ( Figure 5). The d-statistic, indicating estimation accuracy, was 0.727, 0.806, and 0.748 for each growth period, respectively. From these results, the RNN algorithm could reliably estimate the actual crop growth rate. These results are similar to the previous studies that estimated the crop growth rate using ANNs [36][37][38]. While the accuracy in the previous study had been obtained by using a large amount of data from a large scale farm to train the ANN, the accuracy in this study used the data from a single greenhouse, but it was high due to the characteristics of RNN that interprets data chronologically.  During growth period 2, crop growth rate was dramatically reduced for about two weeks because the crop stem was broken (on 16 January 2019) ( Figure 6). The physical injury increased the maintenance respiration [39,40], resulting in the reduction of the crop growth rate over several days. The RNN specializes in interpreting data with time intervals between input and output variables [41]. The RNN algorithm estimated the current crop growth rate based on input variables from the past week. The time interval between the input and output variables affected the estimation accuracy during crop physical injury and its recovery. The actual crop growth rate rapidly decreased after January 16 and then recovered after several days. However, the RNN algorithm estimated the growth rate higher than the actual growth rate (during 16-23 January) because the algorithm used input variables from 9-16 January. RNN, which can estimate current situations (output variables) with past data (input variables), has the advantage of analyzing the cumulative effect of environmental factors but is disadvantaged when immediately reacting to unexpected changes in conditions.

Calibration and Simulation of the PBM
The CropGro-bell pepper model was calibrated from the growth cultivation for algorithm training (Tables 4 and 5). The automatically calculated parameters were re-calibrated to match the previous study results [42]. The crop growth is divided into seven stages in the PBM and no further growth is estimated after the crop reaches the R7 stage. Additionally, when the photothermal day is above a particular value (e.g., past 200 days after the first flowers, leaves no longer expand [42]), it estimated that the growing point no longer develops, resulting in no appearance of new nodes [43]. However, because bell pepper continuously grows until the end of the growth period in greenhouse conditions, effects of these parameters should be minimized. Boote et al. [44] revised the infinite values of SD-PM, the time between the first seed (R3), physiological maturity (R7), and FL-VS (time from the first flower to the last leaf on the main stem) for greenhouse-grown tomatoes to represent the tomato growth curve as an infinite form. Therefore, SD-PM and FL-VS were set to 330 because the growth and development of bell peppers continued until the end of the measurement period.  The calibrated PBM was able to simulate the organ-specific growth of crops (Figure 7). In the early periods of the growth stage, the growth rate of roots was faster but then decreased, while the growth rate of stems and leaves continued to increase. From the first fruit development time, fruit growth was a major factor in the whole crop growth. Consequently, fruit sink strength is very high compared to that of bell pepper organs [45]. The ratio of dry-fresh weight was highest in fruits and lowest in roots (data not shown). In other crop organs, the dry-fresh weight ratio of organs did not change depending on the growth stage, except for stems for which the ratio was highest in the initial growth stage and constant from the middle growth stage. As the PBM calculated the crop growth rate based on dry weight, the fresh weight was computed with the dry-fresh weight ratio measured at every growth stage.

Validation and Evaluation of the Algorithm
The environmental variables collected in the validation were in the range of Figure 8. In the validation, the RNN algorithm estimated the crop growth rate with reasonable accuracy (Figure 9), although the accuracy was relatively low compared to the training test. The RNN algorithm could estimate the fresh weight with higher accuracy than the PBM (Figure 10). Furthermore, the RNN algorithm had a lower RMSE value and a higher d-statistic than the PBM. Although the amount of data used to optimize the RNN training was insufficient, the RNN algorithms developed were more accurate than the PBM. The RNN algorithm when using daily crop growth data could precisely reflect environmental conditions. In addition, the RNN algorithm estimated the crop growth for typical bell pepper cultivation periods (autumn-spring for validation), even from learning data collected during separate growth periods (winter-spring for periods 1 and 2 and autumn-winter for period 3). These results present the possibility of utilizing the data collected under various seasonal environmental conditions without necessarily continuing the completed cultivation to learn the RNN algorithm.

Advantages and Limitations
Due to the demands of modeling and quantitative technical demands for big data collected from smart farms [46], research on estimating crop growth using machine learningbased algorithms is being attempted [16,47,48]. In this study, it was possible to estimate the crop growth rate in a single greenhouse with a reliable accuracy level using RNN. The developed algorithm improved the estimation accuracy compared to the current machine learning algorithm [36][37][38] as well as the existing PBMs.
These studies using crop growth data also have limitations. In PBMs, the parameters of the model are determined to reflect the physiological crop characteristics. In general, the process of estimating parameters may cause specific coefficients to fall outside the normal range. However, these parameters can be altered by manually calibrating them to more realistic values. Although many studies have shown that ANNs exhibit superior predictive powers compared to conventional approaches, they do provide little explanation for the relative influence of the independent variables in the prediction process. In addition, the RNN algorithm does not provide information on the growth of specific crop organs, such as leaves, stems, and fruits. Existing PBMs estimate the total biomass by obtaining the biomass distribution to each organ [11,12] and can estimate the target organ [10]. Therefore, for the practical application of the RNN algorithm, additional research that can continuously estimate the growth rate of each organ is needed.

Conclusions
RNN-based algorithms were developed to estimate the crop growth response to environmental factors. An RNN algorithm was designed to estimate crop growth, consisting of six environmental variables, two crop growth factors, and DAT as input variables, and produced a weekly crop growth rate with reliable accuracy. The RNN algorithm estimated crop growth with higher accuracy than the conventional PBM in the validation. The RNN algorithm can be used to analyze the relationship between crop growth and environmental factors, and estimate crop growth with a limited number of data. The RNN algorithm developed in this study enables the quantitative analysis of crop growth for the environment in greenhouses and we expect that this method can be useful for designing optimal technologies for environmental control in greenhouses.