Growth Analysis of Plant Factory-Grown Lettuce by Deep Neural Networks Based on Automated Feature Extraction

: The mechanisms of lettuce growth in plant factories under artiﬁcial light (PFALs) are well known, whereby the crop is generally used as a model in horticultural science. Deep learning has also been tested several times using PFALs. Despite their numerous advantages, the performance of deep learning models is commonly evaluated based only on their accuracy. Therefore, the objective of this study was to train deep neural networks and analyze the deeper abstraction of the trained models. In total, 443 images of three lettuce cultivars were used for model training, and several deep learning algorithms were compared using multivariate linear regression. Except for linear regression, all models showed adequate accuracies for the given task, and the convolutional neural network (ConvNet) model showed the highest accuracy. Based on color mapping and the distribution of the two-dimensional t-distributed stochastic neighbor embedding (t-SNE) results, ConvNet effectively perceived the differences among the lettuce cultivars under analysis. The extension of the target domain knowledge with complex models and sufﬁcient data, similar to ConvNet with multitask learning, is possible. Therefore, deep learning algorithms should be investigated from the perspective of feature extraction.


Introduction
Lettuce is a relatively easy to grow leafy vegetable that has been thoroughly characterized physiologically, such that it is generally used as model crop in horticultural science and is commercially cultivated in plant factories with artificial lighting (PFALs) worldwide [1]. Various advanced technologies have been tested based on the technology-intensive characteristics of PFALs [2][3][4]. Thus, our extensive knowledge of the growth mechanisms of lettuces in PFALs facilitates the analysis of the influence of the new technology.
As an advanced technology, deep learning has been introduced several times [5] and has shown state-of-the-art performance in many fields [6][7][8][9]. Deep learning can be used to conduct high-level abstraction based on automated feature extraction. Further, deep learning algorithms have high adaptability, which enables the reutilization of the models developed, such that they can be easily adapted to other fields of knowledge.
Recently, the application of deep learning in agriculture has increased [9][10][11]. Particularly, diverse deep learning models have been applied to horticultural data generated in controlled environments, such as greenhouses and PFALs. Further, the models are also applicable to horticultural open fields. However, most of the reports in agricultural science and engineering have focused on the technical improvement of the model accuracy, while other advantages of deep learning, such as automated feature extraction and high adaptability, have been neglected. The regression and classification of limited data are relatively easy tasks for machine learning techniques; therefore, adequate training might reduce the use of unnecessarily complicated models [12,13]. High accuracy for agricultural tasks can be easily achieved with canonical models that have long been used. Deep learning models have different advantages compared with existing models. Therefore, evaluating the performance of deep learning models based on accuracy alone is an unfortunate waste of analytical potential of the technology.
Automated feature extraction and high-level abstraction not only provide high performance but enable the development of novel analytic methods as well [14]. Achieving high performance for a given task with a complicated input and structure requires sophisticated optimization to facilitate the extraction of unprecedented information, which is difficult using existing methodologies, such as multivariate regression. This potential of deep learning models has led to the development of analytical methods for neural networks, although deep learning models are black box models [15][16][17]. An adequate analysis of the trained neural network might provide more value for setting horticultural hypotheses. Therefore, the objective of this study was to train deep neural networks and analyze the deeper abstraction of the trained models. The target growth variables were fresh weight (FW), dry weight (DW), the number of leaves (NL), the leaf area (LA), and soil plant-analysis development (SPAD). These parameters were analyzed with two-dimensional t-distributed stochastic neighbor embedding (t-SNE), which is often used for dimension reduction to understand entangled representation spaces; thus, the methodology might be used to interpret the behavior of deep learning algorithms. This study revealed that it is possible to extend the target domain knowledge with complex models and sufficient data, similar to ConvNet with multitask learning. Therefore, deep learning algorithms should be investigated from the perspective of feature extraction.

Plant Materials and Growth Conditions
Seeds of lettuce (Lactuca sativa L.) cultivars Corbana, Caipira, and Fairy (Enza zaden, Enkhuizen, The Netherlands) were sown in sponge cubes (35 × 35 × 30 mm, L × W × H) and placed on 60-hole trays in a PFAL module. During the first three days, lighting was continuously provided by warm, white light-emitting diodes (LEDs) at an intensity of 230 µmol m −2 s −1 , and ambient temperature and relative humidity were set to 24 • C and 90%, respectively. Finally, the germination sponge cubes were sub-irrigated with distilled water; then, during the following 11 days, Sonneveld nutrient solution [18] was supplied with an electrical conductivity and acidity maintained at 0.8 dS m −1 and 5.8, respectively. Then, the seedlings of each cultivar were transplanted to three cultivation shelves with semi-DFT water channels (3540 × 80 × 58 mm, L × W × H) covered with perforated tops, which allowed for the growth of individual plants 11 cm apart from each other. After transplanting, the environmental conditions were changed as follows: a light period of 16 h, 21 • C ambient temperature, and 80% relative humidity. The cultivation space was divided into nine sections, and 32 plants per section were cultivated for four weeks (Figure 1). After transplanting, the electric conductivity and pH of the Sonneveld nutrient solution were changed to 1.2 dS m −1 and 5.8, respectively.

Plant Growth and Image Data Collection
Three independent growth experiments were conducted, from which growth data were collected at different intervals to acquire unbiased data ( Figure 2). Shoot fresh weight was measured using a digital precision scale (SI-234; Denver Instruments, Denver, CO, USA), and leaf area was measured using a leaf area meter (Li-3100C; Li-COR, Lincoln, NE, USA). The shoots were dried in a forced-air drying oven (VS-1202D3, Vision, Daejeon, Korea) at 70 • C for more than 72 h to determine the dry weight. Soil plant-analysis development (SPAD), which represents the chlorophyll content of leaves, was measured using a SPAD meter (SPAD-502, Konica Minolta, Tokyo, Japan).

Plant Growth and Image Data Collection
Three independent growth experiments were conducted, from which growth data were collected at different intervals to acquire unbiased data ( Figure 2). Shoot fresh weight was measured using a digital precision scale (SI-234; Denver Instruments, Denver, CO, USA), and leaf area was measured using a leaf area meter (Li-3100C; Li-COR, Lincoln, NE, USA). The shoots were dried in a forced-air drying oven (VS-1202D3, Vision, Daejeon, Korea) at 70 °C for more than 72 h to determine the dry weight. Soil plant-analysis development (SPAD), which represents the chlorophyll content of leaves, was measured using a SPAD meter (SPAD-502, Konica Minolta, Tokyo, Japan). A blackout box (560 × 560 × 560 mm, L × H × W) equipped with white LEDs was used to obtain top-view images of the crops ( Figure 3A). Each plant was placed in the center of the box, and images were taken at a height of 50 cm using a smartphone (Galaxy S20, Samsung Electronics Inc., Suwon, Korea). The image size was 3024 × 3024 pixels with RGB indices ( Figure 3B).

Model Structure
In this study, we adopted convolutional neural networks (ConvNets) to design the structure of the model as the specific task was to convert lettuce images into growth

Plant Growth and Image Data Collection
Three independent growth experiments were conducted, from which growth data were collected at different intervals to acquire unbiased data ( Figure 2). Shoot fresh weight was measured using a digital precision scale (SI-234; Denver Instruments, Denver, CO, USA), and leaf area was measured using a leaf area meter (Li-3100C; Li-COR, Lincoln, NE, USA). The shoots were dried in a forced-air drying oven (VS-1202D3, Vision, Daejeon, Korea) at 70 °C for more than 72 h to determine the dry weight. Soil plant-analysis development (SPAD), which represents the chlorophyll content of leaves, was measured using a SPAD meter (SPAD-502, Konica Minolta, Tokyo, Japan). A blackout box (560 × 560 × 560 mm, L × H × W) equipped with white LEDs was used to obtain top-view images of the crops ( Figure 3A). Each plant was placed in the center of the box, and images were taken at a height of 50 cm using a smartphone (Galaxy S20, Samsung Electronics Inc., Suwon, Korea). The image size was 3024 × 3024 pixels with RGB indices ( Figure 3B).

Model Structure
In this study, we adopted convolutional neural networks (ConvNets) to design the structure of the model as the specific task was to convert lettuce images into growth A blackout box (560 × 560 × 560 mm, L × H × W) equipped with white LEDs was used to obtain top-view images of the crops ( Figure 3A). Each plant was placed in the center of the box, and images were taken at a height of 50 cm using a smartphone (Galaxy S20, Samsung Electronics Inc., Suwon, Korea). The image size was 3024 × 3024 pixels with RGB indices ( Figure 3B).

Plant Growth and Image Data Collection
Three independent growth experiments were conducted, from which growth data were collected at different intervals to acquire unbiased data ( Figure 2). Shoot fresh weight was measured using a digital precision scale (SI-234; Denver Instruments, Denver, CO, USA), and leaf area was measured using a leaf area meter (Li-3100C; Li-COR, Lincoln, NE, USA). The shoots were dried in a forced-air drying oven (VS-1202D3, Vision, Daejeon, Korea) at 70 °C for more than 72 h to determine the dry weight. Soil plant-analysis development (SPAD), which represents the chlorophyll content of leaves, was measured using a SPAD meter (SPAD-502, Konica Minolta, Tokyo, Japan). A blackout box (560 × 560 × 560 mm, L × H × W) equipped with white LEDs was used to obtain top-view images of the crops ( Figure 3A). Each plant was placed in the center of the box, and images were taken at a height of 50 cm using a smartphone (Galaxy S20, Samsung Electronics Inc., Suwon, Korea). The image size was 3024 × 3024 pixels with RGB indices ( Figure 3B).

Model Structure
In this study, we adopted convolutional neural networks (ConvNets) to design the structure of the model as the specific task was to convert lettuce images into growth

Model Structure
In this study, we adopted convolutional neural networks (ConvNets) to design the structure of the model as the specific task was to convert lettuce images into growth parameters. ConvNets consist of several convolution layers, and the convolution process effectively abstracts the input into the desired output [5]. In particular, this is a state-of-art algorithm and has shown the highest performance in computer vision [10,19].
Additionally, some variations of ConvNet were included for comparison. Multitask learning helps the model generalize the data relation [20,21]. In this study, the target parameters were distinctive; therefore, multitask learning was introduced.
The feedforward neural network (FFNN) and long short-term memory (LSTM) algorithms were used as comparable deep learning models because these are representative algorithms in deep neural networks. Further, as a conventional methodology, multivariate linear regression (LinReg) was conducted as a baseline. Model structures and hyperparameters were empirically optimized (Tables 1 and 2).

Data Preprocessing
A total of 443 images were collected. The images were re-sized and augmented as the model inputs. The original images were re-sized to 128 × 128 pixels. Input images were augmented by flipping, shifting, and rotating the original images. Processed images were directly fed to ConvNets after rearrangement for training the FFNN and LSTM ( Table 1). The values from the red, green, and blue channels of the images were summed up for linear regression. Growth-related data were normalized to 0-1. Input and output data were combined into datasets.

Model Training, Validation, and Evaluation
All the models were trained to minimize the mean squared error (MSE). Five-fold cross-validation was conducted to ensure model robustness with a limited number of data. Datasets were divided into training and validation sets for each fold at a ratio of 8:2. Models were evaluated based on R 2 and the root mean square error (RMSE). After model training, the trained ConvNet with multitask learning was tested using two-dimensional t-distributed stochastic neighbor embedding (t-SNE), which is often used to explore the black-box condition of deep learning models [16].

Computation
The AdamOptimizer was used for model training [22], and the TensorFlow software (v. 2.10.0, Google Inc., Mountain View, CA, USA) was used for deep learning computations [23]. A Linux server with a performance of 35.58 TFlops (RTX 3090, NVIDIA, Santa Clara, CA, USA) was used for all computations.

Model Accuracy
Except for linear regression, all models showed adequate accuracy for the given task, (Table 3). Specifically, deep learning models yielded R 2 values higher than 0.7, but linear regression had relatively lower accuracy for the prediction of all growth-related parameters, except for the leaf area. Furthermore, ConvNets showed the highest accuracy among trained models, including linear regression, and Multitask learning did not remarkably improve the accuracy of ConvNet architecture. Overall, the models exhibited subtle differences. Neural network architecture proved that all functions can be theoretically approximated using only two hidden layers [24]. Computing simple-regression tasks are not difficult for shallow deep learning models, although the inputs were images, which are complicated data for conventional models.
Conventional models, such as linear regression, can also be optimized to improve accuracy. In our study, five target growth parameters were regressed simultaneously in the linear regression procedure for model comparison, but a reduced output achieved a higher value for R 2 [25]. Because the linear regression procedure is not complicated, five individual regressions were easily possible. Therefore, various types of regression, such as Bayesian regression and support vector machines, can be used to achieve the highest accuracy [26,27].
Our trained models were not optimized for the deep learning algorithms. In general, sophisticated parameter optimization is required to train deep learning models for big data [5]. However, in this study it was difficult to determine the prediction limitation of each model because of the limited number of data. Thus, hyperparameters and structures should be determined for the better optimization of the model. That is, deep learning models can also show higher accuracy based on the empirical skills of users.
Therefore, the exact performance of the deep learning models could not be determined based on the prediction accuracy using approximately 400 images. Because complex deep learning algorithms require a large number of data for model robustness, an adequate methodology should be adopted for the given task, considering the available data and computing power. The models could easily be improved with some technical parameter fitting and model optimization. For example, the low learning rate and high training time could make the model converge more sophisticatedly; sensitive activation functions such as hyperbolic tangent and sigmoid with a shallow neural network structure could also be helpful for the regression tasks; thorough exploration of the model parameters can achieve the highest performance for the given data; and parameter optimization methodology, such as Bayesian optimization, may reduce the time-cost of the exploration for the hyperparameters.
However, this does not mean that ConvNet might not recognize the target properly. The flipping, shifting, and rotating of the original images would enhance the relation between plants in the image with the target growth factors. That is, the ConvNet models can effectively determine what plants are in the image with transformation-based training.
The two ConvNet models achieved the same average accuracy, although they differed slightly in terms of accuracy. The leaf area was the easiest target for the trained models. The no-target prediction was particularly less accurate for the deep learning models, which showed similar overall accuracy. Most models tended to underestimate the five targeted variables (Figure 4). In particular, the models underestimated the number of leaves. models can also show higher accuracy based on the empirical skills of users. Therefore, the exact performance of the deep learning models could not be determined based on the prediction accuracy using approximately 400 images. Because complex deep learning algorithms require a large number of data for model robustness, an adequate methodology should be adopted for the given task, considering the available data and computing power. The models could easily be improved with some technical parameter fitting and model optimization. For example, the low learning rate and high training time could make the model converge more sophisticatedly; sensitive activation functions such as hyperbolic tangent and sigmoid with a shallow neural network structure could also be helpful for the regression tasks; thorough exploration of the model parameters can achieve the highest performance for the given data; and parameter optimization methodology, such as Bayesian optimization, may reduce the time-cost of the exploration for the hyperparameters.
However, this does not mean that ConvNet might not recognize the target properly. The flipping, shifting, and rotating of the original images would enhance the relation between plants in the image with the target growth factors. That is, the ConvNet models can effectively determine what plants are in the image with transformation-based training.
The two ConvNet models achieved the same average accuracy, although they differed slightly in terms of accuracy. The leaf area was the easiest target for the trained models. The no-target prediction was particularly less accurate for the deep learning models, which showed similar overall accuracy. Most models tended to underestimate the five targeted variables (Figure 4). In particular, the models underestimated the number of leaves. As all of the models generally underestimated the target variables towards the latter part of the cultivation period, it can be concluded that the top-view images do not contain enough information about growth during the second half of the cultivation period. Since this phenomenon has been previously reported, it may be due to a limitation of the top-view itself [28][29][30]. Therefore, images capturing multiple perspectives at the same time should be considered, even for rosette leafy vegetables. In this study, well-trained models such as ConvNet showed adequate robustness, but it would be helpful to input other images or environments and growth data for the further estimation of precision.

t-SNE Analysis
Output values from the terminal hidden layers of the trained ConvNet model using multitask learning were extracted and tested using t-SNE. In particular, the best-trained ConvNet distinctively recognized the distribution of Corbana ( Figure 5). part of the cultivation period, it can be concluded that the top-view images do not contain enough information about growth during the second half of the cultivation period. Since this phenomenon has been previously reported, it may be due to a limitation of the topview itself [28][29][30]. Therefore, images capturing multiple perspectives at the same time should be considered, even for rosette leafy vegetables. In this study, well-trained models such as ConvNet showed adequate robustness, but it would be helpful to input other images or environments and growth data for the further estimation of precision.

t-SNE Analysis
Output values from the terminal hidden layers of the trained ConvNet model using multitask learning were extracted and tested using t-SNE. In particular, the best-trained ConvNet distinctively recognized the distribution of Corbana ( Figure 5). According to the color mapping and distribution of the t-SNE results, ConvNet effectively differentiated between the three cultivars tested, which were distinguished based on their appearance according to the number of leaves and SPAD. In practice, Corbana showed distinctive growth compared with the others [31]. Analysis based on t-SNE could be expanded to a more complicated interpretation according to the data and task. This study revealed that it is possible to extend the target domain knowledge with complex models and sufficient data, just as ConvNet with multitask learning here. Therefore, deep learning algorithms should be investigated from the perspective of feature extraction. According to the color mapping and distribution of the t-SNE results, ConvNet effectively differentiated between the three cultivars tested, which were distinguished based on their appearance according to the number of leaves and SPAD. In practice, Corbana showed distinctive growth compared with the others [31]. Analysis based on t-SNE could be expanded to a more complicated interpretation according to the data and task. This study revealed that it is possible to extend the target domain knowledge with complex models and sufficient data, just as ConvNet with multitask learning here. Therefore, deep learning algorithms should be investigated from the perspective of feature extraction.

Conclusions
In this study, the plant growth characteristics of three lettuce cultivars grown in a PFAL, including fresh weight, dry weight, number of leaves, leaf area, and SPAD, were estimated using deep neural networks based on top-view images. Trained ConvNet models showed the highest accuracy; moreover, the ConvNet with multitask learning was tested using t-SNE. The 2D distribution from the t-SNE showed that the trained ConvNet model recognized the differences between the cultivars based on the raw images. Because in this study, the number of input images was not very large, model accuracy can be improved in the future, with intensive optimization of model parameters. Therefore, evaluating the performance of deep learning models based on accuracy alone is not recommended. Our t-SNE results showed the potential to analyze the automated feature extraction potential of deep learning algorithms, a strategy that can help the scientific discovery of target domain knowledge in horticultural science.