Using a Hybrid Neural Network Model DCNN–LSTM for Image-Based Nitrogen Nutrition Diagnosis in Muskmelon

: In precision agriculture, the nitrogen level is signiﬁcantly important for establishing phenotype, quality and yield of crops. It cannot be achieved in the future without appropriate nitrogen fertilizer application. Moreover, a convenient and real-time advance technology for nitrogen nutrition diagnosis of crops is a prerequisite for an efﬁcient and reasonable nitrogen-fertilizer management system. With the development of research on plant phenotype and artiﬁcial intelligence technology in agriculture, deep learning has demonstrated a great potential in agriculture for recognizing nonde-structive nitrogen nutrition diagnosis in plants by automation and high throughput at a low cost. To build a nitrogen nutrient-diagnosis model, muskmelons were cultivated under different nitrogen levels in a greenhouse. The digital images of canopy leaves and the environmental factors (light and temperature) during the growth period of muskmelons were tracked and analyzed. The nitrogen concentrations of the plants were measured, we successfully constructed and trained machine-learning-and deep-learning models based on the traditional backpropagation neural network (BPNN), the emerging convolution neural network (CNN), the deep convolution neural network (DCNN) and the long short-term memory (LSTM) for the nitrogen nutrition diagnosis of muskmelon. The adjusted determination coefﬁcient (R 2 ) and mean square error (MSE) between the predicted values and measured values of nitrogen concentration were adopted to evaluate the models’ accuracy. The values were R 2 = 0.567 and MSE = 0.429 for BPNN model; R 2 = 0.376 and MSE = 0.628 for CNN model; R 2 = 0.686 and MSE = 0.355 for deep convolution neural network (DCNN) model; and R 2 = 0.904 and MSE = 0.123 for the hybrid model DCNN–LSTM. Therefore, DCNN–LSTM shows the highest accuracy in predicting the nitrogen content of muskmelon. Our ﬁndings highlight a base for achieving a convenient, precise


Introduction
The netted muskmelon (Cucumis melo L. var. etiquettes Naud.) is a delicious and nutritious fruit. It is widespread and grown worldwide. Nitrogen is one of the critical environmental factors that affects the growth process of muskmelon. Both the external phenotype and internal activity are significantly affected by nitrogen [1][2][3]. The appropriate nitrogen levels are helpful for the accumulation of nitrogen and fruit biomass production in crops [4,5]. However, farmers often overuse nitrogen in muskmelon, and this reduces the quality and yield of muskmelon fruit. At the same time, the overuse of nitrogen causes serious environmental problems, such as contamination of water resources, nitrogen leaching losses and emission of greenhouse gases [6,7]. Therefore, an efficient and real-time nitrogen nutrition diagnosis technology is necessary for achieving the goal of rational nitrogen application in crops.
Traditionally, crop nitrogen nutrition status is artificially judged by plant phenotypical traits or determined with the chemical analysis method. However, the plant-image-based the most important extraction features in deep learning for large and/or complex datasets for human being. For that reason, we noticed a significant progress and achievement in accurate determination of different models in different fields when compared with other machine learning and/or deep learning methods [39]. Lin et al. [40] used the CNN semantic segmentation method for the identification of powdery mildew in cucumber. The model identified the powdery mildew by intersection over union (72.11%), dice accuracy (83.45%) and average pixel accuracy (96.08%) in the CNN model. CNN is mostly used for visual plant disease related problems in plants, but there are rare works in the literature on the comparison of CNN and DCNN in plant nutrition.
Deep CNN is mostly used for spatial data; deep recurrent neural network is built for sequential data modeling, such as time series [41,42]. It used widely in speech recognition, machine translation, emotion analysis and picture description. The input volume can be put as a series of text, speech, time, etc., that depends on previous elements. The time steps at the same state (input, output and hidden conditions) share one weight matrix, which greatly reduces the number of parameters to be learned in the model. DCNN model used to identify the nutrients deficiencies (nitrogen (N), potassium (K) and calcium (Ca)) in tomato leaves and fruiting phase accurately [43]. DCNN with the AlexNet model showed the highest accuracy rate of a model (92.1%) on the five different types of vegetables' images-based dataset (mushrooms, pumpkin, broccoli, cucumber and cauliflower) [44] as compared to backpropagation neural network (BPNN) (78%) and support vector machines (SVM) classifier (80.5%). The DCNN model was used in cucumber to identify the cucumber diseases with the average pixel accuracy (93.4%) [24].
However, standard deep RNN may not be quite suitable for long-range order or memories in time series modeling. In such cases, long short-term memory (LSTM) has been reported as an effective and popular method in time series. Long short-term memory (LSTM) is a variant of RNN that can learn long-term dependence and is the most widely used. Compared with the traditional RNN, LSTM introduces controllable self-cycling, which is more suitable for processing and predicting important events with relatively long intervals or delays in time series. The network solves problems such as gradient disappearance and gradient explosion caused by time backpropagation during training [45]. Jiang et al. [46] used LSTM to predict the corn yield by using soil and weather data and described promising results. Wheat forecast production is accurately predicted by using LSTM model [47]. Plant growth variation and forecast yield production in tomato and Ficus benjamina stem growth by LSTM showed promising results in the controlled environments [48]. LSTM showed a great ability to disclose phenological properties, while DCNN has a great ability to extract more spatial features [49]. However, little attention has been directed to use the DCNN and LSTM for nitrogen nutrition in muskmelon.
This study aimed to predict the nitrogen content of greenhouse netted muskmelon accurately in real-time and guide the decision-making of nitrogen application of muskmelon in the greenhouse by using machine-learning or deep-learning approaches. Based on leaf images and measured nitrogen values, nitrogen nutrition diagnosis models were established and optimized by using machine learning or deep learning approaches. Nitrogen nutrition diagnosis by different deep learning models, step by step, is shown in Figure 1.

Materials and Methods
A thick-skinned netted muskmelon variety, Wanglu, was used as the material in this study. This experiment was carried out in a Venlo glass greenhouse (31°11′ N, 121°36′ E), C-2 block at Shanghai Jiao Tong University, from March 2018 to June 2018.
The seedlings of muskmelon Wanglu variety were grown in the seedling tray and then transplanted into pots containing a substrate "vermiculite and peat moss, 1:1 v/v" at the three-leaves stage. Each pot contained two plants ( Figure 2). The pH of growing substrate of muskmelon was 6.77 and contained the following nutrients: available nitrogen at 332 mg/kg, available phosphorus at 124 mg/kg and available potassium at 118 mg/kg. Two weeks after transplanting, the muskmelon plants were treated with four different nitrogen applications with three replications. Different treatments of nitrogen were as follows: T1, 2.7 g nitrogen/pot; T2, 5.4 g nitrogen/pot; T3, 8.1 g nitrogen/pot; and T4, 10.8 g nitrogen/pot. The amount of phosphorus and potassium added per pot was 5.2 and 9.0 g, respectively, in all the pots. The sources of N, P and K fertilizers were calcium nitrate, potassium nitrate, magnesium nitrate and potassium dihydrogen phosphate. The total N fertilizer was applied at six growing stages of muskmelon, namely pre-planting (10%), seedling stage (5%), vine elongation stage (10%), initial fruit stage (35%), fruit expanding stage (35%) and mature stage (5%). All other nitrogen fertilizer applications were applied with drip irrigation, for except pre-planting (10%).

Materials and Methods
A thick-skinned netted muskmelon variety, Wanglu, was used as the material in this study. This experiment was carried out in a Venlo glass greenhouse (31 • 11 N, 121 • 36 E), C-2 block at Shanghai Jiao Tong University, from March 2018 to June 2018.
The seedlings of muskmelon Wanglu variety were grown in the seedling tray and then transplanted into pots containing a substrate "vermiculite and peat moss, 1:1 v/v" at the three-leaves stage. Each pot contained two plants ( Figure 2). The pH of growing substrate of muskmelon was 6.77 and contained the following nutrients: available nitrogen at 332 mg/kg, available phosphorus at 124 mg/kg and available potassium at 118 mg/kg. Two weeks after transplanting, the muskmelon plants were treated with four different nitrogen applications with three replications. Different treatments of nitrogen were as follows: T 1 , 2.7 g nitrogen/pot; T 2 , 5.4 g nitrogen/pot; T 3 , 8.1 g nitrogen/pot; and T 4 , 10.8 g nitrogen/pot. The amount of phosphorus and potassium added per pot was 5.2 and 9.0 g, respectively, in all the pots. The sources of N, P and K fertilizers were calcium nitrate, potassium nitrate, magnesium nitrate and potassium dihydrogen phosphate. The total N fertilizer was applied at six growing stages of muskmelon, namely pre-planting (10%), seedling stage (5%), vine elongation stage (10%), initial fruit stage (35%), fruit expanding stage (35%) and mature stage (5%). All other nitrogen fertilizer applications were applied with drip irrigation, for except pre-planting (10%).
In this experiment, three fruiting vines kept in the beginning at the 10th-16th fruiting nodes, and later only one elegant-shaped big fruit and single main vine were kept, and we removed all redundant side vines at the 20-22 leaves stage. Hand-pollination was performed in time to ensure the fruit set.
The flowchart of this study is described in Figure 3. Four different treatments of nitrogen were applied to muskmelon and after harvesting, determined the nitrogen concentration in muskmelon. For this purpose, we collected the digital images from four fully expanded apical leaves for the whole growth period of muskmelon and measured the nitrogen concentration of muskmelon. Based on leaf images and measured nitrogen values, nitrogen nutrition diagnosis models were established and optimized by using machine learning or deep learning approaches.
In the machine learning-based nitrogen nutrition diagnosis models, plantCV, an opensource software for image analysis, was utilized to extract phenotypical features from the plants' images. Furthermore, ANOVA and principal component analysis (PCA) were performed to analyze feature extraction parameters and reduced the dimensions. After that, three principal components were chosen. Dataset 1 was obtained by combining three principal components and nitrogen concentration data, and then it was randomly divided into a training subset (80%) and test subset (20%).
A backpropagation neural network (BPNN) model was built and trained by dataset 1. BPNN is not a two-layer network but with only 1 hidden layer. In BPNN, we used Empirical Formula (1) to calculate the hidden neurons. After testing, we set the number of hidden neurons at 12. In the BPNN model, we set 50 epochs, 100 epochs, 200 epochs, etc. In addition, we found a decline in accuracy after training 100 epochs. Therefore, we stopped training.
We also established a nitrogen nutrition diagnosis model based on deep learning approaches ( Figure 3). First, the original dataset of images was processed through data augmentation, normalization, annotation, stitching, etc. Secondly, Dataset 2 was created by combining the processed dataset of images with nitrogen concentrations data, and then it was randomly divided into a training subset (80%), validation subset (10%) and test subset (10%).
Thirdly, convolutional neural network (CNN) and deep convolutional neural network (DCNN) models were built and trained by Dataset 2. Dataset 3 was covered by combing the processed dataset of images with nitrogen-concentration data and the dataset of meteorological factors. It was also randomly divided into a training subset (80%), validation subset (10%) and test subset (10%). A hybrid model based on DCNN-LSTM was finally built and evaluated by Dataset 3 as input data.
At last, the precision accuracy of all the models was compared to choose the best one among them. In this experiment, three fruiting vines kept in the beginning at the 10th-16th fruiting nodes, and later only one elegant-shaped big fruit and single main vine were kept, and we removed all redundant side vines at the 20-22 leaves stage. Hand-pollination was performed in time to ensure the fruit set.
The flowchart of this study is described in Figure 3. Four different treatments of nitrogen were applied to muskmelon and after harvesting, determined the nitrogen concentration in muskmelon. For this purpose, we collected the digital images from four fully

Measurement of Nitrogen Concentration in Plants
For nitrogen measurement, plant samples were collected total of thirteen times throughout the experimental period at different growth stages. The first sampling was after the 5th day of nitrogen application at the seedling stage (5%) in pots. Each time, one plant from each pot was collected, and total 156 plant samples were collected from seedling stage to fruit maturity stage, with the interval of one week. After the removal of the abovementioned ground parts, digital images of leaves were collected. Only the four fully expanded leaves at the apical part of the plants were used for digital images analysis. Nitrogen concentration in plants was measured by mixing of all plants leaves. The plant leaves were used for nitrogen measurement by initially subjected to 30-min enzyme deactivation treatment at 105 • C, followed by drying at 80 • C to a constant weight, and finally ground into pieces of 100 mesh sieves. Nitrogen concentration was ultimately determined by elemental analysis isotope mass spectrometer Vario EL III/Isoprime element analyzer (Hanau, Germany) [50].

Leaf Image Acquisition
Images of upper leaf surfaces were taken by a single-lens reflex (SLR) camera (Canon EOS 5D Mark II, Japan) in a closed box of 60 cm × 60 cm × 60 cm. The camera settings were adjusted at M mode, exposure compensation set to zero, 1/320 shutter speed, 60 mm focal length and ISO 200. In the photo box, the light was evenly illuminated, and there was a fixed panel of light-emitting diode (LED) on the top two-sides of the photo box. Controlled LED power light was used with 60 W maximum value. Astral lamp panels (38 cm × 38 cm) were fixed to hold the leaves. The box opened at the top.
Finally, 624 digital images were taken from 156 plants' samples. One image was taken from each of the four canopy leaves of the plant.

Collecting Meteorological Data of Greenhouse
After transplanting seedlings to the pots, environmental factors of the greenhouse, such as temperature and photosynthetically active radiation, were monitored by two portable automatic weather stations (HOBO-U30, Onset, Bourne, Ma, USA) in every 5 min.

Establishment of Machine Learning (ML) Model Extraction of Phenotypical Features
Phenotypic extraction parameters were used as conversion from visual characteristics of images into mathematical forms that could be recognized, processed and analyzed by a computer. In this study, the image-analysis software (PlantCV 3.2.0) was used for high-throughput plant phenotyping. PlantCV 3.2.0 is a modular open-source framework, which is written in Python [51].
Two steps were included in the visual digital images processing pipeline of PlantCV, which were used as segmented object (detection or isolation) and analysis (analysis of segmented objects). Taking a muskmelon plant as an example, we show the procedures of image processing in Figure 4. The procedures were as follows: (1) recognized the digital images; (2) converted color space from red green blue (RGB) to hue saturation value (HSV) and extracted saturation channel to get saturation threshold level; (3) removed the image noise with median filtering algorithm; (4) converted color space RGB to LAB and extracted blue channel to get blue threshold level image; (5) segmented the original image into the targeted region and object of interest based on the thresholds of saturation and blue-yellow images; (6) analyzed morphological features; (7) extracted color indexes based on color histogram and pseudo-colored image; (8) extracted the netting indexes based on gray-level co-occurrence matrix; and (9) extracted phenotypic parameters as output. Furthermore, a color histogram and pseudo-colored image of the fully expanded leaf are presented in Figure 5.  1st-4th fully expanded leaf; 1, to convert the image from RGB to HSV and extract the saturation channel; 2, to threshold the saturation image; 3, "median_blur"; 4, to convert RGB to LAB and extract the blue channel; 5, to threshold the blue image; 6, to join the thresholded saturation and blue-yellow images; 7; to convert RGB to LAB and extract the green-magenta channels; 8, blue-yellow channels; 9, to threshold the greenmagenta images; 10, to threshold the blue images; 11, to join the thresholded saturation; 12, to join the blue-yellow images; 13, to decide which objects to be kept; 14, to apply mask; 15, to find shape properties and output shape image; and 16, to shape properties relative to user boundary line. 1st-4th fully expanded leaf; 1, to convert the image from RGB to HSV and extract the saturation channel; 2, to threshold the saturation image; 3, "median_blur"; 4, to convert RGB to LAB and extract the blue channel; 5, to threshold the blue image; 6, to join the thresholded saturation and blue-yellow images; 7; to convert RGB to LAB and extract the green-magenta channels; 8, blue-yellow channels; 9, to threshold the green-magenta images; 10, to threshold the blue images; 11, to join the thresholded saturation; 12, to join the blue-yellow images; 13, to decide which objects to be kept; 14, to apply mask; 15, to find shape properties and output shape image; and 16, to shape properties relative to user boundary line. Thirty-one phenotypical parameters were extracted from each image and numbered 1 to 31 (Table 1), including 9 color parameters involved in 3 color spaces (RGB, LAB, and HSV), 16 morphological parameters based on contour tracking method, 6 netting characteristics parameters based on grey level co-occurrence matrix. All parameters contained means of four canopy leaves of plants.  Thirty-one phenotypical parameters were extracted from each image and numbered 1 to 31 (Table 1), including 9 color parameters involved in 3 color spaces (RGB, LAB, and HSV), 16 morphological parameters based on contour tracking method, 6 netting characteristics parameters based on grey level co-occurrence matrix. All parameters contained means of four canopy leaves of plants. One-way ANOVA: One-way ANOVA was performed to analyze the relationship between the extracted feature parameters and plant nitrogen concentration ( Figure 6). Results highlighted three color indexes (1 blue, 7 hues and 8 saturation) and eight morphological feature indexes (13 perimeters, 17 center-of-masses-x, 18 center-of-masses-y, 19 hull-vertices, 20 ellipse-center-x, 21 ellipse-center-y, 24 ellipse-angle and 25 ellipse-eccentricity) were not associated with plant nitrogen concentration (p > 0.01). In contrast, the other 20 feature parameters were significantly correlated with the nitrogen concentration (p < 0.01). Thus, the 20 feature parameters were chosen for the construction of nitrogen nutrition diagnosis models.

Phenotypical Feature Parameters Screening
One-way ANOVA: One-way ANOVA was performed to analyze the relationship between the extracted feature parameters and plant nitrogen concentration ( Figure 6). Results highlighted three color indexes (1 blue, 7 hues and 8 saturation) and eight morphological feature indexes (13 perimeters, 17 center-of-masses-x, 18 center-of-masses-y, 19 hull-vertices, 20 ellipse-center-x, 21 ellipse-center-y, 24 ellipse-angle and 25 ellipse-eccentricity) were not associated with plant nitrogen concentration (p > 0.01). In contrast, the other 20 feature parameters were significantly correlated with the nitrogen concentration (p < 0.01). Thus, the 20 feature parameters were chosen for the construction of nitrogen nutrition diagnosis models.
. Principal component analysis (PCA): PCA was further performed to reduce the above 20 screened feature parameters' dimensions (Table 2). First, these 20 feature parameters were used to determine the sampling adequacy of data for analysis by Kaiser-Meyer-Olkin (KMO) test [53] and Bartlett spherical test [54], using SPSS Statistics version 22.0. The results indicate that the images' data obtained adequate results (KMO = 0.797), and both the correlations and partial correlations between these parameters were significant (p = 0). Then, PCA was performed to select the principal components whose eigenvalues were more than 1, and only three principal components showed eigenvalues more than 1 (Figure 7). The scatter plots showed projections of the top three PCs based on the PCA of images-based dataset. The component scores (shown in points) were presented as different colors with the same shape, according to the phenotypical features. The component loading vectors (represented in lines) of all features were superimposed proportionally to their contributions. The contribution rates of PC1, PC2 and PC3 are 51.277%, Principal component analysis (PCA): PCA was further performed to reduce the above 20 screened feature parameters' dimensions (Table 2). First, these 20 feature parameters were used to determine the sampling adequacy of data for analysis by Kaiser-Meyer-Olkin (KMO) test [53] and Bartlett spherical test [54], using SPSS Statistics version 22.0. The results indicate that the images' data obtained adequate results (KMO = 0.797), and both the correlations and partial correlations between these parameters were significant (p = 0). Then, PCA was performed to select the principal components whose eigenvalues were more than 1, and only three principal components showed eigenvalues more than 1 (Figure 7). The scatter plots showed projections of the top three PCs based on the PCA of images-based dataset. The component scores (shown in points) were presented as different colors with the same shape, according to the phenotypical features. The component loading vectors (represented in lines) of all features were superimposed proportionally to their contributions. The contribution rates of PC1, PC2 and PC3 are 51.277%, 27.290% and 11.158%, respectively; namely, a total contribution rate of 89.725% was reached. The three principal components could be used as an input variable for nutrient diagnosis models, indicating the input data dimension as a reduction from 20 to 3.

Establishment of Backpropagation Neural Network (BPNN)
Backpropagation neural network (BPNN) was trained as a two-layer forward neural network by using a backpropagation algorithm [55]. It is one of the most widely used and most mature machine learning model. The architecture consisted of three parts: input layer, hidden layer and output layer. Three principal components were used as input that were considered as the input layer, and the nitrogen concen-

Establishment of Backpropagation Neural Network (BPNN)
Backpropagation neural network (BPNN) was trained as a two-layer forward neural network by using a backpropagation algorithm [55]. It is one of the most widely used and most mature machine learning model. The architecture consisted of three parts: input layer, hidden layer and output layer. Three principal components were used as input that were considered as the input layer, and the nitrogen concentration of corresponding plants was obtained as input which was considered as output in the model. Thus, the input node was set to 3, the output node was set to 1 and the hidden neuron node was calculated according to Empirical Formula (1): where n, l and m represent input layer, hidden layer and output layer, respectively; a is a constant with range within 0-10. A random 80% of the total dataset (124 plants) was used as the training dataset, and the other 20% (32 plants) was used as the test dataset. In a MATLAB R2016a based on BPNN, used after a series of tests to debug the parameters of BPNN, we normalized input data by mapminmax () function, selected logsid () function as the activation function and adopted the variable learning rate to the learning algorithm in the model. The maximum learning rate was 0.2, the minimum learning rate was 0.02 and the momentum learning rate was 0.02 (Codes in Supplementary Materials S1).

Establishment of Deep Learning Models
In deep learning, the models were established by using Python 3.6.5 programming language and Keras 2.1.2. Keras 2.1.2 [56] is a high-level neural network API that is written in Python and capable of running on the top of TensorFlow 1.6.0 [57].

Image Preprocessing
First, each leaf image was annotated to correspond the leaf nitrogen concentration. Then the 624 original leaf images were amplified through rotating the original images at 5 random angles, and finally 3744 images in total were obtained for analysis. After splicing the images of four leaves of a plant together, then 936 new images were used as an input dataset of the neural network, while the image resolution changed from 128 × 128 to 256 × 256. The measured nitrogen concentration was still put as the output dataset. The input and output datasets were randomly divided into a training subset (80%), crossvalidation subset (10%) and test subset (10%).

Data Preprocessing of Environmental Factors
The growth rate of plants was mainly determined by the relative thermal effectiveness (RTE) of temperature and photosynthetic active radiation (PAR). The growth and development of netted melon in greenhouse is a dynamic process that changes with time. If the planting days are used to predict the growth and development of the crop at a certain time node, and the influence of the temperature and light as environmental factor due to specific plant location cannot ignore for plant growth and development. In addition, meteorological and environmental data also affect the phenotype (such as color) of plant leaves. As the accumulated total production of thermal effectiveness and PAR gradually increased in the cultivation condition in this study. For the RGB color space, the blue value changed smoothly, and the red and green values both increased first, then decreased and then again increased, represented as green > red > blue ( Figure 8A); meanwhile, for the LAB color space, the green-magenta value hardly changed, but the blue-yellow and lightness values both showed a slight increase, then decreased and then again increased, but the blue-yellow value changed a little, and the lightness value fluctuated widely, represented as blue-yellow > green-magenta > lightness ( Figure 8B). For the HSV color space, the hue value showed almost no change; the saturation and value values both showed a trend of first increasing, then decreasing and then again increasing, represented as saturation > value > hue ( Figure 8C). Horticulturae 2021, 7, x FOR PEER REVIEW 15 of 28  Therefore, this study used the light-temperature index radiant thermal product TEP instead of planting days as a time series variable and combined them with images data to predict the growth and development stage of greenhouse netted muskmelon and plants' nitrogen concentrations. We measured the cumulative radiant heat product of the plant at each sampling time [58] and annotated it into the corresponding canopy leaf images as the environmental input variable of the neural network.

Establishment of CNN Model
In deep learning, a convolution neural network (CNN) is a class of feed forward neural networks and most commonly applied to analyze visual images. CNN employs convolution operation in place of general matrix multiplication operation at least one of their layers. CNN generally consists of an input layer, output layer, and multiple hidden layers, such as convolutional layers, pooling layers, and fully connected layers. The pooling layers are connected with all the neurons of the convolutional layers [59].
We set two convolutional layers, two pooling layers and two fully connected layers in the CNN model by using LeNet as the backbone. In the convolution layers, kernel size was 5 and padding was set as "same". In pooling layers, pool size and strides were both (2, 2), rectified linear units (ReLU) were set as the activation function, Adam () was set as the optimizer and batch size was kept as 12 (Codes in Supplementary Materials S2). The input volumes and output volumes of every layer are presented in Figure 9. The R 2 and MSE methods were used for model evaluation.

Establishment of DCNN Model
Based on CNN architecture, three convolutional layers, three pooling layers and three fully connected layers were supplied to build up a deep-learning convolution neural network (DCNN) model. Different filters in the convolutional layer had placed for other parameters. After a series of convolution, pooling and activation operations in the network, the features of input images were detected and learned. The feature maps of essential areas of the image out of each of the middle layers are presented in Figure 10. With the increased of network depth, the extracted features became more filtered and gave more precise extracted feature parameters. The high activation layer carried more targeted information: the valuable information was enlarged and refined, while the irrelevant information was filtered out (Codes in Supplementary Materials S3). However, in CNN, the too-deep neural network presented results with less prediction accuracy, excessive calculation time and over-fitting, while, in DCNN, the results presented more predication accuracy in less calculation time, without over-fitting. The parameters were set in DCNN as the same in the CNN model.

Establishment of DCNN-LSTM Model
A recurrent neural network (RNN) is a class of neural networks that utilizes input data in series. The architectures (Figure 11) are considered as flexible and used in speech recognition, machine translation, emotion analysis and picture description. The input volume can be used as a series of text, speech, time, etc., that depends on previous elements. The time steps at the same state (input, output and hidden conditions) share one weight matrix, which greatly reduces the number of parameters to be learned in the model. Horticulturae 2021, 7, x FOR PEER REVIEW 17 of 28

Establishment of DCNN Model
Based on CNN architecture, three convolutional layers, three pooling layers and three fully connected layers were supplied to build up a deep-learning convolution neural more precise extracted feature parameters. The high activation layer carried more targeted information: the valuable information was enlarged and refined, while the irrelevant information was filtered out (Codes in Supplementary Materials S3). However, in CNN, the too-deep neural network presented results with less prediction accuracy, excessive calculation time and over-fitting, while, in DCNN, the results presented more predication accuracy in less calculation time, without over-fitting. The parameters were set in DCNN as the same in the CNN model.

Establishment of DCNN-LSTM Model
A recurrent neural network (RNN) is a class of neural networks that utilizes input data in series. The architectures ( Figure 11) are considered as flexible and used in speech recognition, machine translation, emotion analysis and picture description. The input volume can be used as a series of text, speech, time, etc., that depends on previous elements. The time steps at the same state (input, output and hidden conditions) share one weight matrix, which greatly reduces the number of parameters to be learned in the model. Long short-term memory (LSTM) is a variant of RNN that can learn long-term dependence. It is the most widely used type of RNN. Compared with the traditional RNN, LSTM introduces controllable self-cycling, which is more suitable for processing and predicting important events with relatively long intervals or delays in time series. The network solves problems, such as gradient disappearance and gradient explosion caused by time backpropagation during training [60]. The LSTM schematic view is shown in Figure  12. x (t) and y (t) are put as an input and output step respectively. The right-side graph shows the RNN expansion architecture. Respectively, u, v and w are the weight matrices corresponding to the input, output and hidden states. The same time states share one weight matrix, which greatly reduces the number of parameters in the model to be learned.
Long short-term memory (LSTM) is a variant of RNN that can learn long-term dependence. It is the most widely used type of RNN. Compared with the traditional RNN, LSTM introduces controllable self-cycling, which is more suitable for processing and predicting important events with relatively long intervals or delays in time series. The network solves problems, such as gradient disappearance and gradient explosion caused by time backpropagation during training [60]. The LSTM schematic view is shown in Figure 12.
Long short-term memory (LSTM) is a variant of RNN that can learn long-term dependence. It is the most widely used type of RNN. Compared with the traditional RNN, LSTM introduces controllable self-cycling, which is more suitable for processing and predicting important events with relatively long intervals or delays in time series. The network solves problems, such as gradient disappearance and gradient explosion caused by time backpropagation during training [60]. The LSTM schematic view is shown in Figure  12. Figure 12. Schematic view of long-short term memory (LSTM) model: c in the left upper part represents the internal memory of the unit; h on the left lower part represents the hidden state; i, f and o mean input gate, forgetting gate and output gate, respectively; the three parameters were calculated by using the same equation with different parameter matrixes, and then defined the data availability of x (t), h (t-1) and the current data used for the next l; and g represents the internal hidden state. Figure 12. Schematic view of long-short term memory (LSTM) model: c in the left upper part represents the internal memory of the unit; h on the left lower part represents the hidden state; i, f and o mean input gate, forgetting gate and output gate, respectively; the three parameters were calculated by using the same equation with different parameter matrixes, and then defined the data availability of x (t), h (t − 1) and the current data used for the next l; and g represents the internal hidden state.
A hybrid neural network model based on DCNN and LSTM was built (Codes in Supplementary Materials S4), and shown in Figure 13. The structure of DCNN showed consistency with increasing the number of hidden layers in the establishment of the CNN model. The LSTM part had put three layers and function as stateful = False. To make hybrid neural network, two fully connected layers of DCNN and LSTM models are put as an output with ReLU and linear functions as the activation function in the model. The leaf-image dataset was put as input from the DCNN part, the TEP dataset was put as input from the LSTM part in the hybrid neural network model and then the nitrogen concentrations of muskmelon plants were predicted. The R 2 and MSE method were used for the model evaluation.

Evaluation of Models
The adjusted determination coefficient (R 2 ) and mean square error (MSE) between predicted and measured values were used for model evaluation. In general, the higher R 2 value and the lower MSE are considered as more accurate and the best model.
The calculation formulas were put as follows: where y i represents the measured value,ŷ i represents the predicted value, y is the mean of measured values and n represents the sample number. Horticulturae 2021, 7, x FOR PEER REVIEW 21 of 28   Figure 14B. For all models, the prediction accuracy improved to some extent with the increase of iterative training, but after reaching a certain number of trainings, the accuracy did not show a significant increase, or even decrease. At the same time, the more iterative training times required more time for model calculation. For the deep learning model, the model loss was very high in the initial iterative training. With the increase of the number of training iterations, the training sets in model loss showed a sharp drop in the beginning, but later did not show decline in falling and tended to the flat. The model loss of the test set also gradually decreased, but later tended to the flat and showed similar trends of the training set of model loss. In general, the test set was shown to have slightly higher model loss than the training set.

Discussion
In the present study, we collected the digital images of canopy leaves and meteorological data during the whole growth period of muskmelon in the greenhouse. Using these data, plant nitrogen nutrition diagnosis models were built based on machine learn-

Discussion
In the present study, we collected the digital images of canopy leaves and meteorological data during the whole growth period of muskmelon in the greenhouse. Using these data, plant nitrogen nutrition diagnosis models were built based on machine learning or deep learning. The first model (BPNN (R 2 = 0.567, MSE = 0.429) was constructed by adopting machine vision technology to extract and process the phenotypic features of leaf images. Then a CNN nitrogen nutrition diagnosis model (R 2 = 0.376, MSE = 0.628) was constructed. For the CNN model, the original leaf image was directly put as input into the model and preprocessed. By increasing the depth of CNN, we built DCNN (R 2 = 0.686, MSE = 0.355) for nitrogen nutrition diagnosis. Furthermore, based on DCNN, a hybrid model, namely DCNN-LSTM, was constructed, and R 2 = 0.904 and MSE = 0.123 were used as the evaluation indexes' values. For the model, TEP, instead of time series, was used as a time variable.
Many emerging technologies have been applied to crop nitrogen nutrition diagnosis [8]. Based on the spectral information or digital images, the nitrogen nutrition status of rice [61,62], wheat [17,63] and corn [64] has been predicted. These studies only statistically analyze the relationship between reflectance spectrums, phenotypes, plant growth and physiological characteristics [15,65] through simplifying the deduction process and improving the calculation efficiency and accuracy by using a numerical optimization algorithm, PCA, neural networks, etc. Such an idea is adopted in this study for models' construction. The distribution of nitrogen at different canopy heights did not show uniformity [66], and the correlation between nitrogen concentration, spectral and fluorescence characteristics of extra vertical heights also showed difference [67]. Hu et al. [49] reported that the SPAD values of three apical leaves of melon showed the highest correlation with the nitrogen content in leaves, which showed suitability for the diagnosis of nitrogen nutrition, and indicating that it was feasible to predict the nitrogen content of the whole plant through the canopy leaves. Padilla et al. [35] predicted the nitrogen nutrition index (NNI) of muskmelon by using the canopy reflectance characteristics of plants by optical sensor, the flavonol and chlorophyll contents in the leaves also determined to evaluate the nitrogen status. NNI refers to the ratio of the actual nitrogen concentration in the upper part of the crop to the critical nitrogen concentration under the corresponding biomass. It is one of the basic method to judge the crops' nitrogen profit and loss level [52,68]. To measure the actual nitrogen concentration in the crops are considered as the premise of calculating NNI, but the above research could directly predict the nitrogen concentration in plants.
Compared with traditional machine-learning-based models, deep-learning-based models' virtue is considered as an approaches to avoid the manual handcrafting and the problems of inconsistent criteria in parameters [69]. Machine-learning-based models are considered as a reliable technology for selecting parameters, reducing dimensions and then decreasing the number of neural network nodes. Nevertheless, the input-data information is reduced and further lowers the accuracy of the predicted values. Deep-learning-based models overcome the disadvantage through inputting the original images' information directly in the model. In such a way, adequate original information improved the accuracy of output in the model.
The deep learning approaches are booming in the plant community, and this proves that it has a great potential in agriculture. It has been widely used in species identification [70], pests detection [71] and yield prediction [72] of horticultural crops. CNN is the most commonly used deep-learning-based technology, in which the plant features extracted by using deep neural network are better than the artificial design. This is confirmed and verified briefly by the better performance of the deep learning model. In terms of prediction accuracy, the hybrid model DCNN-LSTM was the best among the four models, followed by DCNN, BPNN and CNN, in our study. DCNN is a machine learning-based model, but it shows higher prediction accuracy than the deep-learning-based model BPNN. This indicates that the machine-learning-based model is not necessarily considered less than deep-learning-based models based on prediction accuracy. If machine learning approaches are combined with proper parameters, trained with adequate data and have less of a loss of information, then high prediction accuracy can be obtained. While deep learning techniques are less costly and have more efficiency to get an output results in less time as compared to machine learning parameters.
Similarly, DCNN-LSTM was the best deep learning-based model followed by DCNN and the CNN was at the bottom. DCNN-LSTM is presented the most reliable and applicable model among the three models and has shown the highest prediction accuracy, combining with real-time leaf images and environmental factors. The model can be improved and used in other fields of agriculture. Previously, Schmidhuber [19] combined CNN and LSTM to predict soybean yield, with a histogram of the whole images as an input dataset. Ghazaryan et al. [73] estimated crop yield by using multi-source satellite image series and deep learning, the CNN-LSTM model presented the highest accuracy results. Namin et al. [30] improved the plant classification by using time series with digital images of various genotypes of Arabidopsis in the CNN-LSTM model on the basis of accuracy. Haryono et al. [74] used CNN-LSTM methods for identification and authentication of the herbal leaves with an accuracy of 94.96%. Baek et al. [75] presented by combine use of CNN and LSTM networks for simulating water quality including total nitrogen, total phosphorous, and total organic carbon. It was concluded that the proposed approach CNN-LSTM could be used accurately in simulating the water level and water quality. Sun et al. [76] used the deep CNN-LSTM model to predict soybean yield on the county level. The results indicated that the prediction performance of the proposed deep CNN-LSTM model showed outstanding performance from the pure CNN or LSTM model in both end-of-season and in-season. Recent experiments in this area suggested that CNN could explore more phenotype features and LSTM showed the ability to reveal phenotypic characteristics. So, deep CNN and LSTM both play an important role in crop nitrogen prediction. The accumulation of environmental data could be used to study the relationship between phenotype changes and nitrogen concentration during crop growth process. In our study, TEP values used as time series, the prediction accuracy of LSTM model is also improved and indicating that TEP values are a good substitute for time series. Thus, our constructed nitrogen nutrition diagnosis models presented timely and accurately providing an excellent way of prediction for nitrogen nutrition management in muskmelon production.
However, this study had some limitations: the image data were not adequate in the experiment. In the future, we can use the method of increasing sample size, melon varieties and cultivation environments to establish a more applicable, reliable and stable model.

Conclusions
In conclusion, this study provides the knowledge for the diagnosis of nitrogen nutrition for greenhouse muskmelon by using machine-learning-based and deep-learning-based models. A hybrid model, DCNN-LSTM, which combines real-time digital images with meteorological factors, shows the highest accuracy (R 2 = 0.686, MSE = 0.355) in the prediction of plant nitrogen concentration in muskmelon production in the greenhouse. These findings indicate the great potential of deep learning technology in crop nutrition diagnosis and provide a technique and reference for real-time, convenient, accurate and nondestructive nitrogen nutrition diagnosis in greenhouse muskmelon production. The study lays the foundation for the intelligent monitoring of nitrogen nutrition in plants.