Predicting Plant Growth from Time-Series Data Using Deep Learning

: Phenotyping involves the quantitative assessment of the anatomical, biochemical, and physiological plant traits. Natural plant growth cycles can be extremely slow, hindering the experimental processes of phenotyping. Deep learning offers a great deal of support for automating and addressing key plant phenotyping research issues. Machine learning-based high-throughput phenotyping is a potential solution to the phenotyping bottleneck, promising to accelerate the experimental cycles within phenomic research. This research presents a study of deep networks’ potential to predict plants’ expected growth, by generating segmentation masks of root and shoot systems into the future. We adapt an existing generative adversarial predictive network into this new domain. The results show an efﬁcient plant leaf and root segmentation network that provides predictive segmentation of what a leaf and root system will look like at a future time, based on time-series data of plant growth. We present benchmark results on two public datasets of Arabidopsis ( A. thaliana ) and Brassica rapa (Komatsuna) plants. The experimental results show strong performance, and the capability of proposed methods to match expert annotation. The proposed method is highly adaptable, trainable (transfer learning/domain adaptation) on different plant species and mutations.


Introduction
Plant phenotyping is defined by Li et al. [1] as the assessment of complex plant traits growth, resistance, architecture, physiology, ecology, and the essential measurement of individual quantitative parameters. Historically, plant traits have been measured manually in phenotyping research. This limits throughput and restricts comprehensive analysis-a notion referred to as the phenotyping bottleneck [2]. Image-based phenotyping has been proposed as a solution to this bottleneck, as it has shown great potential in increasing the scale, throughput, efficiency, and speed of phenomic research. It is now commonly argued that deep learning methods for image segmentation, feature extraction, and data analysis are the key for progress in image-based high-throughput plant phenotyping [3].
Traditional image processing and simulation methods have seen a great deal of use in plant phenotyping, and more recent machine learning-based systems have shown extraordinary accuracy. Computer vision and machine learning methods have performed particularly well in static plant analysis, demonstrating they are able to learn complicated plant growth patterns [4,5]. Recently, several deep learning-based approaches have appeared that are available to measure different plant traits efficiently and power genetic discovery [3,6,7]. Such approaches have exhibited improved performance when compared with traditional image-based phenotyping approaches, and biologists are relying more and more on these outstanding results to capture complex features and structures of plants both above and below the ground. Machine learning methods for traditional classification and segmentation tasks are gaining popularity, particularly Convolution Neural Networks (CNNs). CNNs have been used in variety of plant phenotyping tasks from segmentation and classification [8] to disease detection [9,10], and from plant stress assessment [11] to plant productivity analysis [5].
The promising success of machine learning methods on static image analysis has yet to be full explored time series data. Temporal analysis of plant growth has traditionally been performed through modeling [12]. Several tools and approaches exist for modeling and simulating plant growth [13][14][15]. Most of these tools are based on deterministic factors to map the plant growth and offer 2D/3D deterministic plant simulations [16,17]. Predicting plant growth is essential for greenhouse growers, biologists, and plant scientists to analyze the future growth patterns [18]. It helps to understand how a specific plant species behaves under different environmental and various stresses (biotic and abiotic) factors. Many static analysis systems may be extended into temporal data by simply running them on a per-frame basis. This may be effective, but the slow growth rate of plants will still limit experimental cycles, which for many species may be months or years.
Predicting plant growth into the future holds the potential to speed up experimental plant cycles, predicting phenotypic traits prior to their measurement, allowing experiments finish sooner and more efficiently.
Unconventional tasks such as mapping and predicting plant growth are challenging [19]. Generative Adversarial Networks (GANs) proposed by Goodfellow et al. [20] are one of the emerging machine learning methods that may offer a solution. GAN-based approaches have been used to generate realistic images [21,22], and to forecast future frames [23] in other domains.
Here, we develop a system to capture plant traits from images of shoots and roots, while also predicting future plant traits based on current growth patterns. Such a system has the potential to accurately measure plant systems while also speeding up the experimental cycle by shortening the length of time required to grow and measure plants. We adapt and train a FutureGAN network [24] to learn past plant growth trends and forecast future plant growth patterns based on those trends. We apply our approach to Komatsuna plant leaves, using the leaves' annotated segmentation mask to train the GAN. We then perform an additional study on an Arabidopsis thaliana root dataset. Image noise is a key concern with the prediction of real plant growth into the future, and real images require processing to derive phenotypic information. Thus, we focus the proposed method on predicting segmentation masks that are more aligned with real annotations and may be used to derive standard phenotypic measures common in the field. We quantitatively evaluate whether resultant frames are sufficiently accurate, showing a high degree of consistency and similarity between the predicted leaf frames and the ground truth, with an average "structural similarity index measure" higher than 94.60%. This work is designed as a natural extension of the work presented in RootNav 2.0 [6] into temporal sequences, incorporating a GAN that maps plant growth patterns and forecasts plant growth. The proposed system results in plausible plant development predictions, theoretically allowing experiments like this to conclude early, speeding up the experimental cycle.
The key contributions in this work can be described as follows.
• Novel Dataset and Innovative Preprocessing: Most of the studies in plant phenotyping based remote sensing focuses only on a few available plant datasets. We acquired the Arabidopsis dataset and used the latest machine learning software to annotate it. The overall annotation was automatic, and it produces image and XML based annotations that can be interoperable between different software tools. In this way, the recorded plant data (especially roots) can be used for future analysis and experiments. • Innovative Deep Learning Models: The field of remote sensing constantly innovates through the development and application of innovative machine learning models. GANs are one of the least experimented with machine learning methods for plant phenotyping. The proposed research's key objective is to showcase the strength and diversity of GANs and utilize it to enhance phenotyping productivity. The higher resolution output produced by the progressively growing GAN architecture adopted and improved is also one of the proposed method's key contributions. • Comprehensive Results: The proposed system is designed to incorporate and utilize both spatial and temporal plant data to forecast growth. It offers an accurate and efficient predictive segmentation of plant data (root/leaf). Accurate prediction of future plant growth could substantially reduce the time required to conduct growth experiments, with plants requiring less growing time and new experimental cycles beginning sooner. • Generalization and Reproducibility: The proposed system (designed in PyTorch and Python) is freely available on GitHub. It is a robust machine learning-based system that may be reapplied to any dataset with only minor modifications. We demonstrate this through the application of this system on two very different datasets of plant shoots and roots respectively.

Background
The growing demand for high-yielding crops and biofuel feedstock has been accelerated by the severe impact of an increasing population and changing climate. This presents a global threat to food security and agricultural productivity [25]. According to the Food and Agriculture Organization (FAO), an essential component in meeting this demand is high-throughput phenotyping. Phenotyping involves the quantitative assessment of the anatomical, biochemical, and physiological plant traits [26]. Emerging technologies within plant phenomics are being designed to expedite the experimental procedures necessary to examine and interpret plant functions and their environmental interactions. The recent integration of automated methods, advancements in data acquisition technology and data quality allow researchers to improve the phenotyping capabilities in more resourceconserving ways. In particular, deep learning-based methodologies have demonstrated unparalleled precision and accuracy in phenomic research [27][28][29], continually improving upon state of the art. Deep learning refers to a group of statistical machine learning techniques used to learn feature hierarchies [30]-such methods demonstrate outstanding potential for noninvasive studies, such as image-based phenotyping.
Deep learning-based approaches proved can be easily adapted to changing environments and have improved the robustness of image-based phenotyping [31]. Researchers have demonstrated that deep learning-based methods can efficiently capture complex features of plants both above and below the ground [6,32,33]. This is helping to speed-up experimental cycles, which is essential to optimize plant productivity and, in turn, improve food security and crop resistance to environmental stresses. High-quality above and below ground image data is limited-it is important to use in vitro resources to optimize the technologies before applying them in the field [7]. A number of emerging works have explored the use of both spatial and temporal features of plant growth to produce more useful and informative phenotypic information. Namin et al. [34] used a multi-model CNN with LSTMs to predict plant phenotypes and genotypes. This approach offers accession classification, which is useful in the automation of plant production and care. Sakurai et al. [35] used LSTMs with an encoder-decoder model to forecast the growth of plant leaves. However, this approach lacks wide-ranging applications to multiple datasets. Another machine learning application was presented by Giuffrida et al. [22], in which GANs are used to alleviate the lack of training data (a common biological research problem) in plant phenotyping. The proposed method generates synthetic Arabidopsis plants that can train CNNs to improve detection and segmentation quality. Within plant phenotyping, the most common use of GANs to date has been data augmentation, growing the size of data sets using artificially generated or transformed images. The synthetic images generated by GANs have proved more effective than traditional data augmentation and transformations (e.g., rotation, scale, and translation) methods. GANs diversity and strength is evident through [36,37], where innovative methods were used to generate synthetic plants, unsupervised image translation to improve plant disease recognition [38] and domain adaptation through transferring the knowledge learned from annotated plants either to other species or modalities of plants [39].
We are inspired by previous successful applications of GANs to use a multi-model architecture to generate synthetic future frames for plants through learning spatial and temporal feature maps. Such a system has the potential to benefit plant biologists by providing key phenotypic information as fast as possible. Faster experimentation will only become more necessary in the future, as high-throughput phenotyping systems are designed to keep pace with modern genotyping technologies.

Dataset and Preprocessing
Experiments were conducted on two datasets of varying complexity-the Brassica rapa plant leaf dataset [40] and the Arabidopsis thaliana root dataset [41]. The dataset split into 80% and 20% training and test sets, respectively.

Brassica rapa Var. Perviridis (Komatsuna)
Brassica rapa (Komatsuna) is a Japanese mustard spinach leaf vegetable. We used the publicly available RGB-D dataset of Komatsuna [40]. The dataset is grown by Uchiyama et al. [40] using a commercial hydroponic culture toolkit. The plants were sowed using a cube urethane foam and later placed into toolkit holes. A controlled environment is established for the overall plant growth, in which conditions are kept constant for all plants.
The lighting was set to approximately 2400 lux, the temperature was set to 28 • C, and humidity was approximately 30%. The lighting was continued for 24 h to accelerate the process of plant growth. Images were captured regularly using an Intel RealSense SR300 camera. The growth images were captured at 640 × 480 pixel resolution for five plants at four-hour intervals for ten days. The annotations of ground-truth regions of leaves were performed manually using tools proposed by Minervini et al. [42]. The Komatsuna leaf segmentation masks were made available in the RGB-D colour channel format ( Figure 1). The training dataset contains four plants (480 images), while the test dataset is based on one plant (120 images).

Arabidopsis thaliana
The root dataset was extracted from seeds of Arabidopsis (A. thaliana). According to Wilson et al. [41], the seeds were surface-sterilized by incubation in 5% ( v v ) sodium hypochlorite for 5 min. Later, seeds were washed three times in sterile water and sown on vertical 125 × 125 mm square Petri plates. The media used to grow these seeds was 60 mL 1 2 strength Murashige and Skoog media (Sigma) solidified with 1% ( w v ) agar. The Petri plates were transferred to a controlled environment after two days of successful growth at 4 • C. The new controlled environment chambers offered a constant temperature of 23 • C with a continuous photon flux density of 150 µmol m −2 s −1 . The images for the individual root system were acquired using near-infrared imaging methods proposed by Wells et al. [43], where each plate imaged (with regular intervals) over several days. The plants were grown in a controlled-environment for seven days that equipped with the measuring equipment. Each plate contains five plants rotated by 90 • and imaged every 30 min using the automated image acquisition system. Machine vision cameras (Stingray F-504B, Allied Vision Technologies GmbH) were used to acquire the image from the vertically orientated plant plates. The dataset comprises 58 plant plates, with each plate containing five plants. Different genetic accessions of Arabidopsis thaliana such as Col-0, Ler, and Cvi were grown. However, for this work, we focused on the general task of segmentation rather than identification. This dataset does not include segmentation masks, so we produced initial masks using RootNav 2.0 [6] (further details are available in the next section). Figure 2 shows an example of the raw and processed images from the datasets, displaying the feature masks used in the study. The training dataset contains 47 plants (2502 images), while the test dataset is based on 11 plants (694 images).

Preprocessing
The proposed architecture trains on the given dataset ground-truth feature maps and predicts future growth patterns. Dataset annotation is often time-consuming in biological research due to the complex and occluded nature of plants. If dataset annotations are available, then we may simply proceed to GAN training. For example, the Komatsuna dataset comes with well-defined segmentation masks; therefore, no additional data preprocessing is required. However, the Arabidopsis dataset did not include annotated segmentation masks. We utilized RootNav 2.0 [6] for this task. The software uses a deep neural network architecture to segment root features at high accuracy. The segmentation also distinguishes between first-and second-order roots. RootNav 2.0 extracts identified roots as segmentation masks, and the root topology as a series of polylines stored in the Root System Markup Language (RSML) format [44]. RSML files provide an easily interpretable way of utilizing the root system architecture (RSA) data with different modeling and data analysis software tools. It is useful to incorporate this format, as it can be universally applied throughout root phenotyping studies and their repositories. It can preserve the time series information among the image sequences-a critical factor in forecasting growth from priori data. Finally, RootNav 2.0 outputs a color map containing the key recovered root material. An example of the output is shown in Figure 2, depicting the raw input image and segmentation mask outputs ready to be used to train the GAN.

Network Design and Implementation Proposed Architecture
We utilize a Generative Adversarial Network (GAN) to provide forecasting of future image frames. GANs are typically based around convolutional neural networks, and are composed of two networks that learn together, the generator and discriminator. Both networks compete against the other (hence "adversarial") to produce new (synthetic) examples of data that can pose as real data. The model follows an unsupervised approach, trained in an adversarial setting, where the generator is used to produce data samples based on real training data, which the discriminator attempts to classify as accurate (belonging to the real set) or fake (generated) by emitting a probability value [45]. Training is described by the process where, x is the real data sample, z is the generator sample, and P denotes both samples' distributions. In this work, the encoder "generator" learns the plant's growth patterns (leaf and roots) and predicts those growth patterns by producing synthetic future frames, predicting the plant's future growth. The proposed approach is inspired by FutureGAN [24] that itself builds on the progressively growing GAN (PGGAN) [46] by addressing stability issues commonly encountered during GAN training [47]. The proposed network is composed of a generative encoder model trained to produce high definition future forecast images. The discriminator decoder trained to reduce the differences between real and synthetic (forecast) frames. The proposed GAN model is developed using the PyTorch [48] framework ( Figure 3). The discriminator resembles the encoder of the generator, without the feature vector normalization layers. A mini-batch standard deviation layer is inserted in end layers to increase variation. The fully connected last layer, followed by an activation function, enables the discriminator's output to score, indicating the probability of the distribution. According to this score, the generator updates weights, eventually producing a sequence of plausible future frames that may no longer be distinguished from the discriminator's ground truth frames. The LeakyReLU [49] activation function is used at each layer due to its effectiveness as a nonlinear activation function that can help address vanishing gradients and neuron deactivation issues. Each layer also incorporates a weight scaling element, which acts to stabilize training as the learning speed is equalized for all weight parameters. Each resolution stage in generator training consists of 3D convolution and pixel-wise normalization layers, and transposed convolution layers (or deconvolution) are added throughout the middle block and discriminator. The deconvolution is a useful upsampling tool that takes the feature map as input to achieve the same output image dimensions as the original image. The model architecture is summarized in Table 1, describing layer types and resolution downsampling and upsampling stages. Table 1. The complete GAN architecture; types of layers are described for each resolution stage from input to output. "Conv" refers to layers of equalized convolution, "Conv Trans" refers to transposed convolution layers, and pixelwise normalization is also implemented at every layer.

Stage
Layer Activation Output Dimensions Kernel Size Stride Padding

Experimental Design
This GAN module is trained to predict six realistic future growth frames of plants ( x t+1 , . . . , x t+6 ) at once based on six past frames ( x t−5 , . . . , x t ). The segmentation masks are utilized in place of the raw images in order to generate reliable and readily quantifiable output. The procedure is summarized in Figure 4. Experiments were carried out using the ADAM optimizer with β 1 = 0.0 and β 2 = 0.99. The batch size was set to 1. The proposed GANs were trained for 500 epochs using the segmentation masks of plants, and the learning rate was set to 0.001. Training of the GAN is executed using the procedure described by Karras et al. [46], through the progressive growth of the generator. The training was initialized with lower resolution images, and layers were gradually integrated into the network to increase image resolution progressively. This training method offered better training time, and the generator's stability is boosted [46,50]. The generator and discriminator are initialized to take 4 × 4 resolution input frames with new layers gradually added to both networks, doubling the generated frame resolution. There are initially 512 feature maps in each layer. The feature map resolution is halved for new layers, starting from 64 × 64 pixels (Table 1). Layers operating on the higher-resolution frames act as a residual module implemented to overcome the vanishing gradient problem [51]. During the resolution transition, frames are interpolated to match the resolution of the current state networks.

Evaluation
During training and testing, the loss was evaluated to monitor, ensuring that no overfitting or underfitting occurs. Loss is expressed as a negative log-likelihood, referring to a summation of error in training, validation, or test set. The aim is to minimize the loss function with respect to parameter values. The Wasserstein GAN with gradient penalty (WGAN-GP) loss function [52] was used for the optimization of both generator and discriminator, expressed as where L G and L D denote the generator and discriminator network loss functions, respectively. The data distribution is defined using P r , λ refers to the gradient penalty coefficient [24] and ε is the epsilon penalty coefficient added to prevent the loss function drifting [24]. The generator output sequence (x) is defined asx = G(z) = (x t+1 , . . . ,x t+t out ), and the discriminator input (x) is defined as x = (x t−t in +1 , . . . , x t+t out ).
The mean Intersection-Over-Union (mIoU) is used to evaluate segmentation performance, revealing the proportion of overlapping area between the segmentation mask and ground truth image, derived by dividing the area of overlap by the area of union (amount of pixels classified to the same group in both images). This proportion is determined for each class, the primary and lateral root classes, and averaged to obtain the mean for mIoU (the higher the better). This metric quantifies semantic segmentation performance, revealing the misclassifications when compared to ground-truth data. Training and test accuracy are recorded after the learning procedure to reveal the error rate in comparison to the true targets.
We used two other evaluation metrics to assess the proposed system's performance: the peak-signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) [53]. PSNR is a popular image quality validation tool derived from the Mean Squared Error (MSE) formula, making it easy to understand and calculate. MSE used to calculate the average squared difference between the predicted images and their ground truth images, and the smaller the MSE value, the better the prediction. PSNR method is more sensitive to noise degradation that occurs in the image. The higher the PSNR value, the better the quality of the predicted image. To ensure our results' effectiveness, we also calculated SSIM, which is designed on the image distortion model, which correlates with the human visual system. The SSIM values for original and predicted images range between 0 to 1, 1 depicts a perfect match. Usually, SSIM values ranging from 0.90 to 0.99 are considered good quality predictions. The MSE (Equation (4)), the PSNR (Equation (5)) and the SSIM (Equation (6)) are computed as: where (Equation (4)) the square difference between each pixel in the two images, f (i, j) (ground truth) and g(i, j) (generated), is taken and divided by the total pixels using matrix mn. By measuring the differences in pixel values, we may quantify the extent of the generated images' resemblance to their ground truth counterparts. In Equation (6), µ referring to the average of f and g, σ 2 the variance and σ the covariance. The FutureGAN model's performance in terms of leaf growth predictions was also evaluated by measuring the percentage of leaf area change between the first ground truth image and the latest generated frame as a representation of biomass increase rate. We calculate biomass growth using a Hausdorff distance [54] style algorithm, calculating the difference between initial input images and final forecast images. The final forecast images were also compared with ground truth to confirm the validity of growth. Individual RGBcolored leaves/roots are measured, and the overall biomass increase for the whole rosette produced by calculating the amount of change in the colored pixels. This measure allows us to evaluate the rates of leaf/roots growth between any two points in the sequence using the increase in leaf area. We can then explore consistency of the predictions throughout the leaf/roots datasets and the biological significance of the growth forecasts. The RGB color values used in the original annotations (for example, in [40]) and the network output were compared within the HSV color space to ensure accurate comparison.

Komatsuna Leaf Dataset
In the experimental study, the Komatsuna plant's annotated segmentation masks were first used to verify the proposed GAN architecture's effectiveness. The test dataset contains a complete growth sequence of the plant. The GAN architecture was trained to forecast six realistic-looking future growth frames of leaves at once based on six past input frames (see Supplementary Materials S1). Figure 5 shows a qualitative comparison between predicted frames and the ground truth. To assess whether the GAN model forecast was biologically accurate and produced meaningful images rather than merely look plausible from human observation, the average percentage of leaf growth over given sequences was quantitatively calculated by measuring the number of increased pixels of each predicted frame at different time step compared to a baseline of the last frame of the input sequence. The average growth proportions of predicted frames from t + 1 to t + 6 time step were calculated, and they were 6.62%, 9.73%, 16.58%, 24.15%, 31.78%, and 36.75%, respectively. Each forecasted frame showed an increase compared to its past one. The ground-truth's average increase in biomass was also calculated, revealing a strong positive correlation between leaf growth rates in the sequences of forecasted frames and the ground truth with r = 0.812 ( Figure 6). This high consistency in terms of growth rate demonstrates the architecture learned the pattern of leaf growth and can accurately forecast the future growth in leaf image sequences.  To further compare the future growth frames generated by the trained network and the known ground truth, we used two evaluation metrics, PSNR and SSIM, representing the similarity between predicted frames and the ground-truth. The average acquired results at each time step were list in Table 2. The values of SSIM at all time steps surpass 94.60%, signifying a high degree of similarity between the predicted leaf frames and the ground-truth. It is worth noting that the color difference between predicted frames and the groundtruth was ignored when calculating the accuracy metrics, aiming to concentrate on evaluating the performance of the GAN architecture to predict plant growth in terms of developmental indicators, including size and shape rather than the altered pixel color produced by the generator. In other words, Figure 7b was processed to Figure 7c, before comparing Figure 7a,c. These metrics are not robust to changes in color such as these, but it is also important to note that quantification of the key plant characteristics and phenotypes would be unaffected by small changes in contrast and color on these segmentation masks. The PSNR and SSIM metrics are also illustrated in Figure 8-the values of PSNR and SSIM typically show a gradual decrease trend as time step increases. This may be caused by deviations of the predictions accumulated over time.

Arabidopsis thaliana Root Dataset
The second dataset of root images was used to analyze the model performance on more complex RSA images. The system was retrained to generate three future segmentation masks (from t + 1 to t + 3) at once, based on six input segmentation masks (see Supplementary Materials S2). Figure 9 shows samples of the roots' predicted future growth frames, corresponding to their input frames. Similarly, the root growth rates of both predicted frames and the ground truth were measured. The total pixel number across the five plants within each frame was measured rather than each root for simplicity. The average growth proportions of predicted frames between two neighboring time steps from t + 1 to t + 2 and from t + 2 to t + 3 were then calculated, being 45.66% and 36.50%, respectively. The root growth of predicted frames was in line with that of the ground truth, which indicates the growth is accurately forecasted in terms of the root size increase (see Supplementary Materials S3). It is also noted that predicted frames after the time step of t + 3 were not evaluated. Beyond this, performance begins to degrade compared with the structurally simple Komatsuna dataset. The root material comprises fine detail within high-resolution images, increasing the resolution further is not practical due to memory constraints. The accuracy of forecasted frames was further evaluated by calculating the values of IoU. Due to some class imbalance in the segmentation masks (caused by dominant background class), larger PSNR and SSIM may not necessarily reflect a higher prediction accuracy. Thus, the IoU was used as an alternative metric that can better evaluate the predicted root frames. This metric's higher value indicates that the prediction of plant growth is closer to the ground truth. Before the calculation of IoU, the color difference between generated frames and the ground truth was also removed by simply converting the background pixel color of both predicted images and the ground truth to black and the root pixel color of them to white, as shown in Figure 10. After that, the values of IoU were calculated using processed Figure 10b  The mIoU results at each time step from t + 1 to t + 3 are listed in Table 3, where IoUbg, IoUroot, and mIoUroot are the IoU of the background of root images, the IoU of roots, and the mean IoU of root images, respectively. These quantitative results indicate that the GAN can scale to more complex RSA images as well.

Discussion
According to the work in [55], there is a need for efficient models to characterize and measure plant growth and function. In roots, the majority of methods are based on mathematical representations of root length and growth. Our proposed method combines the spatial and temporal characteristics of root systems to provide robust prediction of root growth. By incorporating this network into the existing RootNav 2.0 pipeline, the proposed root masks can be analyzed using the RootNav 2.0 viewer, providing a variety of common phenotypic measurements that are useful to plant scientists. The extracted root system may be be saved in RSML format, facilitating the sharing of root architectures between different tools, experiments, software, and research groups. RSML format can record root topology (parent-child relationships), morphological characteristics (positions in space and time, length), and virtually any additional data required to outline root segments (e.g., color, diameter, and age) separately [44].
While the results presented here have shown to be effective in prediction of shoot and root growth, limitations remain that may be considered for future work. For example, unannotated datasets must first be processed by RootNav 2.0 or a similar package in order to extract usable training data, this can be a time-consuming process on large datasets, but is autonomous. The resolution of this forecasting approach, along with may GAN-based approaches, is also a hindrance. For many plant species, in both roots and shoots, the resol Another critical limitation of the proposed method is the resolution of images. The low-resolution images could be a bottleneck for the proposed method, especially for root images where roots are small and occluded.
Findings have revealed the definite need for high-quality and large image datasets that allow for a good pool of training and test data, evident from the differences in leaf and root dataset results. Future studies will explore the potentials of deep learning in this task further using alternative datasets of varying quality and format, and repeat studies to increase the reliability of findings and expand the breadth of research into image forecasting methods. In particular, the accurate annotation and semantic segmentation of images were found to play a critical role in the prediction of fine detail in root systems. The promising results of software such as RootNav 2.0, when used with the deep learning methods, highlight the importance of high-quality image preprocessing in these studies-a key factor contributing to the exploration of high throughput phenotyping.
Nevertheless, the results presented here have shown that machine learning, and GANs in particular can perform complex forecasting of plant growth. These approaches can be retrained and adapted into other domains, potentially benefiting a wide range of researchers.

Conclusions
This study proposes a plant phenotyping and growth prediction system which may be used to generate future growth frames of both plant leaves and roots using images. We have shown that this deep learning approach, based on GANs, is capable of predicting numerous frames of growth into the future. In shoots, the system generates accurate segmentation masks of leaves growing time. In roots, the system provides a natural extension to the existing RootNav 2.0 approach, adding temporal sequence analysis. The GAN-based approach was trained to predict subsequent frames from six previous growth images. Findings reveal a high degree of consistency and similarity between the predicted leaf frames and the ground truth, with an average SSIM higher than 94.60% in shoots, and an average mIoU higher than 76.89% for roots. We have focused on predicting the segmentation masks rather than the original images, both to reduce the complexity of the prediction problem, and to allow the output to be directly quantified using standard phenotyping metrics. The study shows the potential to provide the plant phenotyping community with an efficient tool that can perform high-throughput phenotyping and predict future plant growth. This could potentially speed up biologists' experimental cycles by reducing the time required to grow, image, and measure plants.