End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots

Gladilin, Evgeny; Narisetti, Narendra; Neumann, Kerstin; Altmann, Thomas

doi:10.3390/agronomy15051117

Open AccessArticle

End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots

Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, Germany

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(5), 1117; https://doi.org/10.3390/agronomy15051117

Submission received: 22 March 2025 / Revised: 24 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Novel Approaches to Phenotyping in Plant Research)

Download

Browse Figures

Versions Notes

Abstract

:

High-throughput image analysis is a key tool for the efficient assessment of quantitative plant phenotypes. A typical approach to the computation of quantitative plant traits from image data consists of two major steps including (i) image segmentation followed by (ii) calculation of quantitative traits of segmented plant structures. Despite substantial advancements in deep learning-based segmentation techniques, minor artifacts of image segmentation cannot be completely avoided. For several commonly used traits including plant width, height, convex hull, etc., even small inaccuracies in image segmentation can lead to large errors. Ad hoc approaches to cleaning ’small noisy structures’ are, in general, data-dependent and may lead to substantial loss of relevant small plant structures and, consequently, falsified phenotypic traits. Here, we present a straightforward end-to-end approach to direct computation of phenotypic traits from image data using a deep learning regression model. Our experimental results show that image-to-trait regression models outperform a conventional segmentation-based approach for a number of commonly sought plant traits of plant morphology and health including shoot area, linear dimensions and color fingerprints. Since segmentation is missing in predictions of regression models, visualization of activation layer maps can still be used as a blueprint to model explainability. Although end-to-end models have a number of limitations compared to more complex network architectures, they can still be of interest for multiple phenotyping scenarios with fixed optical setups (such as high-throughput greenhouse screenings), where the accuracy of routine trait predictions and not necessarily the generalizability is the primary goal.

Keywords:

greenhouse plant phenotyping; image analysis; quantitative traits; image segmentation; image-to-trait regression; deep learning; convolutional neural networks

1. Introduction

High-throughput image analysis plays a critical role in plant phenotyping, where accurate assessment of plant traits is essential for quantitative assessment of environmental or genetic perturbations in plant morphology, health, and function [1,2,3,4,5,6,7]. Two prominent methodologies in this field are deep learning image segmentation and direct end-to-end (i.e., image-to-number) approaches.

The typical approach to the derivation of plant traits from image data consists of two major steps: (i) finding targeted plant structures in image data (image segmentation re. object detection) followed by (ii) calculation of quantitative features of segmented/detected structures. Deep learning-based segmentation of plant images has been widely applied to detect whole plants or plant parts. Prominent models such as U-net [8], DeepLab [9], R-CNN [10], and Mask R-CNN [11] have been frequently adopted for this purpose. These models classify each pixel or image region into specific categories, enabling precise delineation of plant structures. Deep learning models have higher-order cognitive capabilities and can incorporate multi-scale feature extraction, improving segmentation quality in scenarios where plants have to be distinguished from complex background structures and/or where plant parts overlap or are closely situated. By such tasks, deep learning approaches exhibit robustness against variations in lighting and background conditions, making them suitable for diverse environmental settings.

However, image segmentation deep learning models are typically trained to minimize some mathematical accuracy measures and not the phenotypic plant traits of segmented/detected structures. As a result, minor inaccuracies in image segmentation are always included in the prediction results of these models. Some of the integrative phenotypic traits (for example, total projection area, volume, or color shoot traits) are not sensitive to minor inaccuracies in image segmentation. Others, and, in particular, widely sought metric traits such as plant width, height, convex hull, etc., are highly sensitive even to minor errors in image segmentation [6,12]. Cleaning segmented images using some simple criteria (such as the size and/or shape of noisy background structures) is often not feasible, as fragmented plant (foreground) and noisy background structures can be very similar in their size and/or shape.

An alternative approach to image segmentation consists of training end-to-end deep learning models to predict plant traits directly from image data [13,14,15]. End-to-end deep learning models are regression models that map raw image data onto numbers (i.e., phenotypic traits) omitting the intermediate segmentation step. This approach integrates feature extraction and trait inference into a single model. By eliminating the segmentation step, these models reduce processing time and computational overhead, which is particularly beneficial for high-throughput phenotyping applications. The end-to-end approach simplifies the workflow by avoiding the time-consuming generation of ground-truth segmentation data and integrating multiple processes into one single model, which substantially reduces the overall complexity of the data processing pipeline. Since end-to-end models learn relevant features directly from images, they are potentially more accurate than segmentation models that are trained on image similarity measures and not final plant traits. On the other hand, the end-to-end models lack explicit segmentation, which makes it difficult to interpret how specific traits are derived from the image data. Direct trait derivation may also overlook fine-grained details that could be captured through dedicated segmentation algorithms, potentially affecting accuracy in complex phenotyping tasks. Although an end-to-end model may require less preprocessing, they demand more careful tuning and extensive training data to achieve an optimal level of accuracy. In previous works, the principal limitations of end-to-end deep learning, particularly with regard to their scalability, have been reported [16].

In plant phenotyping, both approaches have been applied with notable success. Numerous studies have shown that segmentation models such as U-net can accurately segment various parts of plants (e.g., leaves, roots) from images taken under different conditions [6,17,18,19]. Instead, end-to-end models were previously applied to object detection or counting tasks. Ubbens et al. [20] and Pound et al. [21] presented end-to-end approaches to leaf and spike counting directly from plant images. Olenskyj et al. [22] demonstrated the application of end-to-end regression to yield estimation from images of field-grown plants. However, the overall body of literature on the application of end-to-end deep learning approaches to quantitative trait estimation from plant images remains relatively sparse.

In this study, we investigate the performance of end-to-end vs. image segmentation approaches for the prediction of widespread phenotypic traits, for example, plant area, height, width, convex hull, and color features, and compare these two approaches in the context of high-throughput phenotyping of greenhouse-grown plant shoots.

2. Methods

2.1. Image Data Acquisition and Preprocessing

Visible light images of Arabidopsis, maize, and barley shoots were acquired from greenhouse phenotyping experiments using the LemnaTec-Scanalyzer3D plant phenotyping platform (LemnaTec GmbH, Aachen, Germany); see example images in Figure 1 (top row). The images were acquired once a day throughout the duration of each experiment, which typically ranged from 2 to 3 months. All three greenhouse phenotypic platforms have different camera resolutions, background illumination, and photochamber installations.

For the training and validation of deep learning models, a set of 1476 images with accurate annotation of fore- and background regions (ground-truth data) was used. Examples of binary-segmented shoot images are shown in Figure 1 (bottom row). Binary segmentation of plant shoots was performed using the in-house software tool kmSeg [23], which allows efficient annotation of image regions by assigning precalculated k-means color classes to background or plant shoot categories. kmSeg effectively labels image regions by assigning precalculated k-means color categories to the background or plant areas, significantly reducing the time required for manual segmentation. Sometimes, semi-automated image segmentation using kmSeg requires additional cleaning steps to remove the noise. Together, segmentation of a typical 4-8 megapixel image takes between 1 and 5 min depending on the color composition and structural complexity of a given plant shoot image.

2.2. Image Segmentation Using a Pre-Trained U-Net Model

For the task of image segmentation, a pre-trained U-net model from [6] was used. This model was trained on 256 × 256 RGB masks cropped from original and ground-truth segmented images. U-Net is based on a U-shaped encoder–decoder architecture that hierarchically extracts features while retaining spatial information through skip connections. The model was trained using the Dice loss measure (also known as the F1 score) to optimize the overlap between predicted and ground-truth segmented image regions. The training dataset included images from three LemnaTec phenotyping facilities including Arabidopsis (small), barley (middle size), and maize (large) plants that were imaged at different growth stages and lighting conditions to ensure the robustness of model performance. The output of the segmentation model is a binary image with “zero” values assigned to the background pixels and “one” assigned to the foreground (plant) pixels.

2.3. Phenotypic Plant Traits

Nine end-to-end regression models were trained to predict nine basic traits of plant morphology and health. Table 1 gives an overview of nine traits used in this study as target variables for end-to-end model training. For validation of deep learning model prediction, the above nine traits were computed for all 1476 ground-truth segmented images.

The plant area and the convex hull are indicators of the plant growth vigor, while height and width provide morphological information about the plant. Color traits (such as average red, green, and blue colors) reflect the health status and nutrient content of the plant.

2.4. Plant Trait Derivation Using End-to-End Model

Unlike the image segmentation approach, an end-to-end model was trained to directly predict a single number (i.e., trait value) for each image bypassing the intermediate segmentation step. Figure 2 visualizes the general data flow for image analysis and evaluation of model performance. Accordingly, the accuracy of the end-to-end (i.e., image-to-trait) model was assessed using the ground-truth trait values and not the Dice (re. F-score) coefficient. Our end-to-end approach is based on a conventional architecture of a CNN model, including six hierarchical convolution layers of size 8, 16, 32, 64, 128, and 256 followed by two fully connected layers producing a single trait value in the final model output; see Figure 3. Table 2 gives a summary of the options used for the training of our e2e model, which was implemented under MATLAB R2024a (The MathWorks Inc., Natick, Massachusetts, USA). For all other options not mentioned in Table 2, default option values predefined by the MATLAB R2024a environment were used.

End-to-end regression models were trained to predict one single plant trait (out of a total of nine) for each particular plant type (Arabidopsis, barley, maize) and camera view (top, side). Since Arabidopsis shoots are barely visible in the side view, 9 × 5 = 45 end-to-end models were trained to predict nine traits from five different plant imaging modalities (Arabidopsis top view, barley and maize top/side views).

2.5. Performance Measures

To validate the performance of segmentation and end-to-end models by prediction of the plant traits, the following five measures were computed vs. the ground truth using MATLAB R2024a corr and measerr functions:

R² coefficient of the linear correlation between predicted and ground-truth traits for all images.
Mean Square Error (MSE).
Maximum Error (MAXERR),
Ratio of Squared Norms (L2RAT).

2.6. Computational Implementation

The implementation and training of end-to-end models was performed under MATLAB R2024a using the Nvidia RTX 4090 graphics card with 32GB GRAM. Details of the training and implementation of our U-net segmentation model can be found elsewhere [6]. The calculation of phenotypic traits from ground-truth and U-net-segmented images was performed using the same routines. In the case of end-to-end models, traits are computed in the output of model predictions bypassing the segmentation step.

3. Results

This section presents the experimental results including training, validation, and comparison of the end-to-end trait prediction models vs. the U-net-based image segmentation approach.

3.1. End-to-End Model Generation

For comparison of the performance of end-to-end trait prediction vs. U-net segmentation models, 1476 images of a uniform size of 256 × 256 were used. Out of a total of 1476 images of different plants and camera views, 1178 images (80%) were used for model training and testing, while the remaining ‘unseen’ 298 images (20%) were used for validation of model predictions. In turn, the set of 1178 images was randomly split for each of the five particular imaging modalities (i.e., Arabidopsis top-view images, barley and maize top-view images, and side-view images) into training and testing subsets according to the 80:20 ratio. Table 3 gives an overview of the training, testing, and validation sets for five different imaging modalities. For each of the five imaging modalities, dedicated end-to-end models were trained to predict a single plant trait out of nine listed in Table 1 using the same training/testing sets of randomly selected images, resulting in completely 5 × 9 = 45 e2e models. Likewise, five U-net segmentation models dedicated to the same plant imaging modalities were used for binary image segmentation followed by conventional calculation of plant traits.

Figure 4 shows examples of training and validation RMSE losses of the end-to-end plant projection area prediction model for 10,000 iterations. Since the learning curves exhibit rapid convergence, an automated criterion for early stopping was introduced by setting lower ValidationFrequency = 35 and ValidationPatience = 5 parameters of training options of the MATLAB R2024a trainnet function. With these automated stopping criteria, the typical number of epochs amounted up to 100 by a maximum number of approximately 1000 iterations, indicating that the model had reached a stable state.

Training of each single-trait regression model on the set of 1178 256 × 256 RGB images took about 30 min on a PC workstation equipped with an Nvidia RTX 4090 graphics card and 22 h in total for all 45 trait-predicting models.

3.2. Comparison and Validation of Regression and Segmentation Models vs. Ground Truth

The validation of all 45 end-to-end trait-predicting models vs. the U-net-based segmentation approach in comparison to ground-truth trait values on ’unseen’ images from the validation set is summarized in Table 4. As can be seen, many end-to-end models, especially for barley and maize shoots, outperform segmentation models in terms of the accuracy of final trait prediction with respect to a number of performance measures, including the correlation between ground-truth and model-predicted traits (

R^{2}

) and their differences (MSE, MAXERR, L2RAT). However, significant deviations from ground-truth data were found predominately for the U-net segmentation approach, and some particular plant traits; see the ‘t-test’ column of Table 4. The details of t-test measurements of differences between ground-truth plant traits vs. the results of e2e and U-net model predictions are given in Supplementary Table S1. In particular, for the barley side-view images, the U-net segmentation approach underperforms end-to-end predictions quite significantly. This result is not surprising, because segmentation of thin barley leaves in optically more inhomogeneous (i.e., low-contrast) side-view images is known to be a more difficult task and, thus, associated with more pronounced inaccuracies compared to other imaging modalities. An exemplary comparison of end-to-end and U-net segmentation model performance by prediction of the maize shoot projection area and the convex hull area is shown in Figure 5. The comparison of these traits for all species and the views from the validation set can be found in Supplementary Figure S1. As one can see, the projection area and convex hull area predictions using the U-net segmentation approach undergo stronger deviations from ground-truth values compared to the predictions of the end-to-end model.

Although area is a relatively robust trait that does not alter much with marginal inaccuracies of image segmentation, the convex hull can largely be affected by just a few incorrectly labeled pixels. The correlation plots in Figure S1 clearly show that the U-net predictions of the convex hull area suffer from this limitation, especially in the case of barley and maize images, while segmentation-free end-to-end models exhibit a more robust performance.

3.3. End-to-End Model Explainability

Since the end-to-end model predictions do not include the segmentation of targeted structures, indirect hints for explainability of the e2e performance can be derived from the visualization of early activation maps. Figure 6 shows an example of the GradCam activation map of early convolutional layers of the ’area’ trait-predicting model when applied to a top-view image of an Arabidopsis shoot. As one can see, this activation map resembles the segmentation of the plant region. A more detailed analysis shows that activation maps of single network layers cannot, in general, be seen as an accurate substitute for maps of image segmentation. However, their visualizations provide some evidence that intrinsic image processing in the end-to-end network silently relies on a sort of foreground detection effectively similar to image segmentation.

3.4. Software Performance and Implementation

With regard to efficiency, end-to-end models are 10 times smaller in size (on average, 2 MB) and approximately 10 times faster (on average, 0.05 s) compared to the performance of U-net segmentation models by predicting plant traits from the same set of 256 × 256 images. All 45 Arabidopsis, barley, and maize trait-predicting models described above were compiled and made available as freely distributed Windows-executable command-line tools along with example images and usage guidelines from https://ag-ba.ipk-gatersleben.de/e2e.html (accessed on 23 November 2024). In view of a large variability of optical scenes and a relatively small size of training datasets, pre-trained end-to-end models primarily serve demonstration purposes and can be expected to achieve meaningful prediction of phenotypic traits on images of similar plants captured in similar optical scenes as presented in this feasibility study.

4. Discussion

The critical dependence of trait prediction on the accuracy of image segmentation is a principle bottleneck of all conventional (i.e., segmentation-based) image analysis pipelines, including advanced deep learning approaches. Typically, image segmentation CNN models are trained to minimize the loss measures of image segmentation and not the loss of final biological traits. Consequently, even accurately trained segmentation models cannot prevent errors from occurring in trait estimation, especially for those traits that are highly dependent on small inaccuracies in image segmentation such as linear dimensions of plant regions (i.e., plant height, width, convex hull, etc.). From the viewpoint of biological applications, image segmentation represents an intermediate step toward the derivation of quantitative phenotypic traits. Unlike segmentation approaches, end-to-end regression models are trained to minimize the inaccuracy of final trait predictions and can be expected to overcome the limitations of segmentation-based plant phenotyping. The results of our feasibility study confirm this assumption, demonstrating superior end-to-end model performance by predicting different traits across different plant types and camera views. The slightly but not significantly lower performance of end-to-end models in Arabidopsis images compared to U-net segmentation can be attributed to a higher variability of the Arabidopsis dataset, including strong differences in the plant size between tiny juvenile and larger adult shoots. However, images of Arabidopsis plants with a highly contrasting blue mat can obviously be segmented more accurately than barley and maize images that exhibit a lower contrast between plant and background regions. In the case of lower-contrast barley and maize images, where image segmentation becomes less accurate, the end-to-end approach exhibits significantly more robust performance. By eliminating the segmentation step, end-to-end models reduce training and processing times, which also makes them more efficient, especially for time-consuming high-throughput high-content applications. On the other hand, our end-to-end models were trained on images of particular optical setups, and thus are not generalizable. However, in many applications, when routine plant screening with the same optical setup is performed, the limited generalizability is not the major problem.

Unlike image segmentation, which can also be performed on smaller submasks that are then stitched together with images of larger original size, end-to-end methods have to be applied to full-sized images to reasonably predict whole-plant traits. Consequently, images of a relatively small size were used to train end-to-end models in this feasibility study. However, as long as the target traits can be resolved at a lower image resolution, e2e models can still be applied to process low-size downscaled images followed by subsequent upscaling of the morphological trait values.

In the present study, 45 e2e trait-predicting models (that is, nine traits and five imaging modalities) were trained and validated using the same five sets of randomly selected images. A multifold (e.g., N = 5) evaluation would certainly improve the statistical validity of the observed results, but was not carried out to avoid computational expenses required for the training of 5 × 45 = 225 single e2e models. However, multifold evaluation should be considered in future studies of the robustness and generalizability of some particular e2e trait-predicting models.

5. Conclusions

Our investigations show that e2e models exhibit distinctive advantages by direct prediction of plant traits from images of greenhouse facilities with fixed optical setups. Although end-to-end models have a number of principal limitations compared to more advanced architectures of recent image segmentation networks, they can still be of interest for routine application in phenotyping scenarios, where the accuracy of trait prediction and not necessarily generalizability to different optical scenes and contrasting plant types is primarily required.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/agronomy15051117/s1: Table S1. Summary of t-test of significance of differences between ground truth plant traits vs. the results e2e model prediction vs. traits computed using the U-net segmentation approach crossover all images from the validation set. Figure S1. Comparison of end-to-end (e2e) and U-net segmentation model performance by prediction of area and convex hull area (in pixel²) of Arabidopsis, barley and maize plants.

Author Contributions

E.G. conceptualized the study, developed computational tools, performed data analysis, prepared figures and tables, and wrote the manuscript. N.N. developed computational tools and read the manuscript. K.N. performed the biological experiments and read the manuscript. T.A. co-supervised the study and read the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The manuscript is accompanied by Supplementary Information and software/data repositories.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lobet, G. Image analysis in plant sciences: Publish then perish. Trends Plant Sci. 2013, 18, 422–431. [Google Scholar] [CrossRef] [PubMed]
Minervini, M.; Scharr, H.; Tsaftaris, S.A. Image analysis: The new bottleneck in plant phenotyping. IEEE Signal Process. Mag. 2015, 32, 126–131. [Google Scholar] [CrossRef]
Fahlgren, N.; Gehan, M.A.; Baxter, I. A comparison of machine learning methods for leaf segmentation in plant phenotyping. Front. Plant Sci. 2015, 5, 567. [Google Scholar]
Tsaftaris, S.A.; Minervini, M.; Scharr, H. Machine learning for plant phenotyping needs image processing. Trends Plant Sci. 2016, 21, 989–991. [Google Scholar] [CrossRef] [PubMed]
Pound, M.P.; Atkinson, J.A.; Townsend, A.J.; Wilson, M.H.; Griffiths, M.; Jackson, A.S.; Bulat, A.; Tzimiropoulos, G.; Wells, D.M.; Pridmore, T.P.; et al. Deep machine learning provides state-of-the-art performance in image-based plant phenotyping. GigaScience 2017, 6, 1–10. [Google Scholar] [CrossRef] [PubMed]
Narisetti, N.; Henke, M.; Neumann, K.; Stolzenburg, F.; Altmann, T.; Gladilin, E. Deep Learning Based Greenhouse Image Segmentation and Shoot Phenotyping (DeepShoot). Front. Plant Sci. 2022, 13, 906410. [Google Scholar] [CrossRef] [PubMed]
Okyere, F.G.; Cudjoe, D.; Sadeghi-Tehran, P.; Virlet, N.; Riche, A.B.; Castle, M.; Greche, L.; Mohareb, F.; Simms, D.; Mhada, M.; et al. Machine Learning Methods for Automatic Segmentation of Images of Field- and Glasshouse-Based Plants for High-Throughput Phenotyping. Plants 2023, 12, 2035. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Hüther, P.; Schandry, N.; Jandrasits, K.; Bezrukov, I.; Becker, C. ARADEEPOPSIS, an Automated Workflow for Top-View Plant Phenomics using Semantic Segmentation of Leaf States. Plant Cell 2020, 32, 3674–3688. [Google Scholar] [CrossRef] [PubMed]
Giuffrida, M.V.; Dobrescu, A.; Doerner, P.; Tsaftaris, S.A. Leaf Counting Without Annotations Using Adversarial Unsupervised Domain Adaptation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 2590–2599. [Google Scholar] [CrossRef]
Khoroshevsky, F.; Zhou, K.; Bar-Hillel, A.; Hadar, O.; Rachmilevitch, S.; Ephrath, J.E.; Lazarovitch, N.; Edan, Y. A CNN-based framework for estimation of root length, diameter, and color from in situ minirhizotron images. Comput. Electron. Agric. 2024, 227, 109457. [Google Scholar] [CrossRef]
Cheng, Y.; Ren, N.; Hu, A.; Zhou, L.; Qi, C.; Zhang, S.; Wu, Q. An Improved 2D Pose Estimation Algorithm for Extracting Phenotypic Parameters of Tomato Plants in Complex Backgrounds. Remote Sens. 2024, 16, 4385. [Google Scholar] [CrossRef]
Glasmachers, T. Limits of End-to-End Learning. In Proceedings of the Ninth Asian Conference on Machine Learning, Seoul, Republic of Korea; 15–17 November 2017; Proceedings of Machine Learning Research; Zhang, M.L., Noh, Y.K., Eds.; Yonsei University: Seoul, Republic of Korea, 2017; Volume 77, pp. 17–32. [Google Scholar]
Narisetti, N.; Henke, M.; Seiler, C.; Junker, A.; Ostermann, J.; Altmann, T.; Gladilin, E. Fully-automated root image analysis (faRIA). Sci. Rep. 2021, 11, 16047. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Huang, Y.; Wang, M.; Zhao, Y. An improved U-Net-based in situ root system phenotype segmentation method for plants. Front. Plant Sci. 2023, 14, 1115713. [Google Scholar] [CrossRef] [PubMed]
Yi, X.; Wang, J.; Wu, P.; Wang, G.; Mo, L.; Lou, X.; Liang, H.; Huang, H.; Lin, E.; Maponde, B.; et al. AC-UNet: An improved UNet-based method for stem and leaf segmentation in Betula luminifera. Front. Plant Sci. 2023, 14, 1268098. [Google Scholar] [CrossRef] [PubMed]
Ubbens, J.R.; Cieslak, M.; Prusinkiewicz, P.; Stavness, I. The use of plant models in deep learning: An application to leaf counting in rosette plants. Plant Methods 2018, 14, 6. [Google Scholar] [CrossRef] [PubMed]
Pound, M.P.; Atkinson, J.A.; Wells, D.M.; Pridmore, T.P.; French, A.P. Deep learning for multi-task plant phenotyping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2055–2063. [Google Scholar]
Olenskyj, A.G.; Sams, B.S.; Fei, Z.; Singh, V.; Raja, P.V.; Bornhorst, G.M.; Earles, J.M. End-to-end deep learning for directly estimating grape yield from ground-based imagery. Comput. Electron. Agric. 2022, 198, 107081. [Google Scholar] [CrossRef]
Henke, M.; Neumann, K.; Altmann, T.; Gladilin, E. Semi-Automated Ground Truth Segmentation and Phenotyping of Plant Structures Using k-Means Clustering of Eigen-Colors (kmSeg). Agriculture 2021, 11, 1098. [Google Scholar] [CrossRef]

Figure 1. Examples of greenhouse image data including original and ground-truth segmented images. From left to right column: Arabidopsis (top view), maize (side view), maize (top view), barley (side view), barley (top view).

Figure 2. Visualization of data flow and comparative analysis of end-to-end regression vs. U-net segmentation models vs. ground-truth plant traits.

Figure 3. Layer architecture of the end-to-end (image-to-trait) regression model including six Conv2D/relu blocks followed by fully connected layers of sizes 256 and 1, respectively.

Figure 4. Example of training and validation RSME losses of the end-to-end ‘projection area’ prediction model for a total of 10,000 iterations. Blue and red lines indicate original and smoothed performance curves, respectively. Due to the early saturation of the training curves, automated criteria for early stopping were introduced, resulting in a much lower number of approximately 1000 iterations.

Figure 5. Exemplary comparison of maize shoot projection area and convex hull area using U-net vs. end-to-end trait prediction models. From left to right, from top to bottom: original side-view image of a maize shoot, segmented image of the maize shoot (white) with the boundary of the convex hull (green line), scatter plots of maize shoot projection area (bottom, left) and convex hull area (bottom, right) computed using the U-net segmentation vs. end-to-end trait prediction models. As one can see, the predictions of the segmentation-based approach undergo stronger deviations from the ground-truth values compared to the end-to-end predicted trait values.

Figure 6. Example of the activation map of the end-to-end plant area-predicting model: (a) original image, (b) activation map of the first relu network layer.

Table 1. Overview of phenotypic plant traits used for end-to-end model training and validation.

Trait Name	Description
Area	The pixel count of the plant region
C.hull	The pixel count of the convex hull of the plant region
Height	Vertical dimension of the plant region
Width	Horizontal dimension of the plant region
H_99	99th percentile of the vertical distribution of plant pixels
W_99	99th percentile of the horizontal distribution of plant pixels
Red	Average red color of plant pixels
Green	Average green color of plant pixels
Blue	Average blue color of plant pixels

Table 2. Summary of e2e training options.

Option	Value
Optimizer	Adam
Batch size	32
Initial learn rate	0.005
Metrics	RSME
Validation patience	5
Validation frequency	35
Max. number of epochs	100

Table 3. Overview of model training, testing, and ’unseen’ validation sets of randomly selected Arabidopsis (A), barley (B), and maize (M) top- and side-view images.

Image Modality	# Training	# Testing	# Validation
A, top	210	53	57
B, top	170	43	59
B, side	248	62	77
M, top	209	52	72
M, side	105	26	33

Table 4. Summary of performance of end-to-end (e2e) regression and U-net segmentation models vs. ground truth by estimation of traits from Arabidopsis top-view (A, top), barley top- and side-view (B, top/side) and maize top-view (M, top) images using the following metrics:

R^{2}

coefficient of linear correlation vs. ground-truth traits (i.e., traits derived from ground-truth segmented images), Mean Square Error (MSE), Maximum Error (MAXERR), and the ratio of the L2 norms of model-predicted traits to ground-truth trait values (L2RAT). Bold text indicates superior model prediction for a particular measure (for example, a higher correlation coefficient

R^{2}

or lower MSE indicates superior model performance (bold)). In the right ‘t-test vs. gt’ column, the star symbol (*) indicates the significance of differences between ground-truth and model-predicted trait values assessed at the significance level of

p < 0.05

.

Table 4. Summary of performance of end-to-end (e2e) regression and U-net segmentation models vs. ground truth by estimation of traits from Arabidopsis top-view (A, top), barley top- and side-view (B, top/side) and maize top-view (M, top) images using the following metrics:

R^{2}

coefficient of linear correlation vs. ground-truth traits (i.e., traits derived from ground-truth segmented images), Mean Square Error (MSE), Maximum Error (MAXERR), and the ratio of the L2 norms of model-predicted traits to ground-truth trait values (L2RAT). Bold text indicates superior model prediction for a particular measure (for example, a higher correlation coefficient

R^{2}

or lower MSE indicates superior model performance (bold)). In the right ‘t-test vs. gt’ column, the star symbol (*) indicates the significance of differences between ground-truth and model-predicted trait values assessed at the significance level of

p < 0.05

.

Image Type	Plant Trait	$R^{2}$		MSE		MAXERR		L2RAT		t-Test vs. gt
Image Type	Plant Trait	U-Net	e2e	U-Net	e2e	U-Net	e2e	U-Net	e2e	U-Net	e2e
A, top	Area	9.8 × 10⁻¹	9.9 × 10⁻¹	1.9 × 10⁶	4.1 × 10⁵	5.0 × 10³	1.8 × 10³	9.3 × 10⁻¹	1.0	-	-
A, top	C.hull	9.8 × 10⁻¹	9.5 × 10⁻¹	5.6 × 10⁶	4.8 × 10⁶	9.5 × 10³	9.2 × 10³	9.6 × 10⁻¹	1.0	-	-
A, top	Height	9.9 × 10⁻¹	9.5 × 10⁻¹	1.3 × 10¹	1.5 × 10²	1.4 × 10¹	4.2 × 10¹	9.9 × 10⁻¹	9.4 × 10⁻¹	-	-
A, top	Width	9.6 × 10⁻¹	8.6 × 10⁻¹	1.9 × 10¹	1.5 × 10²	2.4 × 10¹	3.8 × 10¹	9.8 × 10⁻¹	9.6 × 10⁻¹	-	-
A, top	H_99	9.9 × 10⁻¹	8.0 × 10⁻¹	3.0	8.1 × 10¹	1.2 × 10¹	2.4 × 10¹	1.0	9.7 × 10⁻¹	-	-
A, top	W_99	1.0	8.8 × 10⁻¹	6.0 × 10⁻¹	7.4 × 10¹	4.0	2.3 × 10¹	1.0	9.8 × 10⁻¹	-	-
A, top	Red	1.0	9.3 × 10⁻¹	0.0	8.0 × 10⁻⁴	2.0 × 10⁻²	8.7 × 10⁻²	1.0	9.5 × 10⁻¹	-	-
A, top	Green	9.9 × 10⁻¹	9.8 × 10⁻¹	1.0 × 10⁻⁴	8.0 × 10⁻⁴	3.3 × 10⁻²	8.5 × 10⁻²	1.0	9.5 × 10⁻¹	-	-
A, top	Blue	9.9 × 10⁻¹	9.7 × 10⁻¹	0.0	3.0 × 10⁻⁴	2.0 × 10⁻²	5.7 × 10⁻²	9.9 × 10⁻¹	9.4 × 10⁻¹	-	-
B, top	Area	9.5 × 10⁻¹	9.2 × 10⁻¹	1.8 × 10⁶	1.7 × 10⁶	5.3 × 10³	6.4 × 10³	9.3 × 10⁻¹	7.9 × 10⁻¹	-	-
B, top	C.hull	9.0 × 10⁻¹	9.3 × 10⁻¹	3.7 × 10⁷	1.8 × 10⁷	1.5 × 10⁴	1.3 × 10⁴	8.7 × 10⁻¹	9.2 × 10⁻¹	-	-
B, top	Height	9.1 × 10⁻¹	7.8 × 10⁻¹	1.5 × 10³	5.0 × 10²	9.8 × 10¹	5.6 × 10¹	9.0 × 10⁻¹	9.6 × 10⁻¹	-	-
B, top	Width	8.9 × 10⁻¹	7.8 × 10⁻¹	1.4 × 10³	3.4 × 10²	8.8 × 10¹	5.9 × 10¹	7.4 × 10⁻¹	9.5 × 10⁻¹	*	-
B, top	H_99	8.9 × 10⁻¹	9.0 × 10⁻¹	1.7 × 10²	8.7 × 10¹	4.0 × 10¹	1.9 × 10¹	9.3 × 10⁻¹	9.7 × 10⁻¹	-	-
B, top	W_99	9.4 × 10⁻¹	8.0 × 10⁻¹	1.8 × 10²	2.4 × 10²	3.7 × 10¹	3.5 × 10¹	9.6 × 10⁻¹	9.3 × 10⁻¹	-	-
B, top	Red	8.6 × 10⁻¹	8.2 × 10⁻¹	1.5 × 10⁻³	8.0 × 10⁻⁴	1.6 × 10⁻¹	1.0 × 10⁻¹	7.9 × 10⁻¹	1.0	*	-
B, top	Green	9.0 × 10⁻¹	8.5 × 10⁻¹	1.2 × 10⁻³	1.4 × 10⁻³	1.1 × 10⁻¹	2.0 × 10⁻¹	8.8 × 10⁻¹	1.0	-	-
B, top	Blue	8.3 × 10⁻¹	9.1 × 10⁻¹	2.1 × 10⁻³	7.0 × 10⁻⁴	1.8 × 10⁻¹	1.3 × 10⁻¹	6.8 × 10⁻¹	9.8 × 10⁻¹	*	-
B, side	Area	9.4 × 10⁻¹	9.9 × 10⁻¹	5.6 × 10⁶	1.1 × 10⁵	6.1 × 10³	1.1 × 10³	4.8 × 10⁻¹	9.9 × 10⁻¹	*	-
B, side	C.hull	9.1 × 10⁻¹	9.7 × 10⁻¹	5.6 × 10⁷	6.4 × 10⁶	1.9 × 10⁴	9.6 × 10³	6.2 × 10⁻¹	9.8 × 10⁻¹	*	-
B, side	Height	9.3 × 10⁻¹	9.5 × 10⁻¹	8.7 × 10²	3.5 × 10²	8.9 × 10¹	5.0 × 10¹	8.6 × 10⁻¹	1.1	-	-
B, side	Width	6.8 × 10⁻¹	6.4 × 10⁻¹	8.7 × 10²	1.3 × 10²	6.7 × 10¹	2.6 × 10¹	8.1 × 10⁻¹	1.0	*	-
B, side	H_99	7.4 × 10⁻¹	6.7 × 10⁻¹	7.2 × 10¹	4.4 × 10¹	3.4 × 10¹	2.3 × 10¹	9.7 × 10⁻¹	9.8 × 10⁻¹	-	-
B, side	W_99	9.7 × 10⁻¹	9.1 × 10⁻¹	8.5 × 10¹	1.5 × 10²	2.5 × 10¹	5.5 × 10¹	9.5 × 10⁻¹	1.0	*	*
B, side	Red	6.7 × 10⁻¹	9.1 × 10⁻¹	4.4 × 10⁻³	5.0 × 10⁻⁴	1.3 × 10⁻¹	6.3 × 10⁻²	6.5 × 10⁻¹	1.0	*	-
B, side	Green	6.7 × 10⁻¹	9.1 × 10⁻¹	3.3 × 10⁻³	4.0 × 10⁻⁴	1.3 × 10⁻¹	5.8 × 10⁻²	7.2 × 10⁻¹	9.7 × 10⁻¹	*	-
B, side	Blue	5.1 × 10⁻¹	8.7 × 10⁻¹	4.6 × 10⁻³	4.0 × 10⁻⁴	1.5 × 10⁻¹	6.5 × 10⁻²	5.4 × 10⁻¹	9.9 × 10⁻¹	*	-
M, top	Area	9.8 × 10⁻¹	9.7 × 10⁻¹	1.0 × 10⁶	5.0 × 10⁵	2.8 × 10³	2.6 × 10³	6.8 × 10⁻¹	8.9 × 10⁻¹	-	-
M, top	C.hull	8.3 × 10⁻¹	8.2 × 10⁻¹	5.8 × 10⁷	2.8 × 10⁷	2.4 × 10⁴	1.5 × 10⁴	7.0 × 10⁻¹	9.1 × 10⁻¹	-	-
M, top	Height	9.4 × 10⁻¹	8.9 × 10⁻¹	4.4 × 10²	8.7 × 10²	5.9 × 10¹	9.2 × 10¹	9.3 × 10⁻¹	9.8 × 10⁻¹	-	-
M, top	Width	8.6 × 10⁻¹	9.0 × 10⁻¹	1.1 × 10³	2.8 × 10²	1.2 × 10²	5.8 × 10¹	8.2 × 10⁻¹	9.8 × 10⁻¹	-	-
M, top	H_99	8.2 × 10⁻¹	8.6 × 10⁻¹	6.4 × 10²	2.0 × 10²	8.3 × 10¹	4.0 × 10¹	8.8 × 10⁻¹	1.0	-	-
M, top	W_99	9.7 × 10⁻¹	8.0 × 10⁻¹	3.7 × 10¹	3.4 × 10²	1.6 × 10¹	6.6 × 10¹	9.8 × 10⁻¹	1.0	*	-
M, top	Red	9.5 × 10⁻¹	7.9 × 10⁻¹	1.9 × 10⁻³	2.7 × 10⁻³	2.1 × 10⁻¹	1.8 × 10⁻¹	8.8 × 10⁻¹	9.4 × 10⁻¹	-	-
M, top	Green	9.8 × 10⁻¹	9.1 × 10⁻¹	1.3 × 10⁻³	2.0 × 10⁻³	2.1 × 10⁻¹	2.0 × 10⁻¹	9.4 × 10⁻¹	9.3 × 10⁻¹	-	-
M, top	Blue	9.6 × 10⁻¹	9.1 × 10⁻¹	2.5 × 10⁻³	2.3 × 10⁻³	2.5 × 10⁻¹	2.8 × 10⁻¹	8.7 × 10⁻¹	9.5 × 10⁻¹	-	-
M, side	Area	9.5 × 10⁻¹	9.9 × 10⁻¹	7.8 × 10⁵	9.2 × 10⁴	1.9 × 10³	6.8 × 10²	5.4 × 10⁻¹	1.1	-	-
M, side	C.hull	8.9 × 10⁻¹	9.5 × 10⁻¹	1.1 × 10⁸	1.2 × 10⁷	2.5 × 10⁴	9.5 × 10³	4.4 × 10⁻¹	9.7 × 10⁻¹	*	-
M, side	Height	9.1 × 10⁻¹	9.2 × 10⁻¹	1.3 × 10³	3.8 × 10²	7.7 × 10¹	4.9 × 10¹	8.9 × 10⁻¹	9.8 × 10⁻¹	-	-
M, side	Width	9.4 × 10⁻¹	9.7 × 10⁻¹	1.7 × 10³	2.4 × 10²	1.2 × 10²	3.1 × 10¹	6.7 × 10⁻¹	1.0	-	-
M, side	H_99	6.2 × 10⁻²	5.2 × 10⁻¹	8.4 × 10²	4.3 × 10¹	7.0 × 10¹	1.3 × 10¹	9.4 × 10⁻¹	9.9 × 10⁻¹	-	-
M, side	W_99	7.9 × 10⁻¹	9.6 × 10⁻¹	9.8 × 10²	1.7 × 10²	9.8 × 10¹	3.8 × 10¹	1.1	1.1	-	-
M, side	Red	8.2 × 10⁻¹	8.3 × 10⁻¹	2.0 × 10⁻³	1.4 × 10⁻³	1.4 × 10⁻¹	9.4 × 10⁻²	8.5 × 10⁻¹	1.1	-	-
M, side	Green	8.4 × 10⁻¹	7.2 × 10⁻¹	8.0 × 10⁻⁴	9.0 × 10⁻⁴	1.0 × 10⁻¹	8.1 × 10⁻²	9.2 × 10⁻¹	9.9 × 10⁻¹	-	-
M, side	Blue	3.4 × 10⁻¹	8.6 × 10⁻¹	2.7 × 10⁻³	6.0 × 10⁻⁴	1.3 × 10⁻¹	7.1 × 10⁻²	7.8 × 10⁻¹	1.0	*	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gladilin, E.; Narisetti, N.; Neumann, K.; Altmann, T. End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots. Agronomy 2025, 15, 1117. https://doi.org/10.3390/agronomy15051117

AMA Style

Gladilin E, Narisetti N, Neumann K, Altmann T. End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots. Agronomy. 2025; 15(5):1117. https://doi.org/10.3390/agronomy15051117

Chicago/Turabian Style

Gladilin, Evgeny, Narendra Narisetti, Kerstin Neumann, and Thomas Altmann. 2025. "End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots" Agronomy 15, no. 5: 1117. https://doi.org/10.3390/agronomy15051117

APA Style

Gladilin, E., Narisetti, N., Neumann, K., & Altmann, T. (2025). End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots. Agronomy, 15(5), 1117. https://doi.org/10.3390/agronomy15051117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

End-to-End Deep Learning Approach to Automated Phenotyping of Greenhouse-Grown Plant Shoots

Abstract

1. Introduction

2. Methods

2.1. Image Data Acquisition and Preprocessing

2.2. Image Segmentation Using a Pre-Trained U-Net Model

2.3. Phenotypic Plant Traits

2.4. Plant Trait Derivation Using End-to-End Model

2.5. Performance Measures

2.6. Computational Implementation

3. Results

3.1. End-to-End Model Generation

3.2. Comparison and Validation of Regression and Segmentation Models vs. Ground Truth

3.3. End-to-End Model Explainability

3.4. Software Performance and Implementation

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI