1. Introduction
Artificial intelligence (AI) and precision agriculture (PA) are revolutionizing how crops are monitored, managed, and improved [
1]. By integrating sensor data, digital tools, and machine learning, these technologies enable site-specific management, reduce labor demands, and optimize resource use [
2,
3]. Within this context, AI-powered image analysis and automation are transforming how plant traits are assessed—making data collection faster, more accurate, and scalable [
4]. These innovations support digital phenotyping efforts that enhance research efficiency and field-level decision-making, particularly in agricultural systems [
5]. As agriculture continues to adopt AI-driven tools, open-source and accessible platforms are essential to ensure widespread implementation and long-term impact [
6,
7].
Leaf area is a fundamental indicator of plant growth and function closely tied to photosynthesis, transpiration, and biomass accumulation [
8,
9]. In citrus production, the precise measurement of leaf area is critical for evaluating canopy performance, plant health, and physiological status [
10]. It also enables predictions of crop responses to environmental stress and supports the optimization of cultural practices such as pruning, irrigation, and fertilization [
11,
12]. Moreover, leaf area provides valuable insight for monitoring tree productivity and health across diverse orchard conditions, facilitating more targeted and sustainable management decisions [
13]. Despite their importance, traditional leaf area measurement methods remain labor-intensive or costly, limiting their practical application in large-scale or high-throughput studies [
14]. While semi-automated, open-source tools such as ImageJ have made leaf area measurement more accessible, accurate, repeatable, and cost-effective [
15,
16], several limitations remain. These include the potential underestimation of area, sensitivity to image quality and sample preparation, and the need for manual steps that introduce user-dependent variability and limit scalability for high-throughput applications [
17,
18].
To enhance throughput and reproducibility in plant phenotyping, recent studies have developed automated, open-source tools for estimating leaf area from digital images. Huang et al. [
19] presented a Python-based workflow using OpenCV that integrated image preprocessing, segmentation, and contour detection to extract morphological traits from
Populus simonii leaves with high accuracy (r
2 > 0.97). However, their method depended on a bulky Scan1200 imaging device requiring a fixed power supply. Likewise, Easlon and Bloom [
20] developed Easy Leaf Area, which rapidly distinguished leaves from the background using pixel color ratios and eliminated manual scale measurements, enabling much faster analysis than ImageJ (<5 s vs. ~3 min per leaf). Yet, Easy Leaf Area was optimized for
Arabidopsis rosettes and can be less accurate in species with overlapping leaves, angled foliage, or non-green coloration; its calibration for other crops often requires additional coding adjustments. Jiang et al. [
21] introduced a Python package using GANs to reconstruct damaged leaves and quantify herbivory with high accuracy (RMSE = 1.6%), but the method struggled with heavily damaged leaves and irregular damage types and required clean backgrounds without overlapping foliage. In field applications, Lee et al. [
22] demonstrated a portable RGB-D system capable of non-destructive leaf area estimation outdoors, yet its accuracy decreased with dense canopies, high light intensity, or occluded leaves, requiring artificial shading or additional filters to maintain reliability.
Beyond open-source research pipelines, several commercial platforms exist for leaf area measurement. The CI-203 Handheld Laser Leaf Area Meter (CID Bio-Science, Camas, WA, USA) uses a sweeping laser and optical sensor for direct measurement but is a costly, dedicated instrument. The WinDIAS system (Delta-T Devices, Cambridge, UK) offers configurations from scanners to high-throughput conveyor units, yet all require licensed software and specialized hardware. Similarly, WinFOLIA (Regent Instruments Inc., Quebec City, QC, Canada) provides detailed morphological analysis but depends on licensed software and calibrated scanners. More recently, Petiole Pro (Petiole Ltd., Nottingham, UK) has introduced a smartphone-based solution that improves accessibility but remains a closed, company-dependent application that restricts algorithmic transparency. Collectively, these systems demonstrate commercial capabilities, but their proprietary nature, high costs, and limited customizability constrain broader adoption.
In this paper, we present a fully automated, open-source, Python-based tool for quantifying citrus leaf area from scanned images that requires only a standard flatbed scanner and computer—equipment readily available in most laboratories—offering a transparent, cost-free, and adaptable alternative to commercial systems. The tool integrates three key innovations: (1) multi-mask HSV segmentation to capture heterogeneous leaf coloration and improve robustness under variable imaging conditions; (2) contour-hierarchy filtering to ensure the accurate measurement of leaves with irregular shapes or partial damage from insects and diseases by excluding internal contours; and (3) a batch-calibration system with open-source flexibility that enables rapid, scalable processing across hundreds of images while allowing users to tailor masks, kernels, and thresholds for different crops and experimental contexts. Based on these innovations, we hypothesize that (i) multi-mask segmentation will reduce threshold-related failures compared with ImageJ, (ii) contour-hierarchy filtering will improve accuracy in cases of irregularly shaped or damaged leaves, and (iii) batch calibration combined with customizable parameters will significantly decrease analysis time while ensuring accuracy and reproducibility across diverse imaging conditions. Our specific objective was to rigorously evaluate the performance of the Python-based tool by statistically assessing its accuracy, precision, and measurement agreement relative to ImageJ across genetically diverse citrus cultivars. Additionally, to promote broader adoption and reproducible research practices, we provide open access to the code and documentation in a publicly accessible repository. By making this resource available, we aim to empower researchers and agronomists with a reliable, efficient, and adaptable method for precise leaf-area quantification, thereby advancing agricultural research.
2. Materials and Methods
2.1. Plant Material Collection
Leaf samples were collected on 28 May 2025 from 11 citrus cultivars (Citrus spp.) growing in a citrus orchard in Valdosta, Georgia (lat. 30.8228520° N, long. 83.2366239° W). These cultivars included several mandarins and their hybrids [Citrus reticulata; ‘USDA 88-2’, ‘Cleopatra’, ‘Early Pride’, ‘Fairchild’, ‘Gold Nugget’, ‘SugarBelle, ‘Tango’], two satsumas [Citrus unshiu; ‘Owari’ and ‘Orange Frost’], a sweet orange [Citrus sinensis; ‘Early Valencia-2’ (EV-2), and a trifoliate rootstock [Poncirus trifoliata × Citrus reticulata; ‘US-942’]. For each cultivar, a composite sample of leaves was collected from field-grown trees. Leaves were randomly selected across different canopy levels and standardized to a similar developmental age (4–6 months old). Immediately after collection, the leaves were placed in labeled plastic bags and stored in a cooler with ice packs to preserve leaf turgor and prevent dehydration. Samples were transported to the laboratory and scanned within the same day to ensure tissue integrity and minimize physical distortion during imaging.
2.2. Image Acquisition
The adaxial side of each leaf was scanned using a high-resolution flatbed scanner (Perfection V850 Pro; Epson America, Inc., Los Alamitos, CA, USA) at 300 dpi. Leaves were grouped and scanned by cultivars. Due to differences in leaf size and shape across cultivars, leaves were arranged in varying non-overlapping patterns to maximize the use of the scanner bed so that as many leaves as possible were included in each scan. All leaves were placed flat against the scanner’s white background to ensure high contrast and consistent image quality. A separate image of a ruler marked in centimeters was scanned under the same settings, named scale.jpg, and used as the universal reference for both ImageJ and Python-based measurements. All images were saved in JPEG format and organized in a centralized directory for batch analysis.
2.3. Image Analysis Tools
ImageJ (version 1.54) was used as the manual reference method [
23]. The scanned ruler image was opened first, and a 1 cm line was drawn using the straight-line tool. The known distance was set to 1.0 cm with a pixel aspect ratio of 1.0, units were set to centimeters, and the scale was applied globally. Each image—containing multiple leaves—was then converted to 8-bit grayscale (Image > Type > 8-bit). Leaf segmentation was performed using the Adjust Threshold tool in the Default method with B&W display, with the options “Dark background” enabled and “Don’t reset range” checked. The “Limit to threshold” option was enabled in Set Measurements so that only thresholded regions were measured. Individual leaves were highlighted using the wand tool and recorded with the “Measure” function, which reported leaf areas in cm
2 rounded to the nearest thousandth. All images were processed by the same operator to ensure consistency and minimize user-induced variability.
The Python-based image analysis tool was developed using OpenCV and NumPy in Python (version 3.11.9, 64-bit) and executed in Microsoft Visual Studio as the integrated development environment (IDE). OpenCV was implemented for color space conversion, morphological operations, and contour extraction [
24] while NumPy facilitated efficient array processing and numerical computations [
25]. Before batch processing begins, the script automatically opens the scanned scale image and prompts the user to click two points exactly 1 cm apart to calibrate the pixel-to-centimeter ratio. This calibration is then applied globally to all leaf images. The pipeline utilizes HSV color space segmentation with three distinct color masks (green, brown, and yellow) to comprehensively isolate leaves of varying health states. Morphological opening and closing (with a 5 × 5 kernel, 2 iterations each) are applied to remove noise and refine the segmented masks. Contour detection (using RETR_TREE) with hierarchy filtering (selecting only parent contours, where hierarchy = −1) is then used to extract individual leaves. Contours are smoothed using the Douglas–Peucker algorithm [
26] (ε = 0.0025 × perimeter) and filtered by a minimum area threshold of 0.10 cm
2 to exclude small debris and artifacts. The tool calculates total, average, and individual leaf areas per image and exports two CSV files (a summary and a detailed individual leaf dataset), along with annotated images where each detected leaf is outlined and numbered. The default parameter settings and recommended tunable ranges for HSV segmentation, morphological operations, and contour filtering are provided in
Table 1.
A step-by-step comparison of the Python and ImageJ workflow is shown in
Figure 1. All analyses were performed on a Windows 11 Pro (64-bit; Microsoft Corporation, Redmond, WA, USA) system equipped with a 12th Gen Intel
® Core™ i5-1245U CPU (1.60 GHz; Intel Corporation, Santa Clara, CA, USA), 16 GB RAM, and Intel
® Iris
® Xe Graphics (Intel Corporation, Santa Clara, CA, USA).
2.4. Statistical Analysis
All statistical analyses and visualizations were performed using RStudio software (version 2025.05.0 Build 496) with the ggplot2, ggpubr, dplyr, and car packages [
27]. To evaluate agreement between the Python-based tool and ImageJ, a cultivar-specific and combined statistical workflow was applied.
For each citrus cultivar, the difference between Python and ImageJ leaf area measurements (Python − ImageJ) was calculated. The Shapiro–Wilk test was used to assess the normality of these differences as it is widely regarded as the most powerful test for normality, especially with the small-to-medium sample sizes typical of the individual cultivar datasets. The standard deviation (SD) of these differences was also calculated to check for computationally negligible variance. If the assumption of normality was met (p > 0.05) and the SD of the differences was greater than a machine tolerance threshold (>.Machine$double.eps^0.5, approximately 1.5 × 10−8), a paired t-test was performed; otherwise, a Wilcoxon signed-rank test was used. This additional check prevented numerical precision errors in the t-test when differences were nearly identical. To account for multiple hypothesis testing across cultivars, the Benjamini–Hochberg procedure was applied to the resulting p-values to control the False Discovery Rate (FDR); adjusted p-values were reported. This correction was only applied to the cultivar-level comparisons as the combined analysis involved a single paired test.
To further assess agreement, linear regression was performed for each cultivar, and the slope and intercept of the regression line were reported along with their 95% confidence intervals (CIs). A hypothesis test was conducted to determine whether the slope significantly differed from 1, indicating potential systematic bias.
Agreement was also evaluated using Bland–Altman analysis, which included calculation of the mean bias (average difference between methods) and the 95% limits of agreement (LoA), defined as follows:
where Bias was the mean of the differences and SD was their standard deviation. These metrics were visualized using Bland–Altman plots to identify patterns of deviation across leaf sizes.
Additional descriptive metrics included the coefficient of variation (CV) for each method, calculated as follows:
and the mean percent error, calculated as follows:
Percent error calculations excluded any leaves with zero or missing ImageJ values to avoid division errors. These metrics quantified relative variability and bias, respectively, and were used to compare consistency and accuracy across cultivars.
To complement the cultivar-level analysis, a combined dataset including all 412 observations was analyzed using the same statistical procedures, with the exception that the Kolmogorov–Smirnov (K–S) test was used for normality assessment due to the large sample size where this test is considered more appropriate. Scatter plots were generated to visualize the relationship between Python and ImageJ measurements, including regression lines and confidence intervals. Bland–Altman plots were used to assess agreement and identify outliers or systematic deviations. A histogram of the differences was also generated to visually assess the distribution and normality of the data, with a normal curve overlaid for comparison.
3. Results
3.1. Validation Metrics Comparing Python and ImageJ
Validation metrics across 11 citrus cultivars demonstrated that the Python-based analyzer closely mirrors ImageJ in estimating individual leaf area (
Table 2). Statistically significant differences between the two methods were observed in seven cultivars: ‘USDA 88-2’, ‘Cleopatra’, ‘Early Pride’, ‘Orange Frost’, ‘US-942’, ‘Sugar Belle’, and ‘Tango’ (raw
p < 0.05). However, after applying the Benjamini–Hochberg correction to account for multiple comparisons, only five cultivars retained significance (‘Cleopatra’, ‘Orange Frost’, ‘US-942’, ‘Sugar Belle’, and ‘Tango’), indicating that most observed differences were modest and potentially due to random variation. Even among cultivars with significant differences, the absolute mean offsets were small, ranging from –0.14 cm
2 (‘Orange Frost’) to +0.06 cm
2 (‘US-942’), and all standard deviations of the differences were ≤0.33 cm
2. This suggests minimal bias between methods.
The coefficient of variation (CV) varied across cultivars, with ‘Early Valencia-2’ showing the highest variability (≈57%) and ‘US-942’ the lowest (≈17–18%). Importantly, the difference in CV between Python and ImageJ within each cultivar was consistently small (<0.8 percentage points), indicating comparable precision. The mean percent error remained within ±0.5% for 10 cultivars. The only exception was ‘US-942’, where the Python tool overestimated leaf area by 2.43%. Across all cultivars, the standard deviation of percent error did not exceed 1.05%, further supporting the consistency of the Python-based measurements.
A combined analysis of all 412 observations also showed strong agreement between methods, with a mean percent error of 0.16% and a CV of approximately 56% for both tools. Although the paired test for the combined dataset was statistically significant (p = 0.0012), the small bias (–0.04 cm2) and narrow limits of agreement (–0.35 to 0.28 cm2) suggest that the Python tool performs reliably across diverse leaf morphologies.
Linear regression and Bland–Altman analyses further quantified the relationship and agreement between the Python and ImageJ leaf area measurements across citrus cultivars (
Table 3). The coefficient of determination (R
2) exceeded 0.997 for all cultivars, with most values above 0.999, confirming an exceptionally strong linear relationship between the two methods.
The regression slope was not statistically different from 1.0—the value indicating perfect proportionality—in six of the eleven cultivars (‘USDA 88-2’, ‘Early Valencia-2’, ‘Orange Frost’, ‘US-942’, ‘Sugar Belle’, and ‘Tango’). In the remaining five cultivars, the slope was marginally but significantly less than 1.0 (Cleopatra’, ‘Early Pride’, ‘Fairchild’, and ‘Gold Nugget’) or greater than 1.0 (‘Owari’), suggesting subtle proportional biases in these specific cases. The intercept of the regression was not significantly different from zero for most cultivars, indicating that no substantial fixed bias was introduced by the Python tool.
Bland–Altman analysis revealed that the mean bias (Python-ImageJ) was minimal across all cultivars, ranging from –0.14 cm2 (‘Orange Frost’) to +0.06 cm2 (‘US-942’). The 95% limits of agreement were narrow, demonstrating high precision. For instance, the limits of agreement for the combined dataset (–0.35 to 0.28 cm2) indicate that for most leaves, the Python tool’s measurement will be within approximately ±0.3 cm2 of the ImageJ value. The one exception was ‘US-942’, which showed a consistent positive bias, with all differences falling within a tight, positive range (0.02 to 0.11 cm2).
3.2. Scatter-Plot Agreement Between Python and ImageJ
The scatter-plot analysis (
Figure 2) confirmed a striking, cultivar-wide concordance between the Python analyzer and ImageJ. In all 11 cultivars, individual leaf-area values clustered tightly along the 1:1 line. Pearson correlation coefficients were uniformly 1.00 (
p < 2.2 × 10
−16), indicating virtually excellent linear agreement across the entire range of leaf sizes, from the smallest ‘US-942’ blades (~2–4 cm
2) to the largest ‘EV-2’ and ‘Tango’ leaves (>70 cm
2). No cultivar exhibited visible systematic bias or heteroscedastic spread and only a single leaf from ‘Early Pride’ deviated modestly from parity. This exceptional agreement was further validated by the combined scatter plot of all 412 leaves, which showed a near-perfect linear fit (R
2 = 0.9999) and tight clustering of points around the regression line across the full spectrum of leaf sizes.
3.3. Bland–Altman Agreement Analysis
The Bland–Altman plots (
Figure 3) confirmed strong agreement between the Python analyzer and ImageJ across all 11 citrus cultivars and the combined data set. The mean bias, represented by the solid red line, remained close to zero, suggesting minimal systematic error between the two methods. While most data points fell within the 95% limits of agreement (dashed gray lines), a small number of outliers were observed in each cultivar, ranging from one to four points. These deviations were expected due to the randomized sampling approach, which occasionally included leaves that were broken, curled, or otherwise atypical. Importantly, the outliers were isolated and did not follow a consistent pattern across the range of leaf sizes, suggesting no proportional error or heteroscedasticity. Cultivars such as ‘Fairchild’, ‘Sugar Belle’, ‘Tango’, and ‘US-942’ exhibited particularly tight agreement while others like ‘Cleopatra’ and ‘Orange Frost’ showed slightly wider dispersion. In the combined plot looking at all cultivars, the mean bias remained near zero, and the majority of data points were contained within the 95% limits of agreement, further confirming the strong overall agreement. Examples of processed leaf images, including those contributing to observed outliers, can be found in the
Supplementary Materials (Figures S1–S11).
3.4. Distribution of Differences
The distribution of the differences between the two methods for the combined dataset, including all cultivars, is shown in a histogram (
Figure 4). This visualization confirms that the differences were normally distributed, with the data points forming a bell-shaped curve that closely followed the overlaid normal distribution line. The histogram was centered near a mean bias of −0.04 cm
2, reinforcing the minimal systematic difference between the two tools. The tight clustering of the histogram bars around the mean, with the majority of values falling between the 95% limits of agreement [−0.35, 0.28], visually confirmed the high degree of precision and consistency of the Python tool when compared to ImageJ.
3.5. Processing Time Efficiency
To evaluate processing efficiency, we benchmarked both ImageJ and the Python-based tool using the same dataset of 48 scanned images containing a total of 412 citrus leaves. Each flatbed scan included approximately 7–9 leaves, with the exception of scans for the trifoliate rootstock ‘US-942’ where 29 and 18 leaves were imaged per scan due to their smaller leaf size. Image capture required ~3 min per scan, which included arranging the leaves on the scanner bed. This was consistent with previous reports that high-resolution scans may take 1–2 min per image and an additional few minutes to process each image [
28].
The subsequent analysis revealed substantial differences between methods. Davidson [
29] suggested that ImageJ analysis requires ~3 min per leaf. While this estimate is broadly representative, actual performance is affected by user efficiency and workflow setup. In our study, ImageJ processing was accelerated by using multiple monitors to open up to eight images simultaneously. Even with this optimized workflow, analyzing all 412 leaves (48 scans) required an average of 3 h and 12 min (11,520 s), including threshold adjustment, manual leaf selection, and careful data entry to ensure accuracy. Moreover, variability in leaf morphology across citrus cultivars increased the manual correction burden. This manual process is also inherently prone to operator error, such as misclicking during leaf selection, skipping leaves, or transcribing data incorrectly.
In contrast, the Python-based tool, run on a standard desktop computer, completed the analysis of all 48 scans (412 leaves) in just 7 s following a single initial calibration step to set the segmentation threshold. Output files (CSV tables and annotated images) were generated automatically, eliminating manual data transcription and the associated risk of human error. This demonstrated a >1600-fold reduction in processing time compared with ImageJ, highlighting the tool’s scalability, reproducibility, and significantly higher reliability while minimizing operator workload.
3.6. Improved Performance Under Challenging Imaging Conditions
In addition to quantitative validation, two qualitative examples illustrate the key limitations of ImageJ that our Python-based tool overcomes. In the first scenario (
Figure 5), an ‘Owari’ leaf located at the edge of the scanned image could not be accurately segmented by ImageJ using the wand tool. The tool selected the image border, preventing accurate leaf area measurement. In contrast, the Python tool successfully detected and measured all leaves, including those touching the image border.
The second scenario (
Figure 6) highlights challenges in thresholding due to leaf brightness and color. A ‘Tango’ leaf with low contrast against the background required multiple manual adjustments in ImageJ to isolate the leaf, increasing both processing time and potential user bias. The Python tool, however, handled the same image automatically, detecting all leaf contours—including the problematic one—without manual input.
3.7. The Impact of Leaf Morphology on Area Measurement Accuracy
The analysis revealed that leaf morphology, particularly complex and highly segmented shapes, can influence the agreement between measurement methods (
Figure 7). This was most apparent in the trifoliate cultivar ‘US-942’, which exhibited a consistent, small positive bias (+2.43% mean error, Python vs. ImageJ). This discrepancy is not an error of the Python tool in isolation but arises from the cumulative effects of different image processing pipelines applied to challenging structures. The manual ImageJ protocol, which served as our reference standard, can be susceptible to minor overestimation during the thresholding step. The application of a uniform threshold value to convert a grayscale image to a binary mask may inadvertently include pixels at the leaf boundary with subtle color variations or low-contrast shadows, slightly inflating the area measurement. This effect is amplified in leaves with intricate contours such as the three leaflets seen in trifoliate citrus (‘US-942’).
Conversely, the Python tool’s automated contour-based approach introduces a different potential for bias. The algorithm’s use of contour smoothing via the Douglas–Peucker algorithm, while essential for creating robust, continuous boundaries, can lead to a slight expansion of the leaf perimeter. This smoothing effect is most pronounced on leaves with complex geometries, such as the sharp angles and deep lobes characteristic of trifoliate leaves, resulting in a consistent slight overestimation relative to the ImageJ standard. Thus, the observed bias for ‘US-942’ is best interpreted as a systematic difference between two measurement methodologies, each with unique sensitivities to extreme leaf morphology, rather than a standalone error.
4. Discussion
This study demonstrated that the fully automated Python-based leaf area measurement tool achieves near-perfect agreement with ImageJ across a diverse set of citrus cultivars. Across all comparisons, biases were minimal, and correlation coefficients approached unity, confirming that the Python tool is as reliable as ImageJ in quantifying individual leaf areas. Although a few cultivars showed statistically significant differences after multiple-comparison correction, the absolute mean offsets were ≤0.14 cm2, well below biologically meaningful thresholds. This distinction highlights that statistical significance did not translate into practical significance, and the tool can be confidently applied across diverse leaf types. Linear regression and Bland–Altman analyses further supported these findings. Slopes were not statistically different from 1.0 in more than half the cultivars, and where deviations occurred, they were minor and cultivar-specific. The narrow 95% limits of agreement (approximately ±0.3 cm2) demonstrated that the Python tool consistently tracks ImageJ within a very small error margin. Importantly, scatter plots confirmed that this agreement held true across the full range of leaf sizes—from the smallest ‘US-942’ to the largest ‘EV-2’ and ‘Tango’—indicating that performance is robust regardless of scale.
Notably, the tool outperformed ImageJ under conditions where manual segmentation is error-prone—such as leaves touching the image border or exhibiting low color contrast—highlighting its robustness and consistency across variable imaging conditions. This strength was especially clear in the “Owari” border-leaf and “Tango” low-contrast examples, where ImageJ failed to isolate the target leaves without time-intensive manual adjustment but the Python tool completed detection seamlessly. These case studies illustrate its advantage under realistic imaging variability.
Previous studies have noted that ImageJ, while widely adopted, is sensitive to user-dependent thresholding and segmentation, particularly under inconsistent lighting or overlapping plant structures, which may compromise reproducibility [
30]. Our tool addresses these limitations by implementing HSV-based color masking and automated contour detection, eliminating the need for manual parameter tuning. Similar to the findings of Zhang et al. [
31], who demonstrated the efficiency of HSV color models for accurate, non-destructive leaf feature extraction, our design improves accuracy while minimizing user intervention. Furthermore, in alignment with the broader digital phenotyping objectives outlined by Kim et al. [
32], the automation of our workflow enhances throughput and objectivity, making it suitable for high-throughput applications.
Processing efficiency is another critical advantage revealed in this study. Whereas optimized ImageJ workflows still required over three hours to process 412 leaves, the Python tool completed the same dataset in 7 s—a >1600-fold speed improvement. This dramatic gain in efficiency not only reduces operator workload and error but also makes large-scale or multi-season phenotyping studies feasible, where manual approaches would be prohibitively time-consuming. While the tool is highly reproducible under standard conditions, a few potential sources of user error remain. As with ImageJ, incorrect calibration, such as selecting the wrong 1 cm reference during pixel-to-length conversion, can yield inaccurate area measurements. Additionally, users unfamiliar with running Python scripts in an IDE may encounter challenges, although comprehensive documentation has been provided to mitigate this issue. Once calibrated and understood, the tool performs reliably and consistently across large batches of images. Cultivar-specific differences, such as the small positive bias in trifoliate ‘US-942’, underscore that both ImageJ and the Python approach introduce subtle, method-specific sensitivities. In this case, ImageJ thresholding tended to slightly overestimate boundary pixels while Python’s contour smoothing slightly expanded leaf perimeters. These findings should be interpreted as methodological trade-offs rather than tool-specific errors, and they highlight opportunities for future refinement, such as adaptive contour smoothing tailored to leaf morphology.
Although leaf overlapping is a potential challenge in image-based phenotyping, no overlapping leaves were present in the image sets analyzed in this study. To proactively enhance the tool’s robustness for future applications where human error might result in overlapping leaves, a Watershed segmentation algorithm could be incorporated into subsequent versions. This would help automatically separate touching or overlapping objects, improving accuracy in complex image sets [
33]. Although this version is optimized for scanned images with high-contrast backgrounds, future development could also extend functionality to field-acquired images, integrate machine learning for species recognition or damage classification, and offer a graphical user interface to further lower the barrier to use.
Summary and Applicability
This tool is ideally suited for high-throughput leaf area quantification. Its design prioritizes accessibility, requiring only a flatbed scanner, a ruler for calibration, and a standard computer, thereby eliminating the need for expensive or proprietary hardware and software. For researchers intending to apply this tool, we emphasize a clear operational framework to ensure success. The tool was validated under specific conditions—using leaves scanned on a pure white background at a resolution of 300 dpi—and this protocol is strongly recommended for obtaining reliable results. Users should be aware that performance can be compromised by several factors, including low contrast between the leaf and background, the presence of shadows or glare, and overlapping leaves. To mitigate potential errors, successful application depends on two critical best practices: first, meticulous calibration using the included 1 cm reference; and second, a routine visual quality check of the output binary masks to quickly identify and address any segmentation errors. As an open-source platform, this tool provides a robust foundation for automated, reproducible phenotyping, and we encourage community adoption and development to expand its applicability to a wider range of imaging conditions and plant species.
The successful implementation of this automated tool aligns with the core objectives of AI-driven precision agriculture by enabling the rapid, reproducible, and scalable quantification of plant traits. By reducing manual input and standardizing image analysis processes, the tool significantly improves data consistency and throughput—both of which are essential for integrating phenotypic data into artificial intelligence models and decision-support frameworks [
34]. Its open-source design not only fosters transparency and reproducibility but also encourages collaboration and future development. This foundation supports potential integration with machine learning pipelines for trait prediction, stress detection, or crop performance modeling. As digital agriculture continues to advance, tools such as this offer scalable, field-adaptable solutions for real-time, high-resolution monitoring, thereby enhancing site-specific crop management and informed decision-making.
5. Conclusions
This study validated a fully automated, open-source Python-based tool for citrus leaf area quantification that requires only a flatbed scanner and a standard computer, making it accessible to most laboratories without the need for specialized hardware. By integrating three innovations—multi-mask HSV segmentation, contour-hierarchy filtering, and batch calibration—the tool achieved near-perfect agreement with ImageJ across 11 diverse citrus cultivars. These design elements collectively reduced threshold-related failures, improved accuracy for irregular or damaged leaves, and provided scalable batch processing that drastically decreased analysis time. Our findings confirmed the hypotheses outlined at the outset: (i) multi-mask segmentation minimized failures compared with ImageJ, particularly in low-contrast or border-leaf cases; (ii) contour-hierarchy filtering improved robustness when handling irregular leaf morphologies; and (iii) automated calibration enabled reproducible, high-throughput analysis, with a >1600-fold reduction in processing time relative to ImageJ. Together, these results demonstrate that the Python tool not only matches ImageJ in accuracy and precision but also surpasses it in efficiency, reproducibility, and robustness under challenging imaging conditions.
Beyond citrus, the tool represents a practical and adaptable solution for high-throughput leaf area measurement in other crops and experimental contexts using flatbed scanners. It is important to note that the current method has been optimized for these controlled conditions; extrapolation to field-acquired images would require additional validation and algorithmic adjustments to handle variable lighting, complex backgrounds, and challenges like leaf overlap. Its open-source distribution ensures transparency, reproducibility, and flexibility for user customization, supporting broader adoption in digital phenotyping. Future development may extend functionality to field-acquired images—which would require advanced solutions for segmentation and occlusion—and integrate machine learning for trait recognition and incorporate user-friendly interfaces. In its current form, however, this tool already provides the plant science community with a reliable, low-cost, and scalable resource for rapid, reproducible leaf-area quantification, advancing the goals of precision agriculture and digital crop monitoring.