1. Introduction
In fruit tree cultivation, the total leaf area is closely related to photosynthetic activity, physiological functions, tree health, fruit development, yield formation, and pruning strategies [
1,
2,
3]. In grape production, an appropriate balance between leaf area and fruit load is crucial for sugar accumulation, skin coloration, and fruit ripening and is therefore one of the key indicators for orchard management [
4,
5]. This is particularly important in pergola-trained grapevines, where the canopy exhibits a complex three-dimensional architecture with abundant and highly overlapping leaves, making the monitoring of total leaf area even more essential [
6,
7].
Commercial optical instruments such as the LAI–2000 (Plant Canopy Analyzer, LI-COR, Inc., Lincoln, NE, USA) primarily estimate leaf area through light transmittance, typically based on the Beer–Lambert law [
8] and the hemispherical integration method improved by Miller (1967) for estimating effective LAI [
9,
10,
11,
12]. However, these devices often rely on wavelength filtering to infer canopy cover, and previous studies have shown that the performance deteriorates when leaf layers increase or when occlusions occur, leading to overestimation or underestimation of LAI (Leaf Area Index) [
13,
14]. In pergola-trained vineyards, the discontinuous canopy structure, row gaps, and geometric shading from trellis components can introduce systematic bias into optical LAI measurements (e.g., LAI–2200C Plant Canopy Analyzer, LI-COR, Inc., Lincoln, NE, USA). When sensors face along the row direction or when the field of view contains substantial inter-row gaps, LAI tends to be markedly underestimated [
15,
16]. Furthermore, the inherent orientation of shoots and the fixed position of trellis wires can alter leaf distribution and incoming light angles, modifying internal light environments and further reducing the reliability of optical instruments in pergola systems [
17]. The three-dimensional arrangement of shoots, leaves, and trellis wires also causes large variability in incident light angles, shading patterns, and background contrast, resulting in unstable performance of such optical devices [
18]. Traditional destructive methods—such as manual defoliation or mechanical leaf scanning—are labor-intensive and cannot be repeatedly applied to the same tree for time-series observations.
With recent advances in computer vision and three-dimensional reconstruction, RGB imagery combined with Structure-from-Motion (SfM) algorithms has emerged as a promising non-destructive observation technique for plant phenotyping [
19,
20,
21,
22,
23]. SfM reconstructs high-density point clouds from multi-view images, capturing canopy geometry and spatial distribution patterns, which can be used for canopy volume estimation, fruit localization, and leaf area analysis. Because this method requires only a consumer-grade camera and no specialized optical sensors, it is particularly suitable for resource-limited environments and mobile platforms [
24]. For fruit crops that cannot be observed from above—such as grapes, kiwifruit, or other trellis-based systems—due to bird nets and overhead structures, SfM offers a flexible alternative, with strong potential for routine monitoring.
However, most existing SfM-based LAI estimation approaches rely on canopy height models (CHM) or canopy envelope metrics (e.g., alpha shapes or triangulated irregular networks) to approximate plant volume [
21,
25,
26,
27,
28]. While these methods perform well in vertically trained canopies, such as Vertical Shoot Positioning (VSP) grapevines or tomatoes, they introduce substantial errors when applied to pergola-trained trees because of large voids between the canopy and the ground, which amplify DSM–DTM (Digital Surface Model–Digital Terrain Model) inconsistencies.
Despite the ability of SfM to generate dense point clouds, many current SfM-based studies still depend on two-dimensional or envelope-based metrics that capture only the external silhouette of the canopy, rather than its internal structural density [
24]. In pergola-trained grapevines—characterized by horizontally spread canopies with overlapping leaves and abundant internal empty spaces—canopy envelope volume often shows weak correlation with actual leaf area and may be biased by illumination direction and camera viewpoints [
29]. In contrast, voxel-based models, which quantify the occupancy of point clouds within a three-dimensional grid, can directly represent the spatial density of foliage elements [
30,
31,
32,
33]. The relationship between voxel count and canopy structural attributes is therefore more closely aligned with physiological leaf quantity and can capture features that optical LAI instruments and 2D geometric proxies cannot [
34]. For example, voxel metrics can detect layering, leaf arrangement, and internal porosity—features essential for describing three-dimensional canopy structure [
35,
36,
37]. Thus, voxel modeling holds strong potential for leaf area estimation in pergola systems [
38,
39]; however, robust workflows for classification, filtering, and point cloud selection remain underdeveloped and require further validation [
40,
41].
In plant classification, traditional color space thresholding methods (e.g., HSV, LAB) can distinguish leaf regions under certain conditions, but their performance becomes unstable in the presence of strong shadows, specular reflection, or complex backgrounds. In particular, subtle differences between shoots and leaves often cannot be captured purely through color-based classification, requiring additional structural descriptors for correction [
30]. To address this issue, Meyer and Neto (2008) proposed the Excess Green minus Excess Red (ExGR) index, which enhances the contrast between vegetation and background in RGB images and exhibits a certain degree of illumination robustness [
42]. Consequently, ExGR has been widely adopted for vegetation image classification in recent years. For example, Bodor-Pesti et al. (2023) evaluated the relationship between 31 RGB indices and chlorophyll content in grapevine leaves, showing that ExGR was among the most strongly correlated indices (R
2 > 0.9), demonstrating its stability even under varying illumination conditions [
43]. Poblete-Echeverría et al. (2017) similarly reported that simple indices such as 2G_RBi and G% achieved classification accuracies exceeding 98% using Otsu’s method in high-resolution UAV imagery, outperforming both K-means and deep learning approaches [
44].
In three-dimensional point clouds, integrating ExGR with spatial density thresholds to remove non-leaf points—followed by voxelization—could establish a structural correspondence between total leaf area and volumetric occupancy. However, most ExGR-related studies are based on top-view imagery or environments with uniform illumination, where classification accuracy depends heavily on green reflectance. When imagery is captured from different angles (e.g., side-view, oblique-view, or backlit conditions), the illumination distribution and leaf reflectance vary substantially [
29], raising concerns about ExGR stability in multi-view SfM reconstruction. Additionally, SfM point clouds are affected by reconstruction errors, background occlusions, and structural noise from trellis components, posts, and fruits [
24]. Therefore, developing a robust leaf-point filtering pipeline that combines ExGR with spatial density criteria and is suitable for multi-view SfM reconstruction represents a critical research gap.
Taken together, leaf area monitoring in pergola-trained grapevines faces challenges such as unstable optical LAI estimates, the high cost of destructive measurements, and the limitations of traditional SfM envelope models in capturing three-dimensional foliage density. This study integrates multi-view SfM reconstruction, ExGR-based color enhancement, and voxel occupancy analysis to develop a non-destructive leaf area estimation framework tailored for trellis-based fruit trees. Through destructive defoliation validation and multi-temporal observations, we quantify and compare the responses of voxel metrics, digital hemispherical photography (DHP), and the LAI–2000, and clarify the physiological significance of voxel-based structural indicators. The outcomes of this research contribute to establishing a three-dimensional canopy information foundation for pergola-trained fruit trees and serve as an essential dataset for smart orchard leaf mass management and digital twin modeling.
2. Materials and Methods
2.1. Experimental Site and Marker Design
This study was conducted in the vineyard of the Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization (NARO), Japan. The experimental fields (coordinates: 36.050180, 140.094943 and 36.049741, 140.095376) adopt a Japanese-style H-shaped pergola system with short-cane pruning. The canopy structure is characterized by two horizontally extended cordons of approximately 15 m on each side, forming a wide and horizontally spread leaf curtain. This architecture differs substantially from the commonly used VSP systems in Europe and the United States; leaves are highly overlapping, light penetration within the canopy is low, and the intertwined shoots and leaves create a complex three-dimensional structure. Such conditions pose challenges for image-based segmentation, optical LAI estimation, and Structure-from-Motion (SfM) point cloud reconstruction. In addition, bird nets installed during the fruit maturation period (May to mid-September) further modify the illumination conditions.
The surveyed cultivars included Pione, Aki Queen, and Shine Muscat, all grown under uniform soil conditions and management practices. During winter pruning, only the two main cordons extending along the north–south direction were retained, while all one-year-old shoots were removed. The pergola height was maintained at 170 ± 10 cm. The target trees varied by year: in 2022, three Aki Queen (T1–T3) and three Pione (T4–T6) vines were examined; in 2023, an additional Shine Muscat (T7) vine was added (
Figure 1A).
In 2024, three Pione vines (T4–T6) exhibiting larger leaf masses were selected as the primary observation subjects. For these vines, measurements were conducted separately on the east-facing canopy sections, hereafter denoted as T4E–T6E, to enable comparative analysis using LAI–2000, digital hemispherical photography (DHP), SfM, non-destructive leaf measurement and destructive leaf measurement. In terms of cultivation, the 2022 vines followed a relatively natural, less-regulated management style, whereas from 2023 onward, all management tasks were carried out by an experienced grower. This resulted in natural vigor differences across years and enabled comparisons of canopy responses under different management regimes. Furthermore, newly elongated shoots were tied to the horizontal trellis and moderately tip-pruned between 22 and 25 May each year to control final shoot length.
The image acquisition strategy also differed by year. In 2022 and 2023, handheld photography was used, including chest-height images (150 cm) facing the main cordons and low-angle images taken at approximately 30 cm above ground (
Figure 1B). Both east- and west-facing canopy images were captured to evaluate how multi-angle imaging influences SfM model quality and foliage representation. This cross-angle dataset also provided an important reference for multi-year comparisons.
To ensure spatial accuracy in SfM reconstruction, forty waterproof paper markers were uniformly placed along the pergola height (
Figure 1A). These markers were used by Agisoft Metashape Professional 2.1.1 (Agisoft LLC, St. Petersburg, Russia) for automated detection, image alignment, scale calibration, and error correction. For each session, the height, spacing, and distance from major branches of all markers were recorded. This design ensured that images captured on different dates and from varying viewpoints could be consistently aligned to a common three-dimensional spatial reference, thereby improving the reliability of the reconstructed models.
2.2. Non-Destructive Observations
A GoPro Hero10 (GoPro, Inc., San Mateo, CA, USA) camera (4K, 30 FPS) was used to record canopy imagery throughout the study (
Figure 2. P1). To ensure consistent brightness across different observation dates, exposure was calibrated before each session using a fixed grayscale reference panel, and the exposure settings were locked to minimize the influence of natural light variation on image brightness and color. The imaging frequency varied slightly by year: in 2022 and 2023, images were collected two to three times per week from early May until 20 June prior to fruit bagging; in 2024, image acquisition was conducted one to two times per month from May to September to capture a wider range of phenological changes. With the exception of the destructive sampling on 24 September 2024, all imaging was carried out under stable illumination between 11:00 and 14:00, after which leaf measurements were performed.
In 2024, to more clearly capture temporal changes in leaf morphology, a fixed sampling area was designated for each vine. Within a 1 m2 region demarcated by waterproof markers, leaf traits were manually measured by a single trained operator using a steel ruler. Leaf length (excluding petiole) and maximum leaf width of all leaves on each shoot were recorded to minimize observer variation and improve cross-date reproducibility.
The 2024 imagery was acquired using a camera mounted on an autonomous ground vehicle (AGV). The AGV used in this study was the SUPPOT prototype manufactured by Somic Transformation Co., Shizuoka, Japan, equipped with remote operation capability and three preset constant-speed modes. The vehicle featured four-wheel drive and a basic suspension system, enabling stable movement along the relatively flat inter-row paths of the vineyard. The camera was fixed at a height of 60 cm above the ground, with the lens oriented vertically upward. This configuration ensured consistent imaging geometry across dates and reduced angle and distance inconsistencies commonly associated with handheld photography.
All imagery was captured under natural light without supplemental illumination, preserving the inherent effects of pergola canopy lighting conditions on leaf curtain structure. Video footage was converted to still images by extracting one frame every three frames (5312 × 2988 resolution). During post-processing, frames containing non-target vines or poor image quality were removed to avoid cross-tree mixing that could impair SfM reconstruction accuracy. Only valid image sets corresponding to a single vine were retained to ensure the accuracy and consistency of subsequent three-dimensional modeling and voxel-based analysis.
2.3. Three-Dimensional Reconstruction and Voxelization
2.3.1. SfM Reconstruction
Structure-from-Motion (SfM) reconstruction was performed using Agisoft Metashape Professional v2.1.1 (Agisoft LLC, St. Petersburg, Russia) to generate dense three-dimensional point clouds for each observation zone (
Figure 2. P2). After importing the images, camera parameters were initialized, and the software automatically detected the paper-based waterproof markers embedded in the scenes. These markers served as reference points for image correspondence, camera calibration, and subsequent scale normalization of the reconstructed model. The reconstruction workflow included feature detection, image matching (photo alignment), dense point cloud generation, and model scaling.
To minimize interference from ground surfaces, trellis structures, and neighboring vines, only the point cloud regions corresponding to the target vine were retained. Background elements such as soil, bird nets, or non-target vegetation were removed to ensure that the reconstructed 3D data accurately represented the canopy structure of each individual vine.
Following dense reconstruction, an Excess Green minus Excess Red index (ExGR) [
42]–based classification was applied to each point cloud (threshold = ExGR − 80), enabling differentiation among leaves, shoots, and non-plant structures. This classification effectively filtered out background soil, bird nets, and most metallic trellis components, retaining primarily the foliage canopy, major branches, and a small portion of black trellis wires present in the RGB imagery.
The classified point clouds were exported in .pcd format for point cloud alignment and subsequent voxelization. This workflow ensured high vegetation specificity and improved reconstruction accuracy and stability under the complex canopy conditions of pergola-trained grapevines.
2.3.2. Point Cloud Alignment, Classification, and Denoising
After reconstruction, point clouds from different observation dates and viewing angles were aligned and cleaned using CloudCompare (CloudCompare Project, Grenoble, France). A multi-stage alignment workflow was employed (
Figure 2. P3). First, an Iterative Closest Point (ICP) algorithm was applied without altering the scale of the original models. To improve computational efficiency and prevent the amplification of local errors, approximately 10
5 representative points were sampled from each point cloud during the initial alignment stage. A convergence threshold of a root-mean-square (RMS) error below 10
−7 was adopted to ensure that repeated captures of the same vine achieved stable and realistic overlap.
Following the initial registration, a second, fine-scale alignment was performed using the main branches of the vine as stable reference structures. These branches were chosen because their geometry remained largely unchanged across dates and was not significantly affected by wind movement or defoliation treatments. A second ICP refinement further improved the spatial congruence of multi-date and multi-view point clouds, thereby enhancing the accuracy of subsequent voxelization analyses.
Once alignment was completed, the point clouds were classified and denoised to remove non-canopy elements. An Excess Green minus Excess Red (ExGR) index threshold of 0 was applied to distinguish foliage from woody organs and trellis structures. The thresholding effectively removed brown or whitish materials such as shoots, trellis metal components, and ground reflections, preserving the green vegetation points and enabling clear isolation of leaf points from the background.
2.3.3. Voxelization
Although multiple waterproof marker–based ground control points (GCPs) were deployed in the vineyard to improve the absolute spatial accuracy of SfM reconstruction, the resulting GCP errors exceeded acceptable limits. Therefore, all analyses were performed within the relative coordinate framework of the aligned point clouds. Because all models of a given vine were processed using the same coordinate system and a fixed voxel resolution of 1 cm
3, high comparability between time points and viewing angles was maintained (
Figure 2. P4).
Voxelization was performed using a custom Python 3.9 (Python Software Foundation, Wilmington, DE, USA) script that converted the aligned point clouds into voxel occupancy grids. Within the relative coordinate system, the three-dimensional space was discretized into a regular grid of 1 cm3 voxels. Any point falling within a voxel was treated as voxel occupancy, transforming irregular point clouds into a unified quantitative representation of canopy structure. As all models were voxelized using the same axis orientation and voxel resolution, voxel occupancy served as a robust comparative indicator of canopy density and foliage structure across dates and treatments.
The resulting voxel counts were then compared against destructively measured leaf area, fresh and dry biomass, optical LAI indices obtained from LAI–2000 and DHP, and leaf area estimates derived from the temporal leaf length model. These comparisons enabled a systematic evaluation of voxel occupancy as an indicator of leaf quantity in pergola-trained grapevine canopies.
2.4. Digital Hemispherical Photography (DHP)
Canopy optical LAI was measured using Digital Hemispherical Photography (DHP) (
Figure 2. P5-b). Hemispherical images were acquired with a Google Pixel 7 smartphone (Google Inc., Mountain View, CA, USA) equipped with a 180° fisheye conversion lens (KRP-180fy, KenkoTokina, Tokyo, Japan) (
Figure 3A). DHP imaging was carried out immediately after SfM image acquisition to ensure temporal synchronization between optical LAI measurements and three-dimensional point cloud data.
After image import, the effective circular region of the fisheye lens (hemispherical domain) was automatically detected using the Hough Circle Transform. In cases where automatic detection failed, manual adjustment was performed to determine the optimal region of interest (
Figure 3C). ExGR values were then computed for all pixels within the circular domain (
Figure 3B). Otsu’s automated thresholding method was applied to segment vegetation versus non-vegetation (sky) pixels based on ExGR (
Figure 3C,D), rather than the traditional brightness (
Figure 3F).
Following segmentation, each pixel was mapped to its corresponding zenith angle (
θ) based on the geometric model of the fisheye lens (
Figure 3E). Using the detected circle center and radius, pixels outside the hemispherical region were excluded. The zenith angle range (
θ = 0–90°) was discretized into rings with a fixed angular increment (Δ
θ = 10° in this study). For each zenith ring, the gap fraction (Pgap) was calculated as the proportion of non-vegetation pixels relative to the total number of valid pixels within the ring.
Effective LAI was then computed using the discrete form of the Miller (1967) equation [
9]:
where
is the gap fraction of the
ith zenith ring,
is the ring’s center angle, and
is the ring width. LAI was calculated over multiple zenith angle ranges (0–30°, 0–60°, and 0–90°) to evaluate the sensitivity of the results to zenith angle selection. All calculations employed the same discrete Miller formulation.
The DHP-derived LAI indices (0–30°, 0–60°, and 0–90°) were compared with voxel occupancy, LAI–2000 measurements, and destructively measured leaf area.
2.5. Destructive Observations and LAI Measurements
To validate the accuracy of the three-dimensional point cloud-based leaf area estimation, a destructive defoliation experiment was conducted on 24 September 2024. For each sample vine, the canopy was divided into three 1 m
2 sampling zones using waterproof markers. Two zones were subjected to defoliation treatments, while the remaining zone was left intact as a control, enabling the creation of multi-level, quantifiable leaf mass differences within the same vine (
Figure 2. P5-b). Five defoliation intensities were applied—0%, 25%, 50%, 75%, and 100%—and leaves were removed from each shoot in order from the shoot tip toward the base (
Figure 1C).
Prior to and following each defoliation treatment, a full set of images was captured for subsequent SfM reconstruction. All removed leaves were photographed on the same day against a white background, and leaf length, leaf width, and individual leaf area were measured using ImageJ v1.54g (National Institutes of Health, Bethesda, MD, USA). Leaf count and fresh weight were recorded, and the samples were then oven-dried at 80 °C for one week to determine dry weight. These measurements provided ground-truth datasets of leaf area, leaf count, fresh weight, and dry weight, which were used to evaluate the voxel model’s leaf quantity estimation performance.
Optical LAI measurements were also conducted before and after each defoliation event. The LAI–2000 Plant Canopy Analyzer was operated according to the manufacturer’s protocol, with four measurements taken above and below the canopy, followed by averaging. Hemispherical canopy images were acquired using digital hemispherical photography (DHP), with the DHP camera positioned at the same locations as the lower LAI–2000 measurements—directly below the designated markers at a height of 90 cm—to ensure spatial consistency between optical methods. Details of DHP processing and LAI calculation are provided in a separate subsection (
Figure 1B).
Together, the destructive leaf datasets and optical LAI indices form the validation baseline for the voxel-based model, enabling a systematic comparison of different leaf area estimation approaches under the complex canopy architecture of pergola-trained grapevines.
2.6. Leaf Area Estimation Model Development
Leaf images obtained from the destructive sampling were analyzed using ImageJ to quantify leaf length, maximum leaf width, and individual leaf area. These measurements were used to develop regression-based leaf area estimation models (
Figure 2. P5-b). This study evaluated several single-variable model structures—including linear, polynomial, and exponential transformations—using either leaf length or leaf width as the predictor variable. Model performance was assessed based on the coefficient of determination (R
2), residual sum of squares (RSS), and the corrected Akaike Information Criterion (AICc). The highest-performing model was selected as the fundamental equation for non-destructive leaf area estimation in subsequent time-series analyses.
2.7. Statistical Analysis
To evaluate the relationships between three-dimensional voxel-based indicators (voxel occupancy) and multiple leaf-related variables, correlation analysis and regression modeling were used as the primary statistical approaches (
Figure 2. P6). Model performance indices were further applied to compare the effectiveness of different leaf area estimation methods.
First, destructively measured leaf area (A), fresh weight (FW), and dry weight (DW) were used as ground-truth references. Pearson’s correlation coefficients (
r) were calculated between voxel counts and each leaf variable using the following formula:
The corresponding p-values from the t-test were reported to determine whether linear correlations were statistically significant.
For the destructive defoliation experiment (0–100%), changes in voxel counts were fitted using linear and quadratic regression models to evaluate the sensitivity of voxel occupancy to leaf reduction. The linear model was defined as
and the quadratic model as
Model performance was assessed using the coefficient of determination (R
2), residual sum of squares (RSS), and Akaike Information Criterion (AIC) along with its small-sample correction (AICc):
where
n is the sample size and
k is the number of model parameters. AICc is particularly important for datasets with limited sample size to prevent overfitting.
Similarly, leaf area estimation models derived from leaf morphology including linear, quadratic polynomial, and exponential forms were evaluated using RSS, AIC, and AICc to identify the best-performing model. The selected model was then applied to time-series data to estimate seasonal canopy leaf area development across years.
All statistical analyses were performed at a significance level of
. Results were visualized using scatter plots, regression curves, confidence intervals, and residual diagnostics, which are presented in
Section 3.
For each regression model, the standard error (SE) of the slope was computed based on the residual variance and the variance of the independent variable. Ninety-five percent confidence intervals (95% CI) for regression slopes were derived using t-based inference with degrees of freedom equal to n − 2.
To evaluate predictive accuracy against destructively measured reference data, the root mean square error (RMSE) was calculated as the square root of the mean squared difference between predicted and measured values. RMSE was reported only for comparisons involving destructive measurements, as it represents an absolute error metric with the same unit as the response variable.
4. Discussion
4.1. Essence of the 3D Voxel Indicator and Its Structural Relevance in the Canopy
Results from both the destructive defoliation experiment and multi-temporal analysis demonstrate that three-dimensional voxel occupancy provides a stable representation of leaf area dynamics in pergola-trained grape canopies. Among individual vines, the coefficient of determination (R2) between voxel counts and measured leaf area ranged from 0.77 to 0.95, showing consistent responses during both artificial defoliation and natural seasonal development. This stability arises from the fact that voxel occupancy does not represent surface area but instead captures the degree to which foliage occupies three-dimensional space. When leaf overlap increases, leaf inclination angles change, or internal canopy thickness becomes denser, the spatial volume occupied by leaves increases accordingly. Thus, voxel metrics serve as a structural indicator of canopy density rather than a two-dimensional area measurement.
However, substantial variation emerged when voxel–leaf area relationships were compared across different vines. When the three vines were pooled, the overall correlation between voxel counts and leaf area dropped to moderate levels (R2 = 0.52), indicating that reconstruction conditions were not fully consistent across plants. Point cloud density in SfM is influenced by camera viewpoint, illumination direction, leaf reflectance, shoot occlusion, and background brightness. These factors differed systematically among vines and altered the proportionality between voxel occupancy and leaf area, preventing a fixed voxel–leaf area ratio from being applied across plants. Our findings suggest that voxels are highly effective for within-plant temporal analyses, but cross-plant comparisons of absolute leaf area require additional calibration—such as using shoot volume or partial destructive sampling—to correct for differences in illumination and reconstruction conditions.
Moreover, the point cloud integration workflow used the main branches of each vine as the reference for ICP alignment and for constructing the local coordinate system. Because the orientation, curvature, and spatial relationship of these branches relative to the pergola varied among vines, the resulting local coordinate systems differed in their spatial orientation. These differences represent more than simple shifts or rotations; they influence the projection of the canopy into voxel space. When voxelization is performed under vine-specific local coordinate systems, voxel counts are affected by geometric differences in branch architecture, causing identical leaf masses to yield different voxel densities across vines. This effect is particularly pronounced in multi-layered pergola canopies, where the orientation of cordons and leaf curtain expansion is highly plant-specific. The reduced correlation observed when combining data across vines may therefore reflect variations introduced by these local coordinate discrepancies. Improving cross-plant comparability will require reconstruction under a unified global coordinate reference or the incorporation of structural normalization factors such as branch skeletons or principal axis vectors to reduce the influence of branch geometry on voxel outcomes.
Notably, voxel occupancy showed almost no correlation with leaf count, but exhibited stable, though non-linear, relationships with fresh and dry biomass. This indicates that voxel occupancy represents neither leaf number nor biomass directly, but rather the spatial occupation of foliage within the canopy. In pergola grapevines—where foliage is strongly layered and leaves frequently obscure each other—SfM primarily detects surface texture rather than individual leaf units. Consequently, voxels more closely represent leaf area density (LAD) or geometric features associated with light interception capacity. Their physiological meaning therefore differs from traditional LAI, which is area-based. These findings highlight that voxels are not merely a proxy for leaf quantity but constitute a structural descriptor that directly reflects canopy architecture and potential photosynthetic function.
4.2. Role, Benefits, and Limitations of ExGR in 3D Point Cloud Classification and Canopy Reconstruction
In the 3D reconstruction of pergola-trained grapevines, the Excess Green minus Excess Red (ExGR) index serves as a critical step for ensuring the purity of leaf point clouds. Without ExGR classification, non-leaf structures such as trellis wires, shoots, shaded background regions, and bird nets are frequently incorporated into the SfM-generated point clouds. These structural artifacts are subsequently misinterpreted as canopy occupancy during voxelization, disrupting the correspondence between voxel changes and actual leaf area variations. This issue is particularly problematic at intermediate defoliation levels (25% and 50%), where geometric changes caused by leaf removal can be masked by structural noise, resulting in a weakened linear relationship between voxel counts and measured leaf area. Thus, the primary contribution of ExGR in this study lies in improving leaf-point purity so that voxel dynamics reliably reflect real canopy structural changes.
The effectiveness of ExGR originates from its ability to enhance green vegetation reflectance, thereby maintaining distinguishability between foliage and background across varying illumination conditions. Pergola canopies exhibit pronounced multi-layered foliage and strong heterogeneity in illumination due to shading by the trellis structure. As a result, leaves display substantial brightness variation under different viewing angles and lighting directions. Traditional RGB thresholding or HSV-based classification often fails in backlit or penumbral conditions, whereas ExGR maintains relatively stable discrimination between foliage and non-vegetation, preserving geometric continuity of leaf point clouds across defoliation stages.
However, the performance of ExGR is still constrained by the illumination conditions characteristic of pergola systems. In this study, imagery was primarily captured from below or from lateral viewpoints. Because the underside of leaves reflects less green light than the adaxial surface, ExGR values for backlit leaves can decrease to levels similar to shoots or trellis wires, causing false classification as non-vegetation. Under overexposed or strong side-lighting conditions, leaf surfaces may exhibit bright specular reflections, shifting their color distribution outside typical vegetation signatures and leading to misclassification. Conversely, shoots or trellis wires may occasionally produce bright or green-tinged reflections, causing temporary misclassification as foliage. Such errors interact with SfM texture-related gaps, occasionally resulting in localized voxel overestimation or underestimation in high-leaf-density or backlit regions.
Importantly, most existing applications of ExGR are based on top-view or UAV imagery, where illumination is generally uniform and originates from above, providing leaves with consistent green reflectance characteristics. In contrast, the present study relies heavily on sub-canopy or horizontal-side viewpoints, where leaves undergo backlighting, oblique reflections, and semi-translucent light transmission. These complex optical phenomena create a classification environment fundamentally different from that of previous studies. Thus, our findings not only demonstrate the practical utility of ExGR under challenging pergola conditions but also reveal its limitations in non–top-view, multi-angle imaging scenarios—an area seldom discussed in the current literature.
Overall, incorporating ExGR enabled consistent 3D canopy estimation across defoliation stages and was essential for ensuring that voxel occupancy accurately reflected changes in leaf mass. Nonetheless, because ExGR performance is still influenced by illumination direction, leaf reflectance properties, and imaging angle, future improvements may involve integrating multi-angle image fusion, depth cameras, or deep-learning-based semantic segmentation to further enhance classification robustness. Such advancements would allow voxel metrics to maintain high accuracy even under rapidly changing illumination conditions typical of pergola-trained grapevine environments.
4.3. Multi-Level Error Sources and Error Propagation
SfM reconstruction in pergola-trained grapevines is prone to multiple types of errors arising across optical, imaging, reconstruction, classification, and voxelization stages. Our results show that voxel errors stem from the cumulative interaction of several factors rather than a single dominant source.
At the imaging level, illumination direction and leaf inclination angle were the primary contributors to error. The horizontal canopy and branching structure of pergola grapes frequently place leaves in backlit or penumbral conditions. These regions suffer from insufficient surface texture, preventing SfM from generating stable feature points and producing sheet-like gaps in the point cloud. After voxelization, these gaps manifest as local voids, resulting in voxel underestimation. In addition, object boundary regions are often misinterpreted by SfM as feature-rich edges, leading to the incorporation of background textures. Without ExGR filtering, these misclassified boundary points are incorrectly counted as canopy occupancy.
During point cloud classification, some backlit leaves with weakened reflectance were occasionally removed by ExGR thresholding, causing voxel underestimation. At the voxelization stage, error accumulation becomes most pronounced. Because voxel occupancy is a binary process—defined solely by the presence or absence of points—minor noise can cause a voxel to be fully occupied, whereas small texture losses can cause a voxel to be treated as empty. This binary sensitivity is especially critical in the low- to mid-leaf-mass stages, where discrimination among defoliation levels heavily depends on the precision of point cloud classification.
The destructive experiment clearly revealed that voxel errors increased at certain stages, primarily due to illumination-driven failures in classification and reconstruction. Nevertheless, voxel counts still exhibited a stable linear decline across the entire defoliation sequence, demonstrating that voxel metrics—despite multi-level error influences—retain strong internal consistency for structural canopy quantification. In contrast, LAI–2000 and DHP errors are structural rather than statistical. Their inaccuracies originate from the breakdown of light transmission paths within the canopy and therefore do not preserve the physical leaf-area relationships required for correction through post-processing.
Beyond these structure-driven error sources, additional measurement-related offsets must be considered when interpreting regression relationships derived from destructive sampling.
The presence of positive intercepts in the linear regressions between leaf area and biomass (
Table 6) does not indicate a physically meaningful condition at zero leaf area. Fresh and dry weights were measured using paper bags, which introduce a constant mass offset (approximately 25 ± 1 g for fresh weight and 20 ± 1 g for dry weight). In addition, leaf area estimation based on ImageJ and ExGR segmentation may systematically underestimate projected area due to incomplete leaf flattening, self-overlap, and shadowed regions during image acquisition. As the regression models were fitted within an observed range that does not include zero-leaf conditions, the intercept should therefore be interpreted as a measurement-related offset rather than a biological parameter, and regression interpretations are accordingly restricted to the observed canopy development range.
For the regression models in
Table 7, positive intercepts were consistently observed across voxel occupancy, DHP-derived LAI, and LAI–2000 measurements. This reflects the finite and non-isolated field observation window, where measurement signals do not approach zero even when local leaf area is minimal due to spatial and optical integration from surrounding canopy elements.
4.4. Comparison with Existing Literature and Positioning of This Study
Most SfM- or optically based LAI studies to date have focused on vertically trained crops or fruit trees with relatively simple architectures. In these systems, canopy shading is limited, illumination is more uniform, and geometric heterogeneity of foliage is lower. Under such conditions, LAI–2000, DHP, and SfM-derived canopy volume metrics (e.g., alpha shapes, convex hulls, height models) can still achieve acceptable accuracy. In contrast, pergola-trained grapevines exhibit highly complex three-dimensional canopies, multilayered leaf curtains, dense distributions of shoots and trellis wires, and strongly directional illumination—conditions under which many methods reported in previous studies become ineffective.
Earlier research typically relied on canopy envelope measures to describe plant size, yet these metrics are unable to represent true internal foliage density. By characterizing canopy occupancy through voxels, this study demonstrated how multilayered foliage in pergola canopies impacts SfM reconstruction and quantified the actual voxel–leaf area relationship using destructive measurements—an evaluation step largely absent from the existing literature. Few studies combine destructive leaf data, long-term temporal monitoring, detailed point cloud classification, and direct comparison against optical LAI measurements. Our work provides a methodological framework that resolves core challenges in 3D observation of pergola-based fruit crops.
Although the limitations of optical LAI under pergola conditions have been mentioned in some reports, few studies have used destructive evidence to demonstrate that LAI methods fail to reflect actual leaf area. Our results reinforce and extend existing findings by providing empirical proof that irregular light transmission paths in pergola canopies fundamentally break the physical assumptions underlying optical LAI estimation, whereas voxel-based structural metrics offer a more appropriate representation of canopy quantity.
Taken together, this study not only compares the performance of multiple observation methods in pergola grapevines but also establishes a voxel-centered workflow—integrating ExGR classification and SfM reconstruction—and validates it through destructive measurements. The findings fill a critical gap in current canopy research for trellis-trained fruit trees and offer a practical pathway toward robust 3D leaf area estimation in complex canopy systems.
5. Conclusions
This study integrated SfM reconstruction, ExGR-based color classification, and voxelization analysis to establish a three-dimensional leaf canopy quantification workflow tailored for pergola-trained grapevines, validated using destructive defoliation data. The results demonstrated that voxel occupancy reliably captured leaf area dynamics within individual vines, showing strong linear relationships with destructively measured leaf area under complex canopy conditions. These findings confirm voxel metrics as a robust indicator for within-plant temporal monitoring. However, cross-plant analyses revealed a marked decline in linearity, indicating proportional shifts caused by differences in illumination, SfM point cloud density, and vine-specific local coordinate systems. Under the current workflow, voxel occupancy is therefore not suitable as an absolute cross-plant leaf area indicator.
Destructive measurements further showed that voxel counts exhibited consistent nonlinear relationships with fresh and dry biomass, but were largely unrelated to leaf count, indicating that voxel occupancy fundamentally represents three-dimensional canopy occupation and leaf area density (LAD) rather than leaf number. The study also confirmed the essential role of ExGR in purifying point clouds under pergola conditions, enabling voxel metrics to accurately reflect canopy changes; however, its performance remained affected by backlighting, side-lighting, and sub-canopy viewpoints, revealing classification behaviors distinct from those described in top-view imagery literature.
Optical LAI methods (LAI–2000 and DHP) were found to be structurally invalid in pergola canopies due to highly irregular light transmission pathways dominated by trellis geometry and lateral illumination. Using destructive reference measurements, this study quantitatively demonstrated that their outputs do not reflect actual leaf area, providing empirical evidence that traditional LAI tools are unsuitable for trellis-based crops.
Overall, this work establishes a reliable 3D canopy quantification framework for pergola grapevines, clarifies the capabilities and limitations of voxel-based and optical LAI indicators in complex canopies, and provides a foundation for applying voxel metrics in smart orchard management, light interception analysis, and digital twin development. Future advances integrating multi-angle imagery, depth sensing, or semantic segmentation may further improve cross-plant comparability and enhance the robustness of 3D canopy measurements.