LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables

Yang, Xuejun; Zhong, Jinbiao; Lin, Kaiyan; Wu, Junhui; Chen, Jie; Zhu, Huajun

doi:10.3390/agriculture16101111

Open AccessArticle

LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables

by

Xuejun Yang

^1,2,

Jinbiao Zhong

^1,2,

Kaiyan Lin

^1,2,*,

Junhui Wu

^1,2,

Jie Chen

^1,2 and

Huajun Zhu

^1,2

¹

Modern Agricultural Science and Engineering Institute, Tongji University, Shanghai 201804, China

²

College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(10), 1111; https://doi.org/10.3390/agriculture16101111

Submission received: 29 March 2026 / Revised: 30 April 2026 / Accepted: 16 May 2026 / Published: 19 May 2026

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

High-precision plant phenotyping requires efficient 3D reconstruction methods with high geometric quality. 3D Gaussian Splatting (3DGS) has recently emerged as a promising approach for real-time 3D reconstruction, achieving impressive visual quality. However, in crop environments dominated by monochromatic and low-texture regions, existing 3DGS methods often produce ambiguous geometries and fail to recover geometry-consistent 3D surfaces. To address these limitations, we propose LV-3DGS (Leafy Vegetables-3DGS), an optimized 3DGS-based framework tailored for the reconstruction of leafy vegetable scenes. First, a blurred reconstruction module is introduced to mitigate reconstruction artifacts caused by camera motion blur during multi-view image acquisition. Second, we propose a planar optimization strategy and design both local and global geometric consistency regularizations to optimize the model, thereby improving the surface reconstruction quality and geometric accuracy. Third, based on an analysis of individual Gaussian contributions, a contribution-based pruning strategy is developed to selectively remove inaccurate geometric components, achieving accurate scene geometry while reducing memory consumption and improving rendering efficiency. In addition, a quantitative geometric evaluation method is proposed for assessing reconstruction quality. Experimental results demonstrate that the proposed method achieves the highest accuracy among the tested baselines, with SSIM, PSNR, and LPIPS reaching 0.94, 34.53 dB, and 0.11, respectively. Moreover, the geometric consistency (GC) metric attains 0.317 cm. Finally, phenotypic parameters are measured from the reconstructed leafy vegetable point clouds. Compared with ground truth measurements, the proposed approach yields coefficients of determination (

R^{2}

) of 0.9959, 0.9651, and 0.9895 for plant height, leaf number, and leaf area, respectively. These results are significantly outperform to some existing phenotyping methods, providing a new methodology and technical solution for high-precision, low-cost, and high-throughput crop phenotyping.

Keywords:

computer vision; 3D reconstruction; deep learning; 3D gaussian splatting; leafy vegetables; phenotyping measurement

1. Introduction

Modern agriculture is facing increasingly severe food security challenges. As a representative form of controlled-environment agriculture, plant factories play a critical role in enabling precision agriculture, where intelligent technologies are essential for sustainable and efficient production [1]. The core of precise process management and yield prediction in plant factories lies in the real-time, non-destructive acquisition of key phenotypic traits that accurately reflect crop growth status, such as plant height, leaf number, and leaf area. However, traditional phenotyping approaches, including manual measurement and destructive sampling, suffer from low efficiency, strong subjectivity, and disruption of plant growth continuity, making them inadequate for the high-frequency and high-throughput monitoring demands of plant factories [2]. Advances in computer vision have spurred the development of vision-based phenotyping platforms [3], which can be broadly classified into 2D image processing methods and 3D reconstruction techniques [4,5]. However, the complex structural patterns and severe mutual occlusion of leafy vegetables under intensive cultivation conditions significantly limit the effectiveness of 2D imaging approaches by introducing self-occlusion, loss of depth information, and perspective distortion. As a result, projection-based 2D methods can only capture basic traits (e.g., leaf length and projected area) and often yield inaccurate measurements for complex phenotypes [6]. In contrast, 3D reconstruction enables non-invasive spatial analysis and provides more comprehensive phenotypic information [7]. Therefore, accurate 3D crop modeling is crucial for continuous phenotypic monitoring throughout the production process [8].

Active 3D data acquisition methods, such as laser scanning and depth cameras, still face notable limitations. Laser scanners are prohibitively expensive [9], while depth sensors typically suffer from low point cloud quality and limited robustness to illumination variations [3]. More critically, both approaches perform poorly when reconstructing small-scale targets such as leafy vegetables, often failing to capture fine structural details. Deep learning–based 3D reconstruction methods provide a promising alternative to address these challenges. In recent years, neural rendering techniques, represented by Neural Radiance Fields (NeRF) [10] and 3D Gaussian Splatting (3DGS) [11], have revolutionized 3D reconstruction and novel view synthesis. These methods reconstruct realistic 3D digital models of real-world scenes using only multi-view 2D images and camera poses, without requiring explicit 3D or depth supervision, thereby significantly improving reconstruction efficiency and accuracy. They have been widely applied in fields such as 3D surface extraction, human avatar modeling, large-scale urban scene representation, and view synthesis. Mildenhall et al. [10] first introduced NeRF, which represents a scene as a continuous volumetric function parameterized by a neural network mapping spatial coordinates and viewing directions to color and density values. Hu et al. [12] demonstrated the feasibility of NeRF-based methods for measuring plant phenotypic parameters, including leaf morphology, plant height, and canopy structure, in complex agricultural environments. Despite these advances, NeRF-based methods [13,14,15,16,17] generally require substantial computational resources during optimization. Recently, 3DGS has emerged as a novel approach for 3D reconstruction and rendering. By explicitly modeling scenes using a set of structured Gaussian primitives and adopting a splatting-based rendering strategy, 3DGS achieves millisecond-level training and rendering speeds, effectively addressing the high computational cost of NeRF while enabling real-time, high-quality 3D reconstruction. Unlike implicit neural representations, 3DGS relies on interpretable geometric primitives, offering a favorable balance between reconstruction accuracy, rendering efficiency, and model interpretability. Chen et al. [18] proposed an improved 3DGS-based framework for high-quality orchard reconstruction, achieving accurate multi-scale reconstruction of peach orchards. Shen et al. [19] leveraged 3DGS to address leaf overlap and incomplete structural information in complex outdoor environments, enabling accurate 3D reconstruction and biomass estimation of oilseed rape.

Nevertheless, the application of 3DGS in agricultural scenarios remains limited. Numerous studies [20,21] have shown that the unordered and irregular nature of Gaussian primitives makes it difficult for standard 3DGS to accurately model real scene surfaces. Moreover, optimizing 3DGS solely based on image reconstruction objectives often leads to local minima, resulting in inaccurate depth estimation and poor geometric fidelity. To alleviate geometric ambiguity, 2DGS [22] and PGSR [23] flattens 3D volumes into sets of view-oriented planar Gaussian ellipses, providing inspiration for addressing geometric uncertainty in 3D Gaussian representations. Yu et al. [24] introduced a Gaussian Opacity Field (GOF) to facilitate geometry extraction. However, for leafy vegetable phenotyping, geometric reconstruction accuracy is a critical requirement. Existing 3DGS-based methods still struggle to generate high-precision depth maps and maintain multi-view geometric consistency, leading to severe depth artifacts when applied to leafy vegetables with complex surfaces and intricate geometries. These limitations significantly hinder downstream phenotyping and related agricultural applications. Therefore, there is an urgent need for an efficient and high-quality geometric reconstruction framework specifically tailored to leafy vegetable scenes.

To address these challenges, this study proposes an improved 3DGS-based framework designed to enhance 3D reconstruction performance in leafy vegetable scenarios and enable precise phenotypic measurement. Specifically, multi-view image data of various leafy vegetables are captured using RGB cameras. The blurred reconstruction module, planar optimization strategy, and Gaussian pruning strategy are introduced and integrated into the reconstruction pipeline. Based on this framework, reconstruction and phenotypic measurement experiments are conducted in real cultivation environments. The main contributions of this work are summarized as follows:

(1) Motion blur reconstruction. A blurred reconstruction method is proposed based on improvements to the original 3DGS model. By estimating the camera movement trajectory and sampling sub-frames along the approximated motion path, a clear new view is rendered. This effectively addresses reconstruction artifacts caused by motion blur in the sampled data under real agricultural production environments.

(2) Planar optimization strategy. To address the difficulty of reconstructing realistic leaf surfaces and geometries with conventional 3DGS, we propose: (i) a prior depth-guided initialization to bootstrap geometry in low-texture regions; (ii) Gaussian flattening to enforce the planar prior of leaves; (iii) a normal-constrained rendering module for geometrically accurate rasterization; and (iv) a median depth optimization to robustly handle severe occlusions. These strategies jointly enhance surface reconstruction fidelity and reduce geometric errors.

(3) Gaussian pruning strategy. Based on the analysis of individual Gaussian contributions, we introduce a contribution-based pruning strategy that selectively removes inaccurate structures and learns Gaussian primitives with precise geometry, achieving accurate 3D reconstruction while reducing memory consumption and improving rendering efficiency.

(4) Geometric regularization and quantitative evaluation metrics. We propose local geometric consistency constraints between rendered normals and depth maps, as well as global geometric consistency across multiple views. Furthermore, a quantitative geometric evaluation metric is introduced based on global geometric consistency to assess the geometric quality of the reconstruction results.

The remainder of this paper is organized as follows. Section 2 describes the data acquisition and dataset construction process. Section 3 presents the proposed LV-3DGS model and the associated phenotypic measurement methodology. Section 4 evaluates the reconstruction performance of LV-3DGS through comparative and ablation experiments and reports phenotypic measurement results based on reconstructed leafy vegetables. Section 5 concludes the paper, discusses current limitations, and outlines future research directions.

2. Materials

Data Acquisition and Processing

In this study, experimental datasets were collected in real vertical farming facilities. Large-scale image datasets of leafy vegetables cultivated under vertical farming conditions were acquired in April and July 2025 at the Jiading Ecological Park of Tongji University and the Bright Port Vertical Agriculture Research Center in Shanghai. We used the same equipment as in our previous work [1] to collect image data of leafy vegetables in the actual production environment. The operator manually captured RGB images from multiple viewpoints and spatial positions around each leafy vegetable. Approximately 60 images were collected for each scene. A total of 48 scenes covering different leafy vegetable species (e.g., lettuce and Clitoria ternatea), sizes, and growth stages were collected. The dataset also includes scenes with varying degrees of motion blur, which were intentionally retained to evaluate the effectiveness of the proposed blurred reconstruction module. For the division of training and test sets, this study followed the protocol recommended by Mip-NeRF360, in which every eighth image was used for testing, and the remaining images were used for training.

To meet the training requirements of the 3DGS model, the raw images were processed using the open-source COLMAP pipeline. COLMAP is a general-purpose Structure-from-Motion (SfM) [25] and Multi-View Stereo (MVS) [26] framework that supports the reconstruction of both ordered and unordered image collections. Specifically, feature extraction and feature matching were first performed on the raw images, followed by sparse reconstruction to estimate camera poses. Feature points with large reprojection errors were discarded during optimization. The resulting sparse point cloud and camera poses were used as the initial 3D Gaussian dataset for subsequent reconstruction.

To obtain ground truth phenotypic traits and evaluate the effectiveness and accuracy of the proposed reconstruction algorithm, manual measurements of phenotypic parameters were conducted after image acquisition. Use a caliper to measure the height of the leafy vegetable plant from the ground, and use destructive sampling (cutting off the leaves for measurement) to measure the number of leaves and their area. To reduce measurement errors, plant height and leaf area for each leafy vegetable were measured from multiple viewpoints, and the final phenotypic values were obtained by averaging the repeated measurements.

3. Methods

High-quality reconstruction of the 3D morphology of leafy vegetables is essential for phenotypic growth monitoring and yield analysis, and also provides a fundamental basis for exploring the feasibility of digital twin technologies in agricultural scenarios. However, existing NeRF-based and 3DGS-based models struggle to achieve high-quality reconstruction in leafy vegetable scenes due to inherent challenges such as monochromatic appearance, highly similar surface textures, and severe inter-plant and inter-leaf occlusions. In addition, motion blur introduced during image acquisition significantly degrades rendering quality, further limiting the applicability of these models in real-world agricultural environments. An overview of the proposed LV-3DGS framework is illustrated in Figure 1. The integrated blurred reconstruction module addresses reconstruction under motion-blurred conditions and is described in Section 3.1. The proposed high-quality surface reconstruction strategy, consisting of Prior Depth-Guided Initialization (PDGI), Gaussian Flattening, Normal Constraint (NC), and Median Depth Rendering (MDR), is presented in Section 3.2. The Gaussian Pruning (GP) strategy selectively removes redundant Gaussians to obtain accurate scene geometry while reducing memory consumption and improving computational efficiency, as detailed in Section 3.3.

3.1. Blurred Reconstruction Module

The clarity of scene reconstruction is critical for accurate phenotypic analysis. Multi-view image data captured in real agricultural environments are often affected by camera motion blur, which significantly degrades the quality of 3D reconstruction [27]. Although certain motion-blurred images can be filtered during preprocessing using blur detection methods (e.g., Fast Fourier Transform (FFT)-based techniques), such FFT-based approaches cannot fundamentally resolve the issue and only reduce the amount of low-quality data. The original 3DGS framework is designed to reconstruct 3D scenes from clean input images and, to the best of our knowledge, does not explicitly address optimization from motion-blurred inputs. To improve data usability and adapt to natural acquisition conditions, we propose a blurred reconstruction module that can be seamlessly integrated into the existing 3DGS framework. This module synthesizes clear views by estimating the camera motion trajectory and rendering approximate sub-frames along the estimated motion path, thereby reducing reconstruction artifacts. Furthermore, it prevents the generation of inaccurate Gaussian primitives caused by unreliable camera poses during the early stages of training.

From a physical perspective, camera motion blur arises from the temporal integration of irradiance over the exposure duration during unintended camera motion, such as hand shake or jitter [28]. During the shutter interval, the camera cannot maintain a stable pose, causing the accumulated clear sub-frame images to appear blurred. A blurred image B can therefore be modeled as the temporal integration of irradiance I from camera pose

P_{τ}

over the exposure interval

τ \in [τ_{0}, τ_{c}]

, as defined in Equation (1):

B = \int_{τ_{0}}^{τ_{c}} I (P_{τ}) d τ \approx \frac{1}{N} \sum_{i = 1}^{N} I (P_{τ_{i}})

(1)

where

I (P_{τ})

denotes a sharp image captured at pose

P_{τ}

, and

P_{τ_{i}}

represents the i-th sub-frame pose sampled during the exposure time. The integral is approximated by uniformly dividing the exposure duration into N sub-frames and accumulating their irradiance contributions. In practice, we set N = 12, which provides sufficiently accurate approximation of motion blur while maintaining a reasonable computational cost. Further increasing N yields only marginal improvements in reconstruction quality but significantly increases rendering time. Initial camera trajectories and poses are obtained from COLMAP, and the N sub-frames are uniformly sampled over the normalized time interval

[0, 1]

.

As illustrated in Figure 2, sub-frame sampling is performed along the estimated camera motion trajectory. Following ExBluRF [29] and DeBlur-GS [30], we parameterize the rigid camera motion using Bézier curves in the Lie algebra space SE(3). For each sub-frame pose

P_{τ_{i}}

, a sub-frame alignment parameter

V_{τ_{i}}

is introduced to refine the pose along the estimated trajectory, yielding an optimized camera pose

\hat{P} (V_{τ_{i}})

that better approximates the latent camera pose at time

τ_{i}

. Specially, the alignment parameters

V_{τ_{i}}

are initialized as identity transformations in the Lie algebra space SE(3), based on the assumption of locally smooth camera motion between adjacent frames. The blurred image can thus be expressed as Equations (2) and (3):

\begin{matrix} B \approx \frac{1}{N} \sum_{i = 1}^{N} I (P_{τ_{i}}) ≃ \frac{1}{N} \sum_{i = 1}^{N} I (\hat{P} (V_{τ_{i}})) \end{matrix}

(2)

\begin{matrix} \hat{P} (V_{τ_{i}}) = P_{τ_{i}} V_{τ_{i}} \end{matrix}

(3)

The corrected poses are accumulated across N temporal samples and rendered using Gaussian splatting rasterization to synthesize motion-blurred images. Given a set of M blurred input images

{B_{τ}}_{τ = 1}^{M}

, the optimization objective is to estimate the alignment parameters

{V_{τ}}_{τ = 1}^{M}

that best describe the underlying camera motion trajectory while producing a sharp scene representation. This is achieved by minimizing the Manhattan distance between the reconstructed images and the observed blurred inputs.

Following prior work [27,29,31], a gamma correction function is applied to the synthesized blurred views to accurately model the camera imaging process. Specifically,

γ (x) = x^{1 / 2.2}

is used to convert irradiance to image intensity, together with a nonlinear response function to approximate the physical image formation process.

3.2. The Planar Optimization Strategy

In this section, we first address the issue of point cloud sparsity in weakly textured regions of leafy vegetable scenes when using conventional 3DGS by introducing a Prior Depth-Guided Initialization (PDGI) module. Next, we discuss how 3D Gaussian primitives can be transformed into planar representations. Based on this planar formulation, we propose a Normal-Constrained (NC) planar Gaussian rendering method, which jointly renders plane-to-camera distances and surface normals. The rendered depth values are further constrained by surface normals and converted into depth maps, thereby improving geometric accuracy. Finally, to handle severe occlusions commonly observed in leafy vegetable scenes, we introduce a Median Depth Rendering (MDR) strategy to improve the robustness of depth estimation in 3DGS.

3.2.1. Prior Depth-Guided Initialization (PDGI)

Inspired by the work of PlanarGS [32], we observe that SfM-based initialization in 3DGS heavily depends on feature extraction results. In scenes dominated by similar textures, such as leafy vegetables, this dependency often leads to sparse point clouds over large regions. To alleviate this issue, we back-project prior depth information into dense 3D space to supplement missing point clouds in texture-similar areas. Specifically, a pretrained monocular depth estimation model (Depth Anything [33]) is first employed to predict depth maps. As illustrated in Figure 3, for each pixel

p_{0}

in the depth map, four neighboring (radius = 1) pixels (

p_{1}, p_{2}, p_{3}, p_{4}

) are sampled under a local planar assumption to estimate the distance from pixel

p_{0}

to the camera. These pixels are then back-projected into 3D space. The normal

n (p_{0})

of the local plane at pixel

p_{0}

is computed as Equation (4):

\begin{matrix} n (p_{0}) = \frac{(p_{1} - p_{3}) \times (p_{2} - p_{4})}{∥(p_{1} - p_{3}) \times (p_{2} - p_{4})∥} \end{matrix}

(4)

\begin{matrix} δ (p_{0}) = d (p_{0}) \cdot n (p_{0}) \end{matrix}

(5)

Based on the local plane normal

n (p_{0})

and the depth value

d (p_{0})

in depth map, the distance

δ (p_{0})

from the local plane to the camera can be computed using Equation (5).

3.2.2. 3D Gaussian Flattening

Accurate geometric reconstruction and high-quality rendering require Gaussian primitives to closely approximate the true surface geometry of the target scene. Leafy vegetables are characterized by multi-leaf structures with approximately planar surfaces, making faithful surface representation particularly important for reconstruction accuracy. Inspired by prior work such as 2DGS [22] and PGSR [23], we observe that representing surfaces using 3D Gaussian ellipsoids often leads to geometric ambiguity and blurred surface reconstructions that deviate from true geometry. In contrast, planar Gaussian primitives provide a better approximation of local planar structures and enable direct rendering of depth and surface normals. Therefore, we flatten 3D Gaussians into 2D planar Gaussians to more accurately represent the geometric surfaces of leafy vegetables. In the geometric reconstruction process, each Gaussian ellipsoid is flattened into a plane so that it more closely aligns with the surface of real leaf-like objects, thereby reducing depth and normal blurring.

In 3DGS, the covariance matrix is defined as

Σ_{i} = R_{i} S_{i} S_{i}^{T} R_{i}^{T}

, which represents the shape of the Gaussian ellipsoid. Here, i denotes the i-th Gaussian primitive, and

R_{i}

denotes the orientation of the ellipsoid’s principal axes, and

S_{i}

contains the scaling factors along each axis. By compressing the scaling factor along a specific axis, the Gaussian ellipsoid can be flattened into a planar structure. Specifically, we identify the minimum scaling factor

S_{i, \min} = diag (s_{1}, s_{2}, s_{3})

and compress the Gaussian ellipsoid along the corresponding axis direction [34]. We adopt an adaptive compression strategy as Equation (6). The scaling result is updated as

s_{\min}^{'}

:

\{\begin{matrix} s_{\min} = \min (s_{1}, s_{2}, s_{3}) \\ s_{mid} = mid (s_{1}, s_{2}, s_{3}) \\ s_{\max} = \max (s_{1}, s_{2}, s_{3}) \\ s_{\min}^{'} = s_{\min} \cdot c_{i} \\ c_{i} = clip (\frac{s_{\min}}{\frac{1}{2} (s_{mid} + s_{\max})}, c_{\min}, c_{\max}) \end{matrix}

(6)

where

c_{i}

is defined as the adaptive compression coefficient. In our implementation, we set

c_{\min} = 0.001

and

c_{\max} = 0.1

. The clip function ensures that

c_{i}

remains within the range

[c_{\min}, c_{\max}]

. This design ensures that the Gaussian is flattened into a near-planar structure while preserving a small but non-zero thickness.

This operation effectively flattens the ellipsoid into a planar Gaussian that best approximates the local leaf surface geometry. The shortest axis direction

S_{i, \min}

is then defined as the normal n of the planar Gaussian. The orientation of

n_{i}

is determined according to the camera viewing direction, and the angle between the viewing direction and the normal is constrained to be greater than

90^{\circ}

.

3.2.3. Normal-Constrained (NC) Planar Gaussian Rendering

Unlike prior surface reconstruction methods [21,22,23,24], which focus primarily on appearance modeling, we propose a normal-constrained planar Gaussian rendering module driven by geometry for accurate surface reconstruction in leafy scenes. Given planar Gaussian primitives, we first render a surface normal map

\hat{N}

from the current viewpoint via

α

-blending and the rotation matrix R from the camera coordinate system to the global coordinate system, which serves as a geometric descriptor of local surface orientation rather than a purely rendering attribute, and is defined as:

\hat{N} = \sum_{i \in N_{G}} R^{T} n_{i} α_{i} \prod_{j = 1}^{i - 1} (1 - α_{j})

(7)

where

α

is the opacity value, and

N_{G}

is the number of Gaussians that the ray passes through.

Beyond surface normals, accurate depth recovery is crucial for enforcing geometric consistency in thin, heavily occluded leafy structures. Unlike the original 3DGS [11], which uses the distance to the Gaussian center as depth, we explicitly distinguish planar distance from true ray depth. As illustrated in Figure 4, it is important to note that the camera viewing direction v is not necessarily aligned with the normals n of all planar Gaussians. Therefore, the planar distance

δ_{i}

is not equivalent to depth

d_{i}

, and a geometric angle exists between them. For each planar Gaussian, the distance

δ_{i}

from the camera center u to the plane is computed as the projection of the Gaussian center

O_{c}

onto the normal direction

n_{i}

, and is defined as:

δ_{i} = (R^{T} (O_{c} - u)) R^{T} n_{i}

(8)

The rendered planar distance map

\hat{Δ}

is obtained via

α

-blending and is defined as:

\hat{Δ} = \sum_{i \in N_{G}} δ_{i} α_{i} \prod_{j = 1}^{i - 1} (1 - α_{j})

(9)

Inspired by PGSR [23], we finally render a depth map for geometric optimization in leafy vegetable reconstruction. The rendered distance and normals not only enable precise geometric depth computation but also provide supervisory signals for subsequent local and global consistency optimization—critical for handling severe self-occlusions and thin-layered leaves. The final rendered depth map

\hat{D}

is derived from the planar distance map

\hat{Δ}

and the normal map

\hat{N}

as:

\hat{D} (p) = \frac{\hat{Δ} (p)}{\hat{N} (p) K^{- 1} \tilde{p}}

(10)

where

p = {[u, v]}^{T}

denotes a 2D pixel location on the image plane,

\tilde{p}

is its homogeneous coordinate, K is the camera intrinsic matrix, and

K^{- 1} \tilde{p}

represents the direction of the ray that passes through the camera’s optical center and through the pixels in the camera’s imaging plane, which can be regarded as the camera viewing direction v.

3.2.4. Median Depth Rendering (MDR)

Standard 3DGS employs mean depth rendering. Specifically, it counts all the Gaussians that the ray passes through, and calculates the mean value by sorting them in descending order of depth:

d (x) = \sum_{i = 1}^{N_{G}} w_{i} d_{i}

(11)

where

d_{i}

and

w_{i}

denote the depth from the camera plane to the i-th Gaussian and the contribution weight of the i-th Gaussian, respectively. Although effective in general scenarios, this strategy produces unstable depth estimates in discontinuous regions (e.g., overlapping leaves and cavities), which are common in densely planted leafy vegetable scenes, leading to sharp depth variations over short spatial distances. To enhance robustness, inspired by 2DGS [22], we adopt median depth rendering [1]. When the accumulated alpha

\sum_{i} w_{i}

along a ray does not reach 0.5, 2DGS uses the depth of the last Gaussian. However, in dense leafy vegetable scenes, such cases are widespread and typically correspond to occluded regions. Therefore, we instead assign the depth value to half of the default maximum depth used in 3DGS, which better reflects the invisibility of occluded points. Specifically, along each pixel ray, we accumulate the depth weights of sorted Gaussians and select the depth at which the cumulative weight is closest to 0.5 as the pixel depth estimate:

d (x) = d_{m}, m = arg min_{l \in N_{G}} |\sum_{i = 1}^{l} w_{i} - 0.5|

(12)

3.3. The Gaussian Pruning (GP) Optimization Strategy

Pruning is a crucial technique in 3DGS. Although 3DGS achieves significantly faster reconstruction than NeRF-based methods, it suffers from high memory consumption and a large number of redundant Gaussian primitives. Training all Gaussians indiscriminately may cause the model to overlook fine-grained scene geometry, leading to degraded geometric accuracy. In real leafy vegetable growth scenarios, spatial structures are highly interwoven and folded. An appropriate pruning strategy can selectively remove inaccurate or redundant Gaussians while preserving essential geometric structures, thereby improving reconstruction accuracy while reducing memory usage and training time.

The core of Gaussian pruning lies in accurately evaluating the contribution of each Gaussian primitive. In the original 3DGS framework, a Gaussian’s contribution is implicitly measured by its opacity. This strategy tends to preserve Gaussians with high opacity while discarding those with low opacity. However, previous work [35] has shown that although high-opacity Gaussians often contribute significantly to image rendering, they have limited capacity to represent complex geometric structures. This limitation can result in blurred artifacts in high-frequency regions, substantially degrading perceptual quality and geometric fidelity. Consequently, opacity-based pruning may mistakenly remove geometrically important Gaussians while retaining floating artifacts with high opacity. To address this issue, we propose a more precise contribution metric and adopt a progressive pruning strategy. In the original 3DGS formulation, during

α

-blending, the blending weight

w b_{i}

represents the contribution of a Gaussian to a pixel and is defined as:

w b_{i} = α_{i} t_{i} = α_{i} \prod_{j = 1}^{i - 1} (1 - α_{j})

(13)

where

t_{i}

denotes the transmittance. The overall contribution

C_{k}

of a Gaussian to the k-th rendered image can be computed as the sum of blending weights over all pixels

P_{k}

, and is defined as:

C_{k} = \sum_{p \in P_{k}} α_{i} (p) \prod_{j = 1}^{i (p) - 1} (1 - α_{j})

(14)

where

i (p)

denotes the index of the Gaussian sorted by depth along the ray corresponding to pixel p. This formulation inherently favors large Gaussians that contribute to many pixels, while assigning very low contribution scores to small Gaussians. However, large Gaussians have limited ability to represent fine geometric details and are often difficult to optimize. To enhance the model’s sensitivity to geometric structures, we normalize the contribution by the number of projected pixels and introduce a hyperparameter

γ

to balance opacity and transmittance. The final contribution metric is defined as:

C_{k} = \frac{1}{| P_{k} |} \sum_{p \in P_{k}} {(α_{i} (p))}^{γ} {(\prod_{j = 1}^{i (p) - 1} (1 - α_{j}))}^{1 - γ}

(15)

As

γ

increases, the influence of transmittance diminishes. In particular, when

γ = 1

, the formulation degenerates to the original opacity-based pruning strategy used in 3DGS. Moreover,

γ

controls contribution bias: Gaussians deviating from the true surface often exhibit higher transmittance near the outer surface and lower transmittance internally. By adjusting

γ

, the model achieves bidirectional bias control, dynamically balancing internal and external Gaussian distributions. This mechanism enables adaptive pruning tailored to specific geometric characteristics of the scene, ultimately leading to more accurate geometric reconstruction.

In multi-view reconstruction, the contribution of a Gaussian across views typically follows a long-tail distribution. A Gaussian usually contributes significantly to only a limited number of views—approximately 15–30% [35]—where strong geometric cues such as sharp edges and clear occlusion relationships are present. Contributions from other views are often diminished due to motion blur, viewpoint redundancy, or sensor noise. Therefore, we compute the overall contribution C as the average contribution over a small set of high-contribution views:

C = \frac{1}{| V |} \sum_{k \in V} C_{k}

(16)

where V denotes the set of views with the highest contributions, and in practice, we select the top five views. This choice provides a robust trade-off between accuracy and computational efficiency: using fewer views leads to unstable estimation due to noise, while including more views introduces low-contribution observations that dilute the geometric signal. During training, pruning is performed progressively at predefined iteration intervals. At each pruning step, we evaluate the overall contribution of each Gaussian across the training set and remove a fixed proportion of Gaussians with the lowest contribution scores.

3.4. Regularization Functions for Model Training

3.4.1. Image Reconstruction Loss

Following the original 3DGS framework, we compute the image reconstruction loss

L_{1}

as the Manhattan distance between the rendered RGB image and the corresponding ground truth image:

L_{1} = \frac{1}{N} \sum_{p \in N} {∥\hat{I} (p) - I (p)∥}_{1}

(17)

Optimizing 3DGS solely based on image reconstruction loss can easily lead to geometric ambiguities and local geometric overfitting. To mitigate this issue, we introduce both local and global geometric consistency regularization terms, encouraging Gaussians to better conform to the true scene geometry.

3.4.2. Local Geometric Consistency Loss

Under the local planar assumption, a pixel and its neighboring pixels (within the radius of one pixel) can be approximated as lying on a local plane. During training, the model renders both a depth map

\hat{D}

and a normal map

\hat{N}

. For each pixel, four neighboring pixels are sampled, and a local planar normal is estimated based on the rendered depth values. Repeating this process over the entire image yields a locally estimated normal map

{\hat{N}}_{o}

derived from the depth map. We then minimize the difference between the rendered normal map and the locally estimated normal map to enforce consistency between depth and normal geometry:

L_{2} = \frac{1}{N} \sum_{p \in N} {∥{\hat{N}}_{o} (p) - \hat{N} (p)∥}_{1}

(18)

3.4.3. Global Geometric Consistency Loss

While local geometric regularization enforces consistency between depth and normals within a single view, the irregular and discrete nature of Gaussian optimization may still lead to inconsistencies across multiple views. Therefore, we further introduce a global geometric consistency constraint to enforce cross-view geometric alignment. Inspired by stereo matching and optical flow [1], depth values rendered from different views should correspond to the same 3D spatial locations. As illustrated in Figure 5, for a pixel

p_{1}

in view

I_{1}

with depth

Z_{1} (u, v)

, its corresponding world coordinate

p_{1}^{w}

is computed as:

p_{1}^{w} = Z_{1} (u, v) K p_{1} P_{1}

(19)

where K is the camera intrinsic matrix and

P_{1}

denotes the camera pose of view

I_{1}

. Projecting

p_{1}^{w}

into view

I_{2}

yields the corresponding pixel

p_{1}^{'}

:

p_{1}^{'} = \frac{K^{- 1} p_{1}^{w} P_{2}^{- 1}}{Z_{1}^{'} (u, v)}

(20)

where

P_{2}

is the camera pose of view

I_{2}

, and

Z_{1}^{'}

denotes the depth of

p_{1}^{'}

in view

I_{2}

. Mapping

p_{1}^{'}

back to world coordinates using the corresponding depth

Z_{2} (u^{'}, v^{'})

yields

p_{1}^{' w}

:

p_{1}^{' w} = Z_{2} (u^{'}, v^{'}) K p_{1}^{'} P_{2}

(21)

If the depth estimates are geometrically consistent,

p_{1}^{' w}

should coincide with

p_{1}^{w}

. Otherwise, we minimize the discrepancy between their depth values to enforce global geometric consistency:

L_{3} = \frac{1}{N} \sum_{p \in N} {∥Z_{1}^{'} (u, v) - Z_{2} (u^{'}, v^{'})∥}_{1}

(22)

3.4.4. Blur Reconstruction Loss

When the optional blurred reconstruction module is enabled, we additionally introduce a blur reconstruction loss. After convergence, the reconstructed blurred observation B is compared with the input image I:

L_{4} = \frac{1}{N} \sum_{p \in N} {∥B (p) - I (p)∥}_{1}

(23)

The final total loss function of the proposed LV-3DGS framework is defined as a weighted combination of all loss terms:

L_{total} = λ_{1} L_{1} + λ_{2} L_{2} + λ_{3} L_{3} + λ_{4} L_{4}

(24)

3.5. Leafy Vegetable Phenotyping

Due to the low-density noise around leaf edges and the high-density point cloud of the leafy vegetable, a statistical outlier removal (SOR) filter is applied to eliminate outliers with significant density differences. Following our previous work [1], the cleaned point cloud is then used for phenotypic measurements.

(1) Height: Under the natural growth conditions of leafy vegetables, the optimal plane is fitted based on the normal vector of the root to serve as the XY plane of the Cartesian coordinate system. The direction perpendicular to the XY plane and pointing upwards from the root is taken as the Z-axis for the coordinate correction of the leafy vegetables. The lowest and highest points of the Z-axis of the leafy vegetables are calculated, and the vertical distance difference between them is calculated, which is the height H of the leafy vegetables.

(2) Number of Leaves: Unlike other plants, the internal structure of leafy vegetables is very complex. The stems and leaves overlap with each other, making it difficult to calculate the number of leaves by extracting the skeletal structure. However, 3D point cloud models can provide comprehensive spatial structure information. Based on the positional relationship and density difference between the point clouds, different leaves can be clustered, as shown in Figure 6a. We perform conditional Euclidean clustering on the complete point cloud of the leafy plant and create a KD-Tree as the search mechanism for the point cloud. Select a starting seed point and set a threshold distance. Points within this distance are considered to be of the same type, while those outside this distance are not of the same type. This process clusters points that are close to each other into the same cluster. Each independent cluster is identified as a leaf, and the number of clusters can be counted to calculate the number of leaves.

(3) Surface Area: The Delaunay triangulation is used to reconstruct the three-dimensional mesh of the leaf and stem, as shown in Figure 6b. The triangles need to meet two conditions: Firstly, no points exist within the smallest enclosing sphere of each triangle. Secondly, the edges of the triangle are smaller than a certain threshold to avoid connecting discontinuous surfaces. For each triangle obtained from the triangulation, the area of the individual triangle can be calculated using Heron’s formula. Then, the sum of the areas of all triangles within the outermost convex hull can be obtained to calculate the leaf area of each leaf and stem. The Delaunay triangulation formula is shown in Equations (25) and (26):

\begin{matrix} S_{i} = \sqrt{p_{i} (p_{i} - a_{i}) (p_{i} - b_{i}) (p_{i} - c_{i})} \end{matrix}

(25)

\begin{matrix} S = \sum_{i = 1}^{n} S_{i} \end{matrix}

(26)

where

p_{i}

is half of the perimeter of the i-th triangle,

a_{i}, b_{i}, c_{i}

are the side lengths of the i-th triangle, and n is the total number of triangles. The total surface area S is obtained by summing the areas of all triangles within the convex hull.

4. Results and Discussion

This section evaluates the performance of the proposed methodology. Firstly, the implementation details of the experiments in this paper are introduced, which include the model evaluation method. Secondly, evaluate the proposed motion blur removal module. Subsequently, the optimized LV-3DGS model was compared with other mainstream models, and the accuracy and performance of the model were analyzed. Meanwhile, we conducted ablation experiments on the model, which included the ablation of hyperparameters and modules. Finally, we evaluated and compared the performance of our method in phenotyping. A comprehensive evaluation is provided, with quantitative metrics and qualitative assessments.

4.1. Experimental Environment and Evaluation Indicators

All experiments in this paper are implemented based on Ubuntu 20.04, Pytorch 1.12.1, CUDA 11.8. We extend the differentiable Gaussian splatting rasterizer to support depth, pose, and cumulative opacity for both forward and backward propagation. In addition, the model is optimized using Stochastic Gradient Descent techniques. In this study, the model was trained on the Nvidia RTX 4090 GPU 24 GB platform. The model was trained for 30,000 iterations with a learning rate of 0.01 using the Adam optimizer. We conduct pruning every 1000 iterations, and at each pruning step, we remove 10% of the Gaussians with the lowest contribution scores. The optimal hyperparameters involved in the model are selected through subsequent hyperparameter experiments. To assess the reconstruction quality of the proposed LV-3DGS model, we employ the following evaluation metrics:

(1): Image fidelity metrics:

Peak Signal-to-Noise Ratio (PSNR) is used to measure the distortion degree of the rendering image. The larger the value is, the better the rendering effect will be. The calculation formula is shown in Equations (27) and (28):

\begin{matrix} MSE = \frac{1}{h w} \sum_{i = 0}^{h - 1} \sum_{j = 0}^{w - 1} {| I (i, j) - K (i, j) |}^{2} \end{matrix}

(27)

\begin{matrix} PSNR = 10 \cdot {log}_{10} {(\frac{MAX}{MSE})}^{2} = 20 \cdot {log}_{10} (\frac{MAX}{\sqrt{MSE}}) \end{matrix}

(28)

where h and w are the height and width of the image, I and K are the ground truth image and the rendering image, respectively, MAX is the maximum possible pixel value of the image, and MSE is the mean square error.

Structural Similarity Index Measure (SSIM) is used to measure the similarity of edges and textures, which is defined in Equation (29):

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(29)

where

μ_{x}, μ_{y}

are the local window means of x and y, respectively.

σ_{x}^{2}, σ_{y}^{2}

are the variances.

σ_{x y}

is the covariance.

c_{1} = {(K_{1} L)}^{2}

and

c_{2} = {(K_{2} L)}^{2}

are a constant used to maintain stability and L represents the dynamic range of pixel values,

K_{1} = 0.01

and

K_{2} = 0.03

.

Learned Perceptual Image Patch Similarity (LPIPS) is more accurate compared with PSNR and SSIM, capturing more complex image features and perceptual differences. LPIPS quantitatively measures the rendered image against the ground truth image through a deep learning model (using VGG as the backbone network) that ranges from 0 to 1. The values are negatively correlated with the image rendering quality, with lower LPIPS values indicating that the two images are more similar.

(2): Geometric accuracy:

Geometric Consistency (GC) is the global geometric consistency metric accepted in Section 3.4. It serves as a quantitative indicator to measure the geometric accuracy of the reconstructed model and the multi-view consistency of the rendering depth values obtained by the method. Its unit is centimeters.

(3): Computational efficiency metrics:

Training time. We calculated the average training time for all scenarios across 30,000 iterations.

The performance of the phenotyping measurements is evaluated using the Correlation Coefficient

R^{2}

and Root Mean Square Error (RMSE). The

R^{2}

quantifies the strength of the linear relationship between the computed and ground truth phenotypic values, calculated as Equation (30). The RMSE assesses the consistency and overall accuracy of the phenotyping results, calculated as Equation (31).

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(g t_{i} - p r e_{i})}^{2}}{\sum_{i = 1}^{n} {(g t_{i} - \bar{g t})}^{2}} \end{matrix}

(30)

\begin{matrix} RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(p r e_{i} - g t_{i})}^{2}} \end{matrix}

(31)

where

p r e_{i}

and

g t_{i}

represent the predicted and ground truth measured phenotype values,

\bar{g t}

represents the average of

g t_{i}

, respectively.

4.2. Evaluation of 3D Rendering Performance at Motion Blur Scenes

We evaluated the performance of the proposed blurred reconstruction module by comparing it with Deblur-NeRF [27] and a representative 2D image deblurring approach combined with 3DGS. Deblur-NeRF jointly optimizes neural radiance field reconstruction and pixel-wise blur kernel estimation. For the 2D deblurring baseline, we employed Restormer [36] to independently deblur the input images before feeding them into the standard 3DGS pipeline for scene reconstruction. To ensure a fair comparison with the ground truth images, the Gaussian scene parameters were fixed during evaluation, and only a global transformation was optimized to estimate the appropriate camera pose alignment.

Table 1 presents the quantitative evaluation results of blur reconstruction and scene rendering on the leafy vegetable dataset under different levels of self-collected motion blur. The results demonstrate that the proposed blur reconstruction module integrated into the 3DGS framework consistently outperformed the comparison methods. Preprocessing with Restormer showed limited performance gains, likely due to its isolation from the 3D reconstruction process. Without integrating scene geometry during deblurring, the 2D-only approach may introduce inconsistencies that degrade the quality of the reconstructed Gaussians. Deblur-NeRF was able to reconstruct 3D scenes with reasonable consistency by jointly estimating spatially varying blur kernels during training. However, its modeling of motion blur relies on image-space convolution and MLP-based point spread function estimation, without explicitly incorporating camera motion or scene occlusion information. Consequently, we observed that Deblur-NeRF required longer training time and larger blur kernels to handle severe motion blur, and the reconstructed scenes occasionally exhibited residual blur or discontinuities across views. In contrast, the proposed method explicitly models camera motion trajectories and samples clear sub-frames during training. This strategy avoids generating inaccurate Gaussians at incorrect spatial locations and enables faster convergence while producing sharper rendering results. Qualitative comparisons in Figure 7 further confirm that our method yields clearer textures and more consistent geometry than the competing approaches under motion-blurred conditions.

4.3. Comparison of Training and Rendering Efficiency Across Different Models

To ensure a fair comparison, all baseline and comparison methods were trained under identical experimental settings, except for the specific architectural or algorithmic modifications introduced by each method. We compared the proposed LV-3DGS with several advanced novel view synthesis and surface reconstruction approaches, including NeRF [10], Neuralangelo [37], the baseline 3DGS [11], and recent Gaussian-based surface reconstruction methods such as SuGaR [21], GOF [24], 2DGS [22], and PGSR [23]. The evaluation focused on reconstruction quality, geometric accuracy, and computational efficiency in real crop production scenarios.

Neuralangelo extends traditional NeRF by combining multi-resolution 3D hash grid representations with neural surface rendering. SuGaR, GOF, 2DGS, and PGSR represent scenes using planar Gaussian primitives, with differences in geometric constraints and optimization strategies. GOF constructs a Gaussian opacity field and extracts geometry via level-set estimation, while 2DGS and PGSR introduce depth consistency constraints to improve surface reconstruction quality.

As shown in Table 2, the proposed LV-3DGS achieved superior performance across various leafy vegetable scenes. In terms of training efficiency, LV-3DGS achieved the shortest average training time, improving efficiency by 11.67% compared with the baseline 3DGS, which can be attributed to the proposed contribution-based Gaussian pruning strategy. All other comparison methods required longer training times than 3DGS. In the image quality evaluation indicators, when compared with other existing models, thanks to the optimization of the proposed LV-3DGS model in surface reconstruction, the model achieves the highest reconstruction quality in leafy vegetable scenarios. Specifically, compared with the baseline 3DGS model, the PSNR and SSIM values of the LV-3DGS model have increased by 2.70% and 3.23%, respectively, while the LPIPS value has decreased by 6.50%. A paired t-test confirmed that these improvements over 3DGS are statistically significant (PSNR:

p = 0.0175

, SSIM:

p = 0.0134

, LPIPS:

p = 0.0131

). Compared with the current state-of-the-art PGSR method in surface reconstruction, the PSNR and SSIM values of the LV-3DGS model have increased by 0.92% and 1.82%, respectively, and the LPIPS value has decreased by 5.57%. These differences also reached statistical significance (PSNR:

p = 0.0182

, SSIM:

p = 0.0173

, LPIPS:

p = 0.0167

). In the geometric accuracy evaluation indicators, the LV-3DGS model achieves the smallest geometric error, with GC reduced by 1.566 cm compared with the baseline 3DGS model. The reduction in geometric error was statistically significant (

p = 0.0093

) compared with 3DGS. These results demonstrate that LV-3DGS achieves competitive rendering quality and geometric accuracy while also improving training efficiency.

Furthermore, Figure 8 also compares the rendering results of the Neuralangelo, 3DGS, PGSR and LV-3DGS models in different leafy vegetable scenes. LV-3DGS reconstructs fine details with higher quality. Leaf veins are sharply reconstructed, stem textures are clearly stratified, and boundaries between vegetation and background remain well-defined. Neuralangelo captures the overall structure but produces blurred edges and background artifacts. PGSR uses a 2D plane Gaussian optimization model, and the rendered leaf surfaces are smoother and contain more details, but it performs poorly in geometric structure. Some veins still have unclear outlines. These problems weaken the realism of the reconstruction work. These improvements in LV-3DGS are attributed to the combined effect of planar optimization and pruning strategies, local and global geometric consistency, which together steer the optimization towards more accurate surface geometry.

In addition to the general-purpose methods compared above, several 3DGS-based approaches have been successfully applied in agricultural domains, achieving promising results in large-scale scenarios such as orchards and farmlands [18,19]. However, these methods primarily focus on the reconstruction of large scenes (such as orchards and farmland), whereas our target is high-quality reconstruction and phenotypic monitoring of leafy vegetables in controlled-environment plant factories.

Overall, these results demonstrate that LV-3DGS achieves high-quality reconstruction, improved geometric accuracy, and enhanced training efficiency across a wide range of leafy vegetable scenes, highlighting its practical potential for large-scale agricultural 3D reconstruction applications.

4.4. Performance Comparison of Different Network Structures

4.4.1. Hyperparameters Optimization

In this section, we conducted ablation experiments to optimize the hyperparameters associated with the proposed loss function and Gaussian pruning strategy, including the image reconstruction loss weight

λ_{1}

, local geometric consistency loss weight

λ_{2}

, global geometric consistency loss weight

λ_{3}

, and the transmittance exponent

γ

. The blur reconstruction loss weight

λ_{4}

was set to zero in this section, as the blurred reconstruction module was evaluated independently in Section 4.2.

The local geometric regularization term can restrict the geometric consistency between local parts of a single view, providing good initial geometric accuracy without relying on multi-view information. The global geometric regularization term limits the geometric consistency between multiple views, improving the overall reconstruction accuracy. As can be seen from the above Table 3, the local and global geometric consistency is crucial for improving the reconstruction accuracy of the model. Control experiments with either term disabled (

λ_{2} = 0

or

λ_{3} = 0

) yield increased GC values (0.997 and 1.253, respectively) and degraded rendering metrics, confirming that their combination is essential. Through experiments,

λ_{1} : λ_{2} : λ_{3}

= 1.0:1.0:1.2 was selected as the hyperparameters of the final loss function for the model.

Table 4 reports the impact of different

γ

values in the Gaussian pruning strategy. The results indicate that smaller

γ

values yield better performance, which aligns with our theoretical analysis: a lower

γ

gives more weight to transmittance, enabling the pruning metric to identify and remove low-opacity Gaussians that are not consistently visible along rays. When

γ = 0.25

, the transmittance term dominates, retaining Gaussians in thin or occluded regions, which improves reconstruction. As the

γ

value increases to 0.75, the contribution index approximates the original opacity-based pruning method (since transmittance is suppressed) due to the preservation of floating objects. This further validates the effectiveness of the proposed contribution-based pruning strategy compared with the default opacity-based approach.

4.4.2. Effectiveness of Different Module

Based on the hyperparameter ablation study, we verified that the selected model had the best hyperparameter configuration. In this section, we validated the performance of each proposed module through ablation experiments of different modules. Based on the baseline 3DGS model, the following components were independently trained and evaluated: 3DGS+PDGI (introducing Prior Depth-Guided Initialization), 3DGS+Flattening+NC (introducing 3D Gaussian Flattening and Normal Constraint), 3DGS+PDGI+Flattening+NC, 3DGS+MDR (introducing Median Depth Rendering), 3DGS+GP (introducing Gaussians Pruning), and LV-3DGS (all proposed modules). The experimental results are shown in the Table 5. The PDGI module effectively fills in the missing points in the leafy vegetable texture regions of the point cloud, thereby improving the feature perception of leafy vegetable texture details during model training. The PSNR and SSIM values increase by 1.29% and 1.64%, respectively, the LPIPS value decreases by 1.19%, and the GC value decreases by 0.309 cm. The Flattening and NC modules fit the leafy vegetable surface in a planar Gaussian manner and constrain the depth rendering from the normal perspective, reducing geometric errors. The PSNR and SSIM values increase by 2.05% and 1.72%, respectively, the LPIPS value decreases by 1.14%, and the GC value decreases by 1.284 cm. Moreover, the combination of PDGI with Flattening and NC modules has a better effect. Unlike the mean depth estimation in baseline 3DGS, MDR uses the median of depth contributions to robustly handle occlusions and surface discontinuities, and it alleviates the error depth estimation problem that occurs in areas with surface discontinuities or incomplete reconstruction (for example, overlapping leaves). The GP module achieves precise geometric representation by deleting redundant Gaussians based on the contribution-based pruning strategy, and the GC value decreases by 1.548 cm with a significant improvement in training efficiency. In summary, the 3D Gaussian model with the proposed modules shows better reconstruction quality in leafy vegetable scenes, reduces the perceptual difference between the reconstructed image and the GT image, and reduces geometric errors.

4.5. The Results of Leafy Vegetable Phenotypic Calculation and Regression

To validate the effectiveness of the proposed method, phenotypic traits including plant height, leaf number, and leaf surface area were estimated for all reconstructed leafy vegetable scenes using the phenotypic measurement pipeline described in Section 3.5. Figure 9 illustrates the comparison between the phenotypic values estimated from the reconstructed 3D models and the corresponding manual measurements. Meanwhile, the result of the paired t-test (p > 0.05) indicates that the differences between the three phenotypic measurement results and the true values are not statistically significant. In addition, we compared the phenotypic estimation performance of the proposed method with results reported in related studies, as summarized in Table 6. Specifically, the coefficient of determination (

R^{2}

) for plant height estimation reached 0.9959 with a root mean square error (RMSE) of 0.33 cm. For leaf number estimation, the

R^{2}

value was 0.9651 with an RMSE of 0.85. The estimation of leaf surface area achieved an

R^{2}

of 0.9895 and an RMSE of 14.78 cm². These results demonstrate a strong linear correlation between the estimated phenotypic traits and manual measurements, indicating that the proposed LV-3DGS framework provides reliable phenotypic estimation performance. Although some reported methods achieved slightly higher accuracy in specific scenarios—for example, multi-view stereo approaches applied to corn stems (

R^{2} = 0.998

) [38] and lettuce (

R^{2} = 0.979

) [39]—such methods typically rely on extensive point cloud post-processing and manual intervention. In contrast, the proposed approach achieves competitive accuracy while maintaining a higher level of automation and computational efficiency. Furthermore, compared with previous binocular vision methods [1], our multi-view approach provides richer scene information. These results suggest that high-quality 3D reconstruction based on LV-3DGS can serve as a robust and efficient foundation for crop phenotypic measurement.

5. Conclusions

This study proposed the LV-3DGS framework to address the limitations of conventional 3DGS methods in reconstructing leafy vegetable scenes characterized by low-texture, uniform color distribution, and complex surface geometry in controlled agricultural environments. By integrating planar Gaussian surface modeling, contribution-aware Gaussian pruning, and local and global geometric consistency regularization, the proposed method significantly improves both reconstruction fidelity and geometric accuracy across diverse leafy vegetable scenarios in real plant factory settings. Unlike previous works that focused on planar Gaussian representation or blur correction, LV-3DGS introduces entirely new features: LV-3DGS is designed for high-quality reconstruction and phenotypic measurement systems of leafy vegetables. It explicitly models camera motion during multi-view acquisition to solve motion blur problems; optimizes Gaussian structure representation based on the spatial structural characteristics of leafy vegetables; and establishes pruning strategies to improve geometric accuracy and computational efficiency through the analysis of Gaussian contributions. Experimental results demonstrate that LV-3DGS achieves superior rendering quality and geometric precision compared with NeRF, Neuralangelo, 3DGS, SuGaR, GOF, 2DGS, and PGSR. The proposed framework attains an average SSIM of 0.94, PSNR of 34.53 dB, LPIPS of 0.11, and a geometric consistency error of 0.317 cm, while maintaining high training efficiency with an average training time of approximately 10 min. Furthermore, the proposed motion-blurred reconstruction module effectively mitigates artifacts caused by camera motion during multi-view image acquisition, improving data utilization efficiency and reconstruction robustness. Based on the reconstructed 3D models, phenotypic traits including plant height, leaf number, and leaf surface area were accurately estimated. The obtained phenotypic measurements achieved

R^{2}

values of 0.9959, 0.9651, and 0.9895, with corresponding RMSE values of 0.33 cm, 0.85, and 14.78 cm², respectively. These results confirm that phenotypic extraction based on LV-3DGS enables accurate and efficient computation of key plant traits, providing a practical solution for precision agriculture and high-throughput crop phenotyping. It is important to note that these metrics were obtained in a controlled indoor vertical farm with static artificial lighting, uniform background, and limited occlusions.

Despite its promising performance, this study still has several limitations. The current validation remains confined to a single operational domain: indoor vertical farming with fixed artificial lighting and static backgrounds. Open-field scenarios, which involve larger spatial scales, increased environmental complexity, and more diverse crop architectures, have not yet been explored. In such scenarios, reconstruction would face higher computational demands due to the need for more Gaussians to represent expansive scenes, as well as robustness challenges arising from uncontrolled illumination changes and wind-induced plant motion. Extending LV-3DGS to these conditions may require additional components, such as lighting-invariant feature embedding to handle variable lighting, and addressing wind-induced non-rigid deformation in the blur reconstruction module. Future work will focus on improving model scalability and computational efficiency to support large-scale agricultural applications. At the model architecture level, several inherent assumptions warrant further discussion. First, the flattening of 3D Gaussians into planar primitives, while effective for broad leaf surfaces, may under-represent regions with high curvature or sharp creases, where the piecewise planar approximation introduces discretization error. Second, the reliance on

α

-blending for normal and depth rendering can bias geometric estimates toward high-opacity primitives in occluded or semi-transparent regions, potentially causing surface bleeding artifacts. Finally, the PDGI module inherits the scale ambiguity of monocular depth predictors, though we observe that multi-view geometric optimization partially attenuates such errors during training. Future extensions could explore adaptive target spatial morphology to optimize Gaussian graph structure, unbiased depth estimation strategies, multi-view deep fusion, etc., to address these structural limitations. Furthermore, extreme leaf occlusion may lead to incomplete geometry where multi-view coverage is insufficient, and specular highlights on the blade surface can sometimes cause surface artifacts. Regarding initialization, we note that COLMAP is used only for initial camera poses; its sparse point cloud may be noisy in low-texture regions, but multi-view optimization and the depth prior effectively mitigate this limitation. In terms of geometric evaluation, the difficulty of acquiring high-fidelity 3D ground truth for delicate leafy vegetables means our validation relies on a custom Geometric Consistency (GC) metric. Importantly, GC reflects multi-view alignment consistency rather than true physical accuracy, and standard benchmarks such as Chamfer Distance or point-to-surface error are currently absent. We therefore explicitly acknowledge the lack of independent geometric validation against an objective external standard as a primary limitation of this study. Additionally, natural illumination variability poses challenges to data acquisition quality. Although this study incorporated monocular depth estimation as a supplementary data source, the reliance on vision-based data remains a limiting factor. Future research will investigate multi-sensor data fusion strategies that integrate complementary information from cameras, LiDAR, GPS, and IMU sensors to further enhance reconstruction accuracy and robustness. Ultimately, future efforts will aim to extend the proposed framework to a broader range of crop species and deploy it in real-world agricultural production systems, enabling automated, large-scale, and high-precision plant phenotypic analysis.

Author Contributions

Conceptualization, J.W., J.C., and H.Z.; data curation, X.Y. and J.Z.; formal analysis, J.Z.; software, J.Z.; visualization, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, X.Y. and K.L.; resources, K.L.; supervision, K.L.; project administration, K.L.; funding acquisition, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shanghai of China 2023 “Science and Technology Innovation Action Plan” in the Agricultural Science and Technology Field (No.23N21900400).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study consist of self-collected image data of leafy vegetable plants acquired by the authors. The dataset used for 3D reconstruction and phenotypic analysis is available from the corresponding author upon reasonable request.

Acknowledgments

The authors appreciate the funding organization for their financial support. The authors would also like to thank the helpful comments and suggestions provided by all the authors cited in this article and the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, X.; Zhong, J.; Lin, K.; Wu, J.; Chen, J.; Si, H. Research on binocular stereo vision phenotyping measurement for leafy vegetable based on 3DGS supervision. Smart Agric. Technol. 2025, 12, 101460. [Google Scholar] [CrossRef]
Sachithra, V.; Subhashini, L. How artificial intelligence uses to achieve the agriculture sustainability: Systematic review. Artif. Intell. Agric. 2023, 8, 46–59. [Google Scholar] [CrossRef]
Yao, J.; Gong, Y.; Xia, Z.; Nie, P.; Xu, H.; Zhang, H.; Chen, Y.; Li, X.; Li, Z.; Li, Y. Facility of tomato plant organ segmentation and phenotypic trait extraction via deep learning. Comput. Electron. Agric. 2025, 231, 109957. [Google Scholar] [CrossRef]
Sandhu, J.; Zhu, F.; Paul, P.; Gao, T.; Dhatt, B.K.; Ge, Y.; Staswick, P.; Yu, H.; Walia, H. PI-Plat: A high-resolution image-based 3D reconstruction method to estimate growth dynamics of rice inflorescence traits. Plant Methods 2019, 15, 162. [Google Scholar] [CrossRef]
Thapa, S.; Zhu, F.; Walia, H.; Yu, H.; Ge, Y. A novel LiDAR-based instrument for high-throughput, 3D measurement of morphological traits in maize and sorghum. Sensors 2018, 18, 1187. [Google Scholar] [CrossRef]
Sari, Y.A.; Gofuku, A. Measuring food volume from RGB-Depth image with point cloud conversion method using geometrical approach and robust ellipsoid fitting algorithm. J. Food Eng. 2023, 358, 111656. [Google Scholar] [CrossRef]
Zhao, C.; Zhang, Y.; Du, J.; Guo, X.; Wen, W.; Gu, S.; Wang, J.; Fan, J. Crop phenomics: Current status and perspectives. Front. Plant Sci. 2019, 10, 714. [Google Scholar] [CrossRef] [PubMed]
Masoudi, M.; Golzarian, M.R.; Lawson, S.S.; Rahimi, M.; Islam, S.M.S.; Khodabakhshian, R. Improving 3D reconstruction for accurate measurement of appearance characteristics in shiny fruits using post-harvest particle film: A case study on tomatoes. Comput. Electron. Agric. 2024, 224, 109141. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, Z.; Zhou, Z.; Wang, L.; Liao, Q.; Yang, C.; Yang, J. 3D terrestrial LiDAR for obtaining phenotypic information of cigar tobacco plants. Comput. Electron. Agric. 2024, 226, 109424. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139. [Google Scholar] [CrossRef]
Hu, K.; Ying, W.; Pan, Y.; Kang, H.; Chen, C. High-fidelity 3D reconstruction of plants using Neural Radiance Fields. Comput. Electron. Agric. 2024, 220, 108848. [Google Scholar] [CrossRef]
Gao, C.; Saraf, A.; Kopf, J.; Huang, J.B. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5712–5721. [Google Scholar] [CrossRef]
Li, Z.; Niklaus, S.; Snavely, N.; Wang, O. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6498–6508. [Google Scholar] [CrossRef]
Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7210–7219. [Google Scholar] [CrossRef]
Pumarola, A.; Corona, E.; Pons-Moll, G.; Moreno-Noguer, F. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10318–10327. [Google Scholar] [CrossRef]
Xian, W.; Huang, J.B.; Kopf, J.; Kim, C. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9421–9431. [Google Scholar] [CrossRef]
Chen, Y.; Xiao, K.; Gao, G.; Zhang, F. High-fidelity 3D reconstruction of peach orchards using a 3DGS-Ag model. Comput. Electron. Agric. 2025, 234, 110225. [Google Scholar] [CrossRef]
Shen, Y.; Zhou, H.; Yang, X.; Lu, X.; Guo, Z.; Jiang, L.; He, Y.; Cen, H. Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model. Comput. Electron. Agric. 2025, 235, 110320. [Google Scholar] [CrossRef]
Jiang, Y.; Tu, J.; Liu, Y.; Gao, X.; Long, X.; Wang, W.; Ma, Y. Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5322–5332. [Google Scholar] [CrossRef]
Guédon, A.; Lepetit, V. SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 5354–5363. [Google Scholar] [CrossRef]
Huang, B.; Yu, Z.; Chen, A.; Geiger, A.; Gao, S. 2d gaussian splatting for geometrically accurate radiance fields. In Proceedings of the ACM SIGGRAPH 2024 Conference Papers, Denver, CO, USA, 27 July–1 August 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–11. [Google Scholar] [CrossRef]
Chen, D.; Li, H.; Ye, W.; Wang, Y.; Xie, W.; Zhai, S.; Wang, N.; Liu, H.; Bao, H.; Zhang, G. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction. IEEE Trans. Vis. Comput. Graph. 2024, 31, 6100–6111. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Sattler, T.; Geiger, A. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes. ACM Trans. Graph. ToG 2024, 43, 271. [Google Scholar] [CrossRef]
Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
Schönberger, J.L.; Zheng, E.; Frahm, J.M.; Pollefeys, M. Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 501–518. [Google Scholar] [CrossRef]
Ma, L.; Li, X.; Liao, J.; Zhang, Q.; Wang, X.; Wang, J.; Sander, P.V. Deblur-nerf: Neural radiance fields from blurry images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12861–12870. [Google Scholar] [CrossRef]
Park, H.; Mu Lee, K. Joint estimation of camera pose, depth, deblurring, and super-resolution from a blurred image sequence. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4613–4621. [Google Scholar] [CrossRef]
Lee, D.; Oh, J.; Rim, J.; Cho, S.; Lee, K.M. Exblurf: Efficient radiance fields for extreme motion blurred images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 17639–17648. [Google Scholar] [CrossRef]
Chen, W.; Liu, L. Deblur-gs: 3d gaussian splatting from camera motion blurred images. Proc. Acm Comput. Graph. Interact. Tech. 2024, 7, 18. [Google Scholar] [CrossRef]
Rim, J.; Lee, H.; Won, J.; Cho, S. Real-world blur dataset for learning and benchmarking deblurring algorithms. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 184–201. [Google Scholar] [CrossRef]
Jin, X.; Jin, R.; Li, B.; Zou, D.; Yu, W. PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors. In Proceedings of the The Thirty-Ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, USA, 2–7 December 2025; Available online: https://openreview.net/forum?id=38GF07Tmtr (accessed on 5 January 2026).
Yang, L.; Kang, B.; Huang, Z.; Xu, X.; Feng, J.; Zhao, H. Depth anything: Unleashing the power of large-scale unlabeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 10371–10381. [Google Scholar] [CrossRef]
Chen, H.; Li, C.; Lee, G.H. NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance. arXiv 2023, arXiv:2312.00846. [Google Scholar] [CrossRef]
Fan, L.; Yang, Y.; Li, M.; Li, H.; Zhang, Z. Trim 3D Gaussian Splatting for Accurate Geometry Representation. arXiv 2024, arXiv:2406.07499. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar] [CrossRef]
Li, Z.; Müller, T.; Evans, A.; Taylor, R.H.; Unberath, M.; Liu, M.Y.; Lin, C.H. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8456–8465. [Google Scholar] [CrossRef]
Wu, S.; Wen, W.; Wang, Y.; Fan, J.; Wang, C.; Gou, W.; Guo, X. MVS-Pheno: A portable and low-cost phenotyping platform for maize shoots using multiview stereo 3D reconstruction. Plant Phenomics 2020, 251, 73–88. [Google Scholar] [CrossRef]
Ge, X.; Wu, S.; Wen, W.; Shen, F.; Xiao, P.; Lu, X.; Liu, H.; Zhang, M.; Guo, X. LettuceP3D: A tool for analysing 3D phenotypes of individual lettuce plants. Biosyst. Eng. 2025, 251, 73–88. [Google Scholar] [CrossRef]
Yang, Z.; Han, Y. A low-cost 3D phenotype measurement method of leafy vegetables using video recordings from smartphones. Sensors 2020, 20, 6068. [Google Scholar] [CrossRef]
Bloch, V.; Shapiguzov, A.; Kotilainen, T.; Pastell, M. A method for phenotyping lettuce volume and structure from 3D images. Plant Methods 2025, 21, 27. [Google Scholar] [CrossRef] [PubMed]
Yang, S.; Zheng, L.; Gao, W.; Wang, B.; Hao, X.; Mi, J.; Wang, M. An efficient processing approach for colored point cloud-based high-throughput seedling phenotyping. Remote Sens. 2020, 12, 1540. [Google Scholar] [CrossRef]
Chen, Q.; Huang, S.; Liu, S.; Zhong, M.; Zhang, G.; Song, L.; Zhang, X.; Zhang, J.; Wu, K.; Ye, Z.; et al. Multi-view 3D reconstruction of seedling using 2D image contour. Biosyst. Eng. 2024, 243, 130–147. [Google Scholar] [CrossRef]

Figure 1. LV-3DGS overview.

Figure 2. The framework of the blurred reconstruction module. ① Motion Trajectory Estimation: a Bézier curve is parameterized from the initial camera poses and further refined during training based on the optimized sub-frame pose. ② Temporal Sampling: sub-frame camera poses are uniformly sampled along the estimated motion trajectory. ③ Operation Flow: the sampled sub-frame poses are used to train via Gaussian splatting. ④ Gradient Flow: the loss

L_{R e c o n}

between the reconstructed and input images is back-propagated to optimize both the sub-frame alignment parameters

V_{i}

and the motion trajectory parameters. The above four steps ①②③④ are iteratively performed within training batch.

Figure 2. The framework of the blurred reconstruction module. ① Motion Trajectory Estimation: a Bézier curve is parameterized from the initial camera poses and further refined during training based on the optimized sub-frame pose. ② Temporal Sampling: sub-frame camera poses are uniformly sampled along the estimated motion trajectory. ③ Operation Flow: the sampled sub-frame poses are used to train via Gaussian splatting. ④ Gradient Flow: the loss

L_{R e c o n}

between the reconstructed and input images is back-propagated to optimize both the sub-frame alignment parameters

V_{i}

and the motion trajectory parameters. The above four steps ①②③④ are iteratively performed within training batch.

Figure 3. Schematic diagram of the local plane assumption. Red dot represents the current point and yellow dots represent its local neighboring points.

Figure 4. Schematic diagram of the normal constraint, where u denotes the camera center,

O_{c}

the Gaussian center,

n_{i}

the normal vector,

δ_{i}

the distance,

d_{i}

the depth, and v is the camera viewing direction.

Figure 4. Schematic diagram of the normal constraint, where u denotes the camera center,

O_{c}

the Gaussian center,

n_{i}

the normal vector,

δ_{i}

the distance,

d_{i}

the depth, and v is the camera viewing direction.

Figure 5. Schematic diagram of global geometric consistency.

Figure 6. Schematic diagram of phenotypic measurement. (a): the result after clustering, (b): the result after triangulation.

Figure 7. The visualization result of the deblurring module.

Figure 8. The visualization results of different models on the dataset. The area bordered in red is enlarged on the right.

Figure 9. Compared results between the ground truth values by manual measurement and the calculated results from 3D model. CI represents the confidence interval and SD stands for the standard deviation.

Table 1. The quantitative comparison results of different motion blur reconstruction models. The Blur value is calculated using FFT, which indicates the degree of motion blurring. The smaller the value, the more blurred it is. Arrows (↑ or ↓) indicate the direction of better performance.

Scene	Blur = 9.562			Blur = 7.065			Blur = 5.314
Method	PSNR ↑	SSIM ↑	LPIPS ↓	PSNR ↑	SSIM ↑	LPIPS ↓	PSNR ↑	SSIM ↑	LPIPS ↓
3DGS+Restormer	23.063	0.723	0.326	22.561	0.702	0.335	21.548	0.684	0.384
Deblur-NeRF	26.145	0.826	0.261	26.174	0.813	0.269	25.145	0.783	0.287
Ours	28.471	0.889	0.213	28.554	0.886	0.226	27.298	0.874	0.242

Table 2. The rendering results of the evaluation metrics for different models on the dataset. (h.m.s) represent hours, minutes, seconds, respectively. Arrows (↑ or ↓) indicate the direction of better performance.

Method	PSNR ↑	SSIM ↑	LPIPS ↓	GC ↓	Train Time (h.m.s) ↓
NeRF	27.485	0.841	0.289	3.487	03.45.18
Neuralangelo	28.761	0.875	0.256	2.032	13.25.41
3DGS	31.836	0.907	0.181	1.883	00.11.34
SuGaR	29.517	0.884	0.245	1.374	01.35.03
GOF	31.145	0.907	0.164	0.512	02.13.23
2DGS	31.049	0.909	0.175	0.745	00.20.36
PGSR	33.615	0.921	0.171	0.583	00.45.08
LV-3DGS (ours)	34.533	0.941	0.115	0.317	00.10.20

Table 3. The evaluation metrics for different loss hyperparameter configurations. Arrows (↑ or ↓) indicate the direction of better performance.

$λ_{1}$ : $λ_{2}$ : $λ_{3}$	PSNR ↑	SSIM ↑	LPIPS ↓	GC ↓
1.0:1.0:1.0	33.418	0.933	0.135	0.514
1.0:1.0:0.0	32.204	0.917	0.149	1.253
1.0:0.0:1.0	32.316	0.921	0.143	0.997
1.0:1.0:0.8	33.164	0.931	0.138	0.664
1.0:1.0:1.2	34.533	0.940	0.115	0.317
1.0:1.0:1.4	34.159	0.938	0.125	0.597
1.0:0.8:1.2	34.195	0.934	0.119	0.446
1.0:0.6:1.2	33.887	0.925	0.138	0.751

Table 4. The evaluation metrics for different pruning hyperparameter configurations. Arrows (↑ or ↓) indicate the direction of better performance.

$γ$ ’s Value	PSNR ↑	SSIM ↑	LPIPS ↓	GC ↓
$γ = 0.75$	33.015	0.931	0.129	0.532
$γ = 0.50$	33.157	0.935	0.123	0.416
$γ = 0.25$	34.533	0.940	0.115	0.317

Table 5. The evaluation metrics for different methods. (m.s) represent minutes and seconds, respectively. Arrows (↑ or ↓) indicate the direction of better performance.

Method	PSNR ↑	SSIM ↑	LPIPS ↓	GC ↓	Train Time (m.s) ↓
baseline (3DGS)	31.836	0.907	0.181	1.883	11.34
3DGS + PDGI	33.124	0.924	0.169	1.574	13.45
3DGS + Flattening + NC	33.887	0.925	0.169	0.599	43.03
3DGS + PDGI + Flattening + NC	34.559	0.943	0.124	0.507	48.31
3DGS + MDR	33.251	0.925	0.179	1.556	11.31
3DGS + GP	32.245	0.917	0.190	0.335	08.12
LV-3DGS	34.533	0.940	0.115	0.317	10.20

Table 6. Comparison of the proposed method with different studies in phenotyping measurements. (The units of Height and Area are cm and cm², respectively. T and S respectively represent that this indicator is the total surface area of the leaf and the surface area of a single leaf).

Crop Types	Indicator	Methods for 3D Information Acquisition	$R^{2}$	RMSE
Leafy vegetables [40]	Height Number of leaves	Multi-view stereo	0.937 0.730	1.82 1.57
Lettuce [41]	Height	RGBD	0.820	6.20
Cucumber seedling [42]	Height Area	Kinect-V2	0.982 0.892	2.30 84.24 (T)
Lettuce [39]	Height Number of leaves Area	Multi-view stereo	0.961 0.979 0.856	0.75 0.40 1.89 (S)
Maize shoots [38]	Height Area	Multi-view stereo	0.998 0.930	2.96 75.03 (T)
Seedling [43]	Height Area	LiDAR	0.980 0.660	2.30 1.05 (S)
Leafy vegetables [1]	Height Number of leaves Area	Binocular vision	0.994 0.957 0.977	0.38 0.96 21.33 (T)
Leafy vegetables [Ours]	Height Number of leaves Area	3D Gaussian Splatting	0.9959 0.9651 0.9895	0.33 0.85 14.78 (T)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, X.; Zhong, J.; Lin, K.; Wu, J.; Chen, J.; Zhu, H. LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables. Agriculture 2026, 16, 1111. https://doi.org/10.3390/agriculture16101111

AMA Style

Yang X, Zhong J, Lin K, Wu J, Chen J, Zhu H. LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables. Agriculture. 2026; 16(10):1111. https://doi.org/10.3390/agriculture16101111

Chicago/Turabian Style

Yang, Xuejun, Jinbiao Zhong, Kaiyan Lin, Junhui Wu, Jie Chen, and Huajun Zhu. 2026. "LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables" Agriculture 16, no. 10: 1111. https://doi.org/10.3390/agriculture16101111

APA Style

Yang, X., Zhong, J., Lin, K., Wu, J., Chen, J., & Zhu, H. (2026). LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables. Agriculture, 16(10), 1111. https://doi.org/10.3390/agriculture16101111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LV-3DGS: A High-Quality Reconstruction Method Based on 3D Gaussian Splatting for Precise Phenotypic Measurement of Leafy Vegetables

Abstract

1. Introduction

2. Materials

Data Acquisition and Processing

3. Methods

3.1. Blurred Reconstruction Module

3.2. The Planar Optimization Strategy

3.2.1. Prior Depth-Guided Initialization (PDGI)

3.2.2. 3D Gaussian Flattening

3.2.3. Normal-Constrained (NC) Planar Gaussian Rendering

3.2.4. Median Depth Rendering (MDR)

3.3. The Gaussian Pruning (GP) Optimization Strategy

3.4. Regularization Functions for Model Training

3.4.1. Image Reconstruction Loss

3.4.2. Local Geometric Consistency Loss

3.4.3. Global Geometric Consistency Loss

3.4.4. Blur Reconstruction Loss

3.5. Leafy Vegetable Phenotyping

4. Results and Discussion

4.1. Experimental Environment and Evaluation Indicators

4.2. Evaluation of 3D Rendering Performance at Motion Blur Scenes

4.3. Comparison of Training and Rendering Efficiency Across Different Models

4.4. Performance Comparison of Different Network Structures

4.4.1. Hyperparameters Optimization

4.4.2. Effectiveness of Different Module

4.5. The Results of Leafy Vegetable Phenotypic Calculation and Regression

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI