NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction

Li, Ziyang; Huai, Yongjian; Meng, Qingkuo; Dong, Shiquan

doi:10.3390/info16080654

Open AccessArticle

NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(8), 654; https://doi.org/10.3390/info16080654

Submission received: 17 June 2025 / Revised: 29 July 2025 / Accepted: 29 July 2025 / Published: 31 July 2025

(This article belongs to the Special Issue Extended Reality and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

High-quality green gardens can markedly enhance the quality of life and mental well-being of their users. However, health and lifestyle constraints make it difficult for people to enjoy urban gardens, and traditional methods struggle to offer the high-fidelity experiences they need. This study introduces a 3D scene reconstruction and rendering strategy based on implicit neural representation through the efficient and removable neural radiation fields model (NeRF-RE). Leveraging neural radiance fields (NeRF), the model incorporates a multi-resolution hash grid and proposal network to improve training efficiency and modeling accuracy, while integrating a segment-anything model to safeguard public privacy. Take the crabapple tree, extensively utilized in urban garden design across temperate regions of the Northern Hemisphere. A dataset comprising 660 images of crabapple trees exhibiting three distinct geometric forms is collected to assess the NeRF-RE model’s performance. The results demonstrated that the ‘harvest gold’ crabapple scene had the highest reconstruction accuracy, with PSNR, LPIPS and SSIM of 24.80 dB, 0.34 and 0.74, respectively. Compared to the Mip-NeRF 360 model, the NeRF-RE model not only showed an up to 21-fold increase in training efficiency for three types of crabapple trees, but also exhibited a less pronounced impact of dataset size on reconstruction accuracy. This study reconstructs real scenes with high fidelity using virtual reality technology. It not only facilitates people’s personal enjoyment of the beauty of natural gardens at home, but also makes certain contributions to the publicity and promotion of urban landscapes.

Keywords:

neural radiance fields; virtual reality; urban garden; borderless scene

1. Introduction

The immersive capabilities of virtual reality (VR) technology effectively regulate emotions, simulate natural interactions, and reduce psychological stress [1,2]. Clinically, nature-integrated VR provides a viable alternative to analgesics [3]. However, current urban landscape virtualization depends on conventional 3D techniques [4] with inherent limitations—low fidelity, high costs, and limited applicability [5] —hindering public accessibility. This necessitates developing boundary-free, cost-effective, high-precision urban garden 3D reconstruction. Such advancement would empower humans through nature immersion, enhancing mental health, alleviating physical discomfort, and informing scientific urban planning.

Traditional 3D reconstruction methods for unbounded tree scenes—using point clouds or photogrammetry—fail to meet societal demands due to high costs and low fidelity. Point cloud methods (e.g., LiDAR, laser scanning) suffer from expensive equipment, uneven density, and noise [6,7]. Image-based techniques are compromised by branch occlusion and texture repetition [8,9]. To overcome these limitations, recent studies have leveraged implicit neural representations to achieve high-fidelity, low-cost reconstruction of scenes such as plants, furniture, and buildings [10,11], demonstrating significant advantages [12,13].

Neural radiation fields (NeRF), an implicit 3D representation model, enables the reconstruction of urban ornamental plants in complex backgrounds. Unlike explicit representations (e.g., grids, point clouds) that suffer from rendering artifacts like aliasing [11], NeRF encodes continuous scene geometry and color via multilayer perceptrons (MLPs), achieving high-fidelity reconstruction [14,15]. Nevertheless, NeRF-based urban garden reconstruction remains understudied. Key challenges include balancing geometric precision with visual realism, optimizing training efficiency, and ensuring accurate texture mapping [5]. Recent advancements address these through improved sampling techniques [16,17] and the integration of object recognition models, such as SAM [18].

This study proposes NERF-RE, an efficient removable neural radiation fields model based on the NeRF framework, enabling low-cost, high-fidelity 3D reconstruction of unbounded scenes in complex natural environments. Focusing on Malus spp. (a key temperate-zone ornamental plant), we collected three geometric variants from the China National Botanical Garden. The objectives include the following:

(1): Comparing NeRF-RE’s robustness with traditional methods in urban begonia reconstruction.
(2): Analyzing ornamental begonia variations in unbounded scene reconstruction.
(3): Quantifying RGB image quantity effects on reconstruction quality.

2. Materials and Methods

2.1. Neural Radiance Fields Model

The neural radiance fields (NeRF) model was initially introduced by Mildenhall for novel view synthesis [14]. The superior ability of NeRF to represent 3D information makes it widely used in the field of novel view synthesis and 3D reconstruction. A NeRF model represents a three-dimensional scene as a radiation field approximated by a neural network. The radiation field describes the color and volume density of each point in the scene and in each viewing direction.

F (X, θ, \emptyset) \to (c, σ)

(1)

C (r) = \int_{t_{1}}^{t_{2}} T (t) * σ (r (t)) * c (r (t), d) * d t

(2)

where

X = (x, y, z)

is the coordinates in the scene;

(θ, \emptyset)

are expressed as azimuth and polar angles of view;

c (r, g, b)

is the color;

σ

is the volume density;

r (t) = o + t d

is the camera ray;

σ (r (t))

and

c (r (t), d)

represent the bulk density and color of the camera rays along the observed direction

d

at the point

r (t); o

is the camera position; and

d t

is the differential distance of the ray in each integral step.

T (t)

denotes the cumulative transmittance, indicating the likelihood of a ray traversing from

t_{1}

to

t

without encountering any obstruction.

T (t) = e x p (- \int_{t_{1}}^{t} σ (r (u)) * d u)

(3)

The new view synthesis involves rendering by tracking camera rays

C (r)

through each pixel of the target image. A stratified sampling approach, predicated on uncertainty principles, divides the rays into N equally spaced bins, from each of which a sample is uniformly extracted. Equation (2) can be approximated as follows:

\hat{C} (r) = \sum_{k = 1}^{K} \hat{T} (t_{k}) α (σ (X_{k}) δ_{k}) c (X_{k}, d)

(4)

\hat{T} (t_{k}) = e x p (- \sum_{k^{'} = 1}^{k - 1} σ (X_{k^{'}}) δ_{k^{'}})

(5)

δ_{k}

is the distance from sample

k

to sample

k + 1

;

k

is the sampling point.

The expected depth of the rays can be calculated using the cumulative transmittance.

d (r) = \int_{t_{1}}^{t_{2}} T (t) * σ (r (t)) * t * d t

(6)

For each pixel, a square error photometric loss is used to optimize the MLP parameters.

L = \sum_{r \in R} {‖\hat{C} (r) - C_{g t} (r)‖}_{2}^{2}

(7)

where

C_{g t} (r)

is the true ground color of the training image pixel associated with R; R is the set of rays associated with the image to be synthesized.

2.2. The NeRF-RE Model

Addressing limitations of traditional 3D reconstruction (time-intensive labor, sparse point cloud artifacts), we develop NERF-RE—a NeRF-based, high-fidelity model enabling low-cost borderless scene reconstruction through multi-view image integration. Its variant NeRF-E lacks object removal capability. The workflow (Figure 1) comprises the following:

1. Data processing: In this study, we captured three types of crabapple trees (malus ‘harvest gold’ tree, malus halliana koehne tree and malus ‘sparkler’ tree) in the Crabapple Garden of the China National Botanical Garden in April 2024 using a Honor smartphone (Honor Terminal Co., Ltd., Shenzhen, China) to collect 360° imagery (Figure 2).

Figure 2. Three different geometric forms of crabapple trees.

Each type of crabapple tree is represented by 220 RGB images, which include various elements such as trees, roads, pavilions, visitors, and staff members. To further enhance scene reconstruction accuracy, we also collected 80 sets of label information closely associated with scene elements. These labels can effectively improve the recognition and segmentation of scene components, while the image data provides essential color and density information for scene reconstruction. Finally, we divided the image data and label data into training and validation sets at an 8:2 ratio. Among them, there are 176 training set images and 44 test set images for different scenarios. The following are the characteristics of different scenarios.

Malus ‘harvest gold’ tree (less occluded type): characterized by fewer branches, abundant leaves, and a dwarf stature, it is considered a small vegetation type.

Malus halliana koehne tree (medium occluded type): featuring a variety of trunks and branches, mutual branch shielding, and a large size, it serves as an ideal example of medium-density plant geometry.

Malus ‘sparkler’ tree (dense occlusion type): with numerous trunks and branches, and a highly complex geometric topology of disordered distribution, it presents a dense occlusion. The sparse leaves and dense petals obscure each other’s features, creating a highly complex and dense environment that challenges the distinction of individual components.

The dataset encompasses images of visitors, staff, and carried items collected during the data acquisition process. To protect public privacy and maintain the integrity of the scene, this study introduces the segment anything model (SAM) [19] and the large mask inpainting (LaMa) model [20]. Specifically, the coordinates of interfering targets within the scene are extracted using the ResNet50 model [21] and the feature pyramid network [22]. Subsequently, the prompt encoder analyzes and processes this coordinate information. Next, the vision transformer (ViT)-based image encoder and transformer-based mask decoder execute classification and segmentation of the interfering targets. Lastly, the LaMa model fills in and restores the areas where removed objects had caused disturbances.

2. Generation scene: Reconstructing unbounded tree scenes requires accurate rendering of both focal targets and complex environments, yet this substantially reduces training efficiency. Our solution integrates the two following innovations:

(1): Instant-NGP’s occupancy grid rapidly identifies empty areas via multi-resolution hashing of ray origin/direction data, skipping non-essential regions through binary density assessment;
(2): Mip-NeRF 360’s proposal network refines transmittance estimates via importance sampling. Using probability density functions (PDFs), it samples critical regions via inverse transform sampling, optimizing color/density predictions along rays.

Neural network training and computations in this study were conducted on the Ubuntu platform (Ubuntu 20.04). Graphics processing and deep learning tasks were performed using an NVIDIA A100 GPU (A100 80GB PCIe) within the PyTorch framework (Torch 2.0.1). This high-performance configuration ensures the efficient execution of data-intensive and neural network operations.

2.3. Model Evaluation

In this study, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) are used to evaluate model reconstruction results.

PSNR is a no-reference quality assessment index, which is often used to measure the difference in pixel values between reconstructed images and ground truth images as follows:

P S N R (I) = 10 * \log_{10} (\frac{{M A X (I)}^{2}}{M S E (I)})

(8)

where

M A X (I)

is the maximum possible pixel value in the image.

M S E (I)

is the pixel-level mean-square error calculated for all color channels.

SSIM is a complete reference quality evaluation index. Image distortion is modeled as a combination of three different factors: brightness, contrast, and structure. It uses the mean value as an estimate of brightness, the standard deviation as an estimate of contrast, and the covariance as a measure of structural similarity. It is derived from the following formula:

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(9)

μ_{x} = \sum_{i} w_{i} x_{i}

(10)

σ_{x} = {(\sum_{i} w_{i} {(x_{i} - μ_{x})}^{2})}^{\frac{1}{2}}

(11)

σ_{x y} = \sum_{i} w_{i} (x_{i} - μ_{x}) (y_{i} - μ_{y})

(12)

where

C_{i} = {{(K}_{i} L)}^{2}

,

L

is the range of pixels.

K_{1} = 0.01, K_{2} = 0.03

.

x_{i}, y_{i}

are pixels sampled from the reference image and the evaluation image, respectively.

LPIPS is a complete reference quality assessment indicator that uses learned convolution features. The score is calculated from the weighted pixel-level MSE of the multilayer feature map.

L P I P S (x, y) = \sum_{l}^{L} \frac{1}{H_{l} W_{l}} \sum_{h, w}^{H_{l} {, W}_{l}} {‖W_{l} ⊙ (x_{h w}^{l} - y_{h w}^{l})‖}_{2}^{2}

(13)

where

x_{h w}^{l}, y_{h w}^{l}

are features at the pixel width w, pixel height h, and layer l of the reference and evaluation image.

H_{l} {and W}_{l}

are the height and width of the feature maps.

3. Results

3.1. Performance Evaluation of Scene Reconstruction and Rendering

As Table 1 demonstrates, the proposed NeRF-E and NeRF-RE models significantly outperform Mip-NeRF 360 across all three crabapple tree scenes (Malus ‘harvest gold’, Malus halliana koehne, and Malus ‘sparkler’) in terms of both reconstruction quality and training efficiency. Specifically, the key findings can be summarized as follows: first, regarding reconstruction quality, both proposed models achieved consistently higher PSNR (>21.12 dB vs. <21.03 dB), lower LPIPS (<0.39 vs. >0.45), and higher SSIM (>0.58 vs. <0.50) across all species when compared to Mip-NeRF 360. Notably, the peak performance occurred with Malus ‘harvest gold’, where the models reached remarkable metrics (PSNR: 24.80 dB, SSIM: 0.74). Moving to computational efficiency, the training times were dramatically reduced by 95.3% (0.51–0.59 h vs. 10.91–10.93 h), which represents an approximately 21× acceleration over Mip-NeRF 360. Furthermore, it is worth emphasizing that NeRF-E and NeRF-RE exhibited minimal performance variance (ΔPSNR ≤ 0.18 dB across species), strongly suggesting that these models possess stable scene generalization capabilities. Table 2 shows the comparison and comparison of the structure of the model in this paper with other similar models.

As demonstrated in Figure 3, variations in input image count exhibited minimal impact on NeRF-RE’s rendering quality while degrading Mip-NeRF 360’s performance. Comparative analysis across all three crabapple tree types revealed NeRF-RE’s consistent superiority, as shown by the following: (1) for PSNR (Figure 3a), it exceeded 20 dB in 93% of cases (vs. Mip-NeRF 360’s 47%) with minimum values of 23.45 dB, 21.46 dB, and 19.29 dB; (2) regarding LPIPS (Figure 3b), 87% of NeRF-RE renders scored <0.4 (indicating higher perceptual quality) versus Mip-NeRF 360’s consistent >0.4 results, while NeRF-RE’s maximum LPIPS values remained at 0.37, 0.39, and 0.44; and (3) in SSIM evaluations (Figure 3c), NeRF-RE surpassed 0.5 in 93% of instances (vs. Mip-NeRF 360’s <30%) and maintained minimum SSIM values of 0.69, 0.61, and 0.45. These results collectively confirm NeRF-RE’s robustness to input image variation across all quality metrics.

3.2. Recognition and Segmentation Results of Different Conditions

The NeRF-RE model demonstrates robust multi-object recognition and segmentation capabilities across varied environments, as quantified in Table 3. Analysis reveals three core competencies: (1) precise segmentation of human subjects (visitors/staff) including carried objects and tools; (2) effective identification of targets under partial-to-dense occlusion; and (3) simultaneous processing of single or multiple targets. Performance under challenging conditions confirms exceptional resilience; in Malus ‘harvest gold’ scenarios with multiple partially occluded targets (Figure 4), the model achieved 76–99% human recognition accuracy and 75% small cart identification. While maintaining 70% accuracy for single occluded humans in the Malus halliana koehne tree scene (Figure 4), it delivered 87% accuracy in locally dense occlusion scenarios within the Malus ‘sparkler’ tree scene. These findings collectively establish the model’s robustness in complex scenes characterized by variable target density and partial visibility.

3.3. Rendering Results for Different Scenes

As depicted in Figure 5, the NeRF-RE model demonstrated superior rendering performance compared to the Mip-NeRF 360 model across three types of crabapple tree scenes. The Mip-NeRF 360 model’s rendering of the malus halliana koehne tree scene exhibited significant tree branch damage. In addition, the rendering of the malus ‘harvest gold’ tree and malus ‘sparkler’ tree scenes in the Mip-NeRF 360 model featured small areas of broken branches, leaves, and petals, and blurred rendering issues. In contrast, the NeRF-RE model maintained stable, high-definition performance across the three crabapple tree scenes, with no damage observed to the branches and leaves. In the normal map results, the Mip-NeRF 360 model over-emphasized the simulation of the surrounding environment of the crabapple trees, particularly in the scenes of the malus halliana koehne and the malus ‘harvest gold’ trees, whereas the NeRF-RE model concentrated more on simulating the details of the crabapple trees themselves.

4. Discussion

4.1. The Number of Images Exerts a Minor Impact on the Performance of the NeRF-RE Model When Rendering Boundless Scenes

As shown in Figure 3, the number of images did not significantly affect the NeRF-RE model’s reconstruction of the unbounded begonia tree scene. The Mip-NeRF 360 model requires dense viewpoints and multi-angle camera poses to achieve high-fidelity scene rendering, which may stem from limitations inherent in the original NeRF architecture [23,24,25,26]. However, insufficient input images lead to inadequate feature extraction in Mip-NeRF 360. This manifests visually as object floating and background blurring artifacts in the rendered results (Figure 5). Notably, training the Mip-NeRF 360 model with 60 images of the malus ‘sparkler’ tree resulted in a rendered output with PSNR, LPIPS, and SSIM values of 16.67 dB, 0.59, and 0.22, respectively, which were found to be unsuitable for real-world applications. Conversely, the number of images had little impact on the NeRF-RE model’s reconstruction and rendering of three types of crabapple tree scenes with varying levels of complexity. This improvement may stem from the integration of an identification and segmentation model [19,20,21,22] in NeRF-RE, which enables precise extraction and segmentation of target features within scenes, effectively enhancing the model’s capability to extract image feature information (Table 2). Furthermore, the object removal and inpainting module reduces the presence of irrelevant scene elements, thereby decreasing the complexity of scene reconstruction. Finally, the multi-resolution hash grid and proposal network establish a feature pyramid network [22] with similar multi-scale characteristics. This pyramid architecture efficiently captures cross-scale contextual information, improves the model’s training efficiency [27], and significantly enhances its capacity to localize and represent tree features within scenes. Relative to the Mip-NeRF 360 model, the NeRF-RE model demonstrates markedly superior rendering performance under data-limited conditions. This outcome underscores the enhanced practicability and robustness of the NeRF-RE model.

4.2. Significant Impact of Tree Geometric Complexity on Boundless Scenes Reconstruction

Increased tree occlusion significantly impacts the reconstruction of unbounded scenes. As depicted in Figure 5, the densely complex malus ‘sparkler’ tree scene exhibits the poorest reconstruction and rendering quality, whereas the unobstructed malus ‘harvest gold’ tree scene shows the best performance. This may stem from the limitations of the Mip-NeRF 360 model, which requires an extended training time [16] to enhance the reconstruction and rendering of the three unbounded scenes. On the other hand, the malus ‘sparkler’ tree, characterized by its intricate branches, abundant flowers, and high occlusion, leads to inadequate feature extraction by the mode [10,11]. Consequently, the rendered background is significantly defocused, with branches and flowers appearing severely fragmented and absent (Figure 5). In contrast, the NeRF-RE model leverages a multi-resolution hash grid to bypass regions not contributing to the final rendering [28]. The proposal network facilitates deep extraction of regions with significant contributions [29]. The SAM and LaMa models are employed to eliminate redundant targets, thereby reducing the complexity of the scene’s internal elements. Ultimately, the NeRF-RE model markedly enhances training efficiency, reducing training time by up to 21-fold, while ensuring the accurate reconstruction and rendering of unbounded crabapple tree scenes. This approach maximizes the integrity and fidelity of unbounded crabapple tree scenes, ensuring visitor and staff privacy and safety.

5. Conclusions

Traditional virtual garden scenes suffer from issues like high costs, low reconstruction efficiency, and poor fidelity, culminating in a suboptimal urban garden experience for people. To overcome these challenges, this study proposes an innovative solution, the NeRF-RE model based on implicit neural representation. The model incorporates a multi-resolution hash grid and a proposal network to improve training efficiency and modeling accuracy, while also optimizing scene targets. Experimental results demonstrate that the NeRF-RE model achieves realistic rendering across various virtual scenes, ranging from unobstructed to medium-density and densely complex environments. Furthermore, the NeRF-RE model surpasses the baseline Mip-NeRF 360 model in PSNR, SSIM, and LPIPS metrics. Compared to the Mip-NeRF 360 model, the NeRF-RE model reduces training time by up to 21-fold, from 10.91 h to 0.51 h. Ultimately, the boundary-free virtual scenes reconstructed by the NeRF-RE model facilitate the people’s immersive experience of the natural environment, aid in mental stress adjustment, and support physical and mental health maintenance. This research holds significant implications for the advancement of urban landscape promotion, design, and planning.

Author Contributions

Z.L.: conceptualization, methodology, writing—original draft. Y.H.: writing—review and editing. Q.M.: investigation. S.D.: investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Chan, S.H.M.; Qiu, L.; Esposito, G.; Mai, K.P. Vertical greenery buffers against stress: Evidence from psychophysiological responses in virtual reality. Landsc. Urban Plan. 2021, 213, 104127. [Google Scholar] [CrossRef]
Mattila, O.; Korhonen, A.; Pöyry, E.; Hauru, K.; Holopainen, J.; Parvinen, P. Restoration in a virtual reality forest environment. Comput. Hum. Behav. 2020, 107, 106295. [Google Scholar] [CrossRef]
Tanja-Dijkstra, K.; Pahl, S.; White, M.P.; Auvray, M.; Stone, R.J.; Andrade, J.; Mills, I.; Moles, D.R. The soothing sea: A virtual coastal walk can reduce experienced and recollected pain. Environ. Behav. 2018, 50, 599–625. [Google Scholar] [CrossRef] [PubMed]
Cao, Z.; Wang, Y.; Zheng, W.; Yin, L.; Tang, Y.; Miao, W.; Liu, S.; Yang, B. The algorithm of stereo vision and shape from shading based on endoscope imaging. Biomed. Signal Process. Control. 2022, 76, 103658. [Google Scholar] [CrossRef]
Li, Y.; Kan, J. CGAN-Based Forest Scene 3D Reconstruction from a Single Image. Forests 2024, 15, 194. [Google Scholar] [CrossRef]
Liu, Y.; Guo, J.; Benes, B.; Deussen, O.; Zhang, X.; Huang, H. TreePartNet: Neural decomposition of point clouds for 3D tree reconstruction. ACM Trans. Graph. TOG 2021, 40, 232:1–232:16. [Google Scholar] [CrossRef]
Yan, X.; Chai, G.; Han, X.; Lei, L.; Wang, G.; Jia, X.; Zhang, X. SA-Pmnet: Utilizing Close-Range Photogrammetry Combined with Image Enhancement and Self-Attention Mechanisms for 3D Reconstruction of Forests. Remote Sens. 2024, 16, 416. [Google Scholar] [CrossRef]
Li, X.; Zhou, X.; Xu, S. Individual Tree Reconstruction Based on Circular Truncated Cones From Portable LiDAR Scanner Data. IEEE Geosci. Remote Sens. Lett. 2022, 20, 3229065. [Google Scholar] [CrossRef]
Xu, C.; Wu, C.; Qu, D.; Xu, F.; Sun, H.; Song, J. Accurate and efficient stereo matching by log-angle and pyramid-tree. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 4007–4019. [Google Scholar] [CrossRef]
Hu, K.; Ying, W.; Pan, Y.; Kang, H.; Chen, C. High-fidelity 3D reconstruction of plants using Neural Radiance Fields. Comput. Electron. Agric. 2024, 220, 108848. [Google Scholar] [CrossRef]
Zhang, J.; Wang, X.; Ni, X.; Dong, F.; Tang, L.; Sun, J.; Wang, Y. Neural radiance fields for multi-scale constraint-free 3D reconstruction and rendering in orchard scenes. Comput. Electron. Agric. 2024, 217, 108629. [Google Scholar] [CrossRef]
Ito, S.; Miura, K.; Ito, K.; Aoki, T. Neural radiance field-inspired depth map refinement for accurate multi-view stereo. J. Imaging 2024, 10, 68. [Google Scholar] [CrossRef] [PubMed]
Croce, V.; Billi, D.; Caroti, G.; Piemonte, A.; De Luca, L.; Véron, P. Comparative assessment of neural radiance fields and photogrammetry in digital heritage: Impact of varying image conditions on 3D reconstruction. Remote Sens. 2024, 16, 301. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Li, Z.; Zhu, J. Point-Based Neural Scene Rendering for Street Views. IEEE Trans. Intell. Veh. 2024, 9, 2740–2752. [Google Scholar] [CrossRef]
Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5470–5479. [Google Scholar]
Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. TOG 2022, 41, 1–15. [Google Scholar] [CrossRef]
Weder, S.; Garcia-Hernando, G.; Monszpart, A.; Pollefeys, M.; Brostow, G.J.; Firman, M.; Vicente, S. Removing objects from neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16528–16538. [Google Scholar]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar]
Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.P.; Lempitsky, V. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2149–2159. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Reiser, C.; Szeliski, R.; Verbin, D.; Srinivasan, P.; Mildenhall, B.; Geiger, A.; Barron, J.T.; Hedman, P. Merf: Memory-efficient radiance fields for real-time view synthesis in unbounded scenes. ACM Trans. Graph. TOG 2023, 42, 1–12. [Google Scholar] [CrossRef]
Remondino, F.; Karami, A.; Yan, Z.; Mazzacca, G.; Rigon, S.; Qin, R. A critical analysis of nerf-based 3d reconstruction. Remote Sens. 2023, 15, 3585. [Google Scholar] [CrossRef]
Zhang, X.; Srinivasan, P.P.; Deng, B.; Debevec, P.; Freeman, W.T.; Barron, J.T. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Trans. Graph. TOG 2021, 40, 1–18. [Google Scholar] [CrossRef]
Xiong, H.; Muttukuru, S.; Upadhyay, R.; Chari, P.; Kadambi, A. Sparsegs: Real-time 360° sparse view synthesis using gaussian splatting. arXiv 2023, arXiv:2312.00206. [Google Scholar]
Kerr, J.; Kim, C.M.; Goldberg, K.; Kanazawa, A.; Tancik, M. Lerf: Language embedded radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 19729–19739. [Google Scholar]
Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Zip-nerf: Anti-aliased grid-based neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 19697–19705. [Google Scholar]
Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5855–5864. [Google Scholar]

Figure 1. The efficient and removable neural radiation field boundary-free scene reconstruction model.

Figure 3. The evaluation of model rendering results based on the number of images in different scenes. (HG) is malus ‘harvest gold’ tree. (HK) is malus halliana koehne tree. (S) is malus ‘sparkler’ tree.

Figure 4. The rendering results of three kinds of crabapple flower trees based on NeRF-RE and NeRF-E models.

Figure 5. The result of the reconstructed rendering of the crabapple tree scene based on NeRF-E, NeRF-RE, and Mip-NeRF 360 models.

Table 1. The reconstruction and rendering results performance evaluation.

Scene	Model	Evaluation Index
Scene	Model	PSNR (dB)↑	LPIPS↓	SSIM↑	Time (h)
Malus ‘harvest gold’ tree scene	NeRF-E	24.68	0.34	0.74	0.51
	NeRF-RE	24.80	0.34	0.74	0.51
	Mip-NeRF 360	21.03	0.46	0.50	10.91
Malus halliana koehne tree scene	NeRF-E	23.82	0.35	0.70	0.56
	NeRF-RE	23.58	0.35	0.70	0.56
	Mip-NeRF 360	20.23	0.51	0.41	10.92
Malus ‘sparkler’ tree scene	NeRF-E	21.12	0.39	0.58	0.59
	NeRF-RE	21.16	0.39	0.58	0.59
	Mip-NeRF 360	19.27	0.46	0.45	10.93

Table 2. Comparative analysis of model differences.

Model	Recognition and Segment	Remove and Fill	MLP	Proposal Network	Multi-Resolution Hash Grid
NeRF-Ag [11]			√		√
DS-NERF [12]			√
NeRF [13,14]			√
Mip-NeRF 360 [16]			√	√
NeRF-E	√	√	√	√
NeRF-RE	√	√	√	√	√

Table 3. The recognition accuracy results of the NeRF-RE model under different conditions.

Scenes	Types	Accuracy
Malus ‘harvest gold’ tree scene	Multiple targets + occlusion	People: 76~99% Small cart: 75%
Malus halliana koehne tree scene	Single target + occlusion	People: 70%
Malus ‘sparkler’ tree scene	Single target + local dense occlusion	People: 87%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Huai, Y.; Meng, Q.; Dong, S. NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction. Information 2025, 16, 654. https://doi.org/10.3390/info16080654

AMA Style

Li Z, Huai Y, Meng Q, Dong S. NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction. Information. 2025; 16(8):654. https://doi.org/10.3390/info16080654

Chicago/Turabian Style

Li, Ziyang, Yongjian Huai, Qingkuo Meng, and Shiquan Dong. 2025. "NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction" Information 16, no. 8: 654. https://doi.org/10.3390/info16080654

APA Style

Li, Z., Huai, Y., Meng, Q., & Dong, S. (2025). NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction. Information, 16(8), 654. https://doi.org/10.3390/info16080654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NeRF-RE: An Improved Neural Radiance Field Model Based on Object Removal and Efficient Reconstruction

Abstract

1. Introduction

2. Materials and Methods

2.1. Neural Radiance Fields Model

2.2. The NeRF-RE Model

2.3. Model Evaluation

3. Results

3.1. Performance Evaluation of Scene Reconstruction and Rendering

3.2. Recognition and Segmentation Results of Different Conditions

3.3. Rendering Results for Different Scenes

4. Discussion

4.1. The Number of Images Exerts a Minor Impact on the Performance of the NeRF-RE Model When Rendering Boundless Scenes

4.2. Significant Impact of Tree Geometric Complexity on Boundless Scenes Reconstruction

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI