Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

DLG-GS: Dynamic Lighting-Aware Real-Time 3D Gaussian Splatting for Weak-Texture Tunnel Scenes

Remote Sens. 2026, 18(11), 1705; https://doi.org/10.3390/rs18111705

by Jun Li¹

, Shuo Wang¹, Ronghao Yang^2,*

, Shuai Shi¹

and Zhenlong Liu²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2026, 18(11), 1705; https://doi.org/10.3390/rs18111705

Submission received: 11 March 2026 / Revised: 22 May 2026 / Accepted: 22 May 2026 / Published: 25 May 2026

(This article belongs to the Special Issue 3D Scene Perception and Reconstruction of Remote Sensing Imagery)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript addresses the issues of appearance inconsistency caused by illumination changes and geometric instability in weakly textured regions in existing 3D Gaussian Splatting methods. It proposes a real-time 3D reconstruction framework, DLG-GS, designed for dynamic lighting and low-texture scenarios. The main contributions of the paper lie in introducing two core modules based on 3D Gaussian Splatting: a Dynamic Lighting-Aware Appearance Modeling module (DLAAM) and a Voxel-Depth Joint Constraint module (VDJC), and two modules are integrated into a unified framework to achieve joint optimization of appearance and geometry, thereby enhancing robustness in complex scenes while maintaining real-time rendering capability.

This manuscript conducts extensive experiments on a self-collected tunnel dataset as well as multiple public benchmark datasets. Both quantitative and qualitative results demonstrate that the proposed method outperforms existing approaches in terms of reconstruction accuracy and visual stability. Overall, the work is solid, with clear innovation and systematic experimental validation, showing strong academic value and promising engineering application potential. It is recommended for acceptance after minor revisions.

Main Comments and Suggestions:

Inconsistent terminology and formatting issues: The DLAAM module is referred to inconsistently across different sections—sometimes as “DLAAM module” and sometimes simply as “DLAAM.” It is recommended to unify the terminology.
Unclear figure annotations: In Section 2.3, “Figure 4(c)” is referenced multiple times. Although Figure 4 includes (a), (b), and (c), it is recommended to add clearer subfigure borders and labels within the figure. For Figure 2, it is suggested to arrange View A and View B in separate rows for better clarity.
Suggestions for the experimental section: In Table 3 (ablation study), it is recommended to explicitly label entries as “Only DLAAM” and “Only VDJC” to make the individual contributions of each module clearer. Additionally, it is suggested to include the standard deviation of training time in Tables 1 and 2 to better reflect experimental stability.
Conclusion section is somewhat brief: The conclusion is currently relatively short. It is recommended to expand it by including further discussion on future research directions, such as extending the method to more low-texture scenarios or deploying it on mobile devices.
Careful verification of formulas is needed: All formulas should be carefully checked, with attention paid to correct notation and clear explanations of symbols.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript proposes DLG-GS, a 3D Gaussian Splatting framework designed to improve scene reconstruction under dynamic lighting and weak-texture conditions, with a particular focus on tunnel environments. The method combines two main components, notably a dynamic lighting-adaptive appearance module (DLAAM), intended to decouple intrinsic appearance from transient illumination, and a voxel–depth joint constraint (VDJC), intended to stabilize geometry in poorly textured regions by combining voxel priors with monocular depth cues. The main strengths are the clear practical motivation, the attempt to treat appearance and geometry as coupled failure modes rather than isolated problems, and the effort to evaluate the method against several recent 3DGS baselines while preserving real-time rendering performance.

My main concern is that the empirical basis of the paper is limited to support the claims being made. The tunnel evaluation relies on only three proprietary scenarios, and the public-benchmark analysis is reported on selected scenes from Tanks and Temples, LLFF, and Photo Tourism, without enough detail to judge how representative those selections are. This does not provide a sufficiently broad foundation for claims of robustness, generalizability, or state-of-the-art performance. The train/test protocol further weakens the case. The test set is generated by holding out every eighth frame from the same acquisition sequence, while the data are collected at 50 cm intervals using a rigid multi-camera rig. As a result, training and test images remain strongly correlated in geometry, trajectory, and lighting regime. For a paper whose central claim is robustness to dynamic illumination, this is a relatively weak test, and it is difficult to see it as a convincing demonstration of generalization beyond closely related conditions.

A more serious issue concerns the mismatch between the paper’s stated application value and the way it is validated. The manuscript invokes physical plausibility, structural health assessment, and even crack-width reliability, yet the quantitative evaluation is limited to PSNR, SSIM, and LPIPS. These are image-quality metrics and do not directly validate geometric accuracy. There is no independent geometric ground truth, no depth RMSE, which the most problematic part, no point-cloud or mesh accuracy assessment, no normal consistency, and no tunnel-specific geometric metric that could support the inspection-oriented claims. In other words, the manuscript presents a geometry-centered motivation but substantiates it almost entirely with rendering-centered evidence. For a methods paper at this level, that is a substantial weakness.

The comparative framework is also not entirely convincing. While the set of baselines appears broad, the paper does not document the comparison protocol with enough precision to dispel concerns about fairness, and some results are difficult to interpret with confidence. In particular, the very poor WildGaussians performance on the tunnel dataset raises the possibility that some methods may not have been equally adapted or tuned for the experimental setting. This matters because the paper’s broader claim rests heavily on outperforming state-of-the-art alternatives. The ablation analysis is informative, but it does not fully support the stronger narrative of a consistently synergistic appearance–geometry solution. The appearance module appears to drive the more stable gains, whereas the depth module contributes more modest and scene-dependent improvements. Moreover, the full model is not uniformly best across all metrics, which weakens the argument that the two components jointly produce a clear and consistent advantage.

A further limitation lies in reproducibility and presentation. Several implementation details remain underspecified, the monocular depth component is not described with enough operational clarity, and the treatment of scale and alignment is insufficiently explained. In addition, parts of the equations and loss definitions are difficult to parse in their present form. This may appear secondary, but for a technically dense methods paper it is not. A reader should be able to understand exactly what is being optimized and how the individual terms interact, and at present that is not always the case. The discussion also tends to overextend the implications of the results. I believe that the manuscript itself acknowledges only marginal gains on LLFF, sensitivity to specular highlights, and merely comparable behavior to NexusSplats on Photo Tourism, with the abstract and conclusion framed in broader terms. I therefore find the paper more convincing as a promising tunnel-oriented extension of existing 3DGS ideas than as a broadly demonstrated advance for dynamic and weakly textured scenes in general.

Specific Comments:

Lines 33–36 and 146–148. The manuscript claims that DLG-GS outperforms state-of-the-art methods and provides an effective solution for robust reconstruction in dynamic and weakly textured environments. Given the narrow experimental basis and the limited independence of the test protocol, this statement should be moderated.

Lines 162–170. The paper explicitly links reconstruction quality to structural health assessment and crack-width reliability, but the Results section does not include any inspection-relevant validation. This gap between motivation and evaluation should be addressed, or the application claims should be toned down.

Lines 225–233. The VDJC module is said to preserve tunnel circularity and track planarity and to produce more accurate and robust geometry than purely depth-regularized baselines. These are strong geometry claims, but no geometry-specific metric is reported in Section 4.

Lines 269–270. Please specify the exact monocular depth estimator used, whether it is pretrained or fine-tuned, and how its scale and uncertainty are handled. At present this part is too vague for a methods paper.

Equations (1)–(5), lines 275–310. These equations are difficult to parse in their current form, and several symbols are not introduced with enough clarity. The loss formulation should be rewritten more carefully so that the reader can reproduce the method.

Equations (17)–(19), lines 439–455. The optimization section is currently under-explained. In particular, the role of the dropout regularization and its interaction with the main color/depth losses need a clearer justification.

Lines 492–499 and Figure 9. The evaluation uses only three tunnel scenes, and the test split is every eighth frame from the same sequential acquisition. This should be discussed more explicitly as a limitation, because it is not a demanding independence test for dynamic-lighting robustness.

Table 2 and lines 554–560. The paper states that the public-benchmark results support general applicability, but it also admits that improvements there are much smaller than in the tunnel case. This section should be framed more cautiously.

Lines 731–747. The discussion of LLFF and Photo Tourism substantially weakens the stronger generalization claims made earlier. The paper admits only marginal gains on LLFF, sensitivity to specular highlights, and merely comparable performance to NexusSplats on Photo Tourism, where NexusSplats may even produce fewer artifacts. These problems should be reflected in the abstract and conclusion.

Lines 753–759 and Table 3. The interpretation of the ablation appears somewhat stronger than warranted. If the authors wish to argue for synergy between DLAAM and VDJC, they should discuss more explicitly why the full model does not dominate uniformly across all metrics, especially for Tunnel_3 LPIPS.

Lines 760–769. The limitations section is welcome, but it is incomplete. The lack of geometry-specific validation and the restricted tunnel evaluation should also be acknowledged here, not only specular sensitivity and training-time overhead.

Comments on the Quality of English Language

The English is sufficient.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

See the attached file.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The author is highly likely to have used AI for editing. The text contains lots of long sentences, making it difficult to read.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Although the authors have revised the manuscript in response to the previous feedback, my concerns have not been fully addressed, especially with regard to writing conventions, which are detailed below.

The paper appears to have been written in Microsoft Word. In-text citations do not hyperlink to the reference list, indicating that cross-references were not properly configured. There are at least 8 equation-related formatting problems, contains what appears to be a corrupted mathematical symbol (rendered as a box). Many equations exhibit inconsistent formatting and are misaligned with the body text. It is strongly recommended that the manuscript be rewritten in LaTeX.

These remained issues still make the manuscript difficult to read. However, the final decision rests with the editorial board.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

DLG-GS: Dynamic Lighting-Aware Real-Time 3D Gaussian Splatting for Weak-Texture Tunnel Scenes

Further Information

Guidelines

MDPI Initiatives

Follow MDPI