Research on Rapid 3D Model Reconstruction Based on 3D Gaussian Splatting for Power Scenarios
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents an application of 3D Gaussian Splatting for infrastructure inspection, demonstrating its advantages over traditional photogrammetry. However, there are some methodological inconsistencies.
In the Abstract and Introduction, you state that you acquire RGB images, polarization data, and lightweight depth information. However, in Section 2.1 (Data Acquisition), you only describe the use of a Sony 7RM3A RGB camera, and in Section 2.2, you describe the standard 3DGS pipeline starting from SfM point clouds. It is unclear where and how the polarization data was used in the rendering process.
Author Response
Dear Reviewers,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits. We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.
To facilitate this discussion, we first retype your comments in italic font and then present our responses to the comments.
As you will see in the revised manuscript and this following document, major revisions have been shown as the blue text, and the former version were shown as the red text.
Comment 1:This manuscript demonstrates the application of 3D Gaussian spraying in infrastructure inspection, highlighting its advantages over traditional photogrammetry. However, some methodological inconsistencies are observed.
Response 1:We sincerely appreciate your question. In response to additional concerns raised by several reviewers, we have conducted comprehensive revisions to the manuscript to enhance methodological consistency across all chapters.
Based on the SfM-derived sparse point cloud and camera poses, a 3D Gaussian Splatting pipeline is adopted as the core reconstruction framework, with adaptations tailored to transmission tower structures and UAV-based inspection scenarios.
Comment 2:In the abstract and introduction, you mentioned obtaining RGB images, polarization data, and lightweight depth information. However, Section 2.1 (Data Acquisition) only describes the use of the Sony 7RM3A RGB camera, while Section 2.2 outlines the standard 3DGS pipeline starting from SfM point clouds. The specific application and method of polarization data in the rendering process remain unclear.
Response 2:We thank the reviewer for pointing out this important issue regarding methodological clarity and consistency. We acknowledge that the original manuscript did not sufficiently clarify the role of polarization data within the proposed reconstruction pipeline.In the revised manuscript, we have carefully revised the Abstract and Introduction to explicitly clarify that RGB images are the primary input used for SfM initialization and 3D Gaussian Splatting reconstruction, while polarization data and lightweight depth cues were collected as auxiliary information during data acquisition and for exploratory analysis, rather than being directly integrated into the current 3DGS rendering and optimization process.To avoid ambiguity, we have removed any statements that may imply direct use of polarization data in the reconstruction pipeline and added a clear explanation in Section 2 that the present framework focuses on an RGB-based workflow. Furthermore, the potential of polarization information for improving robustness under challenging lighting conditions is now discussed as a future research direction.
Although polarization data were synchronously captured during UAV flights, the current reconstruction pipeline relies primarily on RGB images, and polarization information is not directly involved in the SfM or 3DGS optimization stages.
Future work will explore the integration of polarization cues into the 3DGS framework to enhance robustness under challenging illumination conditions and improve material-aware reconstruction.
Before revision:
First, a multi-view data acquisition scheme combining "unmanned aerial vehicle + oblique photogrammetry" was designed to capture RGB images, polarization data, and lightweight depth information.
After revision:
First, a multi-view data acquisition scheme combining "unmanned aerial vehicle + oblique photogrammetry" was designed to capture RGB images acquired by Unmanned Aerial Vehicle (UAV) platforms are used as the primary input for 3D reconstruction.
Before revision:
Specifically, by optimizing the structured circumferential flight path planning of UAVs, we simultaneously acquire high-resolution RGB images, polarimetric data, and lightweight depth data.
After revision:
Specifically, by optimizing the structured circumferential flight path planning of UAVs, we simultaneously acquire high-resolution RGB images. In addition to conventional RGB imagery, polarization information has been reported to provide complementary cues for material perception and illumination analysis. However, the present study focuses on an RGB-based reconstruction pipeline, and the integration of polarization cues is left for future investigation.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe present manuscript, titled “Research on Rapid 3D Model Reconstruction Based on 3D Gaussian Splatting for Power Scenarios”, presents a 3D Gaussian Splatting (3DGS)-based framework for power tower reconstruction. The study integrates UAV-based oblique photogrammetry with Gaussian model initialization, differentiable rendering, and adaptive density control to improve reconstruction efficiency and detail preservation. The work is technically sound and shows promising result. I recommend Major revision to address the following questions to improve the manuscript's novelty:
- Several abbreviations are used without being defined at first mention. Define all abbreviations upon first use.
- If a term appears only once, there’s no need to use an abbreviation. The author writes it out fully.
- The authors should list and define all abbreviations used in each abstract and figure caption separately from main manuscript. This will help readers understand the figures more easily.
- The Introduction is clear but too long. It should be shortened and better organized. The authors should clearly present the problem, the research gap, recent related work, and their proposed method and main results.
- The authors should clearly explain how this work is different from and advances beyond the previously published studies listed below, particularly in the context of 3D model reconstruction based on 3D Gaussian Splatting.
- EOGS: Event Only 3D gaussian splatting for 3D reconstruction
- GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation; DOI10.1007/978-3-031-72633-0_1
- The authors should add a separate “Conclusion and Future Work” section and move the relevant content there for better organization. In addition, the second, third, and fourth section titles have similar meanings and should be revised to avoid repetition.
- The manuscript should clarify the UAV flight parameters (altitude, camera angle, and speed) and environmental conditions, and briefly explain how they affect image overlap and SfM reconstruction.
- The authors should clearly compare the proposed method with traditional photogrammetry in terms of reconstruction time, accuracy, and cost, and include this comparison in the Conclusion section.
- Section 3 is explained well. However, the section should start with one or two introductory lines explaining how tower shape, structure, and surrounding environment affect 3D reconstruction.
- The proposed system is mainly based on RGB images. How do changes in environmental lighting conditions (such as strong sunlight, clouds, snow, or fog) affect RGB values and reconstruction performance? The authors should comment on the robustness of the method under such conditions
- Add a short paragraph in the Introduction explaining why an RGB camera-based approach is suitable for other applications.
Author Response
Dear Reviewers,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits. We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.
To facilitate this discussion, we first retype your comments in italic font and then present our responses to the comments.
As you will see in the revised manuscript and this following document, major revisions have been shown as the blue text, and the former version were shown as the red text.
Comment 1:The book uses several abbreviations, but they are not defined when first mentioned. All abbreviations are defined when first used.
Response 1:Thank you very much for bringing up this issue. According to your suggestion, some modifications have been made in the corresponding positions in the text:
I have modified SfM, LiDAR, and GPS+IMU according to your suggestion as follows: Structure from Motion (SfM), Light Detection and Ranking (LiDAR) and Dynamic Graph CNN (DGCNN);
Comment 2:If a term appears only once, abbreviations are not required. The author writes it in full.
Response 2:Thank you very much for bringing up this issue. According to your suggestion, some modifications have been made in the corresponding positions in the text:
I have modified GPS+IMU, V2X according to your suggestion as follows: Global Positioning System + Inertial Measurement Unit, and Vehicle-to-Everything;
Comment 3:The author should list and define all abbreviations used in each abstract and figure captions separately from the main text. This will help readers better understand the data.
Response 3:We thank the reviewer for this helpful suggestion and fully agree that clear definition of abbreviations is important for readability. In the revised manuscript, all abbreviations appearing in the abstract, main text, and figure captions have been explicitly defined at their first occurrence, with full terms provided to ensure clarity for readers. Considering that the manuscript already introduces each abbreviation in a self-contained manner and follows the journal’s formatting guidelines, we have chosen to define abbreviations in context rather than in a separate list. We believe this approach maintains readability while avoiding redundancy. Nevertheless, we have carefully reviewed the manuscript to ensure that no abbreviation appears without a clear and explicit definition.
Comment 4:The preface is clear but excessively long. It should be shortened and better organized. The author should clearly present the problem, research gaps, recent related work, as well as the proposed methods and main findings.
Response 4:
Before revision:
Power towers serve as the backbone infrastructure of power transmission networks, and their safe operation and maintenance rely heavily on high-precision 3D models. According to statistics from the National Energy Administration, China’s total power transmission line mileage exceeded 1.2 million kilometers in 2023, with the number of power towers surpassing 5 million [1]. With the advancement of smart grid construction, traditional oblique photogrammetry follows a “geometry-first, texture-appended” workflow. Its serial pipeline (aerotriangulation, dense matching, mesh generation, texture mapping) is computationally intensive and time-consuming. During meshing, simplification and smoothing of geometric surfaces can damage or blur complex structures such as power lines and tree leaves. Additionally, texture mapping relying on mesh UV unwrapping often causes stretching, seams, and ghosting in intricate regions—exacerbated by light-shadow variations—which frequently leads to inconsistent textures. Consequently, the final model often suffers from defects like holes and distortions, and its enormous polygon count imposes a heavy burden on real-time rendering and network transmission, making it difficult to meet the demands of smart grids [2]. 3DGS adopts an “appearance-first, implicit representation” pipeline, using millions of learnable 3D Gaussian ellipsoids as basic units. Each unit inherently carries attributes such as position, color, and transparency; during rendering, these units are projected and fused onto a 2D screen. Unlike traditional methods, 3DGS requires no continuous surface, enabling flexible fitting of complex structures and perfect representation of high-frequency details (e.g., tree leaves, power lines). By embedding color within units, it completely avoids texture stretching and seam issues, achieving photorealistic rendering near training viewpoints. Moreover, 3DGS is optimized with GPU acceleration, enabling faster training than conventional pipelines and supporting real-time rendering—at over 100 frames per second (FPS) on modern graphics cards—to facilitate real-time browsing of ultra-large-scale scenes. As the core of digital twin model construction, 3D reconstruction plays a pivotal role in all stages of power tower design, construction, and operation & maintenance. Digital twin power transmission lines leverage 1:1 3D models of towers and lines to enable real-time equipment monitoring, fault simulation/deduction, and inspection route planning. However, efficient and high-precision 3D reconstruction of power towers remains the core bottleneck for the practical deployment of this technology. Therefore, how to reconstruct power towers with both high efficiency and precision is currently the most pressing challenge.
After revision:
Power towers serve as the backbone infrastructure of power transmission networks, and their safe operation and maintenance rely heavily on high-precision 3D models. According to statistics from the National Energy Administration, China’s total power transmission line mileage exceeded 1.2 million kilometers in 2023, with the number of power towers surpassing 5 million [1]. With the rapid expansion of transmission networks, traditional manual inspection and coarse geometric modeling methods can no longer meet the requirements of efficiency, accuracy, and scalability in large-scale power grid management.During meshing, simplification and smoothing of geometric surfaces can damage or blur complex structures such as power lines and tree leaves. Additionally, texture mapping relying on mesh UV unwrapping often causes stretching, seams, and ghosting in intricate regions—exacerbated by light-shadow variations—which frequently leads to inconsistent textures. Consequently, the final model often suffers from defects like holes and distortions, and its enormous polygon count imposes a heavy burden on real-time rendering and network transmission, making it difficult to meet the demands of smart grids [2].
Before revision:
First, high operational risk: It requires close-proximity data acquisition around power towers using laser scanners mounted on aerial work platforms or UAVs. Such operations are prone to interference from strong electric fields and challenging in complex terrains (e.g., mountainous areas, river crossings), leading to elevated accident rates. Second, severe data loss: Slender structures like angle steels and power lines are easily missed due to insufficient LiDAR point cloud density, while reflection from metal tower materials causes point cloud voids. Third, high cost and low efficiency: LiDAR scanners have unit prices exceeding 500,000 CNY, and data acquisition plus point cloud processing for a single power tower take 4–6 hours—rendering it impractical for large-scale transmission line modeling.
After revision:
High equipment costs, complex data processing workflows, and significant operational risks—particularly in environments with severe electromagnetic interference or complex terrain. Additionally, when capturing slender metal components such as transmission lines, LiDAR point clouds often encounter issues of data sparsity and reflection loss.
Before revision:
This paper proposes a 3D model reconstruction method for power towers based on 3DGS. As an emerging neural rendering technique, 3DGS has rapidly gained traction across diverse domains—including cultural heritage preservation, city-scale digital twins, professional field visualization, robot perception, autonomous driving, and graphics rendering—since its introduction by Kerbl [20] in 2023. This widespread adoption stems from its core strengths: efficient real-time rendering, flexible scene representation, and differentiable optimization. Notable applications include: A joint team from Tsinghua University and the Beijing Institute of Technology applied 3DGS to the digitalization of cultural heritage in Vehicle-to-Everything scenarios. Using decomposed Gaussian splatting, they separated static backgrounds (e.g., ancient buildings) from dynamic elements (e.g., pedestrians, vehicles), enabling the generation of large-scale collaborative cultural heritage datasets. The Intelligent Perception Team at Jinan University proposed the Robust and Efficient 3DGS method, targeting 3DGS reconstruction for large-scale urban scenes (e.g., city streets, building complexes) [21]. Via techniques such as intelligent visibility partitioning, dynamic resource allocation per partition, and fine-grained appearance transformation modeling, they achieved high reconstruction quality while maintaining real-time rendering. NVIDIA’s Omniverse NuRec 3DGS library leverages RTX ray tracing capabilities to rapidly convert raw data from sensors (e.g., LiDAR, RGB-D cameras) into high-precision 3D Gaussian models. It also generates synthetic data (e.g., urban scenes under varying weather and lighting conditions) for training autonomous driving perception algorithms.
After revision:
Introduced by Kerbl[20] in 2023,3DGS emerges as a groundbreaking explicit neural rendering framework. By employing learnable 3D Gaussian elements to represent scenes and supporting efficient differential rasterization, it offers a promising alternative to traditional methods. Compared to implicit neural representations, 3DGS achieves significant acceleration in training processes and real-time rendering performance while preserving high-frequency geometric details and visual features. Notable applications include: A joint team from Tsinghua University and the Beijing Institute of Technology applied 3DGS to the digitalization of cultural heritage in Vehicle-to-Everything scenarios. Using decomposed Gaussian splatting, they separated static backgrounds (e.g., ancient buildings) from dynamic elements (e.g., pedestrians, vehicles), enabling the generation of large-scale collaborative cultural heritage datasets. The Intelligent Perception Team at Jinan University proposed the Robust and Efficient 3DGS method, targeting 3DGS reconstruction for large-scale urban scenes (e.g., city streets, building complexes) [21].
Comment 5:The author should clearly explain how this work differs from previously published studies, particularly in the context of 3D model reconstruction based on 3D Gaussian splashing.
EOGS:3D Gaussian sputtering is limited to events for 3D reconstruction
GRM:Large-scale Gaussian reconstruction model for efficient 3D reconstruction and generation;DOI10.1007/978-3-031-72633-0_1
Response 5:We thank the reviewer for pointing out these closely related studies. We agree that it is important to clearly distinguish the contributions of the present work from existing 3D Gaussian Splatting–based reconstruction approaches. EOGS focuses on event-based 3D Gaussian Splatting, where event camera streams are used as the primary input for reconstruction. Its main contribution lies in extending Gaussian Splatting to asynchronous, high-temporal-resolution event data, targeting dynamic scenes and low-latency perception. In contrast, our work is based on RGB images acquired by UAV platforms and addresses large-scale outdoor infrastructure inspection, where data acquisition conditions, scene scale, and reconstruction objectives differ substantially from event-driven settings. GRM proposes a generalized and large-scale Gaussian reconstruction model aimed at improving reconstruction and generation efficiency across diverse scenes. While GRM emphasizes model generalization and scalable reconstruction, the present study focuses on domain-specific adaptation of 3D Gaussian Splatting for transmission tower inspection. Our contributions lie in tailoring the reconstruction pipeline to complex tower geometries, aerial imaging viewpoints, and inspection-oriented requirements, rather than proposing a generalized reconstruction architecture. To clarify these distinctions, we have added a dedicated discussion in the revised manuscript that positions our work relative to EOGS and GRM in terms of input modality, application domain, scene scale, and research objectives. We believe this clarification highlights the complementary nature of our work within the broader 3D Gaussian Splatting literature.
Comment 6:The author should add a separate 'Conclusion and Future Work' section and relocate the relevant content to this section for better organization. Additionally, the titles of the second, third, and fourth sections are semantically similar and should be revised to avoid redundancy.
Response 6:Thank you very much for bringing up this issue. According to your suggestion, some modifications have been made in the corresponding positions in the text:
I have modified Results of SfM Point Cloud Initialization, Results of Gaussian Ellipsoid Set Generation and Evaluation and Analysis of Reconstruction Performance according to your suggestion as follows: SfM Point Cloud Initialization, Generate Gaussian ellipsoid set and Experimental Setup and Performance Evaluation
Comment 7:The manuscript should clarify the drone flight parameters (height, camera angle and speed) and environmental conditions, and briefly explain how they affect image overlap and SfM reconstruction.
Response 7:Thank you very much for bringing up this valuable suggestion. We have added a detailed description of the UAV flight parameters and environmental conditions in the revised manuscript, along with a brief explanation of their impact on image overlap and SfM-based reconstruction.
Before revision:
For the acquisition protocol, we adopted multi-height 360° circumferential shooting organized into three hierarchical layers: Tower body circumferential scanning: Covers the main structural framework of the power tower; Tower head focused capture: Targets fine-grained components (e.g., bolted connections, insulator clamps); Power line extension mapping: Captures the geometry of overhead conductors and guy wires. Critically, we maintained an adjacent image overlap rate of ≥80%, which not only achieves complete, dead-angle-free coverage of the entire tower and its surroundings but also guarantees the consistency and integrity of both global (whole-tower) and local (component-level) data.
After revision:
During data acquisition, the UAV was operated at a relatively stable flight altitude to ensure sufficient coverage of the transmission tower while maintaining adequate image resolution. The onboard camera was oriented with an oblique viewing angle to capture both the vertical structure and lateral details of the tower, thereby reducing occlusions and improving multi-view visibility of slender components. The flight speed was controlled to avoid motion blur and to maintain consistent image overlap between consecutive frames.
During data collection, our drones operated at 50-150 meters altitude with a constant speed of 6 meters per second. Utilizing multi-height 360-degree circular scanning technology, we established a three-tiered system: tower body scanning to cover the main structure of transmission towers; top-focused acquisition to precisely capture fine components like bolted connections and insulator clamps; and transmission line extension mapping to obtain the geometric configuration of overhead conductors and guy wires. Notably, we maintained an adjacent image overlap rate of ≥80%, ensuring not only comprehensive coverage of the tower body and surrounding areas without blind spots, but also maintaining consistency and integrity between global (whole-tower) and local (component-level) data.
These flight parameters were selected to achieve high image overlap in both along-track and cross-track directions, which is critical for robust feature matching and accurate camera pose estimation in the Structure from Motion (SfM) process. Sufficient overlap increases the number of shared visual features across views, thereby enhancing the stability of bundle adjustment and reducing reconstruction drift.
Data acquisition was conducted under relatively favorable environmental conditions, including stable illumination and low wind speed, to minimize image degradation and platform vibration. Such conditions help ensure image sharpness and consistent appearance across views, which further contributes to reliable SfM reconstruction and stable initialization for subsequent 3D Gaussian Splatting optimization. All the flight days we selected were perfectly clear and sunny.
Comment 8:The authors should clearly compare the differences between the proposed method and traditional photogrammetry in terms of reconstruction time, accuracy and cost, and include this comparison in the conclusion.
Response 8:We thank the reviewer for this important suggestion. In response, we have explicitly incorporated a comparative discussion between the proposed 3D Gaussian Splatting–based method and traditional photogrammetry in terms of reconstruction time, reconstruction quality (accuracy), and practical deployment cost.
Before revision:
This study presents a systematic investigation into a 3D model reconstruction method for power towers based on 3DGS , targeting the key limitations of traditional reconstruction techniques in efficiency, accuracy, and visual realism. Through a methodology integrating theoretical exploration, algorithm design, and experimental validation, we draw the following conclusions: The proposed 3DGS-based reconstruction framework significantly outperforms traditional photogrammetric approaches in both reconstruction accuracy and visual fidelity. Quantitative experiments demonstrate that the framework excels across multiple critical performance metrics: it not only efficiently recovers the overall structure of power towers but also accurately reproduces fine details (e.g., angle steels, cables). Specifically, it achieves state-of-the-art performance in terms of modeling detail completeness, structural geometric accuracy, and reconstruction time efficiency. In terms of practicality and scalability, 3DGS leverages its explicit scene representation and differentiable rasterization capabilities to deliver substantial gains in training speed and rendering efficiency. Our method reduces single-tower reconstruction time from several hours (required by traditional photogrammetry) to dozens of minutes. While maintaining photorealistic rendering quality, it provides a robust technical foundation for real-time interactive viewing and engineering deployment of power tower models. This breakthrough greatly enhances the practical utility and widespread applicability of 3D reconstruction technology in power grid operations.
After revision:
This study presents a systematic investigation of a 3D model reconstruction framework for power transmission towers based on 3DGS, with a focus on addressing practical limitations of traditional reconstruction pipelines in terms of efficiency, structural detail representation, and deployment suitability. Through a comprehensive methodology combining theoretical analysis, algorithm design, and experimental validation, several conclusions can be drawn.
Compared with traditional photogrammetry-based reconstruction pipelines, the proposed 3DGS-based approach demonstrates clear advantages in reconstruction efficiency and structural representation under UAV-based inspection scenarios. Conventional photogrammetry typically relies on multi-stage processing, including dense image matching, mesh reconstruction, and texture mapping, which often leads to long processing times and complex post-processing workflows. In contrast, the proposed method employs an explicit Gaussian representation coupled with differentiable rendering, enabling faster optimization and near real-time rendering once training is completed.
In terms of reconstruction quality, traditional photogrammetry often encounters difficulties in preserving fine structural details of transmission towers, particularly slender components such as angle steels and cables, and may suffer from geometric artifacts and texture seams. By jointly optimizing geometry and appearance within a unified framework, the proposed method achieves more consistent structural completeness and visually coherent reconstructions, which are well suited for inspection-oriented applications. From a practical perspective, the simplified reconstruction pipeline reduces operational complexity and computational overhead, making the approach more suitable for large-scale and repeated inspections when combined with UAV-based data acquisition.
Rather than aiming to provide an exhaustive benchmark across all neural rendering paradigms, this study emphasizes practical applicability, geometric detail preservation, and deployment efficiency in real-world infrastructure inspection scenarios. From this perspective, the proposed 3DGS-based framework can be regarded as a practical and effective alternative to traditional photogrammetry for transmission tower reconstruction, while remaining complementary to neural radiance field–based approaches that prioritize photorealistic view synthesis.
Comment 9:Section 3 is explained in detail. However, this section should begin with a brief introduction to explain how the shape, structure, and surrounding environment of the tower affect 3D reconstruction.
Response 9:We thank the reviewer for this helpful suggestion. In the revised manuscript, we have added a brief introductory paragraph at the beginning of Section 3 to clarify how the geometric characteristics of transmission towers and their surrounding environments influence 3D reconstruction.
The complex lattice structure, slender components, and surrounding environmental conditions of transmission towers pose significant challenges for accurate image-based 3D reconstruction. These factors motivate the need for a reconstruction framework capable of handling fine structures and occlusions effectively.
Comment 10:The system primarily operates on RGB images. How do variations in ambient lighting conditions (e.g., intense sunlight, clouds, snow, or fog) affect RGB values and reconstruction performance? The authors should evaluate the robustness of this method under such conditions.
Response 10:Thank you very much for bringing up this valuable suggestion. In the revised manuscript, we have added a dedicated discussion on the influence of environmental lighting conditions on RGB-based reconstruction performance.
The proposed framework primarily relies on RGB images acquired by UAV platforms, and its performance is therefore influenced by environmental lighting conditions. Variations in illumination, such as strong direct sunlight or heavy cloud cover, can lead to changes in RGB intensity distribution, shadows, and specular reflections on metallic tower components. These effects may reduce feature consistency across views and introduce photometric inconsistencies that affect both SfM-based pose estimation and subsequent 3D Gaussian optimization.
Under moderate illumination variations, the proposed 3DGS-based representation exhibits a certain degree of robustness, as it jointly optimizes geometry and appearance across multiple views and can tolerate limited photometric differences. However, extreme conditions such as dense fog, snow, or severe visibility degradation may significantly alter RGB observations and reduce reconstruction stability, which is a common limitation shared by most RGB-only reconstruction pipelines.
Addressing these challenges remains an important direction for future work. Potential improvements include incorporating illumination-aware or photometric-invariant features, integrating additional sensing modalities such as depth or LiDAR data, and adopting adaptive data acquisition strategies to enhance robustness under adverse environmental conditions.
Comment 11:Include a brief introduction explaining why the RGB camera-based approach is suitable for other applications.
Response 11:We thank the reviewer for this helpful suggestion. In the revised manuscript, we have added a brief paragraph in the Introduction to clarify why RGB-camera-based reconstruction methods are suitable for a wide range of applications beyond transmission tower inspection.
From a sustainability perspective, RGB-camera-based solutions offer a cost-effective and energy-efficient alternative for large-scale visual sensing and 3D modeling. Their compatibility with existing UAV platforms and minimal hardware overhead make them particularly suitable for repeated inspections and long-term monitoring applications across different domains.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors1-The experiments focus on only three power towers from a single region (Luoning), which restricts the generalizability of the findings. Variations in tower design, environmental conditions, and terrain are not adequately represented. Can the authors elaborate in the discussion?
2-The proposed method is tailored for static power tower scenarios. Dynamic factors such as cable vibrations, wind-induced oscillations, or environmental changes (e.g., fog, rain) are not addressed. Can the authors elaborate in the discussion?
3-The approach assumes optimal UAV imaging conditions (≥80% overlap, clear weather). In practical deployments, adverse conditions can degrade image quality and pose estimation, which is not discussed in depth.
4-The comparison is limited to oblique photogrammetry. Other advanced neural rendering methods (e.g., NeRF variants, Instant-NGP) are mentioned in the introduction but not included in experimental benchmarking. A detailed comparison should be provided.
5-While the manuscript claims efficiency improvements, it does not provide detailed metrics on GPU memory usage, inference latency, or scalability for large-scale deployments.
6-There is no sensitivity analysis regarding input data quality.
7-The dataset is not publicly available, and the paper provides limited details on parameter settings for 3DGS optimization
Author Response
Dear Reviewers,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits. We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.
To facilitate this discussion, we first retype your comments in italic font and then present our responses to the comments.
As you will see in the revised manuscript and this following document, major revisions have been shown as the blue text, and the former version were shown as the red text.
Comment 1:The experiment was exclusively conducted on three power towers within the same region (Luning), which limited the generalizability of the findings. Variations in tower design, environmental conditions, and topography were not adequately addressed. Could the authors elaborate on these aspects in the discussion section?
Response 1:Thank you for your valuable feedback. We fully agree with your question. In the revised manuscript, we have explicitly discussed this limitation in the Discussion section.
The primary objective of this study is to validate the feasibility and effectiveness of the proposed framework under real-world inspection conditions, rather than conducting a statistically comprehensive evaluation across different regions. Although the selected transmission towers are located within the same region, they exhibit significant differences in structural components, spatial layouts, and surrounding terrain complexity. More importantly, the proposed method does not rely on prior knowledge of specific regions or handcrafted assumptions related to particular tower designs, which supports its potential applicability to other regions and other types of towers. Future work will extend the experimental validation to more diverse environments.
Comment 2:This method is specifically designed for static power tower scenarios. However, dynamic factors such as cable vibration, wind-induced oscillations, or environmental changes (e.g., fog, rain) are not addressed. Could the authors elaborate on these aspects in the discussion section?
Response 2:Thank you very much for bringing up this issue. In the revised manuscript, we have added a detailed discussion clarifying this assumption and its practical relevance.
In addition, the proposed method is primarily developed for quasi-static transmission tower inspection scenarios. The core assumption is that the main structural components of the tower remain static during the data acquisition process. Dynamic factors such as cable vibration, wind-induced oscillation, and short-term environmental changes (e.g., fog or rainfall) are not explicitly modeled in the current framework. This assumption is consistent with most practical inspection workflows, where data collection is typically scheduled under relatively stable weather conditions, and structural analysis focuses on static components of the transmission tower. Under such conditions, the proposed geometry-aware representation and structural modeling strategy can effectively capture the spatial configuration of the tower. Nevertheless, it is recognized that dynamic motion and adverse environmental factors may introduce reconstruction inconsistencies and degrade segmentation performance. Addressing such challenges would require incorporating temporal information, motion-aware modeling, or multi-frame data fusion strategies. Extending the proposed method to handle dynamic scenes and complex environmental conditions will be an important direction for future work.
Comment 3:This method assumes optimal imaging conditions for the drone (≥80% overlap, clear weather). In practical deployment, adverse conditions may degrade image quality and pose estimation, though this aspect is not discussed in depth.
Response 3:We thank the reviewer for highlighting the assumptions regarding UAV imaging conditions and their implications for practical deployment. According to your suggestion, we have added a dedicated discussion addressing this limitation in the revised manuscript.
Furthermore, the proposed method implicitly assumes relatively favorable UAV imaging conditions, including sufficient image overlap (typically ≥80%) and stable weather, which are commonly required to ensure reliable camera pose estimation and high-quality 3D reconstruction. In practical deployment, adverse conditions such as strong illumination changes, fog, rain, or wind may degrade image quality and increase pose estimation uncertainty. Under such conditions, reduced image overlap or inaccurate camera poses may propagate errors into the reconstructed geometry and affect the subsequent structural analysis and segmentation results. While these factors are not explicitly addressed in the current framework, they represent common challenges in UAV-based inspection systems rather than limitations unique to the proposed method. Future work will focus on enhancing robustness under challenging imaging conditions by incorporating uncertainty-aware pose optimization, multi-view consistency constraints, robust feature representations, and potential multi-sensor fusion strategies. These extensions are expected to improve the practical applicability of the proposed method in real-world inspection scenarios.
Comment 4:The comparison is limited to oblique photogrammetry. Other advanced neural rendering methods (such as NeRF variants and Instant-NGP) are mentioned in the introduction but not included in the experimental benchmarking. A detailed comparison should be provided.
Response 4:We sincerely thank the reviewer for this thoughtful comment and fully acknowledge the importance of recent neural rendering approaches, such as NeRF variants and Instant-NGP, in multi-view 3D reconstruction. We do not exclude the relevance or mainstream status of neural radiance field–based methods in this domain, and their potential applicability to multi-view reconstruction is well recognized. In this study, however, the choice of comparison baselines was guided by a principled consideration of reconstruction objectives and representational characteristics, rather than an intention to avoid additional benchmarks. Traditional photogrammetry-based point cloud reconstruction and 3D Gaussian Splatting share a common strength in explicit geometric representation and fine structural detail preservation, which is particularly critical for transmission tower inspection scenarios involving slender components, lattice structures, and sharp geometric discontinuities. While NeRF-based methods are highly efficient in view synthesis and achieve impressive photorealistic rendering, their implicit volumetric representations may exhibit limitations in faithfully preserving fine-scale geometric details, especially in large-scale outdoor scenes with complex, repetitive structures. Given that the primary goal of this work is to evaluate the capability of 3D Gaussian Splatting to capture and represent detailed structural geometry, we consider traditional photogrammetry to be a more appropriate and meaningful baseline for comparison in this context. Therefore, this study focuses on comparing 3D Gaussian Splatting with conventional photogrammetric reconstruction, highlighting their respective strengths and differences in geometric detail representation and reconstruction efficiency. This comparison also serves as a foundation for our ongoing and future research, where neural radiance field–based approaches may be systematically investigated as complementary solutions. To clarify this rationale, we have revised the manuscript to explicitly discuss the relationship between 3D Gaussian Splatting, traditional photogrammetry, and NeRF-based methods, and to explain the motivation behind the selected experimental comparisons.
Before revision:
In recent years, neural rendering techniques—represented by Neural Radiance Fields (NeRF) [15-16]—have achieved high-quality scene reconstruction via implicit modeling. The core strength of neural rendering lies in its ability to learn from real-world data and generate realistic, controllable digital content, blurring the boundaries between traditional graphics pipelines and computer vision. It boasts powerful "inverse graphics" capabilities: a leap from 2D to 3D can be made with just a small number of multi-view 2D images, yielding high-fidelity 3D models. This has significant application value in fields such as cultural relic digitization, e-commerce, and street view mapping. Additionally, in building modeling, neural rendering better handles complex structures like hollow parts and weakly textured surfaces, enabling automated high-precision modeling. Tian et al. [17] proposed an attention-based NeRF model for power towers, improving sampling efficiency by focusing on key regions of the tower head. This reduced modeling time from 8 hours to 2 hours, though cable details remained blurry. Lu et al. [18] fused infrared and visible light images to train NeRF, achieving integrated "geometry + temperature" reconstruction to support thermal fault diagnosis—yet incorporating infrared images increased data acquisition costs.
Despite its broad prospects, neural rendering still faces severe challenges when transitioning from the lab to large-scale industrial applications. Computational resource and efficiency bottlenecks—specifically slow training and inference speeds—are a major hurdle. Early NeRF models required hours or even days to train a scene and seconds to minutes to render a high-resolution image, making them unfit for real-time interactive applications. For power tower reconstruction, original NeRF reconstructs scene radiance fields via voxel sampling: while it offers high texture fidelity, modeling a single tower takes several hours, resulting in poor real-time performance. Instant-NGP [19] shortens modeling time to 1 hour using hash grid acceleration, but slender structures (e.g., cables, angle steels) still appear blurry due to insufficient sampling density.
After revision:
In recent years, neural rendering techniques—represented by Neural Radiance Fields (NeRF) [15–16]—have achieved high-quality scene reconstruction through implicit modeling. The core strength of neural rendering lies in its ability to learn continuous scene representations from real-world data and to generate realistic, controllable novel views, bridging traditional computer graphics pipelines and computer vision. Owing to these characteristics, NeRF-based methods have demonstrated strong performance in applications such as cultural heritage digitization, e-commerce visualization, and large-scale street view modeling.
In the context of power infrastructure modeling, neural rendering has also attracted increasing attention. Tian et al. [17] proposed an attention-based NeRF framework for power towers, improving sampling efficiency by focusing on key regions of the tower head and reducing modeling time from 8 hours to 2 hours, although fine cable structures remained blurry. Lu et al. [18] further integrated infrared and visible images into NeRF training to achieve joint geometry–temperature reconstruction for thermal fault diagnosis, at the cost of increased data acquisition complexity and sensor requirements.
Despite these advances, neural rendering methods primarily emphasize photorealistic view synthesis through implicit volumetric representations, rather than explicit geometric modeling. This distinction becomes particularly relevant for transmission tower inspection, where the accurate representation of slender components, lattice structures, and sharp geometric discontinuities is critical. Moreover, the deployment of NeRF-based methods in large-scale outdoor inspection scenarios remains constrained by computational efficiency and practical considerations. Early NeRF models require hours to days for per-scene optimization, and even accelerated approaches such as Instant-NGP [19], while significantly reducing training time, still exhibit limitations in faithfully capturing fine structural details of thin elements such as cables and angle steels due to sampling sparsity. However, it should be noted that while neural rendering excels at photorealistic view synthesis, the primary focus of this study lies in explicit geometric representation and structural detail preservation for inspection-oriented applications.
Comment 5:While the manuscript claims to enhance efficiency, it fails to provide detailed metrics on GPU memory usage, inference latency, or scalability for large-scale deployment.
Response 5:We thank the reviewer for pointing out the need for a more detailed discussion regarding computational efficiency and scalability.We acknowledge that the current manuscript does not provide explicit system-level metrics such as GPU memory consumption, inference latency, or large-scale deployment benchmarks. In this study, efficiency improvements are primarily discussed in terms of reconstruction time rather than hardware-specific performance evaluation.
In terms of modeling efficiency, 3DGS demonstrates significant advantages, with an average reconstruction time of approximately 20 minutes per transmission tower. Compared to traditional oblique photogrammetry methods, this represents a reduction of over 50%, greatly enhancing the efficiency of 3D reconstruction and providing robust technical support for the rapid digitalization of power facilities.
Comment 6:No sensitivity analysis was performed on the quality of the input data.
Response 6:We thank the reviewer for this valuable comment. In the revised manuscript, we have added a qualitative sensitivity analysis to discuss how variations in input data quality affect the proposed RGB-based 3D Gaussian Splatting reconstruction framework.
The performance of the proposed reconstruction framework is inherently influenced by the quality of the input RGB images. Several key data quality factors play a critical role in determining reconstruction stability and accuracy. Image resolution directly affects the level of geometric detail that can be preserved, particularly for slender structural components of transmission towers. Insufficient resolution may lead to incomplete or noisy Gaussian representations in fine-scale regions.
Image overlap and viewpoint diversity are especially critical for the SfM initialization stage. Reduced overlap or limited viewing angles can degrade feature matching robustness and lead to inaccurate camera pose estimation, which subsequently affects the convergence and quality of the 3D Gaussian Splatting optimization. Similarly, image blur caused by motion or defocus reduces feature repeatability and photometric consistency, further impacting reconstruction stability.
Despite these sensitivities, the proposed 3DGS-based framework exhibits a certain degree of tolerance to moderate variations in input data quality due to its multi-view optimization strategy and continuous scene representation. Nevertheless, extreme degradation in data quality, such as severely insufficient overlap or poor image sharpness, remains challenging for the current RGB-based pipeline. A systematic quantitative sensitivity analysis under controlled data degradation conditions will be explored in future work.
Comment 7:The dataset is not publicly available, and the paper provides limited details on the optimization parameter settings for 3DGS.
Response 7:Thank you for your valuable feedback. We have considered your question and are providing the following response.
Our data was captured by our team using drones in Luoning for the State Grid project on power towers. As the project is still ongoing, we cannot disclose the data publicly. Once completed, we will release the data transparently. We appreciate your understanding.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe changes made by the authors revised clarification of abbreviations, improvement of the introduction, and expansion of the discussion on RGB analysis, environmental factors, and other relevant aspects are very impressive and have significantly improved the quality of the manuscript. The explanation of the RGB-based 3D model performance is now much clearer and more useful for the engineering community.
I am satisfied with the revised version of the manuscript and recommend its acceptance for publication in Sustainability Journal.
Author Response
Dear Reviewers,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits. We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.
To facilitate this discussion, we first retype your comments in italic font and then present our responses to the comments.
As you will see in the revised manuscript and this following document, major revisions have been shown as the blue text, and the former version were shown as the red text.
Comment 1:The changes made by the authors revised clarification of abbreviations, improvement of the introduction, and expansion of the discussion on RGB analysis, environmental factors, and other relevant aspects are very impressive and have significantly improved the quality of the manuscript. The explanation of the RGB-based 3D model performance is now much clearer and more useful for the engineering community.
Response 1:We sincerely thank the reviewer for the positive and encouraging feedback. We are pleased that the revisions addressing abbreviation clarification, introduction refinement, and expanded discussions on RGB-based analysis and environmental factors have effectively improved the clarity and technical value of the manuscript. We greatly appreciate the reviewer’s recognition of the improved presentation and engineering relevance of the RGB-based 3D modeling framework. These constructive comments have been highly motivating and helpful in strengthening the overall quality of our work.
Comment 2:I am satisfied with the revised version of the manuscript and recommend its acceptance for publication in Sustainability Journal.
Response 2:We sincerely thank the reviewer for the positive evaluation and recommendation. We greatly appreciate the reviewer’s time and effort in assessing the revised manuscript, and we are pleased that the revisions have met the expectations and standards of Sustainability.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors1-The discussion now explicitly mentions dynamic factors (e.g., wind-induced oscillations, fog, rain) and imaging assumptions (≥80% overlap, clear weather). However, these remain theoretical considerations. No empirical tests or simulations under adverse conditions were conducted, leaving practical robustness unverified. If more tests are not applicable, please add this topic to the discussion.
2-The revision adds reconstruction time (~20 minutes per tower) but omits GPU memory usage, inference latency, and scalability benchmarks. These metrics are critical for assessing deployment feasibility in large-scale industrial settings.
3-The manuscript provides extensive algorithmic detail but offers limited discussion on integration into operational workflows.
4-While Figures 7–9 illustrate visual differences, the evaluation relies heavily on qualitative observations. Additional metrics (e.g., reconstruction completeness percentage, occlusion handling scores) would provide stronger evidence.
5-Recommended references to be added:
a-Gaussian Splatting for Automated Video-to-3D Building Energy Modeling
b-Empowering the grid: A comprehensive review of artificial intelligence techniques in smart grids
c-Performance Evaluation and Optimization of 3D Gaussian Splatting in Indoor Scene Generation and Rendering
6-Future Work Section Could Be More Specific
Author Response
Dear Reviewers,
Thank you very much for your time involved in reviewing the manuscript and your very encouraging comments on the merits. We also appreciate your clear and detailed feedback and hope that the explanation has fully addressed all of your concerns. In the remainder of this letter, we discuss each of your comments individually along with our corresponding responses.
To facilitate this discussion, we first retype your comments in italic font and then present our responses to the comments.
As you will see in the revised manuscript and this following document, major revisions have been shown as the blue text, and the former version were shown as the red text.
Comment 1:The discussion now explicitly mentions dynamic factors (e.g., wind-induced oscillations, fog, rain) and imaging assumptions (≥80% overlap, clear weather). However, these remain theoretical considerations. No empirical tests or simulations under adverse conditions were conducted, leaving practical robustness unverified. If more tests are not applicable, please add this topic to the discussion.
Response 1:We thank the reviewer for this insightful comment. We fully agree that empirical evaluation under adverse environmental conditions would provide valuable evidence regarding the robustness of the proposed method. In the current study, our primary objective is to investigate the feasibility and effectiveness of a 3D Gaussian Splatting–based reconstruction framework for power transmission towers under typical inspection conditions, which are generally planned and executed under relatively stable weather and imaging environments in real-world power grid operations.We have revised the manuscript to clearly frame these factors as limitations of the current study and to outline future research directions, including motion-aware modeling, uncertainty-aware pose optimization, and multi-sensor fusion, which could improve robustness under challenging environmental conditions. We believe this clarification appropriately delineates the scope of the present work while maintaining methodological rigor.
The present study focuses on validating the effectiveness of 3DGS for power tower reconstruction under typical UAV-based inspection conditions, which are commonly scheduled under relatively stable weather and imaging environments in practical power grid operations.
Before revision:
To address these gaps, future work should incorporate multi-sensor fusion strategies to enhance the robustness of data collection and the method’s adaptability to complex, dynamic scenes.
After revision:
Dynamic factors such as cable vibration, wind-induced oscillation, and short-term environmental changes are not explicitly modeled in the current framework. These factors are therefore discussed at a conceptual level, and their quantitative impact on reconstruction accuracy remains to be systematically evaluated in future work.
Comment 2:The revision adds reconstruction time (~20 minutes per tower) but omits GPU memory usage, inference latency, and scalability benchmarks. These metrics are critical for assessing deployment feasibility in large-scale industrial settings.
Response 2:We thank the reviewer for this important comment. We agree that GPU memory usage, inference latency, and large-scale scalability are highly relevant metrics for system-level deployment in industrial scenarios. In the current study, however, our primary focus is on evaluating the reconstruction efficiency and geometric representation capability of 3D Gaussian Splatting for power transmission tower modeling, rather than on low-level hardware optimization or large-scale system benchmarking. Accordingly, the reported reconstruction time per tower is intended to provide a practical reference for workflow-level efficiency, rather than a comprehensive performance profiling of GPU resource consumption. Detailed measurements of GPU memory usage and inference latency were not systematically recorded, as the proposed framework was not specifically optimized or stress-tested for large-scale parallel deployment in this stage of the research. To address this limitation, we have revised the Discussion and Future Work section to explicitly acknowledge that GPU-level efficiency metrics and scalability benchmarks are important considerations for industrial deployment, and that these aspects will be investigated in future work, including memory-efficient Gaussian representations, batch-level optimization, and distributed reconstruction strategies. We believe this clarification more accurately defines the scope of the present study while avoiding overinterpretation of the reported results.
Although reconstruction time is reported to provide a practical reference for workflow efficiency, the present study does not conduct a detailed analysis of GPU memory consumption, inference latency, or large-scale scalability. These system-level performance metrics are critical for industrial deployment but are beyond the primary scope of this work, which focuses on reconstruction quality and inspection-oriented applicability. Future research will investigate memory-efficient Gaussian representations, parallel processing strategies, and scalability benchmarking to support large-scale deployment scenarios.
Comment 3:The manuscript provides extensive algorithmic detail but offers limited discussion on integration into operational workflows.
Response 3:We thank the reviewer for this valuable comment. We agree that clarifying how the proposed method fits into practical operational workflows is important for real-world applicability. In the revised manuscript, we have expanded the discussion to explicitly describe how the proposed 3D Gaussian Splatting–based reconstruction framework can be integrated into a typical UAV-based power transmission tower inspection workflow.
From an operational perspective, the proposed framework can be integrated into existing UAV-based power tower inspection workflows with minimal modification. After standard UAV data acquisition under planned inspection conditions, the captured RGB images can be processed using conventional SfM pipelines for camera pose estimation. The resulting poses and images are then directly used for 3DGS optimization, replacing the dense reconstruction, meshing, and texture mapping stages commonly required in traditional photogrammetry. The reconstructed 3D models can be readily visualized and inspected, supporting tasks such as structural assessment, condition documentation, and digital asset management.
The proposed method is designed as a reconstruction module that can be seamlessly embedded into standard UAV-based inspection workflows, following conventional data acquisition and camera pose estimation steps.
Comment 4:While Figures 7–9 illustrate visual differences, the evaluation relies heavily on qualitative observations. Additional metrics (e.g., reconstruction completeness percentage, occlusion handling scores) would provide stronger evidence.
Response 4:We thank the reviewer for this constructive suggestion. We agree that quantitative metrics can provide complementary evidence for evaluating reconstruction performance. However, in the context of real-world transmission tower reconstruction, defining reliable and objective quantitative indicators such as reconstruction completeness or occlusion handling remains challenging due to the absence of accurate ground-truth 3D models and the highly complex, slender, and repetitive structural characteristics of the towers. In the present study, the evaluation therefore emphasizes qualitative visual comparison, which is commonly adopted in infrastructure inspection scenarios to assess structural completeness, continuity of slender components, and visual interpretability for inspection purposes. Rather than introducing potentially subjective or proxy quantitative metrics without reliable ground truth, we focus on visual evidence that directly reflects inspection-oriented requirements. To address the reviewer’s concern, we have expanded the Discussion section to more explicitly explain the rationale behind the qualitative evaluation and to discuss potential quantitative metrics, such as completeness ratios and occlusion-aware scores, as important directions for future work when suitable ground-truth data or standardized benchmarks become available. We believe this clarification better reflects the practical constraints of real-world inspection scenarios while maintaining methodological rigor.
It should be noted that the evaluation in this study primarily relies on qualitative visual comparison. For real-world transmission tower scenes, obtaining accurate ground-truth 3D models for defining objective metrics such as reconstruction completeness or occlusion handling scores is non-trivial. Given the complex geometry and slender structural elements of transmission towers, qualitative assessment remains a practical and widely adopted approach for judging structural integrity and inspection suitability. Nevertheless, the development of standardized quantitative metrics and benchmark datasets is an important direction for future work.
Comment 5:Recommended references to be added:
a-Gaussian Splatting for Automated Video-to-3D Building Energy Modeling
b-Empowering the grid: A comprehensive review of artificial intelligence techniques in smart gridsb
c-Performance Evaluation and Optimization of 3D Gaussian Splatting in Indoor Scene Generation and Rendering
Response 5:We thank the reviewer for these valuable suggestions. We agree that the recommended references are highly relevant and help better position our work within the broader context of 3D Gaussian Splatting applications and intelligent infrastructure systems. In the revised manuscript, we have incorporated these references at appropriate locations.
From the perspective of power system operation and maintenance, artificial intelligence techniques have been widely recognized as key enablers of smart grid development. Comprehensive reviews on AI-driven smart grid technologies emphasize the growing demand for intelligent perception, inspection, and decision-support systems, motivating the need for efficient and accurate 3D reconstruction methods for power infrastructure [3].
Recent studies have demonstrated the applicability of 3DGS beyond conventional scene reconstruction. For example, automated video-to-3D building energy modeling based on Gaussian Splatting has shown promising potential for large-scale built environment analysis, indicating that explicit Gaussian representations can support not only geometric reconstruction but also downstream infrastructure-related applications [22].
Previous studies have also investigated the performance characteristics and optimization strategies of 3DGS in indoor scene generation and rendering. These works provide valuable insights into efficiency–quality trade-offs, which complement our analysis in outdoor transmission tower scenarios [26].
Elkholy, Marwa, et al. Empowering the grid: A comprehensive review of artificial intelligence techniques in smart grids. 2024 International Telecommunications Conference (ITC-Egypt). IEEE, 2024.
Chowdhury, Soumyadeep, et al. Gaussian Splatting for Automated Video-to-3D Building Energy Modeling. Available at SSRN 5149707 (2025).
Fang X ,Zhang Y ,Tan H , et al.Performance Evaluation and Optimization of 3D Gaussian Splatting in Indoor Scene Generation and Rendering[J].ISPRS International Journal of Geo-Information,2025,14(1):21-21.
Comment 6:Future Work Section Could Be More Specific
Response 6:We appreciate the valuable suggestions provided by the reviewers. We acknowledge that a more detailed and structured description of future research would help clarify the potential directions for expanding this study. In the revised manuscript, we have expanded the section on Future Work.
Before revision:
Future work will explore the integration of polarization cues into the 3DGS framework to enhance robustness under challenging illumination conditions and improve material-aware reconstruction.
Overall, this study demonstrates that 3DGS provides an effective and scalable foundation for high-quality 3D reconstruction of power transmission towers, and future extensions will focus on enhancing robustness, generalization, and applicability in complex real-world environments.
After revision:
Future research will further explore how the proposed reconstruction framework can be integrated into digital twin-based asset management systems to support the long-term sustainable operation of transmission infrastructure. Simultaneously, efforts will be made from multiple perspectives to enhance the applicability of the 3DGS framework. Firstly, although this study is based on static tower scenarios and ideal imaging conditions, extending the method to address dynamic factors such as wind-induced oscillations, fog, rain, and other adverse weather conditions remains a critical research direction. This may require the adoption of motion-aware modeling or robustness-enhancing optimization strategies.
Second, future studies will explore the incorporation of quantitative evaluation metrics, such as reconstruction completeness and occlusion-aware scores, once reliable ground-truth models or standardized benchmark datasets for transmission towers become available. Such metrics would enable more comprehensive and objective performance assessment.
Third, improving scalability and deployment efficiency for large-scale inspection scenarios will be investigated, including memory-efficient optimization strategies and incremental reconstruction pipelines suitable for continuous inspection tasks.
Finally, the integration of multimodal data, such as depth cues or polarization information, will be explored to further improve reconstruction robustness and inspection reliability under challenging environmental conditions.
Author Response File:
Author Response.pdf

