1. Introduction
Power towers serve as the backbone infrastructure of power transmission networks, and their safe operation and maintenance rely heavily on high-precision 3D models. According to statistics from the National Energy Administration, China’s total power transmission line mileage exceeded 1.2 million kilometers in 2023, with the number of power towers surpassing 5 million [
1]. With the rapid expansion of transmission networks, traditional manual inspection and coarse geometric modeling methods can no longer meet the requirements of efficiency, accuracy, and scalability in large-scale power grid management. During meshing, simplification and smoothing of geometric surfaces can damage or blur complex structures such as power lines and tree leaves. Additionally, texture mapping relying on mesh UV unwrapping often causes stretching, seams, and ghosting in intricate regions—exacerbated by light-shadow variations—which frequently leads to inconsistent textures. Consequently, the final model often suffers from defects like holes and distortions, and its enormous polygon count imposes a heavy burden on real-time rendering and network transmission, making it difficult to meet the demands of smart grids [
2]. From the perspective of power system operation and maintenance, artificial intelligence techniques have been widely recognized as key enablers of smart grid development. Comprehensive reviews on AI-driven smart grid technologies emphasize the growing demand for intelligent perception, inspection, and decision-support systems, motivating the need for efficient and accurate 3D reconstruction methods for power infrastructure [
3].
Currently, the 3D reconstruction technologies for power towers primarily encompass the following three methods, as presented in
Table 1:
Reconstruction methods based on oblique photogrammetry generate 3D models by capturing multi-view RGB images via UAV oblique photogrammetry [
4,
5]. They offer advantages such as low cost and wide coverage, but still face insurmountable challenges in power tower reconstruction:
First, they have insufficient detail resolution: limited by UAV flight altitude (usually ≥50 m) and camera resolution, fine details—including the edges of angle steels on the tower body and the strand structure of power lines—cannot be restored. Second, they have poor geometric accuracy: relying on Structure from Motion (SfM) to estimate camera pose [
6], weakly textured areas of power towers (e.g., smooth metal components) easily induce pose drift, causing the overall model error to often exceed 10 mm. Third, they have ineffective occlusion handling: the mutual occlusion rate between insulators and conductors at the tower top exceeds 40%, and oblique photogrammetry fails to infer the geometry of occluded regions, resulting in incomplete models. Additionally, this method requires a relatively large number of input images.
To address these limitations, research has focused on improving accuracy and detail: Flight parameter optimization: Zhong et al. [
7] determined via orthogonal experiments that a UAV flight altitude of 30 m and sidelap rate of 80% yield optimal model accuracy (Mean Absolute Error, MAE = 5.2 mm). However, these parameters are only effective for plain areas and are prone to degradation in mountainous terrains due to increased occlusion. Enhanced multi-view stereo (MVS) matching: He et al. [
8] proposed an edge-constrained MVS algorithm to improve the matching robustness of angle steel edges, reducing edge error by 20%. Nevertheless, millimeter-level details remain unrecoverable. Multi-modal data fusion: Zhang et al. [
9] fused oblique photogrammetry images with lightweight radar data, using radar depth information to assist SfM pose estimation and lowering the overall MAE to 4.1 mm. However, integrating radar data increases equipment complexity. Nonetheless, oblique photogrammetry cannot resolve the “resolution-flight altitude” tradeoff, and the reconstruction of thin structures (e.g., fine cables) and heavily occluded areas remains a persistent bottleneck.
LiDAR-based reconstruction methods acquire point clouds via laser ranging and can achieve millimeter-level precision reconstruction, making them one of the mainstream technologies for power tower modeling [
10,
11,
12]. However, this technique faces three major pain points:
High equipment costs, complex data processing workflows, and significant operational risks—particularly in environments with severe electromagnetic interference or complex terrain. Additionally, when capturing slender metal components such as transmission lines, LiDAR point clouds often encounter issues of data sparsity and reflection loss.
To address these issues, research has primarily focused on data preprocessing and modeling optimization: Point cloud denoising and completion: Li et al. [
13] proposed an improved RANSAC-based power tower point cloud segmentation algorithm, which removes noise points using the linear features of angle steels and achieves a segmentation accuracy of 89%. Xing et al. [
14] tackled point cloud voids by leveraging tower material symmetry to complete missing regions, with a completion error of <3 mm. However, this method is only applicable to regular symmetric structures. UAV LiDAR scanning: Chen et al. [
15] used tethered UAVs mounted with LiDAR scanners to enable autonomous inspection and point cloud acquisition via Global Positioning System + Inertial Measurement Unit integrated positioning, enhancing operational safety. Nevertheless, equipment costs exceed 1 million CNY, hindering large-scale deployment. The core limitations of LiDAR-based methods persist: insufficient point cloud density for slender structures, data loss from metal reflection, and unacceptably high costs and operational risks.
In recent years, neural rendering techniques—represented by Neural Radiance Fields (NeRF) [
16,
17]—have achieved high-quality scene reconstruction through implicit modeling. The core strength of neural rendering lies in its ability to learn continuous scene representations from real-world data and to generate realistic, controllable novel views, bridging traditional computer graphics pipelines and computer vision. Owing to these characteristics, NeRF-based methods have demonstrated strong performance in applications such as cultural heritage digitization, e-commerce visualization, and large-scale street view modeling.
In the context of power infrastructure modeling, neural rendering has also attracted increasing attention. Tian et al. [
18] proposed an attention-based NeRF framework for power towers, improving sampling efficiency by focusing on key regions of the tower head and reducing modeling time from 8 h to 2 h, although fine cable structures remained blurry. Lu et al. [
19] further integrated infrared and visible images into NeRF training to achieve joint geometry–temperature reconstruction for thermal fault diagnosis, at the cost of increased data acquisition complexity and sensor requirements.
Despite these advances, neural rendering methods primarily emphasize photorealistic view synthesis through implicit volumetric representations, rather than explicit geometric modeling. This distinction becomes particularly relevant for transmission tower inspection, where the accurate representation of slender components, lattice structures, and sharp geometric discontinuities is critical. Moreover, the deployment of NeRF-based methods in large-scale outdoor inspection scenarios remains constrained by computational efficiency and practical considerations. Early NeRF models require hours to days for per-scene optimization, and even accelerated approaches such as Instant-NGP [
20], while significantly reducing training time, still exhibit limitations in faithfully capturing fine structural details of thin elements such as cables and angle steels due to sampling sparsity. However, it should be noted that while neural rendering excels at photorealistic view synthesis, the primary focus of this study lies in explicit geometric representation and structural detail preservation for inspection-oriented applications.
To address the issues prevalent in the aforementioned 3D reconstruction methods:
- (1)
Pose estimation drift frequently occurs in weakly textured areas, leading to substantial model errors; meanwhile, these methods fail to reliably infer geometry in highly occluded regions, resulting in incomplete reconstructions.
- (2)
Techniques requiring close-proximity data acquisition are susceptible to interference from strong electric fields and entail high operational risks in complex terrains (e.g., mountainous areas, river crossings); furthermore, expensive equipment and time-consuming, labor-intensive acquisition and processing workflows limit their scalability for large-scale applications.
- (3)
Slow training and rendering speeds hinder compliance with real-time interactive application requirements, constituting a major barrier to transitioning these methods from laboratory research to industrial deployment.
Introduced by Kerbl [
21] in 2023, 3DGS emerges as a groundbreaking explicit neural rendering framework. By employing learnable 3D Gaussian elements to represent scenes and supporting efficient differential rasterization, it offers a promising alternative to traditional methods. Compared to implicit neural representations, 3DGS achieves significant acceleration in training processes and real-time rendering performance while preserving high-frequency geometric details and visual features. Recent studies have demonstrated the applicability of 3DGS beyond conventional scene reconstruction. For example, automated video-to-3D building energy modeling based on Gaussian Splatting has shown promising potential for large-scale built environment analysis, indicating that explicit Gaussian representations can support not only geometric reconstruction but also downstream infrastructure-related applications [
22]. Notable applications include the following: a joint team from Tsinghua University and the Beijing Institute of Technology applied 3DGS to the digitalization of cultural heritage in Vehicle-to-Everything scenarios. Using decomposed Gaussian splatting, they separated static backgrounds (e.g., ancient buildings) from dynamic elements (e.g., pedestrians, vehicles), enabling the generation of large-scale collaborative cultural heritage datasets. The Intelligent Perception Team at Jinan University proposed the Robust and Efficient 3DGS method, targeting 3DGS reconstruction for large-scale urban scenes (e.g., city streets, building complexes) [
23].
From a sustainability perspective, efficient and reliable inspection of power transmission infrastructure plays a critical role in supporting long-term asset management, energy security, and environmental responsibility. Transmission towers are widely distributed and typically located in complex terrains, making frequent manual inspections time-consuming, energy-intensive, and costly. Inefficient inspection workflows not only increase operational expenditures but also lead to unnecessary resource consumption and carbon emissions associated with repeated field surveys.
Digital reconstruction and visualization technologies provide a promising pathway toward more sustainable infrastructure management by enabling condition assessment, preventive maintenance, and lifecycle-oriented decision-making in a virtual environment. In particular, high-efficiency 3D modeling techniques can significantly reduce on-site inspection frequency, improve fault detection accuracy, and support data-driven maintenance strategies, thereby extending the service life of power transmission assets. Within this context, developing fast, accurate, and scalable 3D reconstruction methods based on lightweight data acquisition platforms, such as UAV-based RGB imaging, is of great significance for sustainable power grid operation. By improving reconstruction efficiency and reducing computational and operational overhead, such methods contribute to resource-efficient infrastructure monitoring and support the broader goals of sustainable development in smart grid systems.
Despite the potential of 3DGS for efficient real-time rendering and high realism in static scene reconstruction, its adaptability to large-scale, unstructured open scenes—such as power grid environments—remains underexplored. To address this gap, this paper proposes a multimodal collaborative acquisition and reconstruction scheme integrating 3DGS with UAV oblique photogrammetry, incorporating 3DGS’s efficient scene representation capabilities into the 3D modeling workflow of complex open scenes like power grids. Specifically, by optimizing the structured circumferential flight path planning of UAVs, we simultaneously acquire high-resolution RGB images. In addition to conventional RGB imagery, polarization information has been reported to provide complementary cues for material perception and illumination analysis. However, the present study focuses on an RGB-based reconstruction pipeline, and the integration of polarization cues is left for future investigation. This enables high-precision, full-coverage data capture of the power tower’s main structure, key components (e.g., tower material connection nodes, insulator strings), and surrounding environment. Furthermore, we leverage the 3DGS technique to perform joint 3D scene reconstruction of the above multimodal data, aiming to resolve the longstanding issues of insufficient reconstruction accuracy and weak dynamic adaptability that plague traditional methods in open scenes [
24]. Compared with traditional photogrammetry, the proposed method demonstrates clear advantages in efficiency and reconstruction quality.
2. Research Methodology
3DGS achieves continuous and seamless representation of complex scenes by “splatting” discrete point clouds into 3D space in the form of Gaussian distributions [
25], and it can synthesize realistic scene views from arbitrary viewpoints. This splatting and fusion mechanism based on Gaussian point clouds not only constructs highly visually realistic 3D environments but also exhibits strong independence from lighting variations, making it suitable for various real-world applications. For the 3D model reconstruction of power towers, this study adopts a systematic technical workflow of “UAV data acquisition → data preprocessing → 3DGS reconstruction,” as illustrated in
Figure 1.
- (1)
Multi-view Image Data Acquisition: A UAV equipped with an oblique photogrammetry system is used to perform multi-angle (vertical + oblique) image acquisition over the target scene, yielding a raw image dataset covering the study area. Oblique photogrammetry compensates for the texture blind spots of vertical observations by enriching viewing angles, providing multi-view geometric constraints and high-redundancy texture information for subsequent 3D reconstruction.
- (2)
Data Preprocessing: Filtered images undergo feature extraction and matching. Using the Structure from Motion technique, precise camera poses and a sparse point cloud are recovered, providing reliable geometric initial values and spatial position information for the 3D Gaussian representation. Subsequently, Bundle Adjustment is applied to optimize overall consistency, ensuring high-quality input for subsequent 3DGS processing.
- (3)
Three-dimensional Gaussian Reconstruction: Using the sparse point cloud generated by SfM as the geometric carrier, a 3D Gaussian ellipsoid is initialized for each 3D point within it. Through multi-round iterative optimization involving differentiable rendering and adaptive density control, attributes such as position, covariance, color, and transparency are gradually refined. This process ultimately produces a high-fidelity, structurally complete 3D point cloud model of the power tower.
2.1. UAV Data Acquisition and Preprocessing
In response to the inherent challenges of power towers—including high-altitude operation complexity, slender structural geometry, and severe self-occlusion—this paper proposes a collaborative acquisition scheme integrating UAV aerial photography and oblique photogrammetry. A full-frame RGB camera (Sony 7RM3A, 42 million effective pixels) equipped with a 16–28 mm wide-angle lens was selected as the core sensor to ensure high-fidelity imaging capability.
During data acquisition, the UAV was operated at a relatively stable flight altitude to ensure sufficient coverage of the transmission tower while maintaining adequate image resolution. The onboard camera was oriented with an oblique viewing angle to capture both the vertical structure and lateral details of the tower, thereby reducing occlusions and improving multi-view visibility of slender components. The flight speed was controlled to avoid motion blur and to maintain consistent image overlap between consecutive frames.
During data collection, our drones operated at 50–150 m altitude with a constant speed of 6 m per second. Utilizing multi-height 360-degree circular scanning technology, we established a three-tiered system: tower body scanning to cover the main structure of transmission towers; top-focused acquisition to precisely capture fine components like bolted connections and insulator clamps; and transmission line extension mapping to obtain the geometric configuration of overhead conductors and guy wires. Notably, we maintained an adjacent image overlap rate of ≥80%, ensuring not only comprehensive coverage of the tower body and surrounding areas without blind spots, but also maintaining consistency and integrity between global (whole-tower) and local (component-level) data.
These flight parameters were selected to achieve high image overlap in both along-track and cross-track directions, which is critical for robust feature matching and accurate camera pose estimation in the Structure from Motion (SfM) process. Sufficient overlap increases the number of shared visual features across views, thereby enhancing the stability of bundle adjustment and reducing reconstruction drift.
Data acquisition was conducted under relatively favorable environmental conditions, including stable illumination and low wind speed, to minimize image degradation and platform vibration. Such conditions help ensure image sharpness and consistent appearance across views, which further contributes to reliable SfM reconstruction and stable initialization for subsequent 3D Gaussian Splatting optimization. All the flight days we selected were perfectly clear and sunny.
Although polarization data were synchronously captured during UAV flights, the current reconstruction pipeline relies primarily on RGB images, and polarization information is not directly involved in the SfM or 3DGS optimization stages.
Inputs are multi-view sequential images of power towers captured by unmanned aerial vehicles. Images containing both the tower structure and power lines—with an overlap rate exceeding 80%—are filtered to ensure sufficient parallax and full coverage of all angles and details of the target. After processing these images with Structure from Motion, two key outputs are generated: a .plypoint cloud file containing tens of thousands of 3D points with XYZ coordinates and RGB values, which represents the sparse geometry and appearance of the power tower. A parameter file (cameras.json) that records the intrinsic parameters (image size, focal length) and extrinsic parameters (position, orientation) of each image. This file provides a precise imaging geometry basis for subsequent dense reconstruction or novel view synthesis.
Preprocessing, a critical step in 3D reconstruction, directly governs the accuracy of both SfM and the final 3D model. The workflow commences with rigorous screening of multi-view images to ensure they are sharp, exhibit consistent exposure, and maintain an overlap rate exceeding 60%—a prerequisite for reliable feature matching. Subsequently, key points and their feature descriptors are extracted using algorithms such as Scale-Invariant Feature Transform. Image correspondences are then established via either exhaustive matching or a vocabulary tree approach. During the sparse reconstruction phase, geometric verification with Random Sample Consensus (RANSAC) eliminates false matches; incremental reconstruction initializes from the optimal image pair, gradually expanding the point cloud; and bundle adjustment synchronously optimizes both 3D point positions and camera parameters. The final outputs include camera intrinsic/extrinsic parameters, and the sparse point cloud—delivering a precise geometric foundation for subsequent dense modeling.
2.2. 3DGS Scene Reconstruction
Based on the SfM-derived sparse point cloud and camera poses, a 3D Gaussian Splatting pipeline is adopted as the core reconstruction framework, with adaptations tailored to transmission tower structures and UAV-based inspection scenarios.
The overall processing flow and core architecture of 3DGS primarily encompass six key modules, SfM Point Cloud Initialization, 3D Gaussian Ellipsoid Set Initialization, 3D Ellipsoid Parameter Projection, Rasterized Image Rendering, Loss Calculation, and Adaptive Density Control, collectively forming a typical multi-iterative, differentiable process of forward rendering and backward optimization.
Specifically, first, the SfM technique recovers a sparse 3D point cloud and the precise poses of cameras from all input views, providing an initial geometric estimate of the scene. Subsequently, these initial 3D points are converted into a set of learnable Gaussian ellipsoid primitives: each point is expanded into an independent 3D Gaussian distribution, endowed with attribute parameters such as spatial position, covariance, color, and opacity. These Gaussian distributions then enter a differentiable rasterization rendering pipeline, where their parameters are continuously optimized via gradient descent to minimize the discrepancy between synthetic and real images, ultimately reconstructing the original scene’s geometry and appearance with high fidelity. The 3DGS modeling pipeline is illustrated in
Figure 2.
- (1)
SfM Point Cloud Initialization
The primary step in 3D geometric scene reconstruction is recovering the initial sparse geometric structure from multi-view 2D images. This study employs the Structure from Motion (SfM) technique, which achieves accurate recovery of the scene’s sparse 3D point cloud and corresponding camera poses by performing cross-view feature association and solving geometric constraints on sequential images. As a classic multi-view geometry-based 3D reconstruction method, SfM’s core pipeline can be decomposed into the following key stages:
① Feature Extraction and Cross-View Matching
First, for the input multi-view image sequences (acquired via UAV oblique photogrammetry in this study, comprising vertical + oblique images), we employ the SuperPoint deep learning feature extractor and SuperGlue graph neural network matcher to extract local image features and establish cross-view feature correspondences. Compared to traditional SIFT/SURF algorithms, this combination significantly enhances the robustness of feature matching in low-texture areas—such as the metal surfaces of power towers—while effectively reducing the interference of false matches in subsequent geometric reconstruction.
② Triangulation and Initial Geometry Recovery
Based on the successfully matched feature point pairs, the relative pose (rotation matrix Rand translation vector t) between two views is estimated using the fundamental matrix. The triangulation principle is then applied to recover the 3D coordinates of corresponding feature points, forming the initial sparse point cloud. During this stage, RANSAC is employed to filter out mismatched point pairs, ensuring the geometric reliability of the initial point cloud.
③ Tool Implementation and Output
The above SfM pipeline is implemented using the open-source COLMAP framework. COLMAP integrates efficient modules for feature extraction, cross-platform matching, and sparse Bundle Adjustment optimization, supporting parallel computation for large-scale image data. It outputs: high-precision sparse point clouds (stored in .plyformat, containing 3D coordinates and RGB color information) and camera extrinsic parameter files.
The sparse point cloud output by SfM provides critical initial geometric anchors for the subsequent 3D Gaussian representation: The initial positions of Gaussian ellipsoids can be directly anchored to the coordinates of the SfM point cloud, inheriting its spatial topological relationships; Meanwhile, camera pose parameters—serving as the foundation of the projection matrix—support the differentiable rendering and backward optimization processes that map 3D Gaussians to 2D images. The accuracy of this initial geometric information directly dictates the convergence speed of 3D Gaussian reconstruction and the global consistency of the final model. It thus acts as the geometric bedrock for subsequent optimization of Gaussian ellipsoid parameters.
- (2)
Initialization of the 3D Gaussian Ellipsoid Set
Based on the sparse point cloud reconstructed by SfM, each 3D point is initialized as a 3D Gaussian ellipsoid with explicit geometric attributes. Each Gaussian ellipsoid is defined by its mean μ, 3 × 3 covariance matrix Σ (shape and orientation), opacity α, and Spherical Harmonics coefficients for view-dependent appearance. This step converts the discrete point cloud into a continuous, differentiable explicit 3D representation, laying the foundation for subsequent parameter optimization.
- (3)
Projection of 3D Ellipsoid Parameters
To project 3D Gaussian ellipsoids onto 2D images, this process relies on the camera imaging model. It is implemented through affine approximation and perspective projection transformation: First, Gaussian ellipsoids are transformed into the camera coordinate system using camera extrinsic parameters. Then, the 2D covariance on the image plane is computed via the projection Jacobian matrix, which determines the influence range and shape of the Gaussian ellipsoids in screen space.
- (4)
Rasterized Image Rendering
3DGS employs a differentiable rasterization method to synthesize images using Gaussian ellipsoids as rendering primitives. Unlike traditional triangle rasterization, this approach utilizes a tile-based rendering pipeline to efficiently collect all Gaussians influencing the current pixel, then performs alpha blending in depth order to compute pixel colors. This differentiable rendering process enables optimization of Gaussian ellipsoid attribute parameters via gradient descent, driven by the discrepancy between synthetic and real images.
- (5)
Loss Calculation
The 3DGS algorithm adopts a hybrid loss function that combines pixel-wise differences and structural similarity to measure the discrepancy between rendered images and ground-truth input images. Its mathematical expression is given in Equation (1):
In the formula: L denotes the total loss value; L1 is the L1 loss term, which computes the absolute error in pixel intensity between the rendered image and the ground-truth image, emphasizing color fidelity and pixel-level precise matching; LD-SSIM is the loss term based on the Structural Similarity Index Measure (SSIM), used to measure the perceptual similarity of two images in structural information, brightness, and contrast. This term helps preserve the visual naturalness and structural integrity of the reconstructed result; λ is a weight coefficient (ranging from 0 to 1) that adjusts the proportion of the L1 loss and LD-SSIM loss in the total loss, thereby balancing the influence of different error metrics during optimization.
- (6)
Adaptive Density Control
After loss calculation, gradients propagate backward through the differentiable rendering pipeline to iteratively optimize parameters of 3D Gaussian ellipsoid primitives (position, covariance, opacity, color, etc.). Regions with insufficient reconstruction quality become optimization priorities due to their large gradient magnitudes. The mechanism addresses two typical under-optimization scenarios accordingly:
① Under-reconstruction: Current Gaussian ellipsoid primitives fail to effectively represent geometric details (e.g., fine structures) due to insufficient quantity or uneven distribution—manifested as large gradients. To mitigate this, the mechanism clones existing primitives and creates new Gaussian distributions at corresponding spatial locations, enhancing the representation of undersampled details.
② Over-reconstruction: A region may be covered by Gaussians, but details are lost because the primitives are oversized or overly coarse. The mechanism resolves this by splitting these primitives into smaller, finer distributions, which improves local resolution and preserves subtle geometric features.
This dynamic adjustment of Gaussian density and distribution significantly boosts the efficiency of expressing complex details and the accuracy of final reconstruction.
3. Experiments and Analysis
The proposed method is designed as a reconstruction module that can be seamlessly embedded into standard UAV-based inspection workflows, following conventional data acquisition and camera pose estimation steps.
The complex lattice structure, slender components, and surrounding environmental conditions of transmission towers pose significant challenges for accurate image-based 3D reconstruction. These factors motivate the need for a reconstruction framework capable of handling fine structures and occlusions effectively.
To validate the effectiveness and applicability of the proposed 3DGS-based reconstruction method in practical power infrastructure modeling, this study selected three power towers with diverse structural types from the Luoning region as experimental subjects and performed high-fidelity 3D reconstruction on each. These towers exhibit distinct structural forms, effectively representing common typical tower types in actual power grids, and were sequentially labeled as A, B, and C. Their specific geographical locations are shown in
Figure 3, which clearly indicates the spatial positions of each tower and their on-site environments. This map provides geographical context for subsequent data acquisition and analysis of reconstruction results.
As shown in
Figure 4. Tower A is a symmetrical metal tower frame of the double-circuit tower type, with a circular base at the bottom and high geometric regularity. Its symmetrical structure provides stable “repeating patterns” for cross-view feature matching in SfM, facilitating the extraction of high-confidence feature points (e.g., tower frame nodes, antenna brackets) and reducing the false matching rate. The foreground consists of reddish-brown farmland, while the background comprises vast green vegetation, creating high-contrast color tones of “red-green-silver-gray.” The ridge and furrow textures of the farmland and vein details of the vegetation supply dense local features, enhancing the geometric constraints of the SfM sparse point cloud. Additionally, the continuous texture of the green vegetation assists in the depth estimation of camera poses, further improving the accuracy of the point cloud.
Tower B features a metal conical tower body with multiple sets of antennas or equipment mounted at the top. Its auxiliary structures (e.g., antenna brackets, bolts) are rich in fine details. Such small-scale components pose challenges to close-range feature matching in SfM but simultaneously provide geometric anchors for the “fine-grained attribute modeling” of 3DGS. The scene comprises reddish-brown terraced fields in the foreground and green forests in the distance, forming three distinct layers of depth: near, middle, and far. The ridge lines of the terraced fields (geometric constraints) and the contours of the forest (texture gradients) collectively enhance the scene’s depth perception, which aids SfM triangulation in recovering a more precise sparse point cloud. Additionally, the continuous green tone of the forest reduces background interference, improving the efficiency of extracting subject features from the tower.
Tower C is a metal tower frame with a concrete base at the bottom, featuring a simple yet robust structure. The tower body lacks complex auxiliary equipment, with geometric features concentrated on tower frame nodes and extended power lines (radiating in four directions). Such a “low-redundancy” structure reduces the complexity of feature matching in SfM but poses higher requirements for feature extraction from “weakly textured regions” (e.g., the smooth tower body).
3.1. Experimental Environment
The algorithmic work of this study was implemented using Python 3.13 within the integrated development environment PyCharm Community Edition 2024.3.6. Detailed configurations of the experimental environment are provided in the relevant
Table 2.
3.2. SfM Point Cloud Initialization
In the 3DGS pipeline, the sparse point cloud obtained from SfM plays a crucial and foundational role. The sparse point cloud provides the initial positions of key feature points in the scene—such as corners and edge intersections of power towers—in 3D space. This serves as accurate geometric anchors for subsequent Gaussian distribution placement, avoiding random initialization of the model from scratch and significantly accelerating the convergence process.
Furthermore, 3DGS directly converts each 3D point in the sparse point cloud into the center of a 3D Gaussian ellipsoid. The initial covariance matrix of each Gaussian ellipsoid is typically derived from the distribution of neighboring points around it or set to an isotropic tiny ellipsoid, establishing a reasonable starting point for subsequent optimization.
Additionally, the sparse point cloud guides the adaptive density control process. During subsequent training, 3DGS performs cloning or splitting operations in under-reconstructed and over-reconstructed regions based on gradient information. The initial distribution provided by the sparse point cloud acts as a high-quality “seed” distribution for this process, enabling adaptive density control to more efficiently and accurately add Gaussian primitives in detail-required regions—rather than making ineffective attempts in blank areas.
Figure 5 presents the results of sparse point clouds generated for the three power towers.
3.3. Generate Gaussian Ellipsoid Set
The 3D Gaussian ellipsoid set serves as the central component of 3DGS. It is not merely a scene representation method but also an intelligent, optimizable, renderable, and editable representation model. Functioning as an explicit, differentiable representation of the scene, 3DGS avoids relying on implicit neural networks (e.g., NeRF) and instead explicitly models the entire scene using tens of thousands to millions of explicit, parameterized Gaussian ellipsoids.
Each Gaussian ellipsoid acts as an independent entity with well-defined attributes, including position, color, size, and orientation. This property enables direct analysis and editing of the scene—for example, moving, deleting, or modifying individual ellipsoids. Moreover, the attributes of each ellipsoid are differentiable parameters, meaning the system can optimize these parameters via gradient descent to progressively make rendered images resemble real photographs. The efficient rendering capability of 3DGS relies entirely on the characteristics of the Gaussian ellipsoid set.
Figure 6 presents the Gaussian ellipsoid sets generated for the three power towers.
3.4. Experimental Setup and Performance Evaluation
To objectively evaluate the reconstruction performance of different 3D methods for power tower modeling, this study selected two typical technical routes for comparative experiments: one is the mature oblique photogrammetry modeling technique, which represents the traditional photogrammetry pipeline based on SfM and MVS; the other is the 3DGS -based neural rendering reconstruction method proposed in this paper. The experiment utilized the same UAV-acquired power tower image dataset to perform 3D reconstruction on three power towers with diverse structures, followed by systematic performance comparison and analysis of the resulting models.
In the performance evaluation system for 3D power tower reconstruction, modeling completeness, real-time performance, and modeling detail are three interrelated and crucial core dimensions. Together, they form a multi-level comprehensive evaluation framework: Modeling completeness focuses on the accurate restoration of macrostructures and the integrity of components, serving as the foundation for model usability; real-time performance emphasizes training and rendering efficiency, determining the technology’s application potential in practical engineering; modeling detail centers on the fine characterization of micro-features, reflecting the model’s fidelity and depth of practical value.
These three dimensions comprehensively assess the practicality and technical level of reconstruction results from the perspectives of macrostructure, engineering efficiency, and micro-precision.
To visually demonstrate the reconstruction effects, the three images in
Figure 7, respectively, present the fine-grained modeling results of the 3DGS method for key local structures of power towers.
Figure 7A demonstrates excellent preservation of detailed components such as angle steel and power lines. In
Figure 7B, the truss-type tower frame (characterized by intersecting members and a multi-layer grid structure) and the concrete base (with a rectangular outline and surface texture) of the power tower exhibit clear edge contours and spatial topological relationships under the aerial view perspective.
Figure 7C shows that the model 3DGS achieves high-fidelity characterization of the complex truss-type steel structure of the power tower, with the spatial arrangement, intersection angles, and node connections of each member highly consistent with the geometric features of the actual engineering structure.
Figure 8 and
Figure 9, respectively, contrastively present the overall reconstruction results of the three power towers using 3DGS and oblique photogrammetry techniques in 3D visualization form. They clearly reflect the differences between the two methods in terms of geometric completeness, detail representation capability, and visual realism.
To quantitatively and qualitatively assess the performance disparities among different 3D reconstruction methodologies for power facilities, this study employs visual comparative analysis as a primary evaluation tool. This approach enables intuitive identification of strengths and limitations in model fidelity, structural integrity, and detail preservation across varying reconstruction techniques.
The three sets of high-resolution close-up views presented in
Figure 7 offer granular insights into the 3DGS reconstruction model’s superiority in recovering fine-grained geometric details of critical power tower components. Angle steel edges: The model accurately renders the sharp, angular boundaries of angle steel members, preserving their structural rigidity representation—a key requirement for stress analysis in engineering applications. Power line continuity: Overhead conductors are reconstructed with minimal gaps or discontinuities, maintaining their original spatial alignment and reflecting real-world operational conditions. Tower joint complexity: The intricate intersection zones of main chords, diagonal braces, and auxiliary members (e.g., crossarms and brace plates) are faithfully replicated, with local feature matches exhibiting sub-pixel accuracy. These joints, as primary load-bearing units, demand high fidelity to ensure the model’s mechanical plausibility.
Beyond component-level details,
Figure 8 and
Figure 9 extend the comparison to macro-scale structural performance, contrasting 3DGS outputs with those generated via oblique photogrammetry—a widely used alternative in power line modeling. Structural completeness: 3DGS models demonstrate full recovery of all primary components (tower body, base, conductors, and insulators) without omissions, whereas oblique photogrammetry occasionally misses smaller fixtures (e.g., climbing ladders or bird spikes) due to texture homogeneity or occlusion. Contour closure: The 3DGS model exhibits tightly sealed overall contours, with no floating or disconnected segments—a critical factor for ensuring the model’s usability in digital twin simulations requiring watertight meshes. Topological correctness: Spatial relationships between components (e.g., conductor attachment points to tower arms, insulator string alignment) are preserved with 100% accuracy in 3DGS reconstructions, enabling reliable downstream analyses such as clearance verification or fault location. This systematic visual comparison underscores 3DGS’s unique capability to balance micro-detail fidelity with macro-structural coherence—a balance often compromised in traditional photogrammetry-based methods. By addressing both component-level precision and systemic structural integrity, 3DGS emerges as a more robust solution for engineering-grade power tower reconstruction. In terms of detail representation, 3DGS leverages anisotropic Gaussian ellipsoids to flexibly adapt to geometric features of varying scales, achieving continuous and precise representation of components ranging from the robust tower body to slender cables—without obvious omissions of details or structural fractures. Compared with the mesh models generated by oblique photogrammetry, 3DGS exhibits higher edge sharpness at local features such as angle steel edges and bolt connection points, effectively avoiding common distortions in traditional methods (e.g., model bloating, detail smoothing). Furthermore, the view-dependent rendering mechanism based on Spherical Harmonics enables the model to present physically consistent specular and shadow variations under different observation angles, significantly enhancing the model’s stereo sense and visual realism.
As quantitatively demonstrated in
Table 3 and
Table 4, the 3DGS method exhibits significantly superior performance over oblique photography technology across three critical accuracy metrics for 3D scene rendering: SSIM, Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS). These quantitative gains substantiate the accuracy and applicability of the 3D Gaussian model in high-fidelity 3D scene rendering, particularly for geometrically complex industrial structures like power towers. The superiority of 3DGS stems from its explicit scene representation (via compact Gaussian primitives encoding geometry, color, and opacity) and differentiable rasterization pipeline, which jointly minimize rendering artifacts (e.g., blurring, texture distortion) common in traditional photogrammetric methods that rely on sparse feature matching and dense point cloud fusion.
Further analysis of the experimental results reveals a notable case study: Power Tower C outperforms the other two test subjects (Power Towers A and B) in both SSIM and PSNR. This discrepancy is attributed to the more comprehensive image dataset collected for Power Tower C. The increased image quantity and angular diversity provide richer geometric constraints for 3DGS’s Gaussian primitive optimization, enabling more accurate recovery of fine-scale structures and reducing ambiguity in texture mapping. This finding empirically validates a key principle in 3D reconstruction: greater image redundancy and coverage directly enhance reconstruction fidelity, particularly for objects with asymmetric geometries or occluded regions.
In terms of modeling efficiency, 3DGS demonstrates significant advantages, with an average reconstruction time of approximately 20 min per transmission tower. Compared to traditional oblique photogrammetry methods, this represents a reduction of over 50%, greatly enhancing the efficiency of 3D reconstruction and providing robust technical support for the rapid digitalization of power facilities.
In summary, the comparative results from
Table 3 and
Table 4 not only confirm 3DGS’s technical advantage over oblique photography in rendering accuracy but also highlight the critical role of input data quality (e.g., image quantity, viewpoint distribution) in determining reconstruction outcomes. For power tower digitization projects, this implies that optimizing UAV imaging strategies can maximize the benefits of 3DGS, ultimately delivering more reliable digital twins for engineering applications such as defect detection and load simulation.
4. Results
This study presents a systematic investigation of a 3D model reconstruction framework for power transmission towers based on 3DGS, with a focus on addressing practical limitations of traditional reconstruction pipelines in terms of efficiency, structural detail representation, and deployment suitability. Through a comprehensive methodology combining theoretical analysis, algorithm design, and experimental validation, several conclusions can be drawn.
Compared with traditional photogrammetry-based reconstruction pipelines, the proposed 3DGS-based approach demonstrates clear advantages in reconstruction efficiency and structural representation under UAV-based inspection scenarios. Conventional photogrammetry typically relies on multi-stage processing, including dense image matching, mesh reconstruction, and texture mapping, which often leads to long processing times and complex post-processing workflows. In contrast, the proposed method employs an explicit Gaussian representation coupled with differentiable rendering, enabling faster optimization and near real-time rendering once training is completed. Previous studies have also investigated the performance characteristics and optimization strategies of 3DGS in indoor scene generation and rendering. These works provide valuable insights into efficiency and quality trade-offs, which complement our analysis in outdoor transmission tower scenarios [
26].
In terms of reconstruction quality, traditional photogrammetry often encounters difficulties in preserving fine structural details of transmission towers, particularly slender components such as angle steels and cables, and may suffer from geometric artifacts and texture seams. By jointly optimizing geometry and appearance within a unified framework, the proposed method achieves more consistent structural completeness and visually coherent reconstructions, which are well suited for inspection-oriented applications. From a practical perspective, the simplified reconstruction pipeline reduces operational complexity and computational overhead, making the approach more suitable for large-scale and repeated inspections when combined with UAV-based data acquisition.
Rather than aiming to provide an exhaustive benchmark across all neural rendering paradigms, this study emphasizes practical applicability, geometric detail preservation, and deployment efficiency in real-world infrastructure inspection scenarios. From this perspective, the proposed 3DGS-based framework can be regarded as a practical and effective alternative to traditional photogrammetry for transmission tower reconstruction, while remaining complementary to neural radiance field-based approaches that prioritize photorealistic view synthesis. From a sustainability perspective, improving inspection efficiency and reconstruction reliability directly contributes to reducing unnecessary maintenance operations, lowering energy consumption, and extending the service life of power transmission assets.
5. Discussions and Future Work
The present study focuses on validating the effectiveness of 3DGS for power tower reconstruction under typical UAV-based inspection conditions, which are commonly scheduled under relatively stable weather and imaging environments in practical power grid operations.
Although our method demonstrates strong performance in experimental evaluations, it still has certain limitations. First, it is currently tailored for static power tower scenarios and struggles to effectively model cable structures under dynamic effects (e.g., breeze-induced vibrations). In dynamic settings, Gaussian parameters tend to exhibit “blurring effects,” leading to a noticeable decline in reconstruction accuracy. Second, under extreme weather conditions (e.g., heavy rain, dense fog), drone-based data acquisition is hindered—resulting in significant image quality degradation—that easily triggers failures in pose estimation. Dynamic factors such as cable vibration, wind-induced oscillation, and short-term environmental changes are not explicitly modeled in the current framework. These factors are therefore discussed at a conceptual level, and their quantitative impact on reconstruction accuracy remains to be systematically evaluated in future work.
The primary objective of this study is to validate the feasibility and effectiveness of the proposed framework under real-world inspection conditions, rather than conducting a statistically comprehensive evaluation across different regions. Although the selected transmission towers are located within the same region, they exhibit significant differences in structural components, spatial layouts, and surrounding terrain complexity. More importantly, the proposed method does not rely on prior knowledge of specific regions or handcrafted assumptions related to particular tower designs, which supports its potential applicability to other regions and other types of towers. From an operational perspective, the proposed framework can be integrated into existing UAV-based power tower inspection workflows with minimal modification. After standard UAV data acquisition under planned inspection conditions, the captured RGB images can be processed using conventional SfM pipelines for camera pose estimation. The resulting poses and images are then directly used for 3DGS optimization, replacing the dense reconstruction, meshing, and texture mapping stages commonly required in traditional photogrammetry. The reconstructed 3D models can be readily visualized and inspected, supporting tasks such as structural assessment, condition documentation, and digital asset management. Future work will extend the experimental validation to more diverse environments.
In addition, the proposed method is primarily developed for quasi-static transmission tower inspection scenarios. The core assumption is that the main structural components of the tower remain static during the data acquisition process. Dynamic factors such as cable vibration, wind-induced oscillation, and short-term environmental changes (e.g., fog or rainfall) are not explicitly modeled in the current framework. This assumption is consistent with most practical inspection workflows, where data collection is typically scheduled under relatively stable weather conditions, and structural analysis focuses on static components of the transmission tower. Under such conditions, the proposed geometry-aware representation and structural modeling strategy can effectively capture the spatial configuration of the tower. Nevertheless, it is recognized that dynamic motion and adverse environmental factors may introduce reconstruction inconsistencies and degrade segmentation performance. Addressing such challenges would require incorporating temporal information, motion-aware modeling, or multi-frame data fusion strategies. Extending the proposed method to handle dynamic scenes and complex environmental conditions will be an important direction for future work.
Furthermore, the proposed method implicitly assumes relatively favorable UAV imaging conditions, including sufficient image overlap (typically ≥80%) and stable weather, which are commonly required to ensure reliable camera pose estimation and high-quality 3D reconstruction. In practical deployment, adverse conditions such as strong illumination changes, fog, rain, or wind may degrade image quality and increase pose estimation uncertainty. Under such conditions, reduced image overlap or inaccurate camera poses may propagate errors into the reconstructed geometry and affect the subsequent structural analysis and segmentation results. While these factors are not explicitly addressed in the current framework, they represent common challenges in UAV-based inspection systems rather than limitations unique to the proposed method. Future work will focus on enhancing robustness under challenging imaging conditions by incorporating uncertainty-aware pose optimization, multi-view consistency constraints, robust feature representations, and potential multi-sensor fusion strategies. These extensions are expected to improve the practical applicability of the proposed method in real-world inspection scenarios.
The performance of the proposed reconstruction framework is inherently influenced by the quality of the input RGB images. Several key data quality factors play a critical role in determining reconstruction stability and accuracy. Image resolution directly affects the level of geometric detail that can be preserved, particularly for slender structural components of transmission towers. Insufficient resolution may lead to incomplete or noisy Gaussian representations in fine-scale regions.
Image overlap and viewpoint diversity are especially critical for the SfM initialization stage. Reduced overlap or limited viewing angles can degrade feature matching robustness and lead to inaccurate camera pose estimation, which subsequently affects the convergence and quality of the 3D Gaussian Splatting optimization. Similarly, image blur caused by motion or defocus reduces feature repeatability and photometric consistency, further impacting reconstruction stability.
Despite these sensitivities, the proposed 3DGS-based framework exhibits a certain degree of tolerance to moderate variations in input data quality due to its multi-view optimization strategy and continuous scene representation. Nevertheless, extreme degradation in data quality, such as severely insufficient overlap or poor image sharpness, remains challenging for the current RGB-based pipeline. A systematic quantitative sensitivity analysis under controlled data degradation conditions will be explored in future work.
Although reconstruction time is reported to provide a practical reference for workflow efficiency, the present study does not conduct a detailed analysis of GPU memory consumption, inference latency, or large-scale scalability. These system-level performance metrics are critical for industrial deployment but are beyond the primary scope of this work, which focuses on reconstruction quality and inspection-oriented applicability. Future research will investigate memory-efficient Gaussian representations, parallel processing strategies, and scalability benchmarking to support large-scale deployment scenarios.
It should be noted that the evaluation in this study primarily relies on qualitative visual comparison. For real-world transmission tower scenes, obtaining accurate ground-truth 3D models for defining objective metrics such as reconstruction completeness or occlusion handling scores is non-trivial. Given the complex geometry and slender structural elements of transmission towers, qualitative assessment remains a practical and widely adopted approach for judging structural integrity and inspection suitability. Nevertheless, the development of standardized quantitative metrics and benchmark datasets is an important direction for future work.
Future research will further explore how the proposed reconstruction framework can be integrated into digital twin-based asset management systems to support the long-term sustainable operation of transmission infrastructure. Simultaneously, efforts will be made from multiple perspectives to enhance the applicability of the 3DGS framework. Firstly, although this study is based on static tower scenarios and ideal imaging conditions, extending the method to address dynamic factors such as wind-induced oscillations, fog, rain, and other adverse weather conditions remains a critical research direction. This may require the adoption of motion-aware modeling or robustness-enhancing optimization strategies.
Second, future studies will explore the incorporation of quantitative evaluation metrics, such as reconstruction completeness and occlusion-aware scores, once reliable ground-truth models or standardized benchmark datasets for transmission towers become available. Such metrics would enable more comprehensive and objective performance assessment.
Third, improving scalability and deployment efficiency for large-scale inspection scenarios will be investigated, including memory-efficient optimization strategies and incremental reconstruction pipelines suitable for continuous inspection tasks.
Finally, the integration of multimodal data, such as depth cues or polarization information, will be explored to further improve reconstruction robustness and inspection reliability under challenging environmental conditions.