StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting

Su, Jinhe; Pan, Shengfang; Zhu, Huanxin; Chen, Siyu; Huang, Yaoming; Zhou, Yixin

doi:10.3390/rs18101460

Open AccessArticle

StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting

by

Jinhe Su

^*

,

Shengfang Pan

,

Huanxin Zhu

,

Siyu Chen

,

Yaoming Huang

and

Yixin Zhou

School of Computer Engineering, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1460; https://doi.org/10.3390/rs18101460

Submission received: 9 February 2026 / Revised: 2 April 2026 / Accepted: 6 April 2026 / Published: 7 May 2026

(This article belongs to the Special Issue Advances in 3D Reconstruction Based on Remote Sensing Imagery and Lidar Point Cloud)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

StitchGS combines stochastic interwoven stitching with global consistency refinement to reduce boundary cracks and cross-block appearance discontinuities in large-scale 3D Gaussian Splatting.
Spectral-aware adaptive compression with quantization-aware finetuning achieves 1.7×–4.0× model storage reduction while preserving rendering quality.

What are the implications of the main findings?

Reduced storage, transmission, and loading costs improve the deployment feasibility of large-scale 3DGS models under resource-constrained conditions.
The method offers a practical route for seamless and lightweight 3D reconstruction of city-scale scenes, with potential for digital twin, UAV photogrammetry, and remote sensing applications.

Abstract

While 3D Gaussian Splatting enables real-time rendering of large-scale scenes, its explicit representation leads to near-linear growth in storage requirements as scene scale expands. Furthermore, existing block-based strategies often suffer from geometric discontinuities and storage redundancy. To address these limitations, we present StitchGS, a high-fidelity and lightweight reconstruction scheme tailored for city-scale environments. To mitigate boundary artifacts caused by physical segmentation, we design a Stochastic Interwoven Stitching mechanism. This technique utilizes Oriented Bounding Boxes to define soft transition zones and employs a confidence-driven competition strategy to achieve smooth sub-pixel fusion of primitives within overlapping regions. To alleviate high storage costs, we further introduce a Spectral-Aware Adaptive Compression strategy. By analyzing the energy spectrum distribution of Spherical Harmonics, this method adaptively prunes redundant high-frequency parameters in diffuse regions. Moreover, it incorporates Quantization-Aware Fine-Tuning to balance storage efficiency with visual fidelity. Experiments demonstrate that StitchGS achieves 1.7×–4.0× storage reduction across our benchmarks while maintaining rendering quality competitive with state-of-the-art methods, enabling efficient deployment of large-scale scenes.

Keywords:

3D Gaussian Splatting; large-scale reconstruction; block fusion; model compression

1. Introduction

Remote sensing and UAV photogrammetry are developing rapidly. City-scale 3D reconstruction is now a key component of digital twins, mapping, and spatial analysis. In practice, the input is usually multi-view images. A photogrammetry pipeline based on Structure-from-Motion, abbreviated as SfM, estimates camera poses and produces a sparse point cloud [1,2,3,4,5,6]. These outputs provide geometric anchoring and scale cues. Under this setting, 3D Gaussian Splatting, also called 3DGS [7], enables real-time rendering with an explicit discrete representation. It often runs faster than many Neural Radiance Field methods [8,9,10]. However, the explicit model size grows nearly linearly with scene scale [7,11,12,13]. To fit city-scale data within a single GPU, recent work has widely adopted spatial divide-and-conquer training [11,12,13,14]. This strategy improves training scalability and supports square-kilometer scenes [15,16]. However, deployment requires more than scalable training. It requires seamless rendering after block merging. It also requires compact models for transmission and fast loading.

Current block-based pipelines expose structural issues that limit these goals [11,12,13,17]. In remote sensing and UAV-based mapping, reconstructed city-scale models are often used in interactive digital twin and GIS pipelines, where users continuously navigate across large areas and viewpoints [18,19,20,21]. In this setting, visual integrity is not only a matter of photorealistic rendering, but also a prerequisite for reliable inspection, annotation, and spatial analysis at scale. Therefore, seam artifacts introduced by block-wise training can directly degrade usability, as cracks, flickering, and exposure jumps may be amplified during free-viewpoint exploration. Meanwhile, the gigabyte-level storage of explicit 3DGS representations becomes a practical barrier to model delivery, online streaming, and fast loading on resource-limited clients. These requirements make “seamless merging” and “compact deployment” first-class objectives for city-scale 3D reconstruction in real-world remote sensing applications.

A major challenge is cross-block inconsistency caused by physical segmentation [11,12,13,17]. Most methods apply hard truncation with bounding boxes and optimize each block independently [12,13,14,16]. Without cross-block constraints, primitives near block borders often drift or duplicate [11,12,13]. This produces cracks on continuous surfaces after merging. It also leads to depth inconsistency. Overlapping primitives can cause flickering when the viewpoint changes. Independent training can further introduce color and illumination discontinuities across blocks. Figure 1 shows typical artifacts. The Building scene shows geometric cracks. The Rubble scene shows visible seams. These issues reduce the visual integrity required by city-scale digital twins [15].

Another bottleneck is storage and transmission costs [7,11,12,13]. City-scale 3DGS models contain millions of primitives and can occupy gigabytes [11,12,15]. This issue matters for model delivery, online loading, and resource-limited deployment. A key reason is the strong frequency variance of urban appearance. Large diffuse areas, such as roads and walls, are mostly low frequency. Fine structures such as vegetation, signs, and window grids contain richer high-frequency details. Standard 3DGS assigns the same Spherical Harmonics (SH) capacity across regions [7,22].

Low-texture diffuse areas then waste high-order SH parameters and store unnecessary high-frequency residuals. Detail-rich areas still need sufficient capacity to preserve high-frequency appearance [23,24,25,26]. Many compact 3DGS methods have been proposed [23,24,25,27]. However, the use of local spectral cues to suppress redundant high-frequency components in diffuse regions while preserving high-frequency details where needed remains underexplored [23,25,26].

Compared with existing methods, StitchGS is more directly oriented toward the post-merging deployment quality of city-scale block-based reconstruction. On the one hand, block-based 3DGS methods represented by BlockGaussian mainly focus on scene partitioning and scalable training, but they still rely heavily on rigid physical splitting during merging, which can leave geometric cracks, depth discontinuities, and cross-block appearance jumps near block boundaries. In contrast, StitchGS explicitly introduces stochastic interwoven stitching and global consistency refinement after block-wise training, turning hard-cut boundaries into smoother soft transition zones and thus improving the continuity of the merged model. On the other hand, compact 3DGS methods such as LightGaussian mainly target generic model compression, whereas StitchGS further considers the practical requirement of merge-then-deploy in city-scale scenes by performing spectral-aware adaptive compression while preserving cross-block continuity. A representative example can be found in the Rubble scene, where StitchGS reduces the model size from 2375.68 MB to 831.42 MB, improves Seam_PSNR from 23.05 dB to 28.41 dB, and decreases BPD_L1 from 0.0536 to 0.0281. Across all benchmarks, the overall storage reduction reaches 1.7×–4.0×. These results show that StitchGS not only significantly reduces storage overhead but also alleviates visible seams and discontinuities near block boundaries, making it more suitable for the transmission, loading, and deployment of city-scale 3D scenes.

To address these challenges, we present StitchGS. Our goal is to obtain a seamless merged model after block training. We also aim to reduce storage and transmission costs without losing important details. For boundary artifacts, we propose Stochastic Interwoven Stitching. It builds a confidence-driven probabilistic competition in overlap regions. Primitives compete within a soft transition band. This avoids hard cuts and reduces seams and cracks. For block-wise appearance bias, we introduce Global Consistency Refinement. It aligns color and illumination across blocks after stitching [11,12,13]. We also integrate quantization-aware fine-tuning to improve robustness under low-precision storage [24,25,26]. For storage redundancy, we introduce Spectral-Aware Adaptive Compression. It analyzes Spherical Harmonics (SH) energy and prunes redundant high-frequency residual parameters in diffuse regions [22,23,25]. We combine this with mixed-precision storage. This reduces model size while preserving high-frequency details [24,25,26].

The main contributions are summarized as follows:

We present StitchGS, a city-scale 3DGS method that targets both boundary discontinuities and storage redundancy.
We propose Stochastic Interwoven Stitching, which performs confidence-driven probabilistic competition in overlap regions to mitigate cross-block seams.
We introduce Spectral-Aware Adaptive Compression, which prunes redundant high-frequency Spherical Harmonics (SH) residuals in diffuse areas and combines mixed-precision storage for compact deployment.

2. Related Work

2.1. Scalable Urban Scene Reconstruction

Large-scale urban scene reconstruction demands algorithms capable of handling massive data with high efficiency and a low memory footprint. Early Neural Radiance Field works, including Block-NeRF [14], Mega-NeRF [16], and Switch-NeRF [28], pioneered spatial decomposition strategies based on geographical location or sparse grids. Others like Grid-NeRF [29] explored combining grid-based features to accelerate training, and related explicit factorization or plane-based representations further improved scalability and efficiency [30,31]. By dynamically loading sub-modules to bypass single-GPU memory limits, these methods successfully extended reconstruction to kilometer-scale areas. However, the high inference latency inherent to implicit representations limits their utility in real-time interactive scenarios [8,9,10]. The emergence of 3D Gaussian Splatting [7], with its rasterization-based real-time rendering capabilities, has prompted researchers to migrate the divide-and-conquer strategy to explicit representations [11,12,13,17,32]. For static city-scale scenes, subsequent research has mainly focused on scalable block-wise reconstruction and deployable compact representation.

To adapt to the explicit nature of 3D Gaussian Splatting, various block-based reconstruction schemes have recently emerged. VastGaussian [12] proposes a progressive partitioning strategy based on camera distribution density, distributing Gaussian primitives across different computing nodes for parallel optimization. CityGaussian [11] adopts a coarse-to-fine strategy, using a global low-fidelity model to guide the training of detailed blocks and introducing Level-of-Detail technologies to accelerate rendering [33]. BlockGaussian [13] further improves the partitioning logic by using sparse point cloud density to guide spatial slicing and introducing auxiliary points to enhance geometric completeness in unobserved regions.

Although these methods engineer solutions to memory overflow, their core limitation lies in an over-reliance on rigid physical segmentation without mechanisms for manifold-consistent fusion after partitioning [11,12,13,17]. Most existing approaches rely on Axis-Aligned Bounding Boxes (AABB) for hard spatial cutting [12,13,14,16]. This rigid partitioning creates a fundamental structural mismatch with complex urban topologies, such as non-orthogonal street layouts or irregular building facades. Consequently, continuous geometric surfaces like walls and pavements are inevitably truncated, leading to visible physical seams and abrupt illumination discontinuities at block interfaces after merging [11,12,13,17]. In contrast to these approaches, StitchGS focuses on restoring geometric continuity at boundaries through Stochastic Interwoven Stitching, effectively eliminating the artifacts caused by rigid partitioning.

2.2. Compactness and Efficiency in 3D Gaussian Splatting

While 3D Gaussian Splatting offers superior rendering speed, the storage requirements of its unstructured Gaussian primitives are significantly higher than those of implicit neural representations [7,8]. This has catalyzed extensive research into Gaussian primitive compression. Current compression methods can be broadly categorized into pruning [23,34,35,36], structured representation [27,32], and quantization [24,25,26]. Specifically, LightGaussian [23] calculates the global importance contribution of primitives to views to prune redundant Gaussians and distills the remaining attributes. Scaffold-GS [27] introduces an anchor mechanism to reduce parameter redundancy by predicting local Gaussian attributes. Recent works like HAC [25] further leverage hash-grid assisted context to achieve extreme compression ratios. Vector-quantization-based schemes also offer an alternative route to compact Gaussian radiance field storage [26,37,38].

However, existing compression schemes are mostly designed for object-level or controlled indoor scenarios and exhibit limitations when applied to complex large-scale urban environments [23,24,25,27]. General methods typically employ a globally uniform compression strategy, ignoring the extreme imbalance in texture frequency distribution across urban surfaces [23,25,35]. In urban scenes, large low-frequency diffuse areas such as road pavements and walls coexist with high-frequency texture details like vegetation and signage. Retaining full-precision high-order Spherical Harmonics coefficients for diffuse regions results in substantial waste of storage and bandwidth, whereas a uniform high compression ratio would irreversibly obliterate fine details in textured areas [23,24,25,26]. To address this issue, we construct a Spectral-Aware Adaptive Compression strategy. By analyzing the energy spectrum distribution of Spherical Harmonics coefficients [7,22], this strategy accurately identifies and prunes redundant parameters in diffuse regions. Combined with Quantization-Aware Fine-Tuning, it effectively alleviates the trade-off between storage efficiency and visual fidelity and enables high-quality reconstruction tailored for city-scale environments.

3. Method

3.1. Overview

StitchGS focuses on balancing high-quality reconstruction of large-scale scenes with lightweight deployment under limited computational resources. Our framework is organized around two primary dimensions: scalable scene construction and resource-efficient representation. In scalable scene construction, we first decouple complex scenes into multiple independent sub-blocks via density-adaptive partitioning [12,13]. To ensure geometric robustness during the optimization of each sub-block, we introduce margin-aware training and manifold-balanced sampling techniques. Addressing the common hard-cut artifacts in block-based reconstruction, we design a stochastic interwoven stitching mechanism. This mechanism utilizes a confidence-driven competitive strategy to automatically optimize the distribution of primitives in overlapping regions, thereby effectively suppressing visual seams. Subsequently, we unify the illumination distribution across the entire scene through global consistency refinement and inject necessary numerical robustness for subsequent compression processes. For resource-efficient representation, we further eliminate information redundancy to reduce the model storage volume. Departing from traditional indiscriminate storage methods, we propose a spectral-aware adaptive compression scheme. This approach identifies texture characteristics in different regions by performing frequency-domain energy spectrum analysis on Spherical Harmonics (SH) coefficients [7,23]. We perform aggressive parameter pruning for low-frequency diffuse regions while employing a mixed-precision storage strategy to preserve high-frequency details [25,26]. This adaptive compression maintains visual fidelity while significantly reducing the model storage size, providing a viable path for the lightweight transmission and distribution of city-scale scenes.

3.2. Scalable Scene Construction

The primary objective of this phase is to construct a geometrically complete and robustly trained city-scale scene within limited memory constraints.

3.2.1. Scene Partitioning and Margin-Aware Training

To fit city-scale scenes into single-GPU memory, we partition the scene into N sub-blocks

B = {B_{k}}_{k = 1}^{N}

following density-adaptive recursive splitting [12,13]. To avoid boundary under-constraint during independent optimization, we expand the original Oriented Bounding Box (OBB) domain

Ω_{k}

into an augmented training domain

Ω_{k}^{'}

with a margin factor, as shown in Figure 2a:

Ω_{k}^{'} = T_{expand} (Ω_{k}; λ_{margin}),

(1)

We assign training views using a visibility-driven criterion from sparse SfM. Let

P_{k}

be sparse points inside

Ω_{k}^{'}

, and

P_{v}

be sparse points observed in view v. We include a view if it satisfies either view-level coverage or block-level coverage:

V_{k} = \{v \in V_{total} | \frac{| P_{v} \cap P_{k} |}{| P_{v} |} \geq τ_{view} \lor \frac{| P_{v} \cap P_{k} |}{| P_{k} |} \geq τ_{block}\} .

(2)

This rule preserves informative long-tail views while preventing blocks from being under-constrained near boundaries.

3.2.2. Manifold-Balanced Sampling

To mitigate view imbalance, we sample views by enforcing near-uniform coverage on a low-dimensional view manifold. As illustrated in Figure 3, standard uniform sampling leads to overfitting in dense regions; in contrast, our strategy assigns higher sampling weights to sparse views while suppressing redundancy in dense clusters.

For block

B_{k}

, we parameterize each view by

Φ (v) = (θ (v), r (v)),

(3)

where

θ (v)

is the azimuth of the camera center around the block center and

r (v)

is the normalized distance. We discretize

Φ (v)

into bins

b (v)

and perform a low-variance inverse-density schedule:

v \sim RoundRobin ({b : | b | > 0}), v \in b .

(4)

Within each bin, we shuffle views and iterate non-empty bins with a fixed stride to avoid periodic sampling patterns. We optionally mix a small portion of uniform sampling to further stabilize optimization under extreme imbalance.

3.2.3. Stochastic Interwoven Stitching

Independent block training introduces redundant primitives in overlap bands and creates visible seams after concatenation. We propose Stochastic Interwoven Stitching to resolve boundary conflicts by confidence-driven competition. This module corresponds to Figure 2b and is illustrated in Figure 4.

We first classify each primitive by its normalized Oriented Bounding Box (OBB) distance in its parent block, which measures how far the primitive is from the normalized block boundary, as shown in Figure 4a. For a primitive at position

p

in block

B_{i}

, we define

d_{i} (p) = {∥diag {(e_{i})}^{- 1} R_{i}^{⊤} (p - c_{i})∥}_{\infty},

(5)

where

c_{i}

is the block center,

R_{i}

is the local Oriented Bounding Box (OBB) basis, and

e_{i}

is the half-extent. We use three intervals that match Figure 4a:

\{\begin{matrix} core & : d_{i} (p) \leq 1, \\ competitive & : 1 < d_{i} (p) \leq 1.1, \\ pruned & : d_{i} (p) > 1.1, \end{matrix}

(6)

Core primitives are always kept, pruned primitives are discarded, and competitive primitives enter stochastic arbitration.

It should be noted that the OBB distance captures the proximity of a primitive to the block envelope boundary, rather than the exact geometric adjacency on the true scene surface. Therefore, under highly irregular or non-convex topologies, this approximation may lead to transition bands that are either over-expanded or under-covered. Nevertheless, in our block-based reconstruction setting, the OBB distance still provides a stable and computationally efficient approximation of boundary proximity. Its potential inaccuracy is also partially mitigated by the narrow transition-band design, the KDTree-based local competition, and the subsequent global consistency refinement.

For confidence estimation, we use a scale-normalized opacity score that correlates with local reconstruction quality. Let

α_{p}

be opacity and

s_{p}

be the log-scale vector. We define

Q (p) = \frac{α_{p}}{mean (\exp (s_{p})) + ϵ},

(7)

For a competitive primitive

p

from block

B_{i}

, we compare its self score to the strongest neighboring competitor, as shown in Figure 4b. We use a linear distance decay in the overlap band,

w (d) = \max (0, 1.1 - d),

(8)

The self score is

S_{self} (p) = w (d_{i} (p)) \cdot Q (p),

(9)

and the competitor score for a block

B_{j}

is computed by local consensus using a KDTree, a spatial data structure for local neighbor queries:

S_{j} (p) = w (d_{j} (p)) \cdot mean (\{Q (q) ∣ q \in N_{K} (p, B_{j})\}),

(10)

We keep

p

with probability

P_{keep} (p) = σ (η [S_{self} (p) - \max_{j \neq i} S_{j} (p)]) .

(11)

where

η

controls boundary sharpness.

Finally, we sample

ξ \sim Bernoulli (P_{keep} (p))

to decide retention, as depicted in Figure 4c. This stochastic selection produces an interleaved boundary distribution instead of a hard cut, yielding sub-pixel smooth transitions in the merged model, as shown in Figure 4d.

3.2.4. Global Consistency Refinement

Stitching removes geometric seams, but independently trained blocks may still show global appearance bias. We apply a global refinement step, as shown in Figure 2c. This stage aligns scene-wide appearance and improves robustness to low-precision storage.

Pre-cleaning. Before finetuning, we remove primitives that are visually negligible or numerically unstable. We first filter by an opacity logit threshold and a scale bound:

$M_{1} = \{p \in M | ℓ_{α} (p) \geq τ_{ℓ} \land {∥\exp (s (p))∥}_{\infty} \leq τ_{σ}\},$

(12)

Here,

ℓ_{α} (p)

denotes the opacity logit, i.e., the pre-sigmoid opacity parameter, and

\exp (s (p))

denotes the physical scale. We then remove rare scale outliers using a percentile-based rule. Let

z (p) = \prod \exp (s (p))

be the scale volume proxy. We compute a robust upper bound

τ_{q} = Quantile (z, 0.999)

over

p \in M_{1}

and discard primitives with

z (p) > τ_{q}

. If the model still exceeds a memory budget, we apply random downsampling as a safety fallback. We use the same preprocessing rule for all scenes to keep the pipeline consistent.

Geometry-frozen quantization-aware finetuning. We freeze geometric parameters $Θ_{geo} = {μ, q, s}$ and only optimize appearance parameters $Θ_{app} = {α, f_{dc}, f_{rest}}$ to avoid reintroducing boundary drift after stitching. To prepare for 8-bit storage, we insert a fake-quantization operator, i.e., a simulated low-bit quantization used during finetuning while gradients are still propagated, on residual Spherical Harmonics (SH),

${\tilde{f}}_{rest} = Q_{8} (f_{rest}),$

(13)

and minimize the rendering loss over training views with $Θ_{geo}$ fixed:

$\min_{Θ_{app}} E_{v \sim V} [L (R (v; α, f_{dc}, {\tilde{f}}_{rest}, {\bar{Θ}}_{geo}), I_{v})] .$

(14)

where ${\bar{Θ}}_{geo}$ denotes zero gradient flow. We backpropagate through $Q_{8}$ with a straight-through estimator, which passes gradients through the simulated quantization step, so that appearance optimization remains stable under subsequent 8-bit quantization.

3.3. Resource-Efficient Representation

The fine-grained model delivers strong fidelity, but its explicit primitives are costly to store and transmit. We compress the model with Spectral-Aware Adaptive Compression, as shown in Figure 5. This stage has two steps: spectral energy analysis in Figure 5a and mixed-precision storage in Figure 5b.

3.3.1. Spectral Energy Analysis

High-order Spherical Harmonics (SH) residuals often contribute little to diffuse surfaces. We estimate texture activity by a normalized spectral energy score on the residual SH. Let

f_{rest} (p) \in R^{D}

be the residual SH coefficients of primitive p, where

D = 45

for degree-3 SH without DC. We group coefficients by SH order

l \in {1, 2, 3}

and denote the corresponding index set as

I_{l}

. We define the order-wise energy

E_{l} (p) = \frac{1}{| I_{l} |} {∥f_{rest}^{I_{l}} (p)∥}_{1},

(15)

and the aggregated high-frequency energy

E_{hf} (p) = \sum_{l = 1}^{3} ω_{l} E_{l} (p),

(16)

where

ω_{l}

controls the relative importance of each order. In our implementation, we use a uniform weighting, which reduces to the mean absolute magnitude over all residual dimensions. Here, uniform weighting is used to construct a simple and stable high-frequency residual activity score for spectral gating. For the learned residual SH representation used in this work, this aggregation without additional priors is more favorable for maintaining cross-scene consistency. Since the gate is applied to a normalized residual-SH energy score under a fixed degree-3 parameterization, the magnitude range of

E_{hf}

remains relatively comparable across scenes. This makes it practical to use a shared threshold across datasets.

Given a threshold

γ

, we sparsify residuals by a spectral gate

{\tilde{f}}_{rest} (p) = f_{rest} (p) \cdot I (E_{hf} (p) \geq γ) .

(17)

This matches Figure 5a. It removes redundant high-order terms in diffuse regions while keeping view-dependent effects in glossy areas.

3.3.2. Mixed-Precision Storage

After sparsification, we store attributes with different precision based on sensitivity. We keep geometry and density in floating point and quantize only the residual SH. In our implementation, we store

x

in FP32, and store

f_{dc}

, opacity, rotation, and scale in FP16. We then compress

{\tilde{f}}_{rest}

with 8-bit affine quantization, i.e., a linear quantization scheme with scale and offset, and store the dequantization parameters, as in Figure 5b.

We compute per-dimension statistics over all primitives, including the minimum vector

m

and range vector

r

for

{\tilde{f}}_{rest}

. We store

m

and

r

as the dequantization parameters. Given a primitive p, we quantize

{\tilde{f}}_{rest} (p)

to an 8-bit code and reconstruct it by the corresponding affine map.

The quantized reconstruction can be written as a projection onto an 8-bit affine lattice:

{\hat{f}}_{rest} (p) = m + \frac{r}{255} ⊙ Π_{[0, 255]} (round (255 \cdot \frac{{\tilde{f}}_{rest} (p) - m}{r})) .

(18)

where

Π_{[0, 255]}

denotes element-wise clipping to

[0, 255]

. Sparsification produces many zero residuals, so the resulting integer stream has low entropy and compresses well with standard lossless packing.

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

We conduct comprehensive evaluations on three widely used benchmarks spanning six scenes. These scenes cover a broad range of modalities, from real-world drone captures to synthetic cityscapes. Specifically, we use the Building and Rubble scenes from the Mill-19 dataset [16]. These high-resolution industrial scenes are particularly challenging for frequency analysis, as they combine fine-scale debris with large, low-texture planar regions. We also include the Residence and Sci-Art scenes from UrbanScene3D [40]. These real-world reconstructions exhibit complex geometry and exposure variations, demanding strong geometric consistency. To further assess performance at the city scale, we evaluate on the Aerial and Street subsets of MatrixCity [15]. This synthetic benchmark provides kilometer-scale ground truth for rigorous evaluation. In this work, Mill-19 [16] and UrbanScene3D [40] serve as real-world evidence, while MatrixCity [15] is used as a complementary city-scale benchmark with controlled ground truth. Following standard protocols [11,12], we adopt the same data partitioning as Mega-NeRF [16] and uniformly downsample all images by

4 \times

to balance computational efficiency with memory constraints.

4.1.2. Implementation Details

We use multi-view images as input and run COLMAP (version 3.9.1) to perform SfM, which provides camera intrinsics, camera poses, and a sparse SfM point cloud [3,41]. All training and evaluation are conducted in the SfM coordinate system, and the scene scale and alignment are determined by the SfM solution. Unless otherwise specified, the SfM point cloud is used only for pose estimation and geometric anchoring, and it is not used as training supervision.

All experiments are conducted on a single NVIDIA A10 GPU (NVIDIA Corporation, Santa Clara, CA, USA). To ensure a fair comparison, BlockGaussian [13] and StitchGS use the same per-block training budget. Each sub-block is optimized for 40,000 iterations. StitchGS uses density-adaptive recursive partitioning. The number of sub-blocks is 7 for Building and 4 for Rubble in Mill-19 [16], 7 for Residence and 7 for Sci-Art in UrbanScene3D [40], and 20 for Aerial and 4 for Street in MatrixCity [15].

Block fusion is performed with stochastic interwoven stitching. KDTree neighbor queries are used during fusion [39]. The neighborhood size K and the boundary sharpness

η

are set per scene based on repeated preliminary trials, with

K \in {2, 5, 11}

across our experiments. The parameter

η

corresponds to the sharpness parameter in our implementation and controls the steepness of the retention probability. In practice, K controls the spatial support of local competition. A smaller K is suitable when the overlap region is narrow and the boundary structure is simple, because it helps avoid over-smoothing. A larger K is suitable when the overlap region is more complex and repeated primitives are more common near the boundary, because it provides stronger local evidence. The parameter

η

controls how soft or sharp the competition is. A smaller

η

gives a smoother transition, while a larger

η

gives a more decisive selection. These settings are empirical values summarized from repeated preliminary trials, rather than universal optimal choices. After stitching, global consistency refinement runs for 7000 iterations. Geometric parameters are frozen, and only appearance parameters are optimized.

Quantization-aware finetuning is enabled in the later stage of this refinement. A short warmup is followed by 8-bit fake quantization on residual spherical harmonics coefficients and backpropagation with a straight-through estimator. Before refinement, pre-cleaning removes visually negligible or numerically unstable primitives by filtering low-opacity and extreme-scale cases, and rare scale outliers are removed with a percentile rule by discarding primitives above the 99.9th percentile of a scale volume proxy. For spectral-aware adaptive compression, a fixed threshold

γ = 0.02

is used to gate high-order residual spherical harmonics in diffuse regions. This value is an empirical threshold summarized from repeated preliminary trials. Because the gate operates on the same normalized residual-SH energy definition across datasets, we use it as a shared setting for Mill-19 [16], UrbanScene3D [40], and MatrixCity [15]. The remaining residual coefficients are stored with 8-bit affine quantization together with dequantization statistics. Mixed-precision storage, i.e., storing different attributes with different numeric precision, is applied by sensitivity. Positions are stored in 32-bit floating point (FP32) to reduce drift under large-scale coordinates. Opacity, rotation, scale, and spherical harmonics DC terms are stored in 16-bit floating point (FP16), while high-order residual spherical harmonics coefficients are stored in 8-bit.

4.2. Comparisons on Rendering Quality

Unless otherwise specified, all quantitative results and rendered visualizations are reported on the final deployable model, obtained by applying spectral-aware adaptive compression and mixed-precision storage after cross-block stitching and global consistency refinement.

4.2.1. UrbanScene3D

Results on UrbanScene3D are reported in Table 1, Figure 6 and Figure 7. On Residence, StitchGS matches BlockGaussian [13] in PSNR, which indicates comparable reconstruction accuracy after compression. SSIM and LPIPS are lower than those of BlockGaussian, which suggests that compression affects local texture and perceptual details more than global structure. On Sci-Art, the metric gap becomes larger, which indicates higher sensitivity to aggressive pruning in regions with richer view-dependent effects. This behavior is consistent with the spectral gate, since higher-order residuals are more important in glossy or high-frequency regions. Visual comparisons highlight the key advantage of StitchGS at block boundaries. In Figure 6, the baseline shows boundary breaks and cross-block appearance inconsistency in the marked regions. StitchGS produces smoother transitions and a more consistent appearance across blocks. Figure 7 provides a direct view of the merged blocks. BlockGaussian leaves visible seams near the dashed boundaries. StitchGS turns the boundary into a soft transition zone through stochastic interwoven stitching, which reduces discontinuities after merging. Deployment efficiency on this dataset is also reported in Table 2. Storage is reduced consistently relative to BlockGaussian, with moderate changes in the point count.

4.2.2. Mill-19

Table 3 reports results on Mill-19 [16]. StitchGS stays competitive on both scenes. In Building, StitchGS improves PSNR over BlockGaussian [13], which is consistent with fewer boundary-induced errors after stitching. PSNR mainly reflects reconstruction accuracy, and the gain suggests that boundary supervision and overlap arbitration reduce cross-block depth inconsistencies. In Rubble, StitchGS reaches the highest PSNR, which indicates stable fusion under cluttered geometry. SSIM remains close to strong baselines, which suggests that global structure is preserved after compression. LPIPS is slightly higher than the strongest baseline in both scenes, which matches a mild loss of very fine textures under the compact representation. Figure 8 supports these observations. In the highlighted regions, BlockGaussian shows boundary-aligned depth inconsistencies and local texture degradation. StitchGS yields smoother transitions across blocks and more coherent structures that are closer to the ground truth. Deployment efficiency is summarized in Table 4. Relative to BlockGaussian, StitchGS reduces storage substantially on both scenes, while keeping the point count at a similar scale.

4.2.3. MatrixCity

Table 5 reports results on MatrixCity-Aerial and MatrixCity-Street [15]. StitchGS stays competitive with implicit baselines, which supports the advantage of explicit 3DGS representations on kilometer-scale scenes. Compared with BlockGaussian [13], PSNR drops on both splits, and the gap is larger on Street. PSNR penalizes small pixel-level errors, so aggressive pruning and low-bit residual storage can reduce peak numerical accuracy on dense urban details. SSIM and LPIPS remain near the top among methods that report on both splits, which indicates that the main structures and overall perceptual quality are largely preserved. This suggests that most losses come from high-frequency details rather than large-scale geometry or layout. BlockGaussian* is our re-implementation and is excluded from ranking. It is included to reflect possible training and engineering differences. Figure 9 shows qualitative results for the same compressed model. The baseline shows stronger boundary artifacts in the highlighted regions. StitchGS reduces cross-block discontinuities and keeps clear building contours and street layout. Some fine details remain challenging under aggressive compression. Deployment efficiency is reported in Table 6. Storage is reduced substantially on both splits, which improves transmission and loading for large scenes.

4.3. Comparisons on Rendering Efficiency

Table 7 reports the rendering speed comparison on six scenes. Overall, StitchGS achieves an average FPS of 44.18, which is higher than the 40.45 FPS of BlockGaussian. This shows that our method maintains strong real-time rendering capability while significantly reducing storage cost.

On Mill-19 [16], StitchGS runs substantially faster on Building, while it is slightly slower on Rubble. This suggests that the rendering benefit is more evident in scenes with relatively more regular structures, whereas the gain can be weakened when the scene contains denser clutter and more complex boundary regions. On UrbanScene3D [40], StitchGS is slightly faster on Residence but slower on Sci-Art. This indicates that rendering efficiency is influenced not only by compression, but also by scene complexity and the parameter distribution after boundary stitching and global appearance refinement. On MatrixCity [15], StitchGS is consistently faster on both Aerial and Street, with a particularly clear improvement on Street. This shows that the proposed stitching and compression strategy is also effective for large-scale scenes with more regular layouts.

Overall, these results indicate that StitchGS improves the overall deployment efficiency without sacrificing real-time rendering performance.

4.4. Ablation Study

To validate the effectiveness of the individual components in StitchGS, we conduct a comprehensive ablation study on Mill-19 [16] and UrbanScene3D [40]. The quantitative results are summarized in Table 8 and Table 9, and the stepwise visual improvements are illustrated in Figure 10. We start from the Baseline (block-wise concatenation) and progressively enable Stochastic Interwoven Stitching (Merge), Global Consistency Refinement (GCR), and quantized compression (Quant).

4.4.1. Effectiveness of Stochastic Interwoven Stitching (Merge)

We first evaluate the contribution of the proposed stitching mechanism. As shown in Table 8 and Table 9, enabling Merge improves rendering quality in most scenes (e.g., Rubble increases from 25.11 dB to 25.89 dB in PSNR). Meanwhile, we observe that storage may slightly increase at this stage (e.g., Building: 2508.80 MB → 2560.00 MB), which is expected because soft fusion retains primitives in the overlap competition regions to ensure geometric continuity. However, in challenging cases with complex geometry and stronger cross-block appearance biases (e.g., Sci-Art), Merge alone can be insufficient and may lead to a temporary quality drop (23.34 dB → 22.85 dB), because Merge mainly resolves geometric conflicts and boundary redundancy in the overlap regions, but does not directly align the appearance distributions across different sub-blocks. As a result, cleaner boundary geometry can make the block-wise appearance mismatch more visible, motivating an additional global refinement step.

4.4.2. Effectiveness of Global Consistency Refinement (GCR)

To address residual inconsistencies introduced after stitching, we apply the Global Consistency Refinement module by freezing geometry and jointly optimizing scene-wide appearance. This step is complementary to Merge: while Merge improves boundary geometry, GCR restores appearance consistency across sub-blocks. This step proves crucial for recovering performance in complex scenes: for Sci-Art, PSNR rebounds from 22.85 dB to 24.32 dB after applying GCR. Similarly, Residence benefits substantially, improving from 20.80 dB to 22.84 dB. These results confirm that globally aligning illumination and color distributions across sub-blocks effectively corrects local appearance biases that become more visible after merging and improves overall consistency.

4.4.3. Robustness of Quantized Compression (Quant)

Finally, we evaluate the impact of spectral-aware adaptive compression with quantization. Comparing the full-precision model (Merge+GCR) with the final quantized version (Merge+GCR+Quant), we observe a generally small degradation in rendering metrics, although it can be more noticeable in scenes with complex geometry and high-frequency details. For instance, Residence shows only marginal changes in SSIM (0.786 → 0.783), while Sci-Art exhibits a moderate drop. Despite these trade-offs, the storage benefits are substantial: the model size is significantly reduced (e.g., Residence: 2344.96 MB → 747.73 MB), demonstrating that our quantization-aware strategy supports lightweight deployment while maintaining overall fidelity.

4.5. Training Behavior of Balanced Sampling

To further examine the effect of balanced sampling, we analyze the training curves on representative blocks, including the loss and PSNR trends. The results show that balanced sampling leads to smoother optimization and slightly better convergence in the middle and later stages.

Figure 11 shows the training curves on Residence Block0. Balanced sampling shows a more stable training behavior overall. In the 10k–20k iteration range, its average PSNR is slightly higher than random sampling (24.46 dB vs. 24.24 dB). At the final 40k iteration, balanced sampling still gives a slightly higher PSNR (27.40 dB vs. 27.19 dB), while the final loss is also slightly lower (0.039 vs. 0.044). These results indicate that balanced sampling helps obtain more stable optimization in the middle and later stages.

Figure 12 shows the training curves on Sci-Art Block0. A similar trend can also be observed, and the difference is more visible in the middle stage. For example, at 15k iterations, the PSNR of balanced sampling reaches 39.17 dB, compared with 37.88 dB for random sampling. At 20k iterations, the two values are 38.29 dB and 37.51 dB, respectively. Together with the training curves, these results show that balanced sampling provides a smoother optimization process and slightly better convergence on this representative block.

Overall, balanced sampling helps reduce training fluctuation caused by uneven view distribution. It makes the optimization process smoother and gives slightly better convergence in representative blocks.

4.6. Overhead of Stochastic Interwoven Stitching

Table 10 reports the competitive primitive scale, wall-clock time, and peak VRAM overhead of stochastic interwoven stitching (SIS) on six scenes. Overall, except for the Street scene of MatrixCity [15], the number of competitive primitives reaches the million level in all scenes. This indicates that SIS indeed operates on large cross-block competitive regions in city-scale reconstruction. Nevertheless, the added peak VRAM overhead remains low in all cases and stays below 0.22 GB, showing that this process does not introduce a noticeable GPU memory burden.

From the result distribution, the SIS time remains within a relatively short range on the Building and Rubble scenes of Mill-19 [16] and the Residence and Sci-Art scenes of UrbanScene3D [40], indicating that the additional overhead remains manageable in real scenes even when the competitive region reaches the million scale. In contrast, the Street scene of MatrixCity [15] has the lowest overhead because its competitive region is much smaller. The Aerial scene of MatrixCity [15], however, shows a much larger SIS time. Although its competitive primitive scale is close to that of other large scenes, it contains more blocks, which increases the amount of cross-block querying. This suggests that the SIS time overhead is influenced not only by the scale of the competitive region, but also by the number of blocks and the resulting cross-block query cost.

Overall, the overhead of SIS is mainly reflected in one-time offline post-processing rather than iterative training cost or online rendering cost during deployment. At the same time, its added peak VRAM remains low across all scenes, indicating that this mechanism has good scalability in the current city-scale block-based reconstruction setting.

5. Seam-Aware Evaluation on Overlap Bands

Block-based training often causes discontinuities near block interfaces. Full-image metrics may miss these artifacts. The seam region can be small in terms of pixel count. We therefore report seam-aware metrics on overlap bands.

5.1. Protocol and Metrics

Overlap band definition. Each block has an oriented bounding box. We use the normalized Oriented Bounding Box (OBB) distance from Section 3.2 to locate boundary-adjacent regions. We define the overlap band as primitives with normalized Oriented Bounding Box (OBB) distance in $(1.0, 1.1]$ . We then build a pixel-level seam mask from rendering contributions. For each pixel, we rank contributing primitives by their compositing weight. We use the top-k contributors and set $k = 2$ in all experiments. If any of the top-k primitives are from the overlap band, we mark the pixel as a seam and include it in $S$ . All seam-aware metrics below are computed on $S$ .
Seam_PSNR. We compute PSNR only on seam-mask pixels. We normalize RGB values to $[0, 1]$ , so the peak value is 1.

$Seam_PSNR = 10 \log_{10} (\frac{1}{{MSE}_{S}}), {MSE}_{S} = \frac{1}{| S |} \sum_{x \in S} {∥I (x) - \hat{I} (x)∥}_{2}^{2},$

(19)

where $S$ is the seam mask, I is the ground truth image, and $\hat{I}$ is the rendered image. A higher Seam_PSNR indicates better reconstruction fidelity within the seam region.
BPD_L1. BPD_L1 measures photometric discrepancy across adjacent blocks inside the seam mask. For each seam pixel $x$ , we aggregate the compositing weights per block. We select the two blocks with the largest aggregated weights and denote them as a and b. Let ${\hat{I}}_{a} (x)$ and ${\hat{I}}_{b} (x)$ be the colors rendered using only primitives from block a and block b. We define

$BPD_L1 = \frac{1}{| S |} \sum_{x \in S} {∥{\hat{I}}_{a} (x) - {\hat{I}}_{b} (x)∥}_{1},$

(20)

A lower BPD_L1 indicates a smoother photometric transition between adjacent blocks.

nBGJ. nBGJ is a structure jump proxy on the overlap band from BlockGaussian [13]. Let ${\hat{D}}_{a} (x)$ and ${\hat{D}}_{b} (x)$ be per-block depth maps rendered using only primitives from blocks a and b. We compute a gradient-jump score and normalize it by the average depth gradient magnitude:

$nBGJ = \frac{\frac{1}{| S |} \sum_{x \in S} {∥\nabla {\hat{D}}_{a} (x) - \nabla {\hat{D}}_{b} (x)∥}_{1}}{\frac{1}{| S |} \sum_{x \in S} {∥\nabla \hat{D} (x)∥}_{1} + ϵ},$

(21)

where $\hat{D}$ is the depth rendered from the final fused model. ∇ is the spatial gradient computed by finite differences. We set $ϵ = 10^{- 6}$ for numerical stability. nBGJ is a non-negative structural discontinuity metric. Values closer to zero indicate more consistent depth gradients and smoother structural transitions across adjacent blocks, while larger values indicate more pronounced boundary discontinuities.
Seam_Coverage. Seam_Coverage reports the pixel ratio of the seam mask:

$Seam_Coverage = \frac{| S |}{| Ω |} .$

(22)

Here

| Ω |

is the total number of pixels over all evaluated test images. This value helps interpret seam scale and scene difficulty. Seam_Coverage does not directly measure reconstruction quality, but it helps characterize the scale of the seam region and the difficulty of the evaluation setting.

5.2. Results on Mill-19 and UrbanScene3D

Table 11 and Table 12 report seam-aware results on Mill-19 [16] and UrbanScene3D [40]. We compare BlockGaussian* and the final deployable StitchGS model. Higher Seam_PSNR is better. Lower BPD_L1 is better. Lower nBGJ is better.

On Mill-19 [16], StitchGS improves seam quality on both scenes. Seam_PSNR increases and BPD_L1 decreases in Building and Rubble. Rubble shows a large Seam_PSNR gain and a clear BPD_L1 drop. This indicates fewer visible seams under cluttered geometry [16]. For nBGJ, Rubble also improves. The building is close to the baseline and shows a small regression. This suggests that depth-gradient jumps can be more sensitive than photometric seams.

On UrbanScene3D [40], StitchGS reduces BPD_L1 on both scenes. Seam_PSNR improves on Sci-Art. Residence stays nearly unchanged in Seam_PSNR. This indicates mild seam artifacts under this split. Sci-Art shows an nBGJ regression. This highlights a trade-off between photometric fusion and depth-gradient smoothness in some scenes. Seam_Coverage varies across scenes. This matches the different boundary extents and seam visibility.

5.3. Seed Stability on Overlap-Band Seam Metrics

We further evaluate the variability introduced by stochastic competition in the overlap band on the Building scene. We run the full pipeline three times with different random seeds and report the seam-aware metrics in Table 13 as mean ± std. The results show that the variations of Seam_PSNR, BPD_L1, and nBGJ are all small, indicating that the stochastic arbitration process has little influence on the final seam-aware evaluation. Seam_Coverage remains unchanged across different seeds, which is consistent with the fixed overlap-band range and the stable top-k mask rule.

6. Discussion

The experimental results show that StitchGS mainly addresses two practical issues in city-scale block-based 3D Gaussian Splatting: boundary continuity after block merging and the storage overhead of explicit representations. Unlike existing methods that mainly emphasize scalable block-wise training or generic compression, StitchGS is more oriented toward the merge-then-deploy setting. Stochastic interwoven stitching and global consistency refinement improve visual continuity across block boundaries, while spectral-aware adaptive compression reduces redundant storage with limited impact on rendering quality. As a result, the method is better suited to the transmission, loading, and deployment of city-scale 3D scenes under resource-constrained conditions.

Limitations

Although StitchGS shows promising performance in seamless reconstruction and lightweight deployment for city-scale scenes, it still has several limitations. First, the current method is mainly designed for static scenes. When dynamic factors such as vehicles, pedestrians, or swaying vegetation are present, duplicate primitives or appearance inconsistencies may still occur near block boundaries. Second, the spectral compression module uses a fixed threshold

γ

. While this design keeps the pipeline simple and consistent, its adaptability is still limited in regions with strong reflections or large differences in texture distribution. In addition, the soft transition zone defined by the Oriented Bounding Box (OBB) distance is still an approximate description for complex non-convex structures. The method also depends to some extent on the quality of the Structure-from-Motion (SfM) initialization and the scale of the overlap region, which may introduce stability issues and extra overhead under extremely sparse point clouds or very large overlap regions.

7. Conclusions

This paper presented StitchGS for city-scale 3D Gaussian Splatting. StitchGS targets two structural bottlenecks: boundary discontinuities introduced by block partitioning and storage growth caused by explicit high-dimensional parameters. On the reconstruction side, we turn the overlap band from hard cutting into a competitive soft transition zone. Our confidence-driven stochastic interwoven stitching performs sub-pixel fusion near boundaries. We then freeze geometry and refine appearance globally. This step aligns illumination and color across the full scene and removes block-wise bias from independent training. As a result, we obtain a seamless merged model that is ready for deployment. On the compression side, we revisit redundancy in spherical harmonics from a spectral perspective. We apply energy-spectrum gating to suppress high-frequency residuals in diffuse regions. We further store attributes according to their sensitivity using mixed precision and 8-bit affine quantization. This enables adaptive allocation of capacity: fewer bits in low-texture areas and better fidelity in detailed regions. Overall, StitchGS reduces model size while preserving practical rendering quality. It provides a feasible path for efficient transmission, fast loading, and deployment under limited resources.

Future work will proceed in three directions. First, we will extend StitchGS to dynamic scenes to improve its ability to model time-varying targets and complex motion regions. Second, we will incorporate geometric priors such as LiDAR to strengthen structural constraints and cross-modal representation in large-scale scenes. Third, we will explore a more unified end-to-end optimization strategy to further reduce the error accumulation and parameter dependency introduced by the current staged pipeline.

Author Contributions

J.S. supervised the project, guided the overall research direction, and contributed substantially to the critical review, revision, and polishing of the manuscript. S.P. led the study and completed the majority of the work, including conceptualization, methodology, software development, experiments, data analysis, visualization, and original draft preparation. H.Z. assisted with software implementation and helped validate the experimental settings. S.C. supported the experimental evaluation and improved the presentation of figures and tables. Y.H. contributed to results verification and manuscript refinement. Y.Z. assisted with supplementary experiments and formatting checks. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Xiamen, China, under Grant 3502Z202373036; in part by the Open Competition for Innovative Projects of Xiamen under Grant 3502Z20251012; in part by the National Natural Science Foundation of China under Grant 42371457; and in part by the Natural Science Foundation of Fujian Province, China, under Grant 2025J01345.

Data Availability Statement

The datasets used in this study are publicly available. We use Mill-19 from Mega-NeRF [16], UrbanScene3D [40], and MatrixCity [15]. All data were accessed according to the corresponding dataset licenses. The code related to this work is publicly available at https://github.com/alinj4253-droid/StitchGS (accessed on 2 April 2026).

Conflicts of Interest

The authors declare no conflict of interest.

References

Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Szeliski, R. Computer Vision: Algorithms and Applications, 2nd ed.; Springer: Cham, Switzerland, 2022. [Google Scholar]
Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Triggs, B.; McLauchlan, P.F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle Adjustment—A Modern Synthesis. In Proceedings of the Vision Algorithms: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2000; pp. 298–372. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. (TOG) 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. (TOG) 2022, 41, 1–15. [Google Scholar] [CrossRef]
Liu, Y.; Luo, C.; Fan, L.; Wang, N.; Peng, J.; Zhang, Z. CityGaussian: Real-Time High-Quality Large-Scale Scene Rendering with Gaussians. In Proceedings of the Computer Vision—ECCV 2024: 18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part XVI; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2024; Volume 15074, pp. 265–282. [Google Scholar] [CrossRef]
Lin, J.; Li, Z.; Tang, X.; Liu, J.; Liu, S.; Liu, J.; Lu, Y.; Wu, X.; Xu, S.; Yan, Y.; et al. VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; IEEE: New York, NY, USA, 2024; pp. 5166–5175. [Google Scholar] [CrossRef]
Wu, Y.; Qi, Z.; Shi, Z.; Zou, Z. BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting. arXiv 2025, arXiv:2504.09048. [Google Scholar]
Tancik, M.; Casser, V.; Yan, X.; Pradhan, S.; Mildenhall, B.; Srinivasan, P.P.; Barron, J.T.; Kretzschmar, H. Block-NeRF: Scalable Large Scene Neural View Synthesis. In Proceedings of the CVPR, New Orleans, LA, USA, 19–23 June 2022. [Google Scholar]
Li, Y.; Jiang, L.; Xu, L.; Xiangli, Y.; Wang, Z.; Lin, D.; Dai, B. MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond. In Proceedings of the ICCV, Paris, France, 2–3 October 2023. [Google Scholar]
Turki, H.; Ramanan, D.; Satyanarayanan, M. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs. In Proceedings of the CVPR, New Orleans, LA, USA, 19–23 June 2022. [Google Scholar]
Chen, Y.; Lee, G.H. DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus. Adv. Neural Inf. Process. Syst. 2024, 37, 34487–34512. [Google Scholar]
Colomina, I.; Molina, P. Unmanned Aerial Systems for Photogrammetry and Remote Sensing: A Review. ISPRS J. Photogramm. Remote Sens. 2014, 92, 79–97. [Google Scholar] [CrossRef]
Westoby, M.J.; Brasington, J.; Glasser, N.F.; Hambrey, M.J.; Reynolds, J.M. Structure-from-Motion Photogrammetry: A Low-Cost, Effective Tool for Geoscience Applications. Geomorphology 2012, 179, 300–314. [Google Scholar] [CrossRef]
Nex, F.; Remondino, F. UAV for 3D Mapping Applications: A Review. Appl. Geomat. 2014, 6, 1–15. [Google Scholar] [CrossRef]
Biljecki, F.; Stoter, J.; Ledoux, H.; Zlatanova, S.; Çöltekin, A. Applications of 3D City Models: State of the Art Review. ISPRS Int. J. Geo-Inf. 2015, 4, 2842–2889. [Google Scholar] [CrossRef]
Green, R. Spherical harmonic lighting: The gritty details. In Proceedings of the Game Developers Conference (GDC), San Jose, CA, USA, 4–8 March 2003. [Google Scholar]
Fan, Z.; Zhong, C.; Cui, Y.; Zhang, Y.; Wang, Z.; Yu, X. LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS. Adv. Neural Inf. Process. Syst. 2024, 37, 140138–140158. [Google Scholar]
Niedermayr, S.; Stumpfegger, J.; Westermann, R. Compressed 3D Gaussian Splatting for Accelerated Effective Rendering. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Chen, Y.; Wu, Q.; Lin, W.; Harandi, M.; Cai, J. HAC: Hash-Grid Assisted Context for 3D Gaussian Splatting Compression. In Proceedings of the ECCV, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Navaneet, K.L.; Pourahmadi Meibodi, M.; Abbasi Koohpayegani, S.; Pirsiavash, H. Smaller and Faster Gaussian Splatting with Vector Quantization. In Proceedings of the ECCV, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Lu, T.; Yu, M.; Xu, L.; Xiangli, Y.; Wang, L.; Lin, D.; Dai, B. Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Mi, Z.; Xu, D. Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-Scale Neural Radiance Fields. In Proceedings of the ICLR, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Xu, L.; Xiangli, Y.; Peng, S.; Pan, X.; Zhao, N.; Theobalt, C.; Dai, B.; Lin, D. Grid-guided Neural Radiance Fields for Large Urban Scenes. In Proceedings of the CVPR, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. TensoRF: Tensorial Radiance Fields. In Proceedings of the ECCV, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
Fridovich-Keil, S.; Meanti, G.; Warburg, F.R.; Recht, B.; Kanazawa, A. K-Planes: Explicit Radiance Fields in Space, Time, and Appearance. In Proceedings of the CVPR, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
Kerbl, B.; Meuleman, A.; Kopanas, G.; Wimmer, M.; Lanvin, A.; Drettakis, G. A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets. ACM Trans. Graph. 2024, 43, 1–15. [Google Scholar] [CrossRef]
Luebke, D.; Reddy, M.; Cohen, J.D.; Varshney, A.; Watson, B.; Huebner, R. Level of Detail for 3D Graphics; Morgan Kaufmann: Burlington, MA, USA, 2003. [Google Scholar]
Girish, S.; Gupta, K.; Shrivastava, A. EAGLES: Efficient Accelerated 3D Gaussians with Lightweight Encodings. In Proceedings of the ECCV, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Lee, J.C.; Rho, D.; Sun, X.; Ko, J.H.; Park, E. Compact 3D Gaussian Representation for Radiance Field. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
Cheng, K.; Long, X.; Yang, K.; Yao, Y.; Yin, W.; Ma, Y.; Wang, W.; Chen, X. GaussianPro: 3D Gaussian Splatting with Progressive Propagation. In Proceedings of the ICML, Vienna, Austria, 21–27 July 2024. [Google Scholar]
Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep Learning with Limited Numerical Precision. In Proceedings of the ICML, Lille, France, 6–11 July 2015. [Google Scholar]
Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar] [CrossRef]
Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
Lin, L.; Liu, Y.; Hu, Y.; Yan, X.; Xie, K.; Huang, H. UrbanScene3D: A Large Scale Urban Scene Dataset and Simulator. In Proceedings of the CVPR, New Orleans, LA, USA, 19–23 June 2022. [Google Scholar]
Schönberger, J.L.; Zheng, E.; Frahm, J.M.; Pollefeys, M. Pixelwise View Selection for Unstructured Multi-View Stereo. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]

Figure 1. Visual comparison of StitchGS and BlockGaussian [13]. Top: the Building scene [16]. Rigid partitioning produces cracks and illumination discontinuities near block borders. Bottom: the Rubble scene [16]. The baseline shows clear seams between blocks. StitchGS reduces boundary artifacts through stochastic interwoven stitching and global consistency refinement. It also reduces storage. In the Building scene, the model size is reduced from 2508 MB to 814 MB, while boundary discontinuities are also visibly reduced.

Figure 2. Overview of the StitchGS training pipeline. (a) Independent sub-blocks are trained with margin-aware supervision and manifold-balanced sampling. (b) Stochastic interwoven stitching resolves boundary conflicts in overlap bands. (c) Global consistency refinement unifies scene-wide appearance with geometry-frozen quantization-aware finetuning, producing a seamless and quantization-robust model.

Figure 3. Schematic of Manifold-Balanced Sampling. The solid box denotes the initial partitioned region, while the dashed box denotes the proportionally expanded training region. We assign higher weights to sparse views (red) and lower weights to dense clusters (blue) to mitigate overfitting.

Figure 4. Schematic of the stochastic interwoven stitching mechanism. The process removes block seams through three cascaded steps. (a) Validity region analysis assigns each primitive to core, competitive, or pruned regions using its normalized Oriented Bounding Box (OBB) distance. (b) Stochastic competitive stitching queries a KDTree [39] to gather neighboring candidates from adjacent blocks and computes a confidence score using a scale-normalized opacity signal. Competitive primitives are kept with a sigmoid probability based on the score margin, which avoids hard cuts. (c) Probabilistic soft-fusion workflow illustrates how primitives in the overlap band are gradually resolved by stochastic arbitration, producing an interleaved boundary distribution. The letters a and b next to the arrows indicate intermediate states after applying steps (a) and (b), respectively. (d) Merged block visualization shows the resulting seamless transition after stitching.

Figure 5. Schematic of resource-efficient representation. (a) Spectral energy analysis identifies diffuse regions where high-order Spherical Harmonics (SH) residuals can be pruned. (b) Mixed-precision storage: positions are stored in FP32, opacity, rotation, scale, and SH DC terms are stored in FP16, and residual SH coefficients are stored as 8-bit affine codes with dequantization parameters.

Figure 6. Qualitative comparisons on UrbanScene3D [40]. Results are shown for Residence and Sci-Art. The red boxes mark areas with block boundary artifacts or fine-detail errors. StitchGS improves boundary continuity and maintains consistent appearance across blocks.

Figure 7. Block merging visualization on UrbanScene3D [40]. We compare BlockGaussian [13] and StitchGS on Residence and Sci-Art. The dashed regions indicate block boundaries. The zoomed views highlight how stitching reduces boundary discontinuities after merging.

Figure 8. Qualitative comparisons on Mill-19 [16]. Results are shown for Building and Rubble. The red boxes highlight typical boundary artifacts and texture degradations. StitchGS reduces block boundary seams while preserving the main structures.

Figure 9. Qualitative results on MatrixCity [15]. The red boxes highlight typical boundary artifacts and texture degradations. StitchGS reduces boundary artifacts and preserves the main structures. Some fine details remain challenging under aggressive compression.

Figure 10. Qualitative results of the ablation study on Mill-19 [16] and UrbanScene3D [40]. From left to right: (a) Baseline (block concatenation), (b) +Merge (Stochastic Interwoven Stitching), (c) +GCR (Global Consistency Refinement), (d) +Quant (our final result with quantization), and (e) Ground Truth (GT).

Figure 11. Training curves on Residence Block0. Balanced sampling gives smoother optimization and slightly better mid-to-late PSNR.

Figure 12. Training curves on Sci-Art Block0. Balanced sampling provides smoother optimization and better mid-stage PSNR.

Table 1. Quantitative comparisons on the UrbanScene3D dataset [40]. Higher values are better for PSNR and SSIM, while lower values are better for LPIPS. BlockGaussian* represents our re-implementation. Best, Second, and Third results are marked.

Method	Residence [40]			Sci-Art [40]
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Mega-NeRF [16]	22.08	0.628	0.401	25.60	0.770	0.312
Switch-NeRF [28]	22.57	0.654	0.352	26.51	0.795	0.271
3D-GS [7]	21.44	0.791	0.232	21.21	0.821	0.245
VastGaussian [12]	21.01	0.669	0.261	22.64	0.761	0.261
Hierarchy-GS [32]	19.97	0.705	0.297	18.28	0.590	0.316
CityGaussian [11]	22.00	0.813	0.211	21.39	0.837	0.230
BlockGaussian [13]	22.63	0.821	0.196	24.69	0.848	0.208
BlockGaussian* [13]	20.54	0.754	0.233	23.34	0.809	0.236
StitchGS (Ours) w/o Quant.	22.84	0.786	0.247	24.32	0.837	0.222
StitchGS (Ours)	22.64	0.783	0.251	23.91	0.827	0.236

Table 2. Efficiency comparison on UrbanScene3D [40]. Lower storage is better. Storage is measured in MB, and points are reported in millions (10⁶). The best storage result is shown in bold.

Method	Residence [40]		Sci-Art [40]
Method	Points	Storage↓	Points	Storage↓
BlockGaussian [13]	11.29	2519.04	4.77	831.76
StitchGS (Ours)	9.92	747.73	6.43	484.35

Table 3. Quantitative comparisons on the Mill-19 dataset [16]. Higher values are better for PSNR and SSIM, while lower values are better for LPIPS. BlockGaussian* represents our re-implementation. Best, Second, and Third results are marked.

Method	Building [16]			Rubble [16]
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Mega-NeRF [16]	20.92	0.547	0.454	24.06	0.553	0.508
Switch-NeRF [28]	21.54	0.579	0.397	24.31	0.562	0.478
3D-GS [7]	20.23	0.735	0.289	25.24	0.755	0.253
VastGaussian [12]	21.80	0.728	0.225	25.20	0.742	0.264
Hierarchy-GS [32]	21.52	0.723	0.297	24.64	0.755	0.284
CityGaussian [11]	21.55	0.778	0.246	25.77	0.813	0.228
BlockGaussian [13]	21.72	0.762	0.222	26.18	0.816	0.213
BlockGaussian* [13]	21.60	0.748	0.226	25.11	0.795	0.237
StitchGS (Ours) w/o Quant.	22.49	0.761	0.258	26.41	0.799	0.243
StitchGS (Ours)	22.47	0.758	0.259	26.21	0.796	0.245

Table 4. Efficiency comparison on Mill-19 [16]. Lower storage is better. Storage is measured in MB, and points are reported in millions (10⁶). The best storage result is shown in bold.

Method	Building [16]		Rubble [16]
Method	Points	Storage↓	Points	Storage↓
BlockGaussian [13]	13.60	2508.80	10.43	2375.68
StitchGS (Ours)	10.81	814.56	11.04	831.42

Table 5. Quantitative comparisons on the MatrixCity dataset [15]. Higher values are better for PSNR and SSIM, while lower values are better for LPIPS. BlockGaussian* is our re-implementation and is excluded from ranking. Best, Second, and Third results are marked.

Method	MatrixCity-Aerial [15]			MatrixCity-Street [15]
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
3D-GS [7]	27.83	0.821	0.229	20.92	0.655	0.624
VastGaussian [12]	28.33	0.835	0.220	-	-	-
CityGaussian [11]	27.46	0.865	0.204	-	-	-
BlockGaussian [13]	29.32	0.908	0.112	25.48	0.821	0.272
BlockGaussian* [13]	29.85	0.927	0.083	24.00	0.778	0.293
StitchGS (Ours)	28.37	0.883	0.132	24.14	0.753	0.324

Table 6. Efficiency comparison on the MatrixCity dataset [15]. Lower storage is better. Storage is measured in MB, and points are reported in millions (10⁶). The best storage result is shown in bold.

Method	MatrixCity-Aerial [15]		MatrixCity-Street [15]
Method	Points	Storage↓	Points	Storage↓
BlockGaussian [13]	13.57	3205.12	1.98	468.22
StitchGS (Ours)	10.75	809.93	1.86	139.78

Table 7. Rendering speed comparison on six scenes. Higher FPS indicates better rendering efficiency. The best FPS result is shown in bold.

Method	Mill-19 [16]		UrbanScene3D [40]		MatrixCity [15]		Average
Method	Building	Rubble	Residence	Sci-Art	Aerial	Street	Average
BlockGaussian [13]	19.61	28.48	23.87	54.40	27.74	88.61	40.45
StitchGS (Ours)	35.56	24.87	25.49	28.51	35.23	115.40	44.18

Table 8. Ablation study on Mill-19 scenes (Building and Rubble) [16]. Starting from the baseline, we progressively enable Merge, GCR, and Quant. Storage is measured in MB. Upward arrows indicate that higher values are better, while downward arrows indicate that lower values are better. Specifically, higher is better for PSNR and SSIM, while lower is better for LPIPS and Storage. Bold denotes the best result for each metric within the same scene.

Scene	Merge	GCR	Quant	PSNR↑	SSIM↑	LPIPS↓	Storage↓
Building [16]				21.60	0.748	0.226	2508.80
	✓			22.00	0.751	0.262	2560.00
	✓	✓		22.49	0.761	0.258	1955.84
	✓	✓	✓	22.47	0.758	0.259	814.56
Rubble [16]				25.11	0.795	0.237	2375.68
	✓			25.89	0.782	0.258	2611.20
	✓	✓		26.41	0.799	0.243	2048.00
	✓	✓	✓	26.21	0.796	0.245	831.42

Table 9. Ablation study on UrbanScene3D scenes (Residence and Sci-Art) [40]. Starting from the baseline, we progressively enable Merge, GCR, and Quant. Storage is measured in MB. Upward arrows indicate that higher values are better, while downward arrows indicate that lower values are better. Specifically, higher is better for PSNR and SSIM, while lower is better for LPIPS and Storage. Bold denotes the best result for each metric within the same scene.

Scene	Merge	GCR	Quant	PSNR↑	SSIM↑	LPIPS↓	Storage↓
Residence [40]				20.54	0.754	0.233	2519.04
	✓			20.80	0.712	0.308	3164.16
	✓	✓		22.84	0.786	0.247	2344.96
	✓	✓	✓	22.64	0.783	0.251	747.73
Sci-Art [40]				23.34	0.809	0.236	831.76
	✓			22.85	0.801	0.245	1525.76
	✓	✓		24.32	0.837	0.222	851.82
	✓	✓	✓	23.91	0.827	0.236	484.35

Table 10. Overhead of stochastic interwoven stitching on six scenes. Comp. Prim. denotes competitive primitives and is reported in millions (M). SIS Time is reported in seconds (s), and Peak VRAM Overhead in gigabytes (GB).

Metric	Mill-19 [16]		UrbanScene3D [40]		MatrixCity [15]
Metric	Building	Rubble	Residence	Sci-Art	Aerial	Street
Blocks	7	4	7	7	20	4
Comp. Prim.	1.4682	1.6212	1.3264	1.8630	1.8075	0.2787
SIS Time	86.2	30.7	50.1	44.5	657.6	4.0
Peak VRAM	0.1489	0.2173	0.1571	0.1451	0.0735	0.0334

Table 11. Seam-aware evaluation on Mill-19 [16]. Metrics are computed on the overlap-band seam mask defined in Section 5.1. Upward arrows indicate that higher values are better, while downward arrows indicate that lower values are better. Specifically, higher is better for Seam_PSNR, while lower is better for BPD_L1 and nBGJ. Seam_Coverage is reported as a percentage for reference. Best results are in bold.

Method	Building [16]				Rubble [16]
Method	Seam_PSNR↑	BPD_L1↓	nBGJ↓	Coverage	Seam_PSNR↑	BPD_L1↓	nBGJ↓	Coverage
BlockGaussian*	21.39	0.0661	1.0752	17.20%	23.05	0.0536	1.0682	5.51%
StitchGS	22.82	0.0552	1.0795	17.20%	28.41	0.0281	1.0246	5.51%

Table 12. Seam-aware evaluation on UrbanScene3D [40]. Metrics are computed on the overlap-band seam mask defined in Section 5.1. Upward arrows indicate that higher values are better, while downward arrows indicate that lower values are better. Specifically, higher is better for Seam_PSNR, while lower is better for BPD_L1 and nBGJ. Seam_Coverage is reported as a percentage for reference. Best results are in bold.

Method	Residence [40]				Sci-Art [40]
Method	Seam_PSNR↑	BPD_L1↓	nBGJ↓	Coverage	Seam_PSNR↑	BPD_L1↓	nBGJ↓	Coverage
BlockGaussian*	21.12	0.0662	0.9754	17.29%	21.79	0.0664	1.1117	39.61%
StitchGS	21.05	0.0640	0.9758	17.29%	23.79	0.0503	1.1286	39.61%

Table 13. Seed stability of overlap-band seam metrics on the Building scene. We report mean ± std over three random seeds. Upward arrows indicate that higher values are better, while downward arrows indicate that lower values are better. Specifically, higher is better for Seam_PSNR, while lower is better for BPD_L1 and nBGJ. Seam_Coverage denotes the pixel ratio of the seam mask.

Seed	Seam_PSNR↑	BPD_L1↓	nBGJ↓	Seam_Coverage (%)
0	22.8223	0.0552	1.0795	17.2011
1	22.8312	0.0552	1.0771	17.2011
2	22.8006	0.0554	1.0795	17.2011
Mean ± Std	22.8180 ± 0.0157	0.0553 ± 0.0001	1.0787 ± 0.0014	17.2011 ± 0.0000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, J.; Pan, S.; Zhu, H.; Chen, S.; Huang, Y.; Zhou, Y. StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting. Remote Sens. 2026, 18, 1460. https://doi.org/10.3390/rs18101460

AMA Style

Su J, Pan S, Zhu H, Chen S, Huang Y, Zhou Y. StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting. Remote Sensing. 2026; 18(10):1460. https://doi.org/10.3390/rs18101460

Chicago/Turabian Style

Su, Jinhe, Shengfang Pan, Huanxin Zhu, Siyu Chen, Yaoming Huang, and Yixin Zhou. 2026. "StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting" Remote Sensing 18, no. 10: 1460. https://doi.org/10.3390/rs18101460

APA Style

Su, J., Pan, S., Zhu, H., Chen, S., Huang, Y., & Zhou, Y. (2026). StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting. Remote Sensing, 18(10), 1460. https://doi.org/10.3390/rs18101460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

StitchGS: Towards Seamless and Lightweight Large-Scale 3D Gaussian Splatting

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Scalable Urban Scene Reconstruction

2.2. Compactness and Efficiency in 3D Gaussian Splatting

3. Method

3.1. Overview

3.2. Scalable Scene Construction

3.2.1. Scene Partitioning and Margin-Aware Training

3.2.2. Manifold-Balanced Sampling

3.2.3. Stochastic Interwoven Stitching

3.2.4. Global Consistency Refinement

3.3. Resource-Efficient Representation

3.3.1. Spectral Energy Analysis

3.3.2. Mixed-Precision Storage

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Implementation Details

4.2. Comparisons on Rendering Quality

4.2.1. UrbanScene3D

4.2.2. Mill-19

4.2.3. MatrixCity

4.3. Comparisons on Rendering Efficiency

4.4. Ablation Study

4.4.1. Effectiveness of Stochastic Interwoven Stitching (Merge)

4.4.2. Effectiveness of Global Consistency Refinement (GCR)

4.4.3. Robustness of Quantized Compression (Quant)

4.5. Training Behavior of Balanced Sampling

4.6. Overhead of Stochastic Interwoven Stitching

5. Seam-Aware Evaluation on Overlap Bands

5.1. Protocol and Metrics

5.2. Results on Mill-19 and UrbanScene3D

5.3. Seed Stability on Overlap-Band Seam Metrics

6. Discussion

Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI