1. Introduction
The rapid development of Digital Twin Cities and City Information Modeling (CIM) has made the organization, integration, processing, and interactive visualization of massive urban spatio-temporal and 3D scene data a key issue for urban digital infrastructure [
1,
2,
3]. In Web-based 3D GIS and BIM-GIS visualization, large-scale 3D geospatial and Building Information Modeling (BIM) data impose constraints on transmission, browser-side memory, computing resources, and rendering performance [
4,
5]. Efficient visualization therefore depends on data organization and scheduling, including LOD-based model selection, index-based entity loading, and indoor/outdoor scene scheduling [
5]. However, conventional LOD specifications usually represent 3D city models through predefined geometry-oriented levels, which may limit the expression of application-specific requirements involving semantics, textures, and task-dependent representation choices [
6]. Complete preloading may download assets outside the current view and may exceed graphics memory unless less important assets are unloaded in time; thus, full preloading is inefficient for city-scale Web3D applications under limited bandwidth or graphics-memory constraints [
7]. These Web-side challenges become more evident for structurally complex BIM models with many subassemblies, which increase data volume and complicate Web-based organization, scheduling, and rendering [
5,
8].
CityGML established a standardized semantic 3D city model with predefined Levels of Detail (LODs) [
9], while subsequent studies and standards extended this concept toward geometry–semantics decoupling, multi-representation modeling, and indoor–outdoor or application-driven LOD specifications [
10,
11,
12,
13,
14,
15]. These representation-oriented works support fine-grained description of complex 3D city models, but focus mainly on model specification rather than runtime state control. They do not explicitly address runtime encoding of representation dimensions, cross-tile semantic routing, or frequent state scheduling. Meanwhile, massive 3D content is commonly delivered through hierarchical standards such as 3D Tiles, which support streaming, HLOD tile structures, implicit-tiling coordinate addressing, subtree availability, and metadata association [
16]. However, if runtime state descriptors are stored in text-based semi-structured formats such as JSON, frequent state queries may incur parsing and field-access overhead [
17]. These gaps motivate an object-level runtime state-control layer that coordinates representation dimensions, semantic payloads, and scheduling decisions after tiled spatial selection.
Accordingly, this study addresses two research questions:
RQ1: In city-scale Web3D scenes, how can a multidimensional discrete state framework support separately addressable scheduling of spatial, geometric, component, texture, and semantic information while reducing repeated text-based runtime state parsing through compact encoding under frequent state switching?
RQ2: For continuous reality-based mesh scenes and cross-tile building objects, how should multidimensional geometry, component, texture, and semantic information be organized and routed to support fine-grained on-demand scheduling, reduce redundant loading and memory pressure, mitigate Out of Memory (OOM) risk, and maintain topological consistency and semantic continuity where applicable?
To address these questions, this study proposes a multidimensional state-control architecture linking model representation, object-level data organization, and dynamic scheduling. Based on multidimensional decoupling, the architecture encodes object states into compact bit-field representations and reduces repeated text-based runtime state parsing, aiming to support large-scale state indexing, runtime switching, and incremental updating. Specifically, this study constructs a B-D-C-T-S five-dimensional orthogonal decoupling state model for Web3D building models, abstracting Boundary-based Spatial Proxy (B), Geometric Detail (D), Component Complexity (C), Texture Appearance (T), and Semantic Information (S) into separately addressable and schedulable state dimensions. A 16-bit structured state word is then designed as a compact runtime scheduling descriptor to support fixed-offset state parsing, selective resource activation, and Home Tile semantic routing after tiled spatial selection. By integrating XOR-based differential updating with constraint-aware degradation, the architecture supports incremental state updating, invalid-state suppression, and memory-risk control.
This study does not replace tiled delivery standards such as 3D Tiles or glTF-based Web3D pipelines. Instead, it introduces a lightweight object-level state-control layer integrated with tiled spatial indexing. While current tiled streaming frameworks mainly address spatial hierarchy, visibility-driven loading, and file organization, the proposed B-D-C-T-S framework focuses on multidimensional runtime state representation, fixed-offset state parsing, selective resource activation, and incremental state updating.
The main contributions are as follows:
A B-D-C-T-S state framework organizes Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information as separately addressable scheduling dimensions.
A 16-bit structured state word supports compact state representation, fixed-offset bitwise decoding, XOR-based differencing, and constraint-aware runtime degradation.
A Home Tile semantic routing strategy is introduced to reduce redundant semantic payloads for cross-tile objects by storing complete semantic records in Home Tiles and lightweight pointer records in guest tiles.
The remainder of this paper is organized as follows.
Section 2 reviews multidimensional LOD decoupling, tiled Web3D streaming, runtime state-control gaps, and cross-tile semantic organization.
Section 3 introduces the B-D-C-T-S five-dimensional decoupling model and data-organization methods.
Section 4 describes how the five-dimensional states are encoded into a 16-bit state word and how this encoding supports XOR-based incremental updating and constraint-aware degradation.
Section 5 evaluates the proposed mechanisms using a single-building BIM dataset and an urban-scale photogrammetric mesh dataset, covering local-slice measurements, first-screen scheduling tests, and full-volume extrapolated estimates.
Section 6 summarizes the findings, discusses the limitations of the current prototype, and outlines future work.
2. Related Work
This section reviews related studies from four perspectives: the evolution of 3D city model representation from fixed LOD to multidimensional decoupling, tiled Web3D delivery mechanisms, runtime state-control gaps in multidimensional scheduling, and cross-tile semantic organization with dynamic representation management.
2.1. 3D City Model Representation from Fixed LOD to Multidimensional Decoupling
Level of Detail (LOD) is a core mechanism for controlling the complexity of 3D city models. In earlier CityGML versions, model representations were organized through five discrete Levels of Detail (LODs) [
9]. However, the fixed LOD structure also has clear limitations. Because geometry and semantics are usually bundled within predefined levels, it is difficult to adjust one representational aspect without affecting the others. This leads to relatively coarse control over model content and weak adaptability when different applications require different combinations of geometric, semantic, and contextual information [
10,
11,
12,
14,
15]. These concerns have encouraged researchers to move beyond a single fixed LOD hierarchy and to explore more decoupled forms of model representation.
Several studies have responded to this problem by extending the original LOD concept. These include enhanced LOD definitions, multi-representation models, indoor–outdoor LOD specifications, and application-driven LOD paradigms [
10,
11,
14,
15]. Another related approach separates geometric detail from semantic detail and further distinguishes interior and exterior characteristics [
12]. CityGML 3.0 also reflects this shift at the standard level by placing geometric representation in the Core module, so that thematic modules can inherit spatial representations instead of defining their own geometry independently [
13]. These studies and standards provide an important basis for the multidimensional representation of complex 3D city models.
Nevertheless, most LOD-related work focuses on defining representational dimensions rather than organizing them as compact, computable, and schedulable runtime states in Web environments. Existing Web3D studies have explored tile-based multi-representation personalization, rule-based scene graph generation, and selective CityGML loading under mobile resource constraints [
18,
19]. However, they do not explicitly provide a compact state-level mechanism for object-level runtime organization, cross-scale scheduling decisions, or incremental switching of multidimensional representations. Thus, the translation from multidimensional representation to Web3D runtime state organization remains insufficiently addressed.
2.2. Web3D Streaming Standards and Tiled Delivery Mechanisms
Large-scale 3D spatial data in Web environments are commonly delivered through tiled hierarchical structures, as reflected in OGC 3D Tiles. 3D Tiles defines hierarchical spatial data structures and tile formats for streaming massive heterogeneous 3D geospatial content, supporting hierarchical spatial organization, HLOD/SSE-based refinement, and metadata organization [
16]. It has also been applied to efficient Web-based visualization of complex BIM models [
8]. In such pipelines, glTF serves as an API-neutral runtime asset format for meshes, materials, textures, and binary buffers [
20], while Draco compresses 3D meshes and point clouds to improve storage and transmission efficiency [
21].
At the delivery level, 3D Tiles has become a mature basis for organizing large-scale Web3D content. Its 1.1 specification uses Implicit Tiling to encode quadtree and octree structures compactly, enabling tile-coordinate addressing and subtree-level availability management [
16]. The earlier 1.0 specification introduced Feature Tables and Batch Tables for feature-level property organization; Batch Tables, in particular, associate application-specific attributes with individual features and support declarative styling or other application operations [
22,
23]. In practical rendering workflows, these structures are combined with tileset JSON or index files, bounding volumes, geometric errors, refinement rules, and tile payloads to guide view-dependent loading [
22,
24]. SRC addresses a related but different problem by emphasizing progressive transmission of meshes and textures [
25]. For BIM-oriented Web3D scenes, semantic lightweighting, scene indexing, and real-time scene management have also been used to reduce rendering and management pressure when complex building models are visualized on the Web [
26].
These standards and methods mainly address spatial indexing, tile selection, geometry transmission, metadata association, and resource loading. However, they remain primarily delivery- and visualization-oriented, and do not explicitly define compact object-level state-control mechanisms for independent switching of representation dimensions.
2.3. Runtime State-Control Gap in Multidimensional Web3D Scheduling
Although tiled streaming provides a robust spatial delivery foundation, object-level state control in multidimensional Web3D scenes remains less explicitly addressed. Representative tiled-scheduling and Web3D lightweighting pipelines mainly rely on camera/view-frustum visibility, geometric error or SSE, tile payload selection, and progressive loading or unloading [
24,
26], while progressive transmission formats such as SRC focus on mesh and texture delivery [
25]. These mechanisms are effective for general visualization, but they do not explicitly coordinate geometry, components, textures, and semantics as compact object-level runtime states.
Runtime state representation is another challenge. If object attributes or state descriptors are encoded as text-based semi-structured metadata such as JSON, frequent switching across many objects may repeatedly trigger parsing, validation, or field-access operations, introducing overheads similar to those reported for JSON processing systems [
17,
27]. HTTP/2 multiplexing can improve network resource utilization and reduce latency through concurrent exchanges over a single connection [
28], but it does not define compact object-level state representations or decide which multidimensional resources should be activated, reused, or skipped.
Multidimensional state transitions also involve cross-resource dependencies. Component loading may depend on the currently selected geometric representation, texture activation may require a corresponding spatial carrier, and semantic retrieval may require routing support when object information is not stored with the currently rendered tile. Existing tiled mechanisms can stream and index massive 3D content efficiently, but they do not provide a compact runtime instruction format for validating, comparing, and incrementally updating multidimensional representation states. This gap motivates the proposed state-word mechanism, which encodes the B-D-C-T-S state, namely Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information, as a fixed-length 16-bit word to support fixed-offset bitwise parsing, constraint validation, XOR-based differencing, and selective resource scheduling after tiled spatial selection.
2.4. Cross-Tile Semantic Organization and Dynamic Representation Management in Large-Scale Web3D Scenes
Cross-tile semantic organization remains important because spatial partitioning and hierarchical tiling can reorganize geometry and object-level attributes. For example, IFC-to-3D Tiles conversion can decompose IFC models, convert geometry, generate b3dm content, and incorporate component attributes through JSON records and batch-table structures [
29]. Spatial partitioning may split city objects across tile boundaries and compromise building integrity [
18]. In BIM-to-3D Tiles workflows, preserving rich semantic attributes can also substantially increase intermediate JSON records and final tile payloads, which may aggravate transmission, memory, and rendering burdens in Web-based visualization [
30]. CityGML-derived mobile visualization faces related constraints because large semantic 3D city models must be selectively loaded under limited storage, memory, processing, and network conditions [
19]. Although prior studies address Web-based BIM visualization, semantics-guided lightweighting, and IFC-to-3D Tiles conversion [
26,
29,
30], they do not explicitly provide low-redundancy semantic organization with cross-tile routing.
Dynamic Web3D applications, including digital-twin use cases, also require incremental updates across changing object representations. Non-incremental or one-time loading may transmit and process unnecessary model, animation, or LOD data, increasing network payload, loading latency, and client-side computation [
31,
32]. Geometry compression can reduce mesh or point-cloud payload size [
21]; however, it does not decide state transitions among geometry, component, texture, or semantic resources. Large-scale Web3D visualization therefore needs to balance loading latency, decompression cost, rendering performance, multi-level LOD scheduling, and spatial indexing to maintain interactive frame rates [
33,
34].
Together, existing work advances Web3D representation, tiled streaming, metadata association, compression, and large-scale visualization, but lacks an integrated mechanism for compact multidimensional state expression, low-redundancy cross-tile semantic routing, and runtime consistency control under frequent state switching. This study therefore proposes a B-D-C-T-S state-control architecture that complements tiled Web3D delivery through object-level state encoding, selective resource scheduling, Home Tile semantic routing, and XOR-based incremental updating.
3. A Five-Dimensional Orthogonal Decoupling State Framework for Building Models
3.1. Formal Definition of the Framework
To support on-demand scheduling and fine-grained organization of 3D building models in web environments, this study defines the B-D-C-T-S framework as a five-dimensional orthogonal decoupling state model. The five dimensions are Boundary-based Spatial Proxy (B), Geometric Detail (D), Component Complexity (C), Texture Appearance (T), and Semantic Information (S). Here, B denotes boundary-related spatial proxies, such as points, bounding boxes, and mesh proxies, rather than a conventional geometric LOD level. By combining these dimensions, building models can be reorganized into separately addressable data units for flexible transmission and runtime scheduling.
The candidate state space of the framework can be expressed as:
where each dimension is discretized into multiple levels according to its specific role in representation and scheduling. A model state can therefore be represented as an ordered five-tuple:
where
,
,
,
and
.
Figure 1 illustrates the relative configuration of two example states across the five discrete dimensions; the radial values indicate graded state levels rather than directly comparable continuous quantities. An analytical state such as ⟨2,1,0,1,3⟩ is oriented toward macro-scale computation and semantic analysis, whereas a display state such as ⟨2,3,2,3,1⟩ places greater weight on geometric detail and visual fidelity. The contrast between the two shows that the five dimensions can be configured separately rather than following a single linear progression. Thus, the framework supports analytical and presentation-oriented tasks by composing different dimension levels, rather than by enforcing a single universal fidelity scale.
The meanings of the five dimensions are defined as follows.
B (Boundary-based Spatial Proxy): defines the proxy form through which an object participates in spatial indexing and geometric computation. This dimension determines how the object enters spatial queries, such as view-frustum culling, and coarse geometric operations, such as bounding-box collision detection. It serves as the spatial entry point for runtime scheduling.
D (Geometric Detail): describes the degree to which the host geometry approximates the reference geometry. This dimension controls the complexity of the primary mesh and directly affects visual accuracy as well as the feasibility of geometry-dependent computation.
C (Component Complexity): describes the retention level of attached or embedded components, such as doors, windows, railings, and mechanical equipment. This dimension is separated from the host geometry at the data-organization level and allows components to be loaded or removed selectively for different tasks. However, embedded elements such as doors and windows remain subject to the topological dependency constraints of the host geometry.
T (Texture Appearance): describes the material and texture representation of surfaces. This dimension regulates loading from a single color to multi-channel physically based rendering (PBR) materials, mainly affecting graphics memory usage and requested payload.
S (Semantic Information): describes the depth of association between non-geometric attributes and entities. This dimension controls the loading of information from unique identifiers to complete attribute sets and further to external knowledge bases or business databases, thereby supporting query and analysis.
In this study, orthogonality means that the B, D, C, T, and S dimensions are separable in data organization, state encoding, transmission organization, and request addressing. Each dimension can be configured and stored separately, enabling task-specific state vectors. For example, an analytical state may preserve great geometric detail while omitting textures. Thus, overall model fidelity is reformulated as separately controllable discrete variables.
However, orthogonality does not make every B-D-C-T-S combination valid or executable at runtime. The admissible state space is constrained by retrievable spatial-carrier requirements, topological-support conditions, and rendering-logic and semantic-consistency constraints, as defined in
Section 3.3. The framework is therefore a decoupled state-organization model with runtime validity constraints, not an unconstrained Cartesian product of all dimension levels.
3.2. Quantitative and Rule-Based Grading Definitions of Dimensions
Building on the five-dimensional orthogonal framework, this section further defines quantitative grading criteria for each dimension. To meet the engineering requirements of web-based streaming and on-demand rendering, continuous model characteristics are discretized into a complete five-dimensional grading matrix.
Figure 2 illustrates the hierarchical progression of the five dimensions and their corresponding proxy or representation forms.
3.2.1. Dimension B (Boundary-Based Spatial Proxy): Grading Based on Proxy Complexity
The B-dimension describes the proxy forms in which an object participates in spatial computations. Based on the complexity of the proxies, four levels are defined:
B0 (Null Proxy): The object is not assigned an active spatial proxy for runtime spatial computation or geometric evaluation. This level is used for logical hiding or non-spatial background records. If object-level semantic or appearance information is activated in a runtime scene, the spatial-entity constraint in
Section 3.3 requires a retrievable spatial carrier, namely
B >
B0.
B1 (Point Proxy): The object is represented by its centroid or geometric center as the calculation node. As it retains only minimal spatial information, this level is highly suitable for large-scale spatial indexing, clustering tasks, and macro-level situational analysis.
B2 (Bounding-Box Proxy): The object is approximated using an axis-aligned bounding box (AABB) or an oriented bounding box (OBB). This level maintains a high computational efficiency while preserving the basic spatial extent, making it appropriate for view-frustum culling and coarse collision detection.
B3 (Mesh Proxy): A low-polygon mesh or convex hull is used to more closely fit the object’s boundary. This level supports more detailed shadow analysis, occlusion analysis, and high-precision collision detection.
3.2.2. Dimension D (Geometric Detail): Grading Based on Geometric Approximation Error
The D dimension measures the difference between a simplified model and the reference geometry using the Hausdorff distance . According to approximation accuracy, four levels are defined.
D0 (Base Projection): Only the closed two-dimensional projected footprint is retained. Because height information is absent, the three-dimensional Hausdorff distance is not applicable; instead, a two-dimensional contour-similarity constraint is used.
D1 (Prismatic Aggregation): The building mass is represented by a combination of flat-roofed prisms, while vertical height differences below a prescribed threshold are ignored. This level preserves coarse volumetric characteristics and is suitable for first-screen placeholder rendering and macro-scale massing analysis.
D2 (Generalized Host Geometry): The host geometry is simplified into a medium-precision mesh that preserves the major shape of the building envelope while suppressing local surface fluctuations. The approximation must satisfy
where
denotes the reference geometry and
is the tolerance threshold for medium-precision representation.
D3 (Refined Host Geometry): The host geometry retains high-precision geometric characteristics and is suitable for detailed visual inspection as well as accurate topological support for embedded components. The approximation must satisfy
3.2.3. Dimension C (Component Complexity): Grading Based on Visibility and Shape Characteristics
The C dimension regulates the retention of attached or embedded components. Since volume alone may incorrectly discard small but visually important elements, both the characteristic size and the shape factor are introduced.
Let denote the characteristic size of a component, denote its shape factor, the minimum visibility threshold, and the threshold for identifying slender components.
According to these criteria, four levels are defined.
C0 (No Components): all attached and embedded components are omitted. Only the host geometry is retained.
C1 (Salient Components Only): only components satisfying the minimum visibility condition are retained, namely
This level preserves visually dominant components while filtering out small and non-essential details.
C2 (Salient and Slender Components): in addition to the visible components retained at
C1, slender components that are visually important despite their small size are also preserved. A component is identified as a prominent, slender element when
This level is suitable for structures such as railings, mullions, or similar elongated elements.
C3 (Full Components): all attached and embedded components are retained. This level is intended for detailed inspection and full-detail presentation.
It should be noted that the C dimension governs the retention of attached or embedded components on the host geometry, whereas the D dimension governs the approximation level of the host geometry itself. The two dimensions are therefore separated at the data-organization level but remain subject to topological dependency constraints at runtime.
3.2.4. Dimension T (Texture Appearance): Grading Based on Visual Fidelity and Resource Cost
The T dimension regulates the material and texture representation of surfaces. According to appearance fidelity, four levels are defined.
T0 (No Texture): no texture or material information is loaded, and the object is rendered with a default monochrome shader.
T1 (Procedural Color): only lightweight appearance categories are retained, such as functional or semantic class colors. At this level, semantic attributes from the S dimension may also be used to support procedural color mapping.
T2 (Baked Texture): medium-resolution texture atlases or baked material maps are loaded. This level supports improved visual realism while controlling graphics memory consumption.
T3 (Full PBR Texture): complete multi-channel physically based rendering materials are loaded, including albedo, normal, roughness, metallic, and other associated maps where available. This level prioritizes visual fidelity and is suitable for close-up inspection and presentation-oriented applications.
3.2.5. Dimension S (Semantic Information): Grading Based on Association Depth
The S dimension regulates the loading depth of semantic information. According to the degree of semantic association, four levels are defined.
S0 (No Semantics): no semantic information is loaded.
S1 (Identifier Only): only a unique identifier is retained, supporting minimal indexing and object-level retrieval.
S2 (Local Attributes): local semantic attributes, such as category, material class, function, and status, are retained, making this level suitable for routine queries and statistical analysis under offline or low-latency conditions.
S3 (Associated Semantics): lightweight pointers, such as Uniform Resource Identifiers (URIs), are used to link external knowledge bases or business databases. This level enables on-demand retrieval of large-scale heterogeneous data during runtime while reducing the initial requested payload.
The grading scheme above discretizes continuous features for implementation purposes. In practical applications, parameters such as the Hausdorff-distance tolerance , , the minimum visibility threshold , and the slenderness threshold should be configured according to the target scenario. For city-scale CIM planning scenarios, these thresholds may be relaxed. For building-scale BIM inspection tasks, by contrast, they should be tightened to remain consistent with the required delivery standards.
3.3. Valid State Constraints
Although the five dimensions are defined as organizationally separable, physical rules and rendering logic impose specific dependencies to ensure topological validity and semantic consistency. The following constraints are therefore introduced and are used both to check illegal state combinations before delivery and to govern runtime degradation.
Spatial entity constraint. If semantic or appearance information is activated, the object must still possess a retrievable spatial carrier. Formally, if
then
must also hold. This ensures that semantic or material information is not attached to an entity without a locatable spatial representation.
Component topology dependency constraint. Component loading must satisfy the supporting conditions of the host geometry. For attached components, such as billboards or surface-mounted details, a continuous attachment surface is required. The host geometry should therefore reach at least D2; otherwise, floating or intersecting artifacts may occur. For embedded components, such as doors and windows, topological cutting and interface matching are involved. The host geometry should therefore reach D3. Forced loading on low-precision meshes can easily lead to topological mismatch and depth conflict, such as Z-fighting. In simplified runtime validation, this rule can be expressed as C > C0 ⇒ D ≥ D2, with embedded components further requiring D ≥ D3 when topological cutting is involved.
Visual compensation under low geometric precision. When only medium-precision geometry such as D2 is available, but door and window details still need to be presented, high-precision details may be baked into two-dimensional textures in advance, corresponding to T2. In this case, the appearance can be restored through texture mapping while keeping . This strategy preserves topological validity while balancing visual fidelity and transmission efficiency.
These constraints indicate that the orthogonal state framework does not imply arbitrary state combinations. Rather, valid combinations must satisfy runtime feasibility, topological support, and semantic consistency.
3.4. Applicability Boundaries and Extensibility of the Framework
The proposed framework is primarily intended for the lightweight organization, transmission, and scheduling of 3D building models in web environments. It is particularly suitable for scenarios requiring multiscale representation, on-demand loading, and dynamic switching among geometric, visual, and semantic states.
For highly continuous reality-based meshes, such as oblique photogrammetric models, the framework can still be applied, but additional preprocessing is required. In such cases, virtual monomerization may be used to generate virtual monomers from continuous surfaces so that the resulting entities can be organized and scheduled under the same five-dimensional framework.
Beyond its immediate scheduling role, the architecture provides a basis for future extension. Developers can adjust grading rules within a single dimension to meet future application requirements or data specifications without changing the overall state-control logic. Because the dimensions remain organizationally separated under the validity constraints defined above, extending or replacing one layer does not necessarily require redesigning the entire scheduling engine. This separation facilitates compatibility with CIM workflows and complex Web3D streaming pipelines.
5. Validation of Core Mechanisms and Performance Analysis
Based on the B-D-C-T-S five-dimensional orthogonal decoupling framework and the bitwise state mechanism proposed above, this section presents mechanism-level evaluations of the method. A controlled server–client test environment was established, and two datasets were used: a single-building model (Dataset A) and an urban-scale photogrammetric mesh scene (Dataset B). The evaluation covers transmission-level separation, state parsing, incremental updating, cross-tile semantic routing, first-screen memory control, and preprocessing overhead.
5.1. Experimental Design and Environment Setup
5.1.1. Experimental Objectives
The experiments were designed to evaluate the proposed B-D-C-T-S state-control architecture from five perspectives: transmission-level separation during staged loading; parsing efficiency and allocation tendency of the 16-bit state word compared with JSON-based records; XOR-based differential updating and runtime degradation under abnormal-state injection; computational behavior of the spatial proxy dimension (B) in macro- and micro-scale tasks; and cross-tile semantic routing, first-screen memory control, and offline preprocessing overhead.
5.1.2. Experimental Environment and Tools
The experimental platform consisted of an Intel Core i7-12700KF CPU, 96 GB of RAM, and an NVIDIA GeForce RTX 3080 GPU. Web-based evaluations were conducted in Google Chrome (64-bit) with WebGL hardware acceleration enabled. Frontend behavior was assessed using browser developer tools, custom JavaScript timers, and operating-system-level process monitoring.
5.1.3. Experimental Datasets
Two datasets were used. Dataset A, a single-building BIM model derived from an official Revit sample, was used for mechanism-level validation of dimensional decoupling, staged loading, and state switching. The original full GLB model was 8.31 MB. It was preprocessed in Blender 4.5 and decoupled into separately addressable geometry, component, and texture-related packages. The first-screen D1 package was operationalized as a coarse geometric proxy using the Decimate modifier with a fixed retained-face ratio of 5%, whereas D3 retained the refined host geometry. Selected fine-detail objects removed from the host geometry were exported as C-Chunks, and extended material and texture resources used to upgrade the resident baseline T state to T3 were exported as T-related chunks.
In this evaluation, the theoretical minimum-visibility and slenderness-ratio thresholds described in
Section 3.2.3 were not varied as independent experimental variables. Therefore, the benchmark results reflect this fixed Blender-based decimation and chunk-separation setting rather than a sensitivity analysis over component-culling thresholds.
Figure 7 contrasts the D1 coarse model with the reconstructed model after
D3,
C, and
T additions.
Dataset B, a Berlin 3D photogrammetric mesh, was used for scalability-related evaluation under large-scale Web3D conditions. It consisted of a 120 MB uncompressed local slice and a large-area 1.12 GB urban model. The local slice was approximately 69 MB after Draco compression using the built-in glTF exporter in Blender 4.5. The fixed Draco configuration was compression level 6, with quantization bits of 14 for positions, 10 for normals, and 12 for texture coordinates. The uncompressed 120 MB local-slice mesh was used for preprocessing-related measurements, while the 69 MB Draco-compressed payload was retained only as a fixed delivery-size reference and was not used for Draco-parameter sensitivity analysis. The 1.12 GB model was used for first-screen memory stress testing and for evaluating whether state-word-driven
D1 tile scheduling could reduce initial memory pressure and potential Out-of-memory (OOM) risk. It also provided the full-volume reference size for extrapolated preprocessing estimates, rather than direct full-volume preprocessing measurement.
Figure 8 shows the local slice and the large-scale regional model used in Dataset B.
5.2. Empirical Evaluation of Dimensional Decoupling and On-Demand Extraction
5.2.1. Verification of Transmission-Level Separation
To evaluate transmission-level separation, Dataset A was loaded in a staged sequence of
D1 →
D3 →
C →
T. During each stage, browser network logs were monitored to determine whether non-target resources were unnecessarily requested. As shown in
Table 1,
D-targeted loading triggered no unintended component or texture request. Component activation requested only the
C-Chunk without geometry reloading or texture interference, and texture activation requested only the
T-Chunk without affecting geometry or component requests. These results indicate that the tested
D,
C, and
T resources remained separated at the request level and could be selectively scheduled in this staged loading case.
5.2.2. Comparison of First-Screen and Staged Requested Payloads
Table 2 compares requested payloads under monolithic loading and staged decoupled loading for Dataset A. In the conventional scheme, the full 8.31 MB GLB file was loaded during the initial screen phase. In the proposed scheme, the
D1 coarse model was used for first-screen rendering, reducing the initial requested payload to 0.57 MB, corresponding to a 93.1% reduction relative to the full model. The latter
D3 geometry-refinement stage required 0.78 MB, or 9.4% of the original model size, because this
D3 package contains refined host geometry without texture resources.
Table 2 focuses on the first-screen and geometry-refinement payloads, while component and texture switching are evaluated in later incremental-update tests. These results show that multidimensional decoupling can reduce the first-screen requested payload in this tested setting while shifting non-essential details to later stages.
5.3. Performance Evaluation of State-Word Scheduling
5.3.1. Experimental Design
To isolate runtime state parsing from rendering and transmission effects, a browser-side CPU micro-benchmark was conducted in Google Chrome. The benchmark measured the time required to parse or decode N object-state records and extract the C field; WebGL rendering, model loading, network transmission, and full browser heap profiling were excluded.
Two schemes were compared. The control group used JSON deserialization followed by C-field lookup, whereas the experimental group used bit-shift and bitmask operations on a 16-bit unsigned integer state word (Uint16). Both schemes checked the same target condition, C = 2. Hit counts and checksums were retained to verify that they identified the same target states.
The sample size
N was set to 1000, 10,000, 50,000, and 100,000. For each scale, three warm-up runs were excluded, followed by ten official runs. Each official run repeated the parsing pass 1000 times, and the reported time was normalized to one parsing pass. Mean values were used for reporting, while standard deviations were retained in the benchmark records.
Figure 9 summarizes the multi-scale results, and
Table 3 reports the representative large-scale
N = 100,000 case.
5.3.2. Execution Efficiency Analysis
For N = 100,000, JSON deserialization and C-field lookup required approximately 13.05 ms, occupying approximately 78.27% of a 16.67 ms frame budget at 60 FPS. Fixed-offset bitwise C-field extraction required approximately 0.090 ms, corresponding to approximately 0.54% of the frame period and an approximately 144-fold speedup.
As shown in
Figure 9, the state-word scheme consistently required much lower
C-field extraction time across all tested object scales. At
N = 1000, the state-word scheme remained close to 0.001 ms per parsing pass, compared with approximately 0.12 ms for JSON. At
N = 100,000, JSON deserialization and
C-field lookup increased to approximately 13.05 ms, whereas fixed-offset state-word extraction remained at approximately 0.090 ms. This result indicates that fixed-offset state-word decoding can substantially reduce CPU-side
C-field extraction cost during large-scale object-state filtering.
The performance difference comes from the representation mechanism. JSON requires text deserialization and dynamic property access before the target field can be retrieved. The state-word scheme stores runtime states as fixed-length Uint16 values and extracts the target field through bit-shift and bitmask operations, while preserving the same target-state identification result.
5.3.3. Encoded Representation Size and Allocation Tendency
The two schemes also differ in encoded representation size and allocation tendency. In the benchmark, each JSON state record occupied approximately 49 bytes as a UTF-8 encoded string, whereas each state-word record was stored as a 2-byte Uint16 value. JSON parsing creates temporary JavaScript objects before C-field access, which may increase allocation pressure during high-frequency switching. By contrast, state-word decoding directly extracts the target field from a primitive Uint16 value and reduces temporary object creation during parsing.
This comparison refers to encoded representation size and allocation tendency, not a full browser heap-profile measurement. Complete memory behavior in Web3D loading and rendering workflows is evaluated separately in
Section 5.7. Together with the timing results, the benchmark indicates that the state-word mechanism is suitable for high-frequency
C-field extraction and lightweight object-state filtering in large-scale interactive Web3D scenes.
5.4. Incremental Updating and State-Constraint Stability Testing
This section evaluates two runtime mechanisms of the proposed framework: XOR-based differential updating for requested-payload reduction and state-constraint checking for intercepting invalid instructions and mapping them to valid executable states.
5.4.1. XOR-Based Differential Incremental Switching
The incremental switching experiment was conducted on Dataset A to test whether the state-word mechanism could identify the changed dimension and request only the required incremental chunk. The tested case simulated a texture-appearance transition from baked texture mode (
T2) to full PBR texture mode (
T3). The initial state word was 0x9080, and the target state word was 0x90C0:
The resulting differential mask, 0x0040, indicated that the modification occurred only in the T dimension. Therefore, the runtime scheduler did not re-request the full Dataset A package, but requested only the corresponding T3-Chunk as the incremental texture-appearance payload.
As shown in
Table 4, the baseline full-package reload required 8514.30 KB of requested payload. By contrast, XOR-based differential scheduling requested only the
T3-Chunk, with a measured payload of 3569.23 KB. In this representative
T-dimension switching case, the requested payload was reduced to 41.92% of the full-package baseline, corresponding to a 58.1% reduction in redundant transfer. This result indicates that the state-word mechanism can convert an identifiable dimensional state change into a selective chunk request.
5.4.2. State Constraint and Degradation Mechanisms
A controlled abnormal-state injection test was conducted to evaluate whether the runtime state machine could detect and intercept invalid state combinations. In this test, the instruction 0xD400 requested C2 components while only coarse geometry (
D1) was resident. This violated the topological dependency rule defined in
Section 3.3: when
C >
C0, the host geometry must satisfy
D ≥
D2.
As shown in
Figure 10, the runtime state machine identified the instruction as invalid and intercepted the component-loading request before execution. After constraint masking, the component level
C was cleared to
C0, and the exception flag
R was set to 1, producing the valid degraded instruction 0xD001. As a result, the system did not attempt to attach fine components to geometry that was too coarse to support them.
This test demonstrates that the constraint mechanism can prevent illegal topological state combinations, such as unsupported component attachment caused by invalid runtime instructions. The result should be interpreted as a validation of state-consistency control and safe degradation, rather than as a complete solution to perceptual transition artifacts. Visual discontinuities such as popping during rapid refinement are discussed as a limitation in
Section 6.
5.5. Verification of the Spatial Proxy (Dimension B)
This section evaluates the computational behavior of the spatial proxy dimension in the local slice of Dataset B. The purpose is not to show that one B level is universally superior, but to verify whether different B levels support different spatial tasks. B1 is evaluated for macro-scale spatial screening, whereas B3 is evaluated through a B3-like CPU-side workload for local, precise geometric interaction.
5.5.1. Test Setup
Scenario A represents macro-scale tasks such as distance retrieval, spatial filtering, and heat-map-style analysis. The test state was set to B = 1, while the remaining dimensions were not requested in this proxy-only benchmark. The B1 point-proxy dataset was constructed from extracted building centroids and expanded to 10,000 point records within the spatial range of the local scene for stress testing. The resulting Points_B1.bin file occupied approximately 117 KB and contained 10,000 three-dimensional point proxies. On the frontend, Euclidean distance checks were performed over these point proxies.
Scenario B represents the CPU-side geometric cost of local precise picking under a B3-like refined mesh proxy condition. The tested condition was treated as B = 3, but the timing was implemented as a deterministic synthetic ray-triangle intersection workload rather than as direct traversal of the full refined mesh. To isolate CPU-side geometric computation from network transmission and Draco decompression, a deterministic synthetic 50,000-triangle ray-triangle intersection workload was used to approximate the CPU-side computational burden of B3-like local picking. The result was then used to estimate the overhead that would occur if this B3-like workload were incorrectly applied to all 10,000 objects in a macro-scale screening task.
To avoid timer-resolution artifacts, both benchmarks used repeated inner iterations and reported normalized per-pass time. The B1 benchmark used 5000 inner iterations per run, and the B3 benchmark used 300 inner iterations per run. Each scenario was repeated 20 times after 5 warm-up runs. No target-time calibration, random fallback, or random workload generation was used.
5.5.2. Execution Time Statistics
Table 5 summarizes the comparison between the two proxy modes. In Scenario A, the frontend loaded the 117 KB B1 point-proxy file and completed distance checks over 10,000 point proxies in approximately 0.017 ms per normalized processing pass, reported as ~0.02 ms in
Table 5.
In Scenario B, the B3 micro-benchmark required approximately 0.37 ms for one deterministic 50,000-triangle ray-triangle intersection-test workload. If this B3-like workload were naively applied to all 10,000 objects in a macro-scale traversal task, the extrapolated execution time would be approximately 3.7 s. This would cause multi-second main-thread blocking and would therefore be unsuitable for real-time macro-scale screening.
These results indicate the need for task-adaptive proxy selection rather than showing that B1 replaces B3. B1 point proxies are suitable for large-scale spatial retrieval, filtering, and heat-map-style analysis, where approximate object positions are sufficient. B3-level or B3-like mesh proxies remain necessary for local picking, collision checking, shadow analysis, occlusion analysis, and other precise geometric operations. Therefore, B-dimension grading can help avoid unnecessary mesh traversal in macro-scale tasks while preserving refined geometric proxies for local interaction.
5.6. Verification of Home Tile-Based Semantic Payload Reduction
This section evaluates whether the Home Tile strategy can reduce redundant semantic payloads for cross-tile virtual building proxy objects. The test focuses on metadata-level semantic-payload reduction, not bandwidth, end-to-end network latency, runtime semantic-routing latency, or interface response time.
5.6.1. Experimental Setup and Scene Construction
Because continuous oblique photogrammetric meshes often lack explicit object boundaries, a virtual monomerization step was used to construct a controlled cross-tile semantic-redundancy test. Based on the local slice of Dataset B, Voronoi-like nearest-seed partitioning generated 500 virtual building proxy objects within an 8 × 8 tile grid, resulting in 64 spatial tiles.
Two semantic organization schemes were compared. In S-Full, each involved tile stored a complete semantic record for each associated building proxy, including local attributes and linkable semantic fields. Each full semantic template contained 20 semantic fields, including object identifiers, usage, and height, and was padded to approximately 2.5 KB for controlled payload-size testing. In S-Home, each virtual building proxy was assigned a Home Tile according to the tile containing its proxy centroid, represented by the corresponding seed point. The complete semantic record was stored only in the Home Tile, whereas guest tiles stored a lightweight pointer record containing the object identifier, Home Tile identifier, and URI, with a controlled size of approximately 0.1 KB.
5.6.2. Analysis of Experimental Results
The experiment monitored semantic payloads across 64 spatial tiles.
Table 6 reports five representative tiles selected according to a distribution-based criterion, covering the minimum, lower-quartile-nearest, median/overall-nearest, upper-quartile-nearest, and maximum reduction cases among the monitored tiles.
Figure 11 visualizes the same representative tiles.
Across all monitored tiles, S-Full required approximately 2298.46 KB, whereas S-Home required 1285.50 KB. The Home Tile strategy therefore reduced the semantic payload by 1012.96 KB, corresponding to an overall reduction of 44.1%. This result indicates that storing complete attributes only in Home Tiles and replacing guest-tile attributes with lightweight pointer records can reduce semantic duplication in controlled cross-tile object organization.
Overall, the experiment indicates that the Home Tile strategy can reduce redundant semantic payloads in a controlled cross-tile object setting. The result should be interpreted as a metadata-level lightweighting effect rather than as a direct measurement of end-to-end network performance.
5.7. Initial-Screen Memory Testing in a Large-Scale Scenario
This section evaluates the first-screen process-level memory footprint of monolithic loading and state-word-driven D1 tile scheduling using Dataset B. The test focuses on the same initial spatial view and does not compare full-fidelity loading of all B-D-C-T-S layers. In the experimental group, only the D1-level tiles required by the initial view were loaded; detailed components, semantic layers, and high-resolution textures were excluded from the first-screen request.
5.7.1. Experimental Design and Measurement Method
The experiment was conducted in Google Chrome with browser memory diagnostics enabled through Chrome experimental flags. Chrome was restarted before testing, and each loading strategy was evaluated over ten complete loading cycles. The baseline memory usage of the test environment was approximately 28 MB.
Two strategies were compared. The control group directly loaded the single 1.12 GB GLB model of Dataset B. The experimental group used state-word-driven first-screen D1 tile scheduling and loaded 15 D1-level tiles, with a total physical payload of 22.7 MB.
The values in
Table 7 denote mean observed process-level stable resident memory over ten loading cycles. They were manually recorded from the operating-system task manager and Chrome Task Manager, rather than from JavaScript heap APIs. Stable resident memory values include the frontend tab process and GPU-related browser process; JavaScript heap values were used only as auxiliary references.
5.7.2. Memory Results and Interpretation
Under monolithic loading, the 1.12 GB GLB file had to be decoded into runtime geometry, texture, and GPU buffer resources. After rendering reached a stable stage, the mean stable resident memory remained at approximately 6749.2 MB. After subtracting the 28 MB baseline, the net stable resident overhead was approximately 6721.2 MB, corresponding to a resident expansion ratio of approximately 5.86×.
With state-word-driven first-screen D1 tile scheduling, the loaded first-screen physical payload was 22.7 MB. The mean stable resident memory was approximately 88.1 MB. After baseline subtraction, the net stable resident overhead was approximately 60.1 MB, corresponding to a resident expansion ratio of approximately 2.65× relative to the loaded first-screen payload.
These results provide evidence for first-screen memory control enabled by decoupled tile organization and state-word-driven scheduling. They do not show that the complete full-fidelity dataset can be represented with the same resident memory. The reduction mainly results from loading only visible D1-level tiles and excluding nonessential components, semantic layers, and high-resolution textures from the initial request. A conventional tiled loading strategy may also reduce first-screen payload when the same coarse D1 tiles are selected. The contribution of the proposed framework lies in using compact state words to represent selected B-D-C-T-S states and to support deterministic state switching, constraint checking, and incremental updating across dimensions. Therefore, the proposed mechanism complements tiled data organization rather than replacing it.
5.8. Server-Side Offline Preprocessing and State-Word Compilation Overhead
Although the B-D-C-T-S framework is designed to improve frontend runtime scheduling, it requires offline decoupling and state-word compilation before Web delivery. This section evaluates this offline cost using Dataset B.
The benchmark used the 120 MB no-Draco local-slice source mesh of Dataset B, containing approximately 1,367,441 vertices and 1,694,654 faces. The preprocessing script generated 500 candidate partition seeds and produced 307 valid non-empty sub-meshes. Full-volume preprocessing time for the 1.12 GB baseline model was extrapolated using a scaling factor of 9.3333, derived from the reported file-size ratio between the 1.12 GB full-volume model, treated as 1120 MB for preprocessing extrapolation, and the 120 MB local slice. Therefore, the full-volume values in
Table 8 are extrapolated estimates, not direct measurements on the 1.12 GB dataset.
The benchmark was conducted on a workstation with an Intel Core i7-12700KF CPU and 96 GB of RAM. The experiment was repeated five times using a fixed random seed of 42 to keep the geometric workload consistent. Initial model-loading time was recorded separately and excluded from the phase-level preprocessing total.
As shown in
Table 8, spatial partitioning and virtual monomerization dominated the preprocessing cost, requiring approximately 2.12 ± 0.02 min locally and yielding a full-volume extrapolated estimate of approximately 19.75 ± 0.20 min. Geometric dimensionality reduction, approximate texture-appearance reassembly, and state-word compilation were comparatively lightweight. The total local-slice preprocessing time was approximately 2.18 ± 0.02 min, and the full-volume extrapolated estimate was approximately 20.35 ± 0.21 min, or 0.34 h.
This estimate should not be interpreted as a general scaling law. In larger industrial datasets, spatial partitioning, KD-tree construction, grid cutting, sub-mesh separation, virtual monomerization, and texture-related processing may grow non-linearly because of topology complexity, memory locality, and I/O contention. GB- to TB-scale deployment may require parallel or distributed preprocessing support. In many CIM and large-scale Web3D applications, this cost can often be treated mainly as an offline cost because base urban mesh data are usually updated less frequently than frontend interaction states.
5.9. Discussion
The evaluations show that the B-D-C-T-S scheduling mechanism reduced first-screen requested payload in the tested staged-loading case, improved CPU-side C-field extraction, reduced redundant semantic payloads, supported incremental updating, and supported first-screen memory control under the tested conditions. These results should be interpreted as mechanism-level evidence rather than as a replacement for existing Web3D streaming standards.
Existing tiled streaming mechanisms, including OGC 3D Tiles 1.1, implicit tiling, and glTF-based workflows, provide mature support for spatial hierarchy construction, tile availability representation, view-dependent tile selection, and large-scale scene streaming. The proposed 16-bit state word does not replace these mechanisms. Instead, it provides a compact object-level state-control layer after tiled spatial selection. In a practical pipeline, the tiled hierarchy first identifies visible or relevant spatial regions, and the B-D-C-T-S layer then determines which object-level dimensions should be activated, reused, skipped, incrementally updated, or degraded.
This layered interpretation is important for understanding the reported results. The parsing speedup in
Section 5.3 should be interpreted as an improvement in
C-field extraction and lightweight object-state filtering, not as a full benchmark comparison against 3D Tiles, implicit tiling, or glTF streaming. Similarly, the first-screen memory reduction in
Section 5.7 mainly results from loading only the
D1-level tiles required by the tested initial view. Therefore, the contribution of the proposed framework lies not in replacing tiled loading but in adding compact, constraint-aware, and multidimensional state control after tiled spatial selection.
At the same time, dimensional decoupling may introduce request-management risks. Separating geometry, components, texture-appearance resources, and semantic records into independently addressable units can increase fine-grained requests when many objects or dimensions are activated simultaneously. Under high-latency or request-intensive conditions, excessive request fragmentation may offset part of the benefit gained from reduced payload size. Request coordination, cache reuse, visible-object prioritization, request merging, HTTP/2 multiplexing, and batching can mitigate these effects, but they cannot eliminate latency or request-management overhead.
The proposed method also involves an offline-online trade-off. Offline preprocessing is required before Web delivery, and the full-volume preprocessing values reported in
Section 5.8 were extrapolated rather than directly measured on the complete 1.12 GB dataset. Larger industrial datasets may require parallel or distributed preprocessing support. In addition, constraint-aware degradation can prevent invalid state combinations, but it does not fully eliminate perceptual transition artifacts such as visual popping during rapid refinement.
Overall, the proposed framework should be understood as a compact and constraint-aware object-level state-control layer within tiled Web3D pipelines. Its value lies in supporting selective resource activation and scheduling, Home Tile semantic routing, incremental updating, and first-screen memory control while remaining compatible with existing spatial streaming mechanisms.
6. Summary and Future Work
6.1. Summary
This study addressed large requested payloads, runtime state-parsing overhead, semantic redundancy across tiles, and client-side memory pressure during first-screen loading. It proposed a B-D-C-T-S five-dimensional orthogonal decoupling framework and a 16-bit state-word-driven scheduling mechanism. The framework organizes Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information as separately addressable dimensions, while the state word supports compact representation, fixed-offset bitwise decoding, XOR-based differencing, and constraint-aware degradation.
The controlled experiments verified the core mechanisms. In Dataset A, staged decoupled loading reduced the first-screen requested payload by 93.1% under the tested initial-view setting. For N = 100,000 object-state records, state-word-based C-field extraction achieved an approximately 144-fold speedup over JSON deserialization and C-field lookup. XOR-based differential scheduling requested only the T3-related chunk during a texture-appearance state update. In Dataset B, the Home Tile strategy reduced the total semantic payload by 44.1% across 64 monitored tiles. In the 1.12 GB first-screen memory test, state-word-driven D1 tile scheduling loaded only 22.7 MB of physical payload, with stable resident memory of approximately 88.1 MB and net stable overhead of approximately 60.1 MB. These memory results represent first-screen D1 tile scheduling rather than full-fidelity loading. The preprocessing result was based on a 120 MB local-slice measurement and full-volume extrapolation, yielding an estimated 20.35 ± 0.21 min, or 0.34 h, for the 1.12 GB full-volume model. This estimate should not be interpreted as a general scaling law.
Overall, the proposed method supports object-level state representation, selective resource activation and scheduling, semantic-payload lightweighting, incremental updating, and first-screen memory control. These results support its role as a compact state-control layer within tiled Web3D pipelines.
6.2. Key Innovations
This study introduces three main methodological innovations. First, the B-D-C-T-S framework defines state-level grading rules for Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information. Orthogonality is treated as organizational separability rather than unconstrained executability, because valid states must satisfy spatial-carrier, topological-support, and rendering-logic constraints.
Second, the state-word-based scheduling mechanism represents runtime states with compact 2-byte bit-field instructions. Fixed-offset bitmask operations and XOR-based differencing support fast field extraction, changed-dimension identification, incremental scheduling, and constraint-aware degradation.
Third, the Home Tile strategy reduces semantic redundancy for cross-tile objects through centroid-assigned complete records and lightweight guest-tile pointer records, supporting low-redundancy semantic routing after tiled spatial selection.
6.3. Research Limitations
Several limitations remain. First, the validation focuses on controlled prototype experiments rather than full end-to-end deployment in heterogeneous WebGIS environments. Although the framework complements 3D Tiles- or glTF-based pipelines, this study did not implement a production-level 3D Tiles baseline or a large-scale Cesium-based comparison. The comparison with tiled streaming standards is therefore architectural rather than benchmark-equivalent.
Second, high-latency and low-bandwidth network conditions were not fully evaluated through end-to-end experiments. Fine-grained decoupling can reduce redundant requested payloads, but it may increase discrete chunk requests during rapid navigation. Request batching, cache reuse, visible-object prioritization, request merging, HTTP/2 multiplexing, and batching can mitigate request fragmentation, but they cannot eliminate latency or request-management overhead. Their effectiveness depends on server configuration, round-trip time, chunk granularity, cache hit rate, and user behavior.
Third, the preprocessing cost remains a limitation. The full-volume preprocessing time is an extrapolated estimate and should not be interpreted as a general scaling law. In larger industrial datasets, spatial partitioning, KD-tree construction, grid cutting, sub-mesh separation, virtual monomerization, and texture-related processing may grow non-linearly because of topology complexity, memory locality, and I/O contention. The current serial pipeline may require parallel or distributed support for GB- to TB-scale deployment.
Fourth, visual continuity and cross-platform validation remain insufficient. Constraint-aware degradation can prevent invalid logical states and unsupported component attachment, but it does not eliminate perceptual discontinuities when D1 proxies are replaced by D3 geometry or high-resolution T3 textures. Mobile devices and low-resource clients also impose stricter limits on memory, GPU resources, thermal control, and browser resource management.
6.4. Future Work
Future work should focus on four directions. First, the state-control layer should be integrated into practical 3D Tiles-, glTF-, or Cesium-based pipelines and compared with conventional tiled streaming under consistent scene coverage, camera paths, cache policies, and device conditions.
Second, network-adaptive scheduling should use runtime indicators such as round-trip time, bandwidth fluctuation, request queue length, cache hit rate, and main-thread workload.
Third, preprocessing should move toward scalable parallel and distributed workflows, including multi-process tiling, cluster-based spatial partitioning, GPU-assisted preprocessing, and Spark/MPI-style processing.
Fourth, graphics-pipeline-level transition mechanisms should improve visual continuity through asynchronous alpha blending, progressive texture fading, temporal smoothing, geomorphing, or WebGL transition shaders. Broader evaluations should also include mobile devices, low-memory clients, different browsers, and real WebGIS network environments.