A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS

Zhang, Tong; Shi, Yunfei; Jiang, Wenjie; Lyu, Chunguang; Shi, Shuangshuang

doi:10.3390/ijgi15050215

Open AccessArticle

A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS

by

Tong Zhang

,

Yunfei Shi

^*,

Wenjie Jiang

,

Chunguang Lyu

and

Shuangshuang Shi

School of Resources and Environment, Linyi University, Linyi 276000, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(5), 215; https://doi.org/10.3390/ijgi15050215

Submission received: 15 March 2026 / Revised: 8 May 2026 / Accepted: 18 May 2026 / Published: 19 May 2026

Download

Browse Figures

Versions Notes

Abstract

Large-scale WebGIS visualization of 3D building models is often constrained by large requested payloads, client-side memory pressure, and runtime state-parsing overhead. This study proposes a five-dimensional orthogonal decoupling framework and a 16-bit state-word-driven scheduling method for 3D building models. The Boundary-based Spatial Proxy–Geometric Detail–Component Complexity–Texture Appearance–Semantic Information (B-D-C-T-S) framework organizes model representations into five separately addressable and schedulable dimensions, covering spatial proxies, geometry, components, textures, and semantics. A compact 16-bit structured state word is used to represent runtime states and reduce dependence on repeated text-based state parsing, supporting fixed-offset bitwise decoding, exclusive-OR (XOR)-based differencing, constraint checking, and incremental updating. A centroid-assigned Home Tile strategy is further introduced to reduce redundant semantic payloads for cross-tile objects. The method was evaluated using a single-building BIM model and an urban-scale photogrammetric mesh dataset. Under the tested initial-view setting, staged decoupled loading reduced the first-screen requested payload by 93.1% compared with monolithic loading. State-word-based C-field extraction achieved an approximately 144-fold speedup over JSON deserialization and C-field lookup. The Home Tile strategy reduced the total semantic payload by 44.1% in the semantic-redundancy test. In the 1.12 GB first-screen memory test, state-word-driven D1 tile scheduling loaded only 22.7 MB of physical payload, with stable resident memory of approximately 88.1 MB. These results indicate that the proposed method supports object-level state representation, selective resource activation and scheduling, Home Tile semantic routing, incremental updating, and first-screen memory control within tiled Web3D pipelines.

Keywords:

Web3D; city information modeling; multidimensional decoupling; state-word scheduling; bitwise encoding; Home Tile semantic routing; first-screen memory control; 3D building models

1. Introduction

The rapid development of Digital Twin Cities and City Information Modeling (CIM) has made the organization, integration, processing, and interactive visualization of massive urban spatio-temporal and 3D scene data a key issue for urban digital infrastructure [1,2,3]. In Web-based 3D GIS and BIM-GIS visualization, large-scale 3D geospatial and Building Information Modeling (BIM) data impose constraints on transmission, browser-side memory, computing resources, and rendering performance [4,5]. Efficient visualization therefore depends on data organization and scheduling, including LOD-based model selection, index-based entity loading, and indoor/outdoor scene scheduling [5]. However, conventional LOD specifications usually represent 3D city models through predefined geometry-oriented levels, which may limit the expression of application-specific requirements involving semantics, textures, and task-dependent representation choices [6]. Complete preloading may download assets outside the current view and may exceed graphics memory unless less important assets are unloaded in time; thus, full preloading is inefficient for city-scale Web3D applications under limited bandwidth or graphics-memory constraints [7]. These Web-side challenges become more evident for structurally complex BIM models with many subassemblies, which increase data volume and complicate Web-based organization, scheduling, and rendering [5,8].

CityGML established a standardized semantic 3D city model with predefined Levels of Detail (LODs) [9], while subsequent studies and standards extended this concept toward geometry–semantics decoupling, multi-representation modeling, and indoor–outdoor or application-driven LOD specifications [10,11,12,13,14,15]. These representation-oriented works support fine-grained description of complex 3D city models, but focus mainly on model specification rather than runtime state control. They do not explicitly address runtime encoding of representation dimensions, cross-tile semantic routing, or frequent state scheduling. Meanwhile, massive 3D content is commonly delivered through hierarchical standards such as 3D Tiles, which support streaming, HLOD tile structures, implicit-tiling coordinate addressing, subtree availability, and metadata association [16]. However, if runtime state descriptors are stored in text-based semi-structured formats such as JSON, frequent state queries may incur parsing and field-access overhead [17]. These gaps motivate an object-level runtime state-control layer that coordinates representation dimensions, semantic payloads, and scheduling decisions after tiled spatial selection.

Accordingly, this study addresses two research questions:

RQ1: In city-scale Web3D scenes, how can a multidimensional discrete state framework support separately addressable scheduling of spatial, geometric, component, texture, and semantic information while reducing repeated text-based runtime state parsing through compact encoding under frequent state switching?

RQ2: For continuous reality-based mesh scenes and cross-tile building objects, how should multidimensional geometry, component, texture, and semantic information be organized and routed to support fine-grained on-demand scheduling, reduce redundant loading and memory pressure, mitigate Out of Memory (OOM) risk, and maintain topological consistency and semantic continuity where applicable?

To address these questions, this study proposes a multidimensional state-control architecture linking model representation, object-level data organization, and dynamic scheduling. Based on multidimensional decoupling, the architecture encodes object states into compact bit-field representations and reduces repeated text-based runtime state parsing, aiming to support large-scale state indexing, runtime switching, and incremental updating. Specifically, this study constructs a B-D-C-T-S five-dimensional orthogonal decoupling state model for Web3D building models, abstracting Boundary-based Spatial Proxy (B), Geometric Detail (D), Component Complexity (C), Texture Appearance (T), and Semantic Information (S) into separately addressable and schedulable state dimensions. A 16-bit structured state word is then designed as a compact runtime scheduling descriptor to support fixed-offset state parsing, selective resource activation, and Home Tile semantic routing after tiled spatial selection. By integrating XOR-based differential updating with constraint-aware degradation, the architecture supports incremental state updating, invalid-state suppression, and memory-risk control.

This study does not replace tiled delivery standards such as 3D Tiles or glTF-based Web3D pipelines. Instead, it introduces a lightweight object-level state-control layer integrated with tiled spatial indexing. While current tiled streaming frameworks mainly address spatial hierarchy, visibility-driven loading, and file organization, the proposed B-D-C-T-S framework focuses on multidimensional runtime state representation, fixed-offset state parsing, selective resource activation, and incremental state updating.

The main contributions are as follows:

A B-D-C-T-S state framework organizes Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information as separately addressable scheduling dimensions.
A 16-bit structured state word supports compact state representation, fixed-offset bitwise decoding, XOR-based differencing, and constraint-aware runtime degradation.
A Home Tile semantic routing strategy is introduced to reduce redundant semantic payloads for cross-tile objects by storing complete semantic records in Home Tiles and lightweight pointer records in guest tiles.

The remainder of this paper is organized as follows. Section 2 reviews multidimensional LOD decoupling, tiled Web3D streaming, runtime state-control gaps, and cross-tile semantic organization. Section 3 introduces the B-D-C-T-S five-dimensional decoupling model and data-organization methods. Section 4 describes how the five-dimensional states are encoded into a 16-bit state word and how this encoding supports XOR-based incremental updating and constraint-aware degradation. Section 5 evaluates the proposed mechanisms using a single-building BIM dataset and an urban-scale photogrammetric mesh dataset, covering local-slice measurements, first-screen scheduling tests, and full-volume extrapolated estimates. Section 6 summarizes the findings, discusses the limitations of the current prototype, and outlines future work.

2. Related Work

This section reviews related studies from four perspectives: the evolution of 3D city model representation from fixed LOD to multidimensional decoupling, tiled Web3D delivery mechanisms, runtime state-control gaps in multidimensional scheduling, and cross-tile semantic organization with dynamic representation management.

2.1. 3D City Model Representation from Fixed LOD to Multidimensional Decoupling

Level of Detail (LOD) is a core mechanism for controlling the complexity of 3D city models. In earlier CityGML versions, model representations were organized through five discrete Levels of Detail (LODs) [9]. However, the fixed LOD structure also has clear limitations. Because geometry and semantics are usually bundled within predefined levels, it is difficult to adjust one representational aspect without affecting the others. This leads to relatively coarse control over model content and weak adaptability when different applications require different combinations of geometric, semantic, and contextual information [10,11,12,14,15]. These concerns have encouraged researchers to move beyond a single fixed LOD hierarchy and to explore more decoupled forms of model representation.

Several studies have responded to this problem by extending the original LOD concept. These include enhanced LOD definitions, multi-representation models, indoor–outdoor LOD specifications, and application-driven LOD paradigms [10,11,14,15]. Another related approach separates geometric detail from semantic detail and further distinguishes interior and exterior characteristics [12]. CityGML 3.0 also reflects this shift at the standard level by placing geometric representation in the Core module, so that thematic modules can inherit spatial representations instead of defining their own geometry independently [13]. These studies and standards provide an important basis for the multidimensional representation of complex 3D city models.

Nevertheless, most LOD-related work focuses on defining representational dimensions rather than organizing them as compact, computable, and schedulable runtime states in Web environments. Existing Web3D studies have explored tile-based multi-representation personalization, rule-based scene graph generation, and selective CityGML loading under mobile resource constraints [18,19]. However, they do not explicitly provide a compact state-level mechanism for object-level runtime organization, cross-scale scheduling decisions, or incremental switching of multidimensional representations. Thus, the translation from multidimensional representation to Web3D runtime state organization remains insufficiently addressed.

2.2. Web3D Streaming Standards and Tiled Delivery Mechanisms

Large-scale 3D spatial data in Web environments are commonly delivered through tiled hierarchical structures, as reflected in OGC 3D Tiles. 3D Tiles defines hierarchical spatial data structures and tile formats for streaming massive heterogeneous 3D geospatial content, supporting hierarchical spatial organization, HLOD/SSE-based refinement, and metadata organization [16]. It has also been applied to efficient Web-based visualization of complex BIM models [8]. In such pipelines, glTF serves as an API-neutral runtime asset format for meshes, materials, textures, and binary buffers [20], while Draco compresses 3D meshes and point clouds to improve storage and transmission efficiency [21].

At the delivery level, 3D Tiles has become a mature basis for organizing large-scale Web3D content. Its 1.1 specification uses Implicit Tiling to encode quadtree and octree structures compactly, enabling tile-coordinate addressing and subtree-level availability management [16]. The earlier 1.0 specification introduced Feature Tables and Batch Tables for feature-level property organization; Batch Tables, in particular, associate application-specific attributes with individual features and support declarative styling or other application operations [22,23]. In practical rendering workflows, these structures are combined with tileset JSON or index files, bounding volumes, geometric errors, refinement rules, and tile payloads to guide view-dependent loading [22,24]. SRC addresses a related but different problem by emphasizing progressive transmission of meshes and textures [25]. For BIM-oriented Web3D scenes, semantic lightweighting, scene indexing, and real-time scene management have also been used to reduce rendering and management pressure when complex building models are visualized on the Web [26].

These standards and methods mainly address spatial indexing, tile selection, geometry transmission, metadata association, and resource loading. However, they remain primarily delivery- and visualization-oriented, and do not explicitly define compact object-level state-control mechanisms for independent switching of representation dimensions.

2.3. Runtime State-Control Gap in Multidimensional Web3D Scheduling

Although tiled streaming provides a robust spatial delivery foundation, object-level state control in multidimensional Web3D scenes remains less explicitly addressed. Representative tiled-scheduling and Web3D lightweighting pipelines mainly rely on camera/view-frustum visibility, geometric error or SSE, tile payload selection, and progressive loading or unloading [24,26], while progressive transmission formats such as SRC focus on mesh and texture delivery [25]. These mechanisms are effective for general visualization, but they do not explicitly coordinate geometry, components, textures, and semantics as compact object-level runtime states.

Runtime state representation is another challenge. If object attributes or state descriptors are encoded as text-based semi-structured metadata such as JSON, frequent switching across many objects may repeatedly trigger parsing, validation, or field-access operations, introducing overheads similar to those reported for JSON processing systems [17,27]. HTTP/2 multiplexing can improve network resource utilization and reduce latency through concurrent exchanges over a single connection [28], but it does not define compact object-level state representations or decide which multidimensional resources should be activated, reused, or skipped.

Multidimensional state transitions also involve cross-resource dependencies. Component loading may depend on the currently selected geometric representation, texture activation may require a corresponding spatial carrier, and semantic retrieval may require routing support when object information is not stored with the currently rendered tile. Existing tiled mechanisms can stream and index massive 3D content efficiently, but they do not provide a compact runtime instruction format for validating, comparing, and incrementally updating multidimensional representation states. This gap motivates the proposed state-word mechanism, which encodes the B-D-C-T-S state, namely Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information, as a fixed-length 16-bit word to support fixed-offset bitwise parsing, constraint validation, XOR-based differencing, and selective resource scheduling after tiled spatial selection.

2.4. Cross-Tile Semantic Organization and Dynamic Representation Management in Large-Scale Web3D Scenes

Cross-tile semantic organization remains important because spatial partitioning and hierarchical tiling can reorganize geometry and object-level attributes. For example, IFC-to-3D Tiles conversion can decompose IFC models, convert geometry, generate b3dm content, and incorporate component attributes through JSON records and batch-table structures [29]. Spatial partitioning may split city objects across tile boundaries and compromise building integrity [18]. In BIM-to-3D Tiles workflows, preserving rich semantic attributes can also substantially increase intermediate JSON records and final tile payloads, which may aggravate transmission, memory, and rendering burdens in Web-based visualization [30]. CityGML-derived mobile visualization faces related constraints because large semantic 3D city models must be selectively loaded under limited storage, memory, processing, and network conditions [19]. Although prior studies address Web-based BIM visualization, semantics-guided lightweighting, and IFC-to-3D Tiles conversion [26,29,30], they do not explicitly provide low-redundancy semantic organization with cross-tile routing.

Dynamic Web3D applications, including digital-twin use cases, also require incremental updates across changing object representations. Non-incremental or one-time loading may transmit and process unnecessary model, animation, or LOD data, increasing network payload, loading latency, and client-side computation [31,32]. Geometry compression can reduce mesh or point-cloud payload size [21]; however, it does not decide state transitions among geometry, component, texture, or semantic resources. Large-scale Web3D visualization therefore needs to balance loading latency, decompression cost, rendering performance, multi-level LOD scheduling, and spatial indexing to maintain interactive frame rates [33,34].

Together, existing work advances Web3D representation, tiled streaming, metadata association, compression, and large-scale visualization, but lacks an integrated mechanism for compact multidimensional state expression, low-redundancy cross-tile semantic routing, and runtime consistency control under frequent state switching. This study therefore proposes a B-D-C-T-S state-control architecture that complements tiled Web3D delivery through object-level state encoding, selective resource scheduling, Home Tile semantic routing, and XOR-based incremental updating.

3. A Five-Dimensional Orthogonal Decoupling State Framework for Building Models

3.1. Formal Definition of the Framework

To support on-demand scheduling and fine-grained organization of 3D building models in web environments, this study defines the B-D-C-T-S framework as a five-dimensional orthogonal decoupling state model. The five dimensions are Boundary-based Spatial Proxy (B), Geometric Detail (D), Component Complexity (C), Texture Appearance (T), and Semantic Information (S). Here, B denotes boundary-related spatial proxies, such as points, bounding boxes, and mesh proxies, rather than a conventional geometric LOD level. By combining these dimensions, building models can be reorganized into separately addressable data units for flexible transmission and runtime scheduling.

The candidate state space of the framework can be expressed as:

F = B \times D \times C \times T \times S,

(1)

where each dimension is discretized into multiple levels according to its specific role in representation and scheduling. A model state can therefore be represented as an ordered five-tuple:

x = ⟨ b_{i}, d_{j}, c_{k}, t_{m}, s_{n} ⟩,

(2)

where

b_{i} \in B

,

d_{j} \in D

,

c_{k} \in C

,

t_{m} \in T

and

s_{n} \in S

.

Figure 1 illustrates the relative configuration of two example states across the five discrete dimensions; the radial values indicate graded state levels rather than directly comparable continuous quantities. An analytical state such as ⟨2,1,0,1,3⟩ is oriented toward macro-scale computation and semantic analysis, whereas a display state such as ⟨2,3,2,3,1⟩ places greater weight on geometric detail and visual fidelity. The contrast between the two shows that the five dimensions can be configured separately rather than following a single linear progression. Thus, the framework supports analytical and presentation-oriented tasks by composing different dimension levels, rather than by enforcing a single universal fidelity scale.

The meanings of the five dimensions are defined as follows.

B (Boundary-based Spatial Proxy): defines the proxy form through which an object participates in spatial indexing and geometric computation. This dimension determines how the object enters spatial queries, such as view-frustum culling, and coarse geometric operations, such as bounding-box collision detection. It serves as the spatial entry point for runtime scheduling.

D (Geometric Detail): describes the degree to which the host geometry approximates the reference geometry. This dimension controls the complexity of the primary mesh and directly affects visual accuracy as well as the feasibility of geometry-dependent computation.

C (Component Complexity): describes the retention level of attached or embedded components, such as doors, windows, railings, and mechanical equipment. This dimension is separated from the host geometry at the data-organization level and allows components to be loaded or removed selectively for different tasks. However, embedded elements such as doors and windows remain subject to the topological dependency constraints of the host geometry.

T (Texture Appearance): describes the material and texture representation of surfaces. This dimension regulates loading from a single color to multi-channel physically based rendering (PBR) materials, mainly affecting graphics memory usage and requested payload.

S (Semantic Information): describes the depth of association between non-geometric attributes and entities. This dimension controls the loading of information from unique identifiers to complete attribute sets and further to external knowledge bases or business databases, thereby supporting query and analysis.

In this study, orthogonality means that the B, D, C, T, and S dimensions are separable in data organization, state encoding, transmission organization, and request addressing. Each dimension can be configured and stored separately, enabling task-specific state vectors. For example, an analytical state may preserve great geometric detail while omitting textures. Thus, overall model fidelity is reformulated as separately controllable discrete variables.

However, orthogonality does not make every B-D-C-T-S combination valid or executable at runtime. The admissible state space is constrained by retrievable spatial-carrier requirements, topological-support conditions, and rendering-logic and semantic-consistency constraints, as defined in Section 3.3. The framework is therefore a decoupled state-organization model with runtime validity constraints, not an unconstrained Cartesian product of all dimension levels.

3.2. Quantitative and Rule-Based Grading Definitions of Dimensions

Building on the five-dimensional orthogonal framework, this section further defines quantitative grading criteria for each dimension. To meet the engineering requirements of web-based streaming and on-demand rendering, continuous model characteristics are discretized into a complete five-dimensional grading matrix. Figure 2 illustrates the hierarchical progression of the five dimensions and their corresponding proxy or representation forms.

3.2.1. Dimension B (Boundary-Based Spatial Proxy): Grading Based on Proxy Complexity

The B-dimension describes the proxy forms in which an object participates in spatial computations. Based on the complexity of the proxies, four levels are defined:

B0 (Null Proxy): The object is not assigned an active spatial proxy for runtime spatial computation or geometric evaluation. This level is used for logical hiding or non-spatial background records. If object-level semantic or appearance information is activated in a runtime scene, the spatial-entity constraint in Section 3.3 requires a retrievable spatial carrier, namely B > B0.

B1 (Point Proxy): The object is represented by its centroid or geometric center as the calculation node. As it retains only minimal spatial information, this level is highly suitable for large-scale spatial indexing, clustering tasks, and macro-level situational analysis.

B2 (Bounding-Box Proxy): The object is approximated using an axis-aligned bounding box (AABB) or an oriented bounding box (OBB). This level maintains a high computational efficiency while preserving the basic spatial extent, making it appropriate for view-frustum culling and coarse collision detection.

B3 (Mesh Proxy): A low-polygon mesh or convex hull is used to more closely fit the object’s boundary. This level supports more detailed shadow analysis, occlusion analysis, and high-precision collision detection.

3.2.2. Dimension D (Geometric Detail): Grading Based on Geometric Approximation Error

The D dimension measures the difference between a simplified model and the reference geometry using the Hausdorff distance

d_{H} (\cdot, \cdot)

. According to approximation accuracy, four levels are defined.

D0 (Base Projection): Only the closed two-dimensional projected footprint is retained. Because height information is absent, the three-dimensional Hausdorff distance is not applicable; instead, a two-dimensional contour-similarity constraint is used.

D1 (Prismatic Aggregation): The building mass is represented by a combination of flat-roofed prisms, while vertical height differences below a prescribed threshold are ignored. This level preserves coarse volumetric characteristics and is suitable for first-screen placeholder rendering and macro-scale massing analysis.

D2 (Generalized Host Geometry): The host geometry is simplified into a medium-precision mesh that preserves the major shape of the building envelope while suppressing local surface fluctuations. The approximation must satisfy

d_{H} (M_{D 2}, M_{r}) \leq τ_{D 2},

(3)

where

M_{r}

denotes the reference geometry and

τ_{D 2}

is the tolerance threshold for medium-precision representation.

D3 (Refined Host Geometry): The host geometry retains high-precision geometric characteristics and is suitable for detailed visual inspection as well as accurate topological support for embedded components. The approximation must satisfy

d_{H} (M_{D 3}, M_{r}) \leq τ_{D 3}, τ_{D 3} < τ_{D 2} .

(4)

3.2.3. Dimension C (Component Complexity): Grading Based on Visibility and Shape Characteristics

The C dimension regulates the retention of attached or embedded components. Since volume alone may incorrectly discard small but visually important elements, both the characteristic size and the shape factor are introduced.

Let

l_{c}

denote the characteristic size of a component,

ϕ_{c}

denote its shape factor,

τ_{v}

the minimum visibility threshold, and

τ_{s}

the threshold for identifying slender components.

According to these criteria, four levels are defined.

C0 (No Components): all attached and embedded components are omitted. Only the host geometry is retained.

C1 (Salient Components Only): only components satisfying the minimum visibility condition are retained, namely

l_{c} \geq τ_{v} .

(5)

This level preserves visually dominant components while filtering out small and non-essential details.

C2 (Salient and Slender Components): in addition to the visible components retained at C1, slender components that are visually important despite their small size are also preserved. A component is identified as a prominent, slender element when

l_{c} < τ_{v} and ϕ_{c} \geq τ_{s} .

(6)

This level is suitable for structures such as railings, mullions, or similar elongated elements.

C3 (Full Components): all attached and embedded components are retained. This level is intended for detailed inspection and full-detail presentation.

It should be noted that the C dimension governs the retention of attached or embedded components on the host geometry, whereas the D dimension governs the approximation level of the host geometry itself. The two dimensions are therefore separated at the data-organization level but remain subject to topological dependency constraints at runtime.

3.2.4. Dimension T (Texture Appearance): Grading Based on Visual Fidelity and Resource Cost

The T dimension regulates the material and texture representation of surfaces. According to appearance fidelity, four levels are defined.

T0 (No Texture): no texture or material information is loaded, and the object is rendered with a default monochrome shader.

T1 (Procedural Color): only lightweight appearance categories are retained, such as functional or semantic class colors. At this level, semantic attributes from the S dimension may also be used to support procedural color mapping.

T2 (Baked Texture): medium-resolution texture atlases or baked material maps are loaded. This level supports improved visual realism while controlling graphics memory consumption.

T3 (Full PBR Texture): complete multi-channel physically based rendering materials are loaded, including albedo, normal, roughness, metallic, and other associated maps where available. This level prioritizes visual fidelity and is suitable for close-up inspection and presentation-oriented applications.

3.2.5. Dimension S (Semantic Information): Grading Based on Association Depth

The S dimension regulates the loading depth of semantic information. According to the degree of semantic association, four levels are defined.

S0 (No Semantics): no semantic information is loaded.

S1 (Identifier Only): only a unique identifier is retained, supporting minimal indexing and object-level retrieval.

S2 (Local Attributes): local semantic attributes, such as category, material class, function, and status, are retained, making this level suitable for routine queries and statistical analysis under offline or low-latency conditions.

S3 (Associated Semantics): lightweight pointers, such as Uniform Resource Identifiers (URIs), are used to link external knowledge bases or business databases. This level enables on-demand retrieval of large-scale heterogeneous data during runtime while reducing the initial requested payload.

The grading scheme above discretizes continuous features for implementation purposes. In practical applications, parameters such as the Hausdorff-distance tolerance

τ_{D 2}

,

τ_{D 3}

, the minimum visibility threshold

τ_{v}

, and the slenderness threshold

τ_{s}

should be configured according to the target scenario. For city-scale CIM planning scenarios, these thresholds may be relaxed. For building-scale BIM inspection tasks, by contrast, they should be tightened to remain consistent with the required delivery standards.

3.3. Valid State Constraints

Although the five dimensions are defined as organizationally separable, physical rules and rendering logic impose specific dependencies to ensure topological validity and semantic consistency. The following constraints are therefore introduced and are used both to check illegal state combinations before delivery and to govern runtime degradation.

Spatial entity constraint. If semantic or appearance information is activated, the object must still possess a retrievable spatial carrier. Formally, if

t_{m} > T 0 or s_{n} > S 0,

(7)

then

b_{i} > B 0

(8)

must also hold. This ensures that semantic or material information is not attached to an entity without a locatable spatial representation.

Component topology dependency constraint. Component loading must satisfy the supporting conditions of the host geometry. For attached components, such as billboards or surface-mounted details, a continuous attachment surface is required. The host geometry should therefore reach at least D2; otherwise, floating or intersecting artifacts may occur. For embedded components, such as doors and windows, topological cutting and interface matching are involved. The host geometry should therefore reach D3. Forced loading on low-precision meshes can easily lead to topological mismatch and depth conflict, such as Z-fighting. In simplified runtime validation, this rule can be expressed as C > C0 ⇒ D ≥ D2, with embedded components further requiring D ≥ D3 when topological cutting is involved.

Visual compensation under low geometric precision. When only medium-precision geometry such as D2 is available, but door and window details still need to be presented, high-precision details may be baked into two-dimensional textures in advance, corresponding to T2. In this case, the appearance can be restored through texture mapping while keeping

C = C 0

. This strategy preserves topological validity while balancing visual fidelity and transmission efficiency.

These constraints indicate that the orthogonal state framework does not imply arbitrary state combinations. Rather, valid combinations must satisfy runtime feasibility, topological support, and semantic consistency.

3.4. Applicability Boundaries and Extensibility of the Framework

The proposed framework is primarily intended for the lightweight organization, transmission, and scheduling of 3D building models in web environments. It is particularly suitable for scenarios requiring multiscale representation, on-demand loading, and dynamic switching among geometric, visual, and semantic states.

For highly continuous reality-based meshes, such as oblique photogrammetric models, the framework can still be applied, but additional preprocessing is required. In such cases, virtual monomerization may be used to generate virtual monomers from continuous surfaces so that the resulting entities can be organized and scheduled under the same five-dimensional framework.

Beyond its immediate scheduling role, the architecture provides a basis for future extension. Developers can adjust grading rules within a single dimension to meet future application requirements or data specifications without changing the overall state-control logic. Because the dimensions remain organizationally separated under the validity constraints defined above, extending or replacing one layer does not necessarily require redesigning the entire scheduling engine. This separation facilitates compatibility with CIM workflows and complex Web3D streaming pipelines.

4. State-Word Encoding, Chunk Routing, and Incremental Updating

4.1. 16-Bit Structured State Word for Multidimensional Scheduling

To transform the 5D orthogonal state framework into a lightweight runtime scheduling descriptor, this study encodes the model state into a compact 16-bit structured state word. The objective is to reduce dependence on repeated text-based runtime state parsing through fixed-width bit fields and bitwise operations, thereby improving state indexing, decoding efficiency, and scheduling responsiveness in large-scale Web environments.

4.1.1. Bit-Field Mapping Strategy

The bit allocation of the state word is illustrated in Figure 3. Specifically, 2 bits are allocated to the B dimension, 2 bits to the D dimension, 3 bits to the C dimension, 3 bits to the T dimension, and 3 bits to the S dimension, with the lowest 3 bits reserved as an extension field (R). This bit-field layout follows a high-order-bit priority principle, placing spatial and geometric information, which is commonly evaluated early in scheduling, in the higher-order bits, while positioning auxiliary and extension information in the lower bits. Although the current implementation defines four levels for C, T, and S, their 3-bit fields reserve values 4–7 for future sub-levels or application-specific extensions; undefined values are treated as invalid unless explicitly registered. Through bitmasks and bit-shift operations, this design supports fast extraction of state fields and reserves limited extension capacity for future CIM applications.

The state word is therefore encoded as:

W = (B ≪ 14) ∣ (D ≪ 12) ∣ (C ≪ 9) ∣ (T ≪ 6) ∣ (S ≪ 3) ∣ R

(9)

where ≪ denotes the bitwise left shift operation, and | represents the bitwise OR operation.

4.1.2. Reserved Bits and Architectural Extensibility

The R field is not one of the five B-D-C-T-S representation dimensions. In the current implementation, the lowest R bit is used as a runtime exception flag, for example, to mark constraint-aware degradation after invalid-state interception. The remaining R-bit patterns are reserved for future extensions, such as overflow flags, special scheduling instructions, or flags for extended semantic handling.

Therefore, the current 16-bit state word should be regarded as a compact baseline encoding scheme for this scheduling architecture. If future CIM applications require greater representational capacity, predefined R-bit patterns can be used to activate extended encoding paths. In this sense, the R field supports extensibility at the encoding level while preserving the current fixed-offset decoding path for the baseline format.

4.1.3. Encoding Example

Consider the state ⟨B3, D3, C2, T3, S2⟩ as an example. According to the predefined bit-field mapping rules, the corresponding binary values for the B, D, C, T, S, and R fields are 11, 11, 010, 011, 010, and 000, respectively. By concatenating these fixed-length fields in sequence, the complete 16-bit binary sequence is obtained:

1111010011010000

Its corresponding hexadecimal value is 0xF4D0. This indicates that complex multidimensional building states can be directly encoded into fixed-length integers, facilitating compact Web-based state indexing and lightweight state representation.

4.1.4. Bitwise Decoding Mechanism

Once a state word is available at runtime, the client-side parsing engine extracts each dimension through right-shift and mask operations. For instance, the texture appearance field (T) is extracted as:

T = (W ≫ 6) & 0 x 07

(10)

where >> denotes the bitwise right shift operation, & denotes the bitwise AND operation, and 0x07 (binary 0111) is the 3-bit mask for extracting the T field.

The same O(1) field-extraction logic applies to the B, D, C, and S fields because their bit-field offsets are fixed. Therefore, state decoding avoids string deserialization, dynamic field traversal, and repeated object-property lookup. This makes the state-word mechanism suitable for high-frequency field extraction and lightweight object-state filtering in large-scale interactive Web3D scenarios.

4.2. Data Chunk Organization, Semantic Routing, and Runtime Request Coordination

To support runtime scheduling of encoded state words, the B-D-C-T-S framework maps the five state dimensions onto an external tiled spatial index and an internal object-level state-control layer. The tiled index handles spatial partitioning, visibility-driven tile selection, and coarse-grained Web delivery, whereas the internal layer handles state-word decoding, target-state validation, selective resource activation, semantic routing, and incremental updating. Figure 4 shows the two-stage workflow, and Figure 5 illustrates runtime resource routing and Home Tile-based semantic routing.

4.2.1. Data Chunk Organization and Integration with Tiled Delivery Standards

The B-D-C-T-S framework follows a two-layer organization strategy. The first layer is an external tiled spatial index, which can be coupled with existing tiled delivery mechanisms, such as a 3D Tiles-compatible hierarchy with glTF-based payloads. This layer identifies tiles and objects relevant to the current view. The second layer is an internal object-level state-control layer, which operates after tiled spatial selection and determines whether a target state is valid, whether degradation is required, and which resources should be loaded, updated, reused, or skipped.

In this design, the B dimension, namely Boundary-based Spatial Proxy, is linked to but not identical to the external tile-based spatial index. The tiled index narrows the spatial search range, while B-level proxies provide object-level representations for visibility culling, localization, and coarse spatial computation inside selected tiles. The D, C, T, and S dimensions are mapped to separately addressable resources or records: D stores host geometry, C stores attached or embedded components, T stores materials and textures, and S stores semantic attributes or semantic pointers.

During offline preprocessing, the original BIM model or photogrammetric mesh is spatially partitioned, tiled, and decomposed into D/C/T chunks and S-level semantic records or pointers. A 16-bit state word is assigned to each schedulable object to describe its B-D-C-T-S configuration. During runtime, the tiled index first selects visible tiles, after which the internal state-control layer decodes relevant state words, validates constraints, computes state differences when needed, and schedules only the required resources for WebGL reassembly.

The proposed framework is not intended to replace existing tiled delivery standards. Existing tile hierarchies remain responsible for spatial indexing and coarse-grained streaming, whereas the proposed state word controls fine-grained object-level scheduling after tiled spatial selection. Thus, macroscopic semantic analysis can retrieve lightweight B-level proxies and S-level semantic resources, while presentation-oriented viewing can activate refined D-chunks, detailed C-chunks, and high-fidelity T-chunks.

4.2.2. Home Tile Strategy for Cross-Tile Semantic Routing

Low-redundancy semantic organization is important for cross-tile objects in large-scale Web3D scenes. When a building or component spans multiple tiles, conventional spatial partitioning may duplicate detailed semantic metadata in all intersecting tiles, increasing semantic payloads and client-side memory pressure.

To reduce this redundancy, each cross-tile object is assigned one Home Tile that stores its complete semantic payload. Other intersecting tiles, referred to as guest tiles, retain only lightweight semantic pointers. In the current implementation, the Home Tile is selected as the tile containing the object centroid. If the centroid lies on a tile boundary, the tile with the smallest identifier is used as a deterministic tie-breaker. Each semantic pointer stores only the object identifier, Home Tile identifier, and URI or relative path of the semantic payload.

After state-word decoding, the D, C, and T fields trigger geometry, component, and texture resource routing, whereas the S field controls semantic routing. When S = S1, identifier-level semantics can be embedded in the local object record or host geometry payload without an independent semantic request. When S > S1, an independent semantic branch is activated. If the current tile is the object’s Home Tile, the complete semantic payload is retrieved locally; otherwise, the loading engine follows the semantic pointer and redirects the request to the corresponding Home Tile.

The Home Tile strategy preserves semantic continuity while avoiding full duplication of detailed attributes across fragmented tiles. It does not alter the external spatial tile hierarchy, but operates as an object-level semantic routing mechanism within the internal state-control layer.

4.2.3. Runtime Request Coordination and Chunk Aggregation

The decoupled organization of D/C/T chunks and S-level semantic records enables selective loading, but it may also generate many fine-grained requests when multiple objects or dimensions are activated simultaneously. Under high-latency conditions, excessive request fragmentation may offset part of the benefit gained from reduced payload size. Therefore, after state-word decoding and target-state validation, request coordination is applied before resource requests are submitted.

The scheduler first compares the current and target state words, and only the fields marked by the XOR result are treated as update targets. If the corresponding geometry, component, texture, or semantic resource has already been loaded and its state field is unchanged, the cached copy is reused. Cached resources are indexed by object ID, resource type, state level, and tile ID, allowing unchanged chunks to be retrieved without repeating the full request process.

Runtime priority is then assigned according to view relevance. Resources associated with visible tiles and selected objects are requested first, while requests for non-visible objects or degraded states can be postponed or skipped. When multiple requests generated in the same animation frame refer to the same tile, object, or server source, they may be merged into fewer same-origin requests where possible. This does not alter the state model itself, but reduces avoidable fragmentation caused by overly fine-grained chunk separation.

HTTP/2 multiplexing and batching are used only as auxiliary mechanisms. They can improve request handling by transmitting concurrent same-origin requests over one connection and reducing repeated headers when server and cache configurations allow it. However, they cannot eliminate network latency or guarantee the same benefit under all navigation patterns. Their effect depends on round-trip time, server behavior, cache hits, chunk size, and user navigation. Therefore, this part of the framework should be understood as request coordination for reducing avoidable fragmentation, not as a complete solution to high-latency network conditions.

4.3. Incremental Updating and Runtime Consistency Assurance

In addition to initial loading, dynamic Web3D and digital-twin applications require frequent runtime state transitions. Under such conditions, avoiding the re-requesting or reloading of unchanged resources and maintaining stable rendering behavior become equally important. This section therefore focuses on XOR-based differential updating and the control logic used to keep runtime states valid and consistent.

4.3.1. XOR-Based Differential State Updating

To avoid reloading unchanged resources during state switching, XOR-based differencing is adopted. Let the current state word be

W_{c}

and the target state word be

W_{t}

. The differential word is defined as:

Δ W = W_{c} \oplus W_{t}

(11)

where

\oplus

denotes bitwise XOR.

The resulting differential word records only the bit positions that differ between the current and target states. Changed dimensions are identified by applying the corresponding field masks to

Δ W

, after which the affected B-level proxy records, D/C/T chunks, or S-level semantic records can be selectively requested. Figure 6 illustrates a simple case in which only the T dimension changes; accordingly, the XOR result isolates the corresponding bit positions in the differential word. In this situation, only the corresponding T-related chunk needs to be updated, whereas the B, D, C, and S fields remain unchanged.

For continuous geometric deformation at the vertex level, the B-D-C-T-S state word generally remains unchanged. Such operations are therefore handled by the rendering or animation layer rather than by the state-scheduling layer. In this way, the state framework remains focused on multidimensional representation transitions rather than low-level mesh animation.

4.3.2. Deferred Reassembly and Runtime Redraw

After the changed dimensions have been identified, the affected chunks are not always reassembled immediately. Reassembly is instead postponed to a later redraw stage, where the target representation is rebuilt only after the required chunks have been prepared. This avoids repeated intermediate reconstruction during rapid successive state changes and also allows dependency conditions to be checked before the redraw is committed. The benefit is particularly clear when several dimensions change at the same time, for example, when geometry, components, and textures need to be updated in coordination.

4.3.3. State Constraint Validation and Safe Degradation

Before being formally submitted for reassembly, the target state is evaluated against the valid state constraints. Typical invalid state conflicts during Web3D navigation include: (1) requesting component-level loading (C > C0) when the precision of the current host geometry is insufficient to provide topological support; (2) requesting texture-level rendering (T > T0) when no retrievable spatial carrier or host geometry is available; and (3) requesting embedded components that require precise topological cutting when the host geometry has not reached the specified precision, such as D3.

Once these invalid transitions are detected, the system intercepts the original instruction and triggers the degradation logic, mapping the target state to a valid executable state supported by the currently available resources. For instance, if embedded components are requested before the host geometry reaches D3, the system clears the component field to C0. When a pre-baked texture representation encoding component appearance is available, T2 can be used as a visual compensation layer instead of loading topologically unsupported components. This strategy helps avoid invalid topological combinations while preserving feasible visual detail under the current resource conditions.

5. Validation of Core Mechanisms and Performance Analysis

Based on the B-D-C-T-S five-dimensional orthogonal decoupling framework and the bitwise state mechanism proposed above, this section presents mechanism-level evaluations of the method. A controlled server–client test environment was established, and two datasets were used: a single-building model (Dataset A) and an urban-scale photogrammetric mesh scene (Dataset B). The evaluation covers transmission-level separation, state parsing, incremental updating, cross-tile semantic routing, first-screen memory control, and preprocessing overhead.

5.1. Experimental Design and Environment Setup

5.1.1. Experimental Objectives

The experiments were designed to evaluate the proposed B-D-C-T-S state-control architecture from five perspectives: transmission-level separation during staged loading; parsing efficiency and allocation tendency of the 16-bit state word compared with JSON-based records; XOR-based differential updating and runtime degradation under abnormal-state injection; computational behavior of the spatial proxy dimension (B) in macro- and micro-scale tasks; and cross-tile semantic routing, first-screen memory control, and offline preprocessing overhead.

5.1.2. Experimental Environment and Tools

The experimental platform consisted of an Intel Core i7-12700KF CPU, 96 GB of RAM, and an NVIDIA GeForce RTX 3080 GPU. Web-based evaluations were conducted in Google Chrome (64-bit) with WebGL hardware acceleration enabled. Frontend behavior was assessed using browser developer tools, custom JavaScript timers, and operating-system-level process monitoring.

5.1.3. Experimental Datasets

Two datasets were used. Dataset A, a single-building BIM model derived from an official Revit sample, was used for mechanism-level validation of dimensional decoupling, staged loading, and state switching. The original full GLB model was 8.31 MB. It was preprocessed in Blender 4.5 and decoupled into separately addressable geometry, component, and texture-related packages. The first-screen D1 package was operationalized as a coarse geometric proxy using the Decimate modifier with a fixed retained-face ratio of 5%, whereas D3 retained the refined host geometry. Selected fine-detail objects removed from the host geometry were exported as C-Chunks, and extended material and texture resources used to upgrade the resident baseline T state to T3 were exported as T-related chunks.

In this evaluation, the theoretical minimum-visibility and slenderness-ratio thresholds described in Section 3.2.3 were not varied as independent experimental variables. Therefore, the benchmark results reflect this fixed Blender-based decimation and chunk-separation setting rather than a sensitivity analysis over component-culling thresholds. Figure 7 contrasts the D1 coarse model with the reconstructed model after D3, C, and T additions.

Dataset B, a Berlin 3D photogrammetric mesh, was used for scalability-related evaluation under large-scale Web3D conditions. It consisted of a 120 MB uncompressed local slice and a large-area 1.12 GB urban model. The local slice was approximately 69 MB after Draco compression using the built-in glTF exporter in Blender 4.5. The fixed Draco configuration was compression level 6, with quantization bits of 14 for positions, 10 for normals, and 12 for texture coordinates. The uncompressed 120 MB local-slice mesh was used for preprocessing-related measurements, while the 69 MB Draco-compressed payload was retained only as a fixed delivery-size reference and was not used for Draco-parameter sensitivity analysis. The 1.12 GB model was used for first-screen memory stress testing and for evaluating whether state-word-driven D1 tile scheduling could reduce initial memory pressure and potential Out-of-memory (OOM) risk. It also provided the full-volume reference size for extrapolated preprocessing estimates, rather than direct full-volume preprocessing measurement. Figure 8 shows the local slice and the large-scale regional model used in Dataset B.

5.2. Empirical Evaluation of Dimensional Decoupling and On-Demand Extraction

5.2.1. Verification of Transmission-Level Separation

To evaluate transmission-level separation, Dataset A was loaded in a staged sequence of D1 → D3 → C → T. During each stage, browser network logs were monitored to determine whether non-target resources were unnecessarily requested. As shown in Table 1, D-targeted loading triggered no unintended component or texture request. Component activation requested only the C-Chunk without geometry reloading or texture interference, and texture activation requested only the T-Chunk without affecting geometry or component requests. These results indicate that the tested D, C, and T resources remained separated at the request level and could be selectively scheduled in this staged loading case.

5.2.2. Comparison of First-Screen and Staged Requested Payloads

Table 2 compares requested payloads under monolithic loading and staged decoupled loading for Dataset A. In the conventional scheme, the full 8.31 MB GLB file was loaded during the initial screen phase. In the proposed scheme, the D1 coarse model was used for first-screen rendering, reducing the initial requested payload to 0.57 MB, corresponding to a 93.1% reduction relative to the full model. The latter D3 geometry-refinement stage required 0.78 MB, or 9.4% of the original model size, because this D3 package contains refined host geometry without texture resources. Table 2 focuses on the first-screen and geometry-refinement payloads, while component and texture switching are evaluated in later incremental-update tests. These results show that multidimensional decoupling can reduce the first-screen requested payload in this tested setting while shifting non-essential details to later stages.

5.3. Performance Evaluation of State-Word Scheduling

5.3.1. Experimental Design

To isolate runtime state parsing from rendering and transmission effects, a browser-side CPU micro-benchmark was conducted in Google Chrome. The benchmark measured the time required to parse or decode N object-state records and extract the C field; WebGL rendering, model loading, network transmission, and full browser heap profiling were excluded.

Two schemes were compared. The control group used JSON deserialization followed by C-field lookup, whereas the experimental group used bit-shift and bitmask operations on a 16-bit unsigned integer state word (Uint16). Both schemes checked the same target condition, C = 2. Hit counts and checksums were retained to verify that they identified the same target states.

The sample size N was set to 1000, 10,000, 50,000, and 100,000. For each scale, three warm-up runs were excluded, followed by ten official runs. Each official run repeated the parsing pass 1000 times, and the reported time was normalized to one parsing pass. Mean values were used for reporting, while standard deviations were retained in the benchmark records. Figure 9 summarizes the multi-scale results, and Table 3 reports the representative large-scale N = 100,000 case.

5.3.2. Execution Efficiency Analysis

For N = 100,000, JSON deserialization and C-field lookup required approximately 13.05 ms, occupying approximately 78.27% of a 16.67 ms frame budget at 60 FPS. Fixed-offset bitwise C-field extraction required approximately 0.090 ms, corresponding to approximately 0.54% of the frame period and an approximately 144-fold speedup.

As shown in Figure 9, the state-word scheme consistently required much lower C-field extraction time across all tested object scales. At N = 1000, the state-word scheme remained close to 0.001 ms per parsing pass, compared with approximately 0.12 ms for JSON. At N = 100,000, JSON deserialization and C-field lookup increased to approximately 13.05 ms, whereas fixed-offset state-word extraction remained at approximately 0.090 ms. This result indicates that fixed-offset state-word decoding can substantially reduce CPU-side C-field extraction cost during large-scale object-state filtering.

The performance difference comes from the representation mechanism. JSON requires text deserialization and dynamic property access before the target field can be retrieved. The state-word scheme stores runtime states as fixed-length Uint16 values and extracts the target field through bit-shift and bitmask operations, while preserving the same target-state identification result.

5.3.3. Encoded Representation Size and Allocation Tendency

The two schemes also differ in encoded representation size and allocation tendency. In the benchmark, each JSON state record occupied approximately 49 bytes as a UTF-8 encoded string, whereas each state-word record was stored as a 2-byte Uint16 value. JSON parsing creates temporary JavaScript objects before C-field access, which may increase allocation pressure during high-frequency switching. By contrast, state-word decoding directly extracts the target field from a primitive Uint16 value and reduces temporary object creation during parsing.

This comparison refers to encoded representation size and allocation tendency, not a full browser heap-profile measurement. Complete memory behavior in Web3D loading and rendering workflows is evaluated separately in Section 5.7. Together with the timing results, the benchmark indicates that the state-word mechanism is suitable for high-frequency C-field extraction and lightweight object-state filtering in large-scale interactive Web3D scenes.

5.4. Incremental Updating and State-Constraint Stability Testing

This section evaluates two runtime mechanisms of the proposed framework: XOR-based differential updating for requested-payload reduction and state-constraint checking for intercepting invalid instructions and mapping them to valid executable states.

5.4.1. XOR-Based Differential Incremental Switching

The incremental switching experiment was conducted on Dataset A to test whether the state-word mechanism could identify the changed dimension and request only the required incremental chunk. The tested case simulated a texture-appearance transition from baked texture mode (T2) to full PBR texture mode (T3). The initial state word was 0x9080, and the target state word was 0x90C0:

0x9080 ⊕ 0x90C0 = 0x0040

The resulting differential mask, 0x0040, indicated that the modification occurred only in the T dimension. Therefore, the runtime scheduler did not re-request the full Dataset A package, but requested only the corresponding T3-Chunk as the incremental texture-appearance payload.

As shown in Table 4, the baseline full-package reload required 8514.30 KB of requested payload. By contrast, XOR-based differential scheduling requested only the T3-Chunk, with a measured payload of 3569.23 KB. In this representative T-dimension switching case, the requested payload was reduced to 41.92% of the full-package baseline, corresponding to a 58.1% reduction in redundant transfer. This result indicates that the state-word mechanism can convert an identifiable dimensional state change into a selective chunk request.

5.4.2. State Constraint and Degradation Mechanisms

A controlled abnormal-state injection test was conducted to evaluate whether the runtime state machine could detect and intercept invalid state combinations. In this test, the instruction 0xD400 requested C2 components while only coarse geometry (D1) was resident. This violated the topological dependency rule defined in Section 3.3: when C > C0, the host geometry must satisfy D ≥ D2.

As shown in Figure 10, the runtime state machine identified the instruction as invalid and intercepted the component-loading request before execution. After constraint masking, the component level C was cleared to C0, and the exception flag R was set to 1, producing the valid degraded instruction 0xD001. As a result, the system did not attempt to attach fine components to geometry that was too coarse to support them.

This test demonstrates that the constraint mechanism can prevent illegal topological state combinations, such as unsupported component attachment caused by invalid runtime instructions. The result should be interpreted as a validation of state-consistency control and safe degradation, rather than as a complete solution to perceptual transition artifacts. Visual discontinuities such as popping during rapid refinement are discussed as a limitation in Section 6.

5.5. Verification of the Spatial Proxy (Dimension B)

This section evaluates the computational behavior of the spatial proxy dimension in the local slice of Dataset B. The purpose is not to show that one B level is universally superior, but to verify whether different B levels support different spatial tasks. B1 is evaluated for macro-scale spatial screening, whereas B3 is evaluated through a B3-like CPU-side workload for local, precise geometric interaction.

5.5.1. Test Setup

Scenario A represents macro-scale tasks such as distance retrieval, spatial filtering, and heat-map-style analysis. The test state was set to B = 1, while the remaining dimensions were not requested in this proxy-only benchmark. The B1 point-proxy dataset was constructed from extracted building centroids and expanded to 10,000 point records within the spatial range of the local scene for stress testing. The resulting Points_B1.bin file occupied approximately 117 KB and contained 10,000 three-dimensional point proxies. On the frontend, Euclidean distance checks were performed over these point proxies.

Scenario B represents the CPU-side geometric cost of local precise picking under a B3-like refined mesh proxy condition. The tested condition was treated as B = 3, but the timing was implemented as a deterministic synthetic ray-triangle intersection workload rather than as direct traversal of the full refined mesh. To isolate CPU-side geometric computation from network transmission and Draco decompression, a deterministic synthetic 50,000-triangle ray-triangle intersection workload was used to approximate the CPU-side computational burden of B3-like local picking. The result was then used to estimate the overhead that would occur if this B3-like workload were incorrectly applied to all 10,000 objects in a macro-scale screening task.

To avoid timer-resolution artifacts, both benchmarks used repeated inner iterations and reported normalized per-pass time. The B1 benchmark used 5000 inner iterations per run, and the B3 benchmark used 300 inner iterations per run. Each scenario was repeated 20 times after 5 warm-up runs. No target-time calibration, random fallback, or random workload generation was used.

5.5.2. Execution Time Statistics

Table 5 summarizes the comparison between the two proxy modes. In Scenario A, the frontend loaded the 117 KB B1 point-proxy file and completed distance checks over 10,000 point proxies in approximately 0.017 ms per normalized processing pass, reported as ~0.02 ms in Table 5.

In Scenario B, the B3 micro-benchmark required approximately 0.37 ms for one deterministic 50,000-triangle ray-triangle intersection-test workload. If this B3-like workload were naively applied to all 10,000 objects in a macro-scale traversal task, the extrapolated execution time would be approximately 3.7 s. This would cause multi-second main-thread blocking and would therefore be unsuitable for real-time macro-scale screening.

These results indicate the need for task-adaptive proxy selection rather than showing that B1 replaces B3. B1 point proxies are suitable for large-scale spatial retrieval, filtering, and heat-map-style analysis, where approximate object positions are sufficient. B3-level or B3-like mesh proxies remain necessary for local picking, collision checking, shadow analysis, occlusion analysis, and other precise geometric operations. Therefore, B-dimension grading can help avoid unnecessary mesh traversal in macro-scale tasks while preserving refined geometric proxies for local interaction.

5.6. Verification of Home Tile-Based Semantic Payload Reduction

This section evaluates whether the Home Tile strategy can reduce redundant semantic payloads for cross-tile virtual building proxy objects. The test focuses on metadata-level semantic-payload reduction, not bandwidth, end-to-end network latency, runtime semantic-routing latency, or interface response time.

5.6.1. Experimental Setup and Scene Construction

Because continuous oblique photogrammetric meshes often lack explicit object boundaries, a virtual monomerization step was used to construct a controlled cross-tile semantic-redundancy test. Based on the local slice of Dataset B, Voronoi-like nearest-seed partitioning generated 500 virtual building proxy objects within an 8 × 8 tile grid, resulting in 64 spatial tiles.

Two semantic organization schemes were compared. In S-Full, each involved tile stored a complete semantic record for each associated building proxy, including local attributes and linkable semantic fields. Each full semantic template contained 20 semantic fields, including object identifiers, usage, and height, and was padded to approximately 2.5 KB for controlled payload-size testing. In S-Home, each virtual building proxy was assigned a Home Tile according to the tile containing its proxy centroid, represented by the corresponding seed point. The complete semantic record was stored only in the Home Tile, whereas guest tiles stored a lightweight pointer record containing the object identifier, Home Tile identifier, and URI, with a controlled size of approximately 0.1 KB.

5.6.2. Analysis of Experimental Results

The experiment monitored semantic payloads across 64 spatial tiles. Table 6 reports five representative tiles selected according to a distribution-based criterion, covering the minimum, lower-quartile-nearest, median/overall-nearest, upper-quartile-nearest, and maximum reduction cases among the monitored tiles. Figure 11 visualizes the same representative tiles.

Across all monitored tiles, S-Full required approximately 2298.46 KB, whereas S-Home required 1285.50 KB. The Home Tile strategy therefore reduced the semantic payload by 1012.96 KB, corresponding to an overall reduction of 44.1%. This result indicates that storing complete attributes only in Home Tiles and replacing guest-tile attributes with lightweight pointer records can reduce semantic duplication in controlled cross-tile object organization.

Overall, the experiment indicates that the Home Tile strategy can reduce redundant semantic payloads in a controlled cross-tile object setting. The result should be interpreted as a metadata-level lightweighting effect rather than as a direct measurement of end-to-end network performance.

5.7. Initial-Screen Memory Testing in a Large-Scale Scenario

This section evaluates the first-screen process-level memory footprint of monolithic loading and state-word-driven D1 tile scheduling using Dataset B. The test focuses on the same initial spatial view and does not compare full-fidelity loading of all B-D-C-T-S layers. In the experimental group, only the D1-level tiles required by the initial view were loaded; detailed components, semantic layers, and high-resolution textures were excluded from the first-screen request.

5.7.1. Experimental Design and Measurement Method

The experiment was conducted in Google Chrome with browser memory diagnostics enabled through Chrome experimental flags. Chrome was restarted before testing, and each loading strategy was evaluated over ten complete loading cycles. The baseline memory usage of the test environment was approximately 28 MB.

Two strategies were compared. The control group directly loaded the single 1.12 GB GLB model of Dataset B. The experimental group used state-word-driven first-screen D1 tile scheduling and loaded 15 D1-level tiles, with a total physical payload of 22.7 MB.

The values in Table 7 denote mean observed process-level stable resident memory over ten loading cycles. They were manually recorded from the operating-system task manager and Chrome Task Manager, rather than from JavaScript heap APIs. Stable resident memory values include the frontend tab process and GPU-related browser process; JavaScript heap values were used only as auxiliary references.

5.7.2. Memory Results and Interpretation

Under monolithic loading, the 1.12 GB GLB file had to be decoded into runtime geometry, texture, and GPU buffer resources. After rendering reached a stable stage, the mean stable resident memory remained at approximately 6749.2 MB. After subtracting the 28 MB baseline, the net stable resident overhead was approximately 6721.2 MB, corresponding to a resident expansion ratio of approximately 5.86×.

With state-word-driven first-screen D1 tile scheduling, the loaded first-screen physical payload was 22.7 MB. The mean stable resident memory was approximately 88.1 MB. After baseline subtraction, the net stable resident overhead was approximately 60.1 MB, corresponding to a resident expansion ratio of approximately 2.65× relative to the loaded first-screen payload.

These results provide evidence for first-screen memory control enabled by decoupled tile organization and state-word-driven scheduling. They do not show that the complete full-fidelity dataset can be represented with the same resident memory. The reduction mainly results from loading only visible D1-level tiles and excluding nonessential components, semantic layers, and high-resolution textures from the initial request. A conventional tiled loading strategy may also reduce first-screen payload when the same coarse D1 tiles are selected. The contribution of the proposed framework lies in using compact state words to represent selected B-D-C-T-S states and to support deterministic state switching, constraint checking, and incremental updating across dimensions. Therefore, the proposed mechanism complements tiled data organization rather than replacing it.

5.8. Server-Side Offline Preprocessing and State-Word Compilation Overhead

Although the B-D-C-T-S framework is designed to improve frontend runtime scheduling, it requires offline decoupling and state-word compilation before Web delivery. This section evaluates this offline cost using Dataset B.

The benchmark used the 120 MB no-Draco local-slice source mesh of Dataset B, containing approximately 1,367,441 vertices and 1,694,654 faces. The preprocessing script generated 500 candidate partition seeds and produced 307 valid non-empty sub-meshes. Full-volume preprocessing time for the 1.12 GB baseline model was extrapolated using a scaling factor of 9.3333, derived from the reported file-size ratio between the 1.12 GB full-volume model, treated as 1120 MB for preprocessing extrapolation, and the 120 MB local slice. Therefore, the full-volume values in Table 8 are extrapolated estimates, not direct measurements on the 1.12 GB dataset.

The benchmark was conducted on a workstation with an Intel Core i7-12700KF CPU and 96 GB of RAM. The experiment was repeated five times using a fixed random seed of 42 to keep the geometric workload consistent. Initial model-loading time was recorded separately and excluded from the phase-level preprocessing total.

As shown in Table 8, spatial partitioning and virtual monomerization dominated the preprocessing cost, requiring approximately 2.12 ± 0.02 min locally and yielding a full-volume extrapolated estimate of approximately 19.75 ± 0.20 min. Geometric dimensionality reduction, approximate texture-appearance reassembly, and state-word compilation were comparatively lightweight. The total local-slice preprocessing time was approximately 2.18 ± 0.02 min, and the full-volume extrapolated estimate was approximately 20.35 ± 0.21 min, or 0.34 h.

This estimate should not be interpreted as a general scaling law. In larger industrial datasets, spatial partitioning, KD-tree construction, grid cutting, sub-mesh separation, virtual monomerization, and texture-related processing may grow non-linearly because of topology complexity, memory locality, and I/O contention. GB- to TB-scale deployment may require parallel or distributed preprocessing support. In many CIM and large-scale Web3D applications, this cost can often be treated mainly as an offline cost because base urban mesh data are usually updated less frequently than frontend interaction states.

5.9. Discussion

The evaluations show that the B-D-C-T-S scheduling mechanism reduced first-screen requested payload in the tested staged-loading case, improved CPU-side C-field extraction, reduced redundant semantic payloads, supported incremental updating, and supported first-screen memory control under the tested conditions. These results should be interpreted as mechanism-level evidence rather than as a replacement for existing Web3D streaming standards.

Existing tiled streaming mechanisms, including OGC 3D Tiles 1.1, implicit tiling, and glTF-based workflows, provide mature support for spatial hierarchy construction, tile availability representation, view-dependent tile selection, and large-scale scene streaming. The proposed 16-bit state word does not replace these mechanisms. Instead, it provides a compact object-level state-control layer after tiled spatial selection. In a practical pipeline, the tiled hierarchy first identifies visible or relevant spatial regions, and the B-D-C-T-S layer then determines which object-level dimensions should be activated, reused, skipped, incrementally updated, or degraded.

This layered interpretation is important for understanding the reported results. The parsing speedup in Section 5.3 should be interpreted as an improvement in C-field extraction and lightweight object-state filtering, not as a full benchmark comparison against 3D Tiles, implicit tiling, or glTF streaming. Similarly, the first-screen memory reduction in Section 5.7 mainly results from loading only the D1-level tiles required by the tested initial view. Therefore, the contribution of the proposed framework lies not in replacing tiled loading but in adding compact, constraint-aware, and multidimensional state control after tiled spatial selection.

At the same time, dimensional decoupling may introduce request-management risks. Separating geometry, components, texture-appearance resources, and semantic records into independently addressable units can increase fine-grained requests when many objects or dimensions are activated simultaneously. Under high-latency or request-intensive conditions, excessive request fragmentation may offset part of the benefit gained from reduced payload size. Request coordination, cache reuse, visible-object prioritization, request merging, HTTP/2 multiplexing, and batching can mitigate these effects, but they cannot eliminate latency or request-management overhead.

The proposed method also involves an offline-online trade-off. Offline preprocessing is required before Web delivery, and the full-volume preprocessing values reported in Section 5.8 were extrapolated rather than directly measured on the complete 1.12 GB dataset. Larger industrial datasets may require parallel or distributed preprocessing support. In addition, constraint-aware degradation can prevent invalid state combinations, but it does not fully eliminate perceptual transition artifacts such as visual popping during rapid refinement.

Overall, the proposed framework should be understood as a compact and constraint-aware object-level state-control layer within tiled Web3D pipelines. Its value lies in supporting selective resource activation and scheduling, Home Tile semantic routing, incremental updating, and first-screen memory control while remaining compatible with existing spatial streaming mechanisms.

6. Summary and Future Work

6.1. Summary

This study addressed large requested payloads, runtime state-parsing overhead, semantic redundancy across tiles, and client-side memory pressure during first-screen loading. It proposed a B-D-C-T-S five-dimensional orthogonal decoupling framework and a 16-bit state-word-driven scheduling mechanism. The framework organizes Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information as separately addressable dimensions, while the state word supports compact representation, fixed-offset bitwise decoding, XOR-based differencing, and constraint-aware degradation.

The controlled experiments verified the core mechanisms. In Dataset A, staged decoupled loading reduced the first-screen requested payload by 93.1% under the tested initial-view setting. For N = 100,000 object-state records, state-word-based C-field extraction achieved an approximately 144-fold speedup over JSON deserialization and C-field lookup. XOR-based differential scheduling requested only the T3-related chunk during a texture-appearance state update. In Dataset B, the Home Tile strategy reduced the total semantic payload by 44.1% across 64 monitored tiles. In the 1.12 GB first-screen memory test, state-word-driven D1 tile scheduling loaded only 22.7 MB of physical payload, with stable resident memory of approximately 88.1 MB and net stable overhead of approximately 60.1 MB. These memory results represent first-screen D1 tile scheduling rather than full-fidelity loading. The preprocessing result was based on a 120 MB local-slice measurement and full-volume extrapolation, yielding an estimated 20.35 ± 0.21 min, or 0.34 h, for the 1.12 GB full-volume model. This estimate should not be interpreted as a general scaling law.

Overall, the proposed method supports object-level state representation, selective resource activation and scheduling, semantic-payload lightweighting, incremental updating, and first-screen memory control. These results support its role as a compact state-control layer within tiled Web3D pipelines.

6.2. Key Innovations

This study introduces three main methodological innovations. First, the B-D-C-T-S framework defines state-level grading rules for Boundary-based Spatial Proxy, Geometric Detail, Component Complexity, Texture Appearance, and Semantic Information. Orthogonality is treated as organizational separability rather than unconstrained executability, because valid states must satisfy spatial-carrier, topological-support, and rendering-logic constraints.

Second, the state-word-based scheduling mechanism represents runtime states with compact 2-byte bit-field instructions. Fixed-offset bitmask operations and XOR-based differencing support fast field extraction, changed-dimension identification, incremental scheduling, and constraint-aware degradation.

Third, the Home Tile strategy reduces semantic redundancy for cross-tile objects through centroid-assigned complete records and lightweight guest-tile pointer records, supporting low-redundancy semantic routing after tiled spatial selection.

6.3. Research Limitations

Several limitations remain. First, the validation focuses on controlled prototype experiments rather than full end-to-end deployment in heterogeneous WebGIS environments. Although the framework complements 3D Tiles- or glTF-based pipelines, this study did not implement a production-level 3D Tiles baseline or a large-scale Cesium-based comparison. The comparison with tiled streaming standards is therefore architectural rather than benchmark-equivalent.

Second, high-latency and low-bandwidth network conditions were not fully evaluated through end-to-end experiments. Fine-grained decoupling can reduce redundant requested payloads, but it may increase discrete chunk requests during rapid navigation. Request batching, cache reuse, visible-object prioritization, request merging, HTTP/2 multiplexing, and batching can mitigate request fragmentation, but they cannot eliminate latency or request-management overhead. Their effectiveness depends on server configuration, round-trip time, chunk granularity, cache hit rate, and user behavior.

Third, the preprocessing cost remains a limitation. The full-volume preprocessing time is an extrapolated estimate and should not be interpreted as a general scaling law. In larger industrial datasets, spatial partitioning, KD-tree construction, grid cutting, sub-mesh separation, virtual monomerization, and texture-related processing may grow non-linearly because of topology complexity, memory locality, and I/O contention. The current serial pipeline may require parallel or distributed support for GB- to TB-scale deployment.

Fourth, visual continuity and cross-platform validation remain insufficient. Constraint-aware degradation can prevent invalid logical states and unsupported component attachment, but it does not eliminate perceptual discontinuities when D1 proxies are replaced by D3 geometry or high-resolution T3 textures. Mobile devices and low-resource clients also impose stricter limits on memory, GPU resources, thermal control, and browser resource management.

6.4. Future Work

Future work should focus on four directions. First, the state-control layer should be integrated into practical 3D Tiles-, glTF-, or Cesium-based pipelines and compared with conventional tiled streaming under consistent scene coverage, camera paths, cache policies, and device conditions.

Second, network-adaptive scheduling should use runtime indicators such as round-trip time, bandwidth fluctuation, request queue length, cache hit rate, and main-thread workload.

Third, preprocessing should move toward scalable parallel and distributed workflows, including multi-process tiling, cluster-based spatial partitioning, GPU-assisted preprocessing, and Spark/MPI-style processing.

Fourth, graphics-pipeline-level transition mechanisms should improve visual continuity through asynchronous alpha blending, progressive texture fading, temporal smoothing, geomorphing, or WebGL transition shaders. Broader evaluations should also include mobile devices, low-memory clients, different browsers, and real WebGIS network environments.

Author Contributions

Conceptualization, Tong Zhang and Yunfei Shi; Methodology, Yunfei Shi, Chunguang Lyu and Shuangshuang Shi; Software, Tong Zhang and Wenjie Jiang; Validation, Tong Zhang, Wenjie Jiang and Yunfei Shi; Data curation, Tong Zhang and Wenjie Jiang; Writing—original draft preparation, Tong Zhang and Wenjie Jiang; Writing—review and editing, Yunfei Shi, Chunguang Lyu and Shuangshuang Shi; Visualization, Tong Zhang and Wenjie Jiang; Supervision, Yunfei Shi, Chunguang Lyu and Shuangshuang Shi; Funding acquisition, Yunfei Shi and Chunguang Lyu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42171422) and the Shandong Provincial Natural Science Foundation (ZR2022MD071).

Data Availability Statement

The data are available from the corresponding author on reasonable request.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. We also acknowledge Autodesk for providing the official Revit example models used for Dataset A. The Berlin 3D Photogrammetric Meshes (Dataset B) were provided by the Berlin Senate Department for Economics, Energy and Public Enterprises via the Berlin Business Location Center, and are used under the Data licence Germany—attribution—Version 2.0 (dl-de/by-2-0, data modified).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Chen, S.; Hwang, K.; Ji, X.; Lei, Z.; Zhu, Y.; Ye, F.; Liu, M. Spatio-temporal data fusion techniques for modeling digital twin city. Geo-Spat. Inf. Sci. 2025, 28, 541–564. [Google Scholar] [CrossRef]
Lin, H.; Guo, R.; Ma, D.; Kuai, X.; Yuan, Z.; Du, Z.; He, B. Digital-twin-based multi-scale simulation supports urban emergency management: A case study of urban epidemic transmission. Int. J. Digit. Earth 2024, 17, 2421950. [Google Scholar] [CrossRef]
Yu, W.; Zhou, X.; Wang, D.; Dong, J. The development and construction of city information modeling (CIM): A survey from data perspective. Appl. Sci. 2025, 15, 4696. [Google Scholar] [CrossRef]
Auer, M.; Zipf, A. 3D WebGIS: From visualization to analysis. An efficient browser-based 3D line-of-sight analysis. ISPRS Int. J. Geo-Inf. 2018, 7, 279. [Google Scholar] [CrossRef]
Chen, Q.; Chen, J.; Huang, W. Visualizing large-scale building information modeling models within indoor and outdoor environments using a semantics-based method. ISPRS Int. J. Geo-Inf. 2021, 10, 756. [Google Scholar] [CrossRef]
Biljecki, F. Level of Detail in 3D City Models. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2017. [Google Scholar] [CrossRef]
Koskela, T.; Pouke, M.; Heikkinen, A.; Alatalo, T.; Alavesa, P.; Ojala, T. DRUMM: Dynamic viewing of large-scale 3D city models on the web. In Proceedings of the 2017 9th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Athens, Greece, 6–8 September 2017; pp. 8–14. [Google Scholar] [CrossRef]
Zhan, W.; Chen, Y.; Chen, J. 3D Tiles-based high-efficiency visualization method for complex BIM models on the Web. ISPRS Int. J. Geo-Inf. 2021, 10, 476. [Google Scholar] [CrossRef]
Gröger, G.; Plümer, L. CityGML—Interoperable semantic 3D city models. ISPRS J. Photogramm. Remote Sens. 2012, 71, 12–33. [Google Scholar] [CrossRef]
Benner, J.; Geiger, A.; Gröger, G.; Häfele, K.-H.; Löwner, M.-O. Enhanced LOD concepts for virtual 3D city models. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, II-2/W1, 51–61. [Google Scholar] [CrossRef]
Löwner, M.O.; Gröger, G.; Benner, J.; Biljecki, F.; Nagel, C. Proposal for a new LOD and multi-representation concept for CityGML. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, IV-2/W1, 3–12. [Google Scholar] [CrossRef]
Löwner, M.-O.; Benner, J.; Gröger, G.; Häfele, K.-H. New concepts for structuring 3D city models—An extended level of detail concept for CityGML buildings. In Computational Science and Its Applications—ICCSA 2013; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7973, pp. 466–480. [Google Scholar] [CrossRef]
Kolbe, T.H.; Kutzner, T.; Smyth, C.S.; Nagel, C.; Roensdorf, C.; Heazel, C. (Eds.) OGC City Geography Markup Language (CityGML) Part 1: Conceptual Model Standard Version 3.0.0; Open Geospatial Consortium: Wayland, MA, USA, 2021; Available online: https://docs.ogc.org/is/20-010/20-010.html (accessed on 1 March 2026).
Tang, L.; Li, L.; Ying, S.; Lei, Y. A full level-of-detail specification for 3D building models combining indoor and outdoor scenes. ISPRS Int. J. Geo-Inf. 2018, 7, 419. [Google Scholar] [CrossRef]
Tang, L.; Ying, S.; Li, L.; Biljecki, F.; Zhu, H.; Zhu, Y.; Yang, F.; Su, F. An application-driven LOD modeling paradigm for 3D building models. ISPRS J. Photogramm. Remote Sens. 2020, 161, 194–207. [Google Scholar] [CrossRef]
Lilley, S.; Cozzi, P.; Getz, G. (Eds.) OGC 3D Tiles 1.1 Specification; OGC Community Standard; OGC Doc. No. 22-025r4; Open Geospatial Consortium: Wayland, MA, USA, 2022; Available online: https://docs.ogc.org/cs/22-025r4/22-025r4.html (accessed on 1 March 2026).
Li, Y.; Katsipoulakis, N.R.; Chandramouli, B.; Goldstein, J.; Kossmann, D. Mison: A fast JSON parser for data analytics. Proc. VLDB Endow. 2017, 10, 1118–1129. [Google Scholar] [CrossRef]
Gaillard, J.; Peytavie, A.; Gesquière, G. Visualisation and personalisation of multi-representations city models. Int. J. Digit. Earth 2020, 13, 627–644. [Google Scholar] [CrossRef]
Blut, C.; Blut, T.; Blankenbach, J. CityGML goes mobile: Application of large 3D CityGML models on smartphones. Int. J. Digit. Earth 2019, 12, 25–42. [Google Scholar] [CrossRef]
Khronos Group. glTF 2.0 Specification, version 2.0.1; Khronos Group: Beaverton, OR, USA, 2021. Available online: https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html (accessed on 30 April 2026).
Google. Draco 3D Data Compression. Available online: https://google.github.io/draco/ (accessed on 30 April 2026).
Cozzi, P.; Lilley, S.; Getz, G. (Eds.) 3D Tiles Specification 1.0; OGC Community Standard; OGC Doc. No. 18-053r2; Open Geospatial Consortium: Wayland, MA, USA, 2019; Available online: https://docs.ogc.org/cs/18-053r2/18-053r2.html (accessed on 1 March 2026).
CesiumGS. Batch Table. 3D Tiles Specification. Available online: https://github.com/CesiumGS/3d-tiles/blob/main/specification/TileFormats/BatchTable/README.adoc (accessed on 30 April 2026).
Song, Z.; Li, J. A dynamic tiles loading and scheduling strategy for massive oblique photogrammetry models. In Proceedings of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), Chongqing, China, 27–29 June 2018; pp. 648–652. [Google Scholar] [CrossRef]
Limper, M.; Thöner, M.; Behr, J.; Fellner, D.W. SRC—A streamable format for generalized web-based 3D data transmission. In Proceedings of the 19th International ACM Conference on 3D Web Technologies (Web3D ‘14), Vancouver, BC, Canada, 8–10 August 2014; pp. 35–43. [Google Scholar] [CrossRef]
Liu, X.; Xie, N.; Tang, K.; Jia, J. Lightweighting for Web3D visualization of large-scale BIM scenes in real-time. Graph. Models 2016, 88, 40–56. [Google Scholar] [CrossRef]
Langdale, G.; Lemire, D. Parsing gigabytes of JSON per second. VLDB J. 2019, 28, 941–960. [Google Scholar] [CrossRef]
Thomson, M.; Benfield, C. RFC 9113: HTTP/2; RFC Editor: Fremont, CA, USA, 2022. [Google Scholar] [CrossRef]
Chen, Y.; Shooraj, E.; Rajabifard, A.; Sabri, S. From IFC to 3D Tiles: An integrated open-source solution for visualising BIMs on Cesium. ISPRS Int. J. Geo-Inf. 2018, 7, 393. [Google Scholar] [CrossRef]
Xu, Z.; Zhang, L.; Li, H.; Lin, Y.-H.; Yin, S. Combining IFC and 3D tiles to create 3D visualization for building information modeling. Autom. Constr. 2020, 109, 102995. [Google Scholar] [CrossRef]
Chen, J.; Li, M.; Li, J. Progressive visualization of complex 3D models over the Internet. Trans. GIS 2016, 20, 887–902. [Google Scholar] [CrossRef]
Li, L.; Qiao, X.; Lu, Q.; Ren, P.; Lin, R. Rendering optimization for mobile Web 3D based on animation data separation and on-demand loading. IEEE Access 2020, 8, 88474–88486. [Google Scholar] [CrossRef]
Lavoué, G.; Chevalier, L.; Dupont, F. Streaming compressed 3D data on the web using JavaScript and WebGL. In Proceedings of the 18th International Conference on 3D Web Technology (Web3D ‘13), San Sebastian, Spain, 20–22 June 2013; ACM: New York, NY, USA, 2013; pp. 19–27. [Google Scholar] [CrossRef]
Wang, J.; Xu, Z.; Li, Y. A WebGL-based interactive visualization framework for large-scale urban seismic simulations with a dual multi-LOD strategy. Buildings 2025, 15, 2916. [Google Scholar] [CrossRef]

Figure 1. Example configurations of two states in the 5D orthogonal decoupling framework.

Figure 2. Grading matrix of the B-D-C-T-S framework for 3D building models. The illustrated levels correspond to the quantitative and rule-based grading definitions in Section 3.2.

Figure 3. Schematic of the bit-field mapping strategy for the 16-bit structured state word.

Figure 4. Two-stage integration workflow of the proposed B-D-C-T-S state-control layer. Offline preprocessing generates tiled spatial indices, B-level proxy records, D/C/T chunks, S-level semantic records, and 16-bit state-word records; runtime scheduling performs tile selection, object-level state control, selective request coordination, and WebGL reassembly.

Figure 5. State-word-driven resource routing and Home Tile-based semantic routing. D/C/T fields are extracted and validated before resource requests are generated, while invalid target states are degraded before resource requests enter the runtime request queue. The S field controls embedded semantics, Home Tile retrieval, and guest-tile routing.

Figure 6. XOR-based incremental updating mechanism for state switching. The differential word identifies changed fields so that only affected proxy records, chunks, or semantic records are requested.

Figure 7. Visual comparison of staged decoupled reconstruction in Dataset A: (a) D1 coarse model; (b) reconstructed model after overlaying D3, C, and T.

Figure 8. Overview of the experimental data for Dataset B: (a) local slice of the city; (b) large-scale regional model.

Figure 9. Browser-side CPU micro-benchmark of JSON-based C-field extraction and 16-bit state-word C-field extraction across different object scales. Reported times are mean normalized values per parsing pass from repeated runs. Error bars denote standard deviations, and the y-axis uses a logarithmic scale.

Figure 10. Schematic runtime trace of constraint conflict detection and safe degradation. The abnormal instruction 0xD400 requested C2 components under a D1 host-geometry state; the runtime state machine cleared the C field, set the R flag to 1, and generated the valid degraded instruction 0xD001.

Figure 11. Semantic payload comparison for distribution-based representative spatial tiles under S-Full and S-Home organization. S-Full stores complete semantic records in every involved tile, whereas S-Home stores complete records only in Home Tiles and uses lightweight pointer records in guest tiles.

Table 1. Monitoring data of transmission-level separation during staged loading.

Stage	State Change	Requested Chunk	Geometry Reload	Component Request	Texture Request	Interpretation
1	Initial D1	D1-Chunk	No	No	No	D-targeted initial loading
2	D1 → D3	D3-Chunk	No	No	No	Geometry refinement only
3	D3 → D3 + C	C-Chunk	No	No	No	Component-only addition
4	D3 + C → D3 + C + T	T-Chunk	No	No	No	Texture-only addition

Table 2. Comparison of requested payloads under monolithic loading and staged decoupled loading for Dataset A.

Loading Scheme/Phase	Loaded Content	Requested Payload (MB)	Relative to Full Model (%)	Reduction Compared with Full Model (%)	Interpretation
Conventional monolithic loading	Full GLB model	8.31	100.0	0.0	Entire model requested during the initial screen phase.
Proposed staged decoupled loading, initial stage	D1 coarse model	0.57	6.9	93.1	First-screen payload reduced through coarse geometric placeholder rendering.
Proposed staged decoupled loading, geometry-refinement stage	D3 host geometry	0.78	9.4	90.6	Refined host geometry requested separately after texture-separated preprocessing.

Table note: Requested payloads refer to requested resource sizes recorded in the controlled local Web loading environment. They do not represent end-to-end network latency.

Table 3. Comparison of JSON-based and state-word-based C-field extraction performance (N = 100,000).

Metric	Control Group (JSON Mode)	Experimental Group (State Word Mode)	Optimization Effect
C-field parsing/extraction time	~13.05 ms	~0.090 ms	Improved by ~144×
Calculated frame-budget occupancy, 60 FPS	~78.27%	~0.54%	Lower calculated frame-budget occupancy
Encoded state representation size	~49 bytes per UTF-8 JSON record	2 bytes per Uint16 state word	More compact encoded representation
Parsing mechanism	JSON deserialization and C-field lookup	Fixed-offset bitwise C-field extraction	C-field extraction through fixed bit positions

Table note: Times are mean normalized browser-side CPU micro-benchmark results. Each official run was normalized from 1000 repeated passes over N records. Rendering, loading, network transmission, and full heap profiling were excluded. Frame-budget occupancy was calculated from a 16.67 ms frame budget and does not represent measured rendering time. JSON and state-word tests were run sequentially with alternating order.

Table 4. Comparison of the requested payload under the representative T-dimension switching case.

Scheme Type	Scheduling Method	Requested Content	Requested Payload (KB)	Relative Payload Proportion
Baseline full-package reload	No differential scheduling	Full Dataset A package, geometry + texture	8514.30	100.00%
Proposed XOR-based scheduling	T-dimension differential request	T3-Chunk, extended texture-appearance payload	3569.23	41.92%

Table note: Payload values refer to requested resource sizes read from the current validation files in the controlled local environment. They do not represent end-to-end network latency or throughput.

Table 5. Comparison of computational behavior for B1 point proxy and B3-like mesh workload.

Evaluation Metric	B1 Point Proxy	B3-like Mesh Workload	Interpretation
Proxy payload/reference payload	117 KB	69.0 MB compressed mesh reference	Lightweight proxy vs. refined-mesh reference
CPU-side processing time	~0.02 ms/10,000 points	~0.37 ms/50,000-triangle workload	Macro screening vs. local precise picking
Estimated cost for 10,000-object macro traversal	~0.02 ms, measured	~3.7 s, extrapolated	Applying B3-like traversal to macro screening would cause blocking
Applicable tasks	Retrieval, filtering, heat-map-style analysis	Picking, collision, shadow, occlusion	Task-adaptive proxy selection

Table note: CPU-side times are normalized browser-side micro-benchmark results, not end-to-end interaction latency. B1 directly used Points_B1.bin, while B3 timing used a deterministic synthetic 50,000-triangle ray-triangle workload. The 69.0 MB value is only the compressed refined-mesh reference payload and was not used as the direct timing workload. The 10,000-object B3 cost is a naive extrapolated estimate, not a measured full-scene traversal time.

Table 6. Comparison of semantic payloads in distribution-based representative tiles and across all monitored tiles.

Tile ID	Representative Role	Involved Building-Tile Records	S-Full Payload (KB)	S-Home Payload (KB)	Payload Reduction (%)
Tile 4_5	Minimum reduction	20	50.0	42.8	14.4%
Tile 2_0	Lower-quartile-nearest	16	40.0	25.6	36.0%
Tile 0_7	Median/overall-nearest	13	32.5	18.1	44.3%
Tile 6_2	Upper-quartile-nearest	14	35.0	15.8	54.8%
Tile 2_1	Maximum reduction	16	40.0	11.2	72.0%
Overall, 64 tiles	Aggregate result	919	2298.46	1285.50	44.1%

Table note: Representative tiles were selected by a distribution-based criterion covering the minimum, lower-quartile-nearest, median/overall-nearest, upper-quartile-nearest, and maximum reduction cases among the 64 monitored tiles. The “Overall, 64 tiles” row reports all building-tile associations. Because cross-tile objects may involve multiple tiles, the number of building-tile records exceeds the 500 virtual building proxy objects. Aggregate payloads were calculated from generated records; 2.5 KB is an approximate controlled full-record size, so the aggregate S-Full payload may differ slightly from 919 × 2.5 KB.

Table 7. Process-level stable memory footprint under monolithic loading and tested first-screen D1 tile scheduling.

Loading Strategy	Loaded Physical Payload	Stable Resident Memory	Net Stable Overhead	Resident Expansion Ratio
Traditional monolithic loading	1.12 GB	~6749.2 MB	~6721.2 MB	~5.86×
State-word-driven first-screen D1 tile scheduling	22.7 MB	~88.1 MB	~60.1 MB	~2.65×

Table note: Loaded physical payload refers to the total size of the model files requested for the tested loading strategy before runtime decoding, GPU buffer allocation, and rendering. The state-word-driven group loaded only the 15 D1-level tiles required by the tested first-screen view. Net stable overhead was calculated after subtracting the approximately 28 MB baseline of the browser/process environment. The resident expansion ratio was calculated using the reported physical payload values, with the 1.12 GB monolithic payload treated as approximately 1.12 × 1024 MB. The results should not be interpreted as full-fidelity loading of the complete 1.12 GB dataset.

Table 8. Offline preprocessing time and extrapolated full-volume estimates for Dataset B.

Phase	Main Operation	Local 120 MB (min)	Extrapolated 1.12 GB (min)
Spatial partitioning and virtual monomerization, B/S	Seed generation, KD-tree assignment, sub-mesh separation	2.12 ± 0.02	19.75 ± 0.20
Geometric reduction, D	Feature extraction and proxy/LOD processing	0.04 ± 0.00	0.35 ± 0.01
Approximate texture-appearance reassembly, T	UV remapping and appearance reassembly	0.03 ± 0.00	0.24 ± 0.00
State-word compilation	16-bit state-word encoding	<0.01	<0.1
Total	—	2.18 ± 0.02	20.35 ± 0.21

Table note: Values are based on the 120 MB no-Draco local slice. Full-volume estimates were extrapolated using a factor of 9.3333 from unrounded timing records and are not a general scaling law. The 1.12 GB full-volume model was treated as 1120 MB only for this preprocessing extrapolation. Initial loading and full texture-atlas repacking were excluded.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Zhang, T.; Shi, Y.; Jiang, W.; Lyu, C.; Shi, S. A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS. ISPRS Int. J. Geo-Inf. 2026, 15, 215. https://doi.org/10.3390/ijgi15050215

AMA Style

Zhang T, Shi Y, Jiang W, Lyu C, Shi S. A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS. ISPRS International Journal of Geo-Information. 2026; 15(5):215. https://doi.org/10.3390/ijgi15050215

Chicago/Turabian Style

Zhang, Tong, Yunfei Shi, Wenjie Jiang, Chunguang Lyu, and Shuangshuang Shi. 2026. "A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS" ISPRS International Journal of Geo-Information 15, no. 5: 215. https://doi.org/10.3390/ijgi15050215

APA Style

Zhang, T., Shi, Y., Jiang, W., Lyu, C., & Shi, S. (2026). A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS. ISPRS International Journal of Geo-Information, 15(5), 215. https://doi.org/10.3390/ijgi15050215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A 5D Orthogonal Decoupling Framework and 16-Bit State-Word-Driven Scheduling Method for 3D Building Models in WebGIS

Abstract

1. Introduction

2. Related Work

2.1. 3D City Model Representation from Fixed LOD to Multidimensional Decoupling

2.2. Web3D Streaming Standards and Tiled Delivery Mechanisms

2.3. Runtime State-Control Gap in Multidimensional Web3D Scheduling

2.4. Cross-Tile Semantic Organization and Dynamic Representation Management in Large-Scale Web3D Scenes

3. A Five-Dimensional Orthogonal Decoupling State Framework for Building Models

3.1. Formal Definition of the Framework

3.2. Quantitative and Rule-Based Grading Definitions of Dimensions

3.2.1. Dimension B (Boundary-Based Spatial Proxy): Grading Based on Proxy Complexity

3.2.2. Dimension D (Geometric Detail): Grading Based on Geometric Approximation Error

3.2.3. Dimension C (Component Complexity): Grading Based on Visibility and Shape Characteristics

3.2.4. Dimension T (Texture Appearance): Grading Based on Visual Fidelity and Resource Cost

3.2.5. Dimension S (Semantic Information): Grading Based on Association Depth

3.3. Valid State Constraints

3.4. Applicability Boundaries and Extensibility of the Framework

4. State-Word Encoding, Chunk Routing, and Incremental Updating

4.1. 16-Bit Structured State Word for Multidimensional Scheduling

4.1.1. Bit-Field Mapping Strategy

4.1.2. Reserved Bits and Architectural Extensibility

4.1.3. Encoding Example

4.1.4. Bitwise Decoding Mechanism

4.2. Data Chunk Organization, Semantic Routing, and Runtime Request Coordination

4.2.1. Data Chunk Organization and Integration with Tiled Delivery Standards

4.2.2. Home Tile Strategy for Cross-Tile Semantic Routing

4.2.3. Runtime Request Coordination and Chunk Aggregation

4.3. Incremental Updating and Runtime Consistency Assurance

4.3.1. XOR-Based Differential State Updating

4.3.2. Deferred Reassembly and Runtime Redraw

4.3.3. State Constraint Validation and Safe Degradation

5. Validation of Core Mechanisms and Performance Analysis

5.1. Experimental Design and Environment Setup

5.1.1. Experimental Objectives

5.1.2. Experimental Environment and Tools

5.1.3. Experimental Datasets

5.2. Empirical Evaluation of Dimensional Decoupling and On-Demand Extraction

5.2.1. Verification of Transmission-Level Separation

5.2.2. Comparison of First-Screen and Staged Requested Payloads

5.3. Performance Evaluation of State-Word Scheduling

5.3.1. Experimental Design

5.3.2. Execution Efficiency Analysis

5.3.3. Encoded Representation Size and Allocation Tendency

5.4. Incremental Updating and State-Constraint Stability Testing

5.4.1. XOR-Based Differential Incremental Switching

5.4.2. State Constraint and Degradation Mechanisms

5.5. Verification of the Spatial Proxy (Dimension B)

5.5.1. Test Setup

5.5.2. Execution Time Statistics

5.6. Verification of Home Tile-Based Semantic Payload Reduction

5.6.1. Experimental Setup and Scene Construction

5.6.2. Analysis of Experimental Results

5.7. Initial-Screen Memory Testing in a Large-Scale Scenario

5.7.1. Experimental Design and Measurement Method

5.7.2. Memory Results and Interpretation

5.8. Server-Side Offline Preprocessing and State-Word Compilation Overhead

5.9. Discussion

6. Summary and Future Work

6.1. Summary

6.2. Key Innovations

6.3. Research Limitations

6.4. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI