IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling

Zhang, Yameng; Xie, Bingxue; Xu, Ting; Bi, Yanqiu; Luo, Zhongbin

doi:10.3390/electronics14224399

Open AccessArticle

IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling

by

Yameng Zhang

¹,

Bingxue Xie

²,

Ting Xu

³

,

Yanqiu Bi

^4,*

and

Zhongbin Luo

^5,6,*

¹

Taubman College of Architecture and Urban Planning, University of Michigan—Ann Arbor, Ann Arbor, MI 48109, USA

²

School of Engineering and Applied Science, University of Virginia, Charlottesville, VA 22903, USA

³

Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA

⁴

School of Civil Engineering, Chongqing Jiaotong University, Chongqing 400074, China

⁵

College of Computer Science, Chongqing University, Chongqing 400044, China

⁶

China Merchants Chongqing Communications Research & Design Institute Co., Ltd., Chongqing 400067, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(22), 4399; https://doi.org/10.3390/electronics14224399

Submission received: 20 October 2025 / Revised: 4 November 2025 / Accepted: 5 November 2025 / Published: 12 November 2025

(This article belongs to the Special Issue Computer Vision and Image Processing in Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Accurate semantic interpretation of 3D point clouds is essential for the digital transformation of architecture, engineering, and construction (AEC). Building Information Modeling (BIM) depends on both geometric precision and semantic consistency, yet raw scans are typically noisy, redundant, and computationally expensive to process. This work presents an Information Bottleneck (IB) formulation that regularizes latent features to preserve only task-relevant information, yielding compact and interpretable representations within point-based neural networks. Our method, named IB-PC (Information Bottleneck for Point Clouds), introduces an Information Bottleneck (IB) layer as an auxiliary regularization task alongside supervised prediction, encouraging information compression and improving model robustness. Evaluations are conducted on two representative benchmarks, Semantic3D (outdoor) and TUM RGB-D (indoor) across, five criteria: (i) segmentation accuracy, (ii) calibration, (iii) robustness to noise and occlusion, (iv) computational efficiency, and (v) structural fidelity of architectural elements. The IB-regularized model consistently improves mean Intersection over Union (mIoU), overall accuracy, and macro F1, while reducing Expected Calibration Error (ECE) and Negative Log-Likelihood (NLL). The model remains stable under noise, occlusion, and varying point densities, and yields more consistent segmentation of architectural components such as walls, floors, and columns. These improvements are achieved with roughly 30% fewer FLOPs and reduced memory consumption, demonstrating the method’s efficiency and suitability for large-scale scan-to-BIM and digital twin applications.

Keywords:

Building Information Modeling; point cloud; Information Bottleneck; Model Calibration

1. Introduction

Artificial Intelligence (AI) is transforming the architecture, engineering, and construction (AEC) industry by enabling data-driven design, intelligent construction planning, and predictive building management across the entire project lifecycle [1]. Generative AI models, such as generative adversarial networks (GANs) [2] and diffusion-based frameworks [3], rapidly generate structural and spatial alternatives [4], whereas predictive learning models and multi-objective optimization frameworks facilitate the evaluation of retrofit strategies [5] by balancing energy efficiency, economic cost, and environmental impact across urban contexts [6]. Digital twins [7] and Building Information Modeling (BIM) [8,9] together enable the integration of multi-source architectural and operational data into dynamic, computable representations of the built environment, bridging static design models with real-time performance monitoring across the building lifecycle. Together, these developments reflect a data-driven transformation of architecture, emphasizing the need for reliable and transparent computational approaches to support sustainable and efficient built environments [10].

Among the available data modalities for representing the built environment, 3D point clouds [11] have become fundamental tools in both visual computing and architectural documentation [12]. Acquired through terrestrial LiDAR, photogrammetry, or RGB-D scanning, point clouds provide precise geometric measurements of buildings and spaces, forming the basis for as-built modeling, renovation, and scan-to-BIM workflows [13]. However, efficient point cloud analysis remains difficult: dense scans are often redundant and memory-intensive, sensor noise and occlusions introduce distributional variability, and architectural tasks require high semantic precision to delineate boundaries between elements such as walls, doors, and columns. Recent work has therefore explored methods for reducing spatial and temporal redundancy in point cloud representations to enable real-time processing without compromising accuracy [14,15]. These factors highlight the need for representations that are compact to reduce redundancy, robust to noise and occlusion, and semantically consistent in describing architectural elements.

The Information Bottleneck (IB) principle [16,17] provides a rigorous theoretical lens for understanding how learning systems compress and preserve information. In this framework, the input signal X, which may contain both task-relevant and irrelevant details, is transformed into a latent representation Z, for example through a neural network encoder. The objective is to filter X through this “bottleneck” so that Z becomes a compact summary that retains only the information necessary for predicting the target variable Y, such as semantic categories or structural attributes, while discarding nuisance variability. Formally, this is achieved by maximizing the mutual information

I (Z; Y)

under a constraint on

I (Z; X)

, thereby balancing sufficiency and compression. Intuitively, the IB framework describes how an effective representation emerges when the model learns to preserve what matters for the task and ignore what does not.

The Variational Information Bottleneck(VIB) [18,19] reformulates the intractable IB objective into a variational lower bound that can be optimized within deep neural networks. By introducing stochastic encoders and applying the reparameterization trick, VIB enables end-to-end training via gradient descent. This practical formulation preserves the original goal of the IB principle, maintaining a trade-off between retaining task-relevant information and suppressing input redundancy, while remaining computationally feasible for large-scale learning. It has also been applied to structured representation learning, such as disentangling identity and expression in 3D facial modeling through mutual-information regularization [20], as well as 3D self-supervised representation learning [21].

This perspective aligns naturally with architectural point-cloud analysis. Compact embeddings reduce computational cost and energy usage, enabling large-scale BIM and digital twin workflows. Robust latent features improve stability under noise, occlusion, and uneven sampling, while sufficient representations preserve geometric regularities that define architectural semantics. Thus, the Information Bottleneck (IB) principle provides not only a theoretical foundation but also a practical guideline for constructing efficient and reliable point-cloud networks in the built environment.

This paper presents an IB-regularized framework for 3D point-cloud learning in architectural analysis; we name it IB-PC. We position this work within the broader development of 3D neural architectures, ranging from global PointNet [22] to hierarchical PointNet++ [23] and transformer-based networks [24]. While these models have achieved high accuracy, they often overlook calibration, computational efficiency, and structural fidelity, factors that are critical in architectural applications [25,26]. By integrating IB constraints into point-cloud backbones, the proposed IB-PC framework jointly optimizes accuracy, robustness, and efficiency. Beyond architectural integration, the novelty lies in redefining point-cloud representation learning as an information-theoretic trade-off between compression and generalization, offering a unified and interpretable perspective that guides both model design and evaluation.

The main contributions of this study are as follows: The overall objective of this work is to develop a robust and information-efficient learning framework for 3D architectural point-cloud analysis, guided by the Information Bottleneck principle.

We develop an IB-PC formulation that redefines point-cloud representation learning under an information-theoretic trade-off between compression and sufficiency, providing both theoretical insight and practical regularization for architectural point clouds.
We establish an extended evaluation protocol encompassing not only accuracy but also calibration (ECE, Brier score, NLL), robustness to degradation (noise, occlusion, density variation), efficiency (FLOPs, parameters, latency, memory), and structural fidelity (residuals for planes and cylinders, inlier ratios).
We demonstrate consistent performance improvements on indoor (TUM RGB-D) [27] and outdoor (Semantic3D) [28] datasets, showing that IB-PC models yield more favorable trade-offs than baselines such as random sampling, PCA, and autoencoders.

This paper is organized as follows: Section 2 reviews related work on the Information Bottleneck (IB) principle and its applications. Section 3 presents the problem formulation, defining the research questions and introducing the core concepts of IB. Section 4 describes the proposed method, including preprocessing, encoder, IB layer, and decoder/classifier design. Section 5 outlines the experimental setup, covering evaluation protocols, metrics, baselines, results, and analyses.

Overall, this work introduces the Information Bottleneck as a principled framework for learning compact, robust, and geometrically consistent representations in architectural point-cloud analysis, where efficiency, robustness, and semantic preservation arise naturally from the representation learning process rather than being treated as auxiliary objectives.

2. Related Work

2.1. Information Bottleneck and Its Variational Formulation

The Information Bottleneck (IB) framework [16] formulates representation learning as a trade-off between compression and prediction. It seeks a latent representation Z that retains the information relevant to the target Y while discarding redundancy in the input X. This principle captures how compact yet task-relevant structure can be extracted from high-dimensional observations.

To make the IB principle computationally tractable, Alemi et al. [18] introduced a variational approximation that replaces the intractable mutual information terms with Kullback–Leibler (KL) divergences. Under this formulation, the encoder

q_{ϕ} (z | x)

and decoder

p_{θ} (y | z)

jointly optimize a trade-off between reconstruction accuracy and compression regularization, leading to the well-known Variational Information Bottleneck (VIB) model. The approach is typically trained end-to-end using the reparameterization trick

z = μ_{ϕ} (x) + σ_{ϕ} (x) ⊙ ϵ

, where

ϵ \sim N (0, I)

, to enable differentiable sampling and stable gradient estimation.

2.2. Extensions and Applications

Beyond its original formulation, the Information Bottleneck framework has given rise to various extensions that adapt the mutual information objective to different settings. These include kernel- and divergence-based variants, such as the CS-IB model [29], which replace or generalize the Kullback–Leibler divergence to improve flexibility in continuous and structured domains. Regularization schemes derived from the Information Bottleneck principle have been applied to a variety of data modalities, ranging from visual and geometric representations to language models. These studies consistently report enhanced calibration, robustness to noise, and interpretability of learned representations [30,31].

Information-theoretic approaches have also been employed to enhance interpretability in 3D point-cloud analysis. For instance, InfoCons [32] decomposes point clouds into semantically coherent subsets whose information contribution to model predictions can be quantified, thereby linking geometric structure with explanatory relevance.

2.3. Learning-Based and Compression-Based Architectures

Point-based learning has evolved rapidly from multilayer perceptron (MLP)-based models to transformer architectures [33]. PointNet [22] first enabled direct learning on unordered point sets through shared MLPs and symmetric pooling, ensuring permutation invariance without voxelization. PointNet++ [23] extended this idea by applying PointNet hierarchically on local neighborhoods, capturing fine-grained spatial features across multiple scales. Recent transformer-based models further improved contextual reasoning: Point Transformer [34] employs self-attention to model long-range dependencies, while Point Transformer V3 (PTv3) [35] emphasizes scalability and simplicity through efficient neighbor mapping and large receptive fields. Together, these architectures represent a shift from MLP-based encoders toward attention-driven frameworks, forming a solid foundation for the proposed IB-PC regularization.

In parallel, compression-based baselines evaluate the impact of representation compactness independent of network design. Random Subsampling (RS) uniformly reduces point density, PCA Projection provides linear dimensionality reduction, and Autoencoders (AE) [36] learn non-linear latent embeddings through reconstruction. These baselines bridge geometric simplification and learned compression, offering a complementary perspective to the Information Bottleneck-based formulation adopted in this work.

3. Problem Formulation

The learning objective follows a supervised information-theoretic formulation under the Information Bottleneck (IB) principle. Given a dataset

{(x_{i}, y_{i})}_{i = 1}^{N}

with C semantic classes, the goal is to learn a latent representation Z that retains information useful for predicting the semantic labels Y while discarding irrelevant or redundant details in the input X.

This principle defines a trade-off between compression and prediction: the representation should be as compact as possible without losing task-relevant information. In practice, mutual information terms

I (X; Z)

and

I (Z; Y)

are not computed directly but estimated through a variational approximation, where the relevance to Y is optimized via supervised learning and the redundancy from X is constrained by a Kullback–Leibler (KL) regularizer. The resulting variational loss, which operationalizes this trade-off, is presented in Section 4.3.

4. Method

This section introduces the overall architecture and its main components. Figure 1 illustrates the end-to-end pipeline that integrates the Information Bottleneck (IB) principle into 3D point-cloud learning. The framework consists of four key stages: preprocessing of raw point clouds, feature encoding, stochastic compression through the IB layer, and decoding/classification. We refer to the proposed approach as IB-PC, emphasizing its integration of the Information Bottleneck principle into point-cloud learning.

4.1. Preprocessing

Given the irregular and unordered nature of 3D point clouds, a standardized preprocessing pipeline is employed to ensure geometric consistency and computational efficiency:

Normalization: Each point block is centered by subtracting its centroid and scaled to fit within a unit sphere, removing translation and scale bias.
Voxelization: The spatial density is regularized by discretizing the 3D space into voxels, with one representative point per voxel. The voxel size (2–8 cm for indoor scenes, 5–20 cm for outdoor scenes) balances fidelity and efficiency.
Feature Augmentation: In addition to coordinates, RGB color, surface normals (via PCA), and relative height are concatenated to enrich the representation.
Block Sampling: Overlapping blocks of 4k–8k points are randomly extracted to form training batches, ensuring diversity and memory feasibility.

To mitigate class imbalance, especially for small categories such as doors or windows, class-balanced sampling and focal loss weighting are applied.

4.2. Encoder Network

We consider a supervised mapping from a 3D point cloud to its semantic labels. Let

X = {x_{i}}_{i = 1}^{N}, x_{i} \in R^{d},

where each point

x_{i}

carries geometric and photometric features. The encoder network

f_{ϕ}

extracts a latent representation Z:

Z = f_{ϕ} (X),

(1)

where Z captures semantically relevant and spatially structured information. For segmentation tasks, Z retains per-point embeddings, while for object recognition, features can be aggregated globally.

4.3. Information Bottleneck Layer

To enhance generalization and robustness, a stochastic bottleneck layer is introduced after the encoder. The overall integration of this module into the PointNet++ backbone is illustrated in Figure 2, where the IB layer follows the encoder and regularizes the latent feature distribution.

Following the Information Bottleneck (IB) principle, the latent variable Z is optimized to preserve only the information relevant to the prediction target Y, while compressing the redundant content in the input X. This trade-off is formulated as

min_{p (z | x)} I (X; Z) - β I (Z; Y), β > 0,

(2)

where

β

controls the balance between compression and predictive sufficiency. A variational approximation replaces

I (X; Z)

with a Kullback–Leibler (KL) term and

I (Z; Y)

with a supervised log-likelihood, yielding

L_{IB} = E_{p (x, y)} [- log p_{θ} (y | z)] + β E_{p (x)} [KL (q_{ϕ} (z | x) ∥ r (z))] .

(3)

Specifically, the encoded features are modeled as Gaussian variables parameterized by mean and variance:

z = μ_{ϕ} (x) + σ_{ϕ} (x) ⊙ ϵ, ϵ \sim N (0, I),

(4)

where ⊙ denotes element-wise multiplication. This reparameterization enables differentiable sampling and encourages compact yet sufficient encodings. The resulting objective thus combines supervised prediction with KL-based information regularization, promoting stochastic compression and robustness. A complete derivation of the variational formulation is provided in Appendix A. The resulting latent representation Z is then passed to the decoder for downstream prediction.

4.4. Decoder and Classifier

The decoder

g_{θ}

maps the compressed latent representations to per-point or per-object predictions:

\hat{Y} = g_{θ} (Z) .

(5)

In our implementation, the decoder functions as a lightweight classifier composed of multilayer perceptrons (MLPs) and feature interpolation layers, enabling efficient upsampling and semantic reconstruction. At test time, the model operates deterministically by using only the mean latent code

μ_{ϕ} (x)

.

During training, both the segmentation objective and the information regularization term are optimized jointly:

L = L_{seg} (\hat{Y}, Y) + λ KL (q_{ϕ} (z | x) ∥ N (0, I)),

(6)

where

λ

balances predictive sufficiency and compression.

5. Experimental Setting

5.1. Datasets

Two representative datasets, as summarized in Table 1, are used to capture both indoor and outdoor architectural conditions:

TUM RGB-D (Indoor) (https://vision.in.tum.de/data/datasets/rgbd-dataset, accessed on 2 September 2025): Comprising RGB-D sequences of office and residential environments, this dataset highlights structural elements such as walls, floors, ceilings, windows, and furniture. It poses challenges related to occlusion, lighting, and sensor noise, closely reflecting indoor scan-to-BIM scenarios.
Semantic3D (Outdoor) (http://www.semantic3d.net, accessed on 3 September 2025): A large-scale LiDAR benchmark of urban and architectural environments, including streets, squares, and façades. Each scan contains millions of points labeled into categories such as building, vegetation, ground, and vehicles. It tests scalability, class imbalance handling, and geometric generalization.

Both datasets are divided into training, validation, and test splits following standard conventions. Our framework is agnostic to specific partition schemes.

5.1.1. Training Protocol

All models are trained under identical configurations: AdamW optimizer (lr

= 10^{- 3}

, cosine annealing, weight decay

= 10^{- 4}

), batch size 16, and 200 epochs with early stopping (patience 20). Data augmentations include Gaussian jitter, random rotation, and point dropout (10–30%). Hyperparameters are selected via validation mIoU, with

λ \in {10^{- 4}, 5 \times 10^{- 4}, 10^{- 3}, 5 \times 10^{- 3}}

and latent dimension

| Z | \in {32, 64, 128}

. Each experiment is repeated three times with independent seeds, and results are reported as mean ± std.

5.1.2. Evaluation Metrics

Model performance, summarized in Table 2, is assessed across six complementary dimensions that jointly capture predictive accuracy, reliability, robustness, and computational efficiency:

Accuracy: The core segmentation quality is evaluated using the Intersection over Union (IoU):

$IoU (c) = \frac{T P_{c}}{T P_{c} + F P_{c} + F N_{c}}, mIoU = \frac{1}{C} \sum_{c = 1}^{C} IoU (c) .$

We report mean IoU (mIoU), overall accuracy (OA), mean class accuracy (mAcc), macro F1, and Cohen’s $κ$ .
Calibration: Reliability of predicted confidence is quantified by the Expected Calibration Error (ECE, 15 bins):

$ECE = \sum_{m = 1}^{M} \frac{| B_{m} |}{n} | acc (B_{m}) - conf (B_{m}) |,$

complemented by the Negative Log-Likelihood (NLL) and the Brier score.
Robustness: Sensitivity to input perturbations is measured under point jitter, occlusion, and density variation. The Corruption Error (CE) denotes the relative degradation in mIoU with respect to clean input data.
Computational Efficiency: We assess floating-point operations (FLOPs), parameter count, inference latency, throughput, GPU memory footprint, and an energy proxy, capturing the scalability of the approach.
Data Efficiency: Models are trained on subsets containing ${10, 25, 50, 100} %$ of the labeled data to evaluate generalization under limited supervision.
Geometric Consistency: For architectural categories (e.g., walls, floors, façades), we compute plane and cylinder fitting residuals and inlier ratios, forming a Geometric Consistency Score (GCS) that quantifies spatial regularity and alignment with architectural priors.

5.1.3. Baseline Methods

We benchmark against two complementary categories of methods that represent distinct philosophies in point-cloud processing:

Learning-based architectures:
–
PointNet [22]: The pioneering neural network that directly consumes unordered point sets without voxelization or rasterization. It employs shared multilayer perceptrons (MLPs) and a symmetric max-pooling operation to ensure permutation invariance, providing a unified and efficient framework for 3D object classification, part segmentation, and scene parsing. However, as noted in subsequent studies, its global pooling mechanism neglects local geometric context.
–
PointNet++ [23] introduces a hierarchical architecture that recursively applies PointNet to local neighborhoods defined in metric space. This design enables the extraction of fine-grained local features and their aggregation across multiple scales, improving generalization to complex and non-uniformly sampled scenes.
–
Point Transformer [34] extends this line of research by introducing self-attention mechanisms into point-cloud learning. Inspired by advances in natural language processing and 2D vision transformers, it models local feature relationships through attention-weighted aggregation, achieving state-of-the-art performance on large-scale benchmarks such as S3DIS and SemanticKITTI. Its ability to adaptively capture long-range dependencies makes it particularly effective for semantic segmentation and object classification in complex 3D environments.
–
Point Transformer V3 (PTv3) [35] focuses on scaling efficiency while maintaining accuracy. Instead of introducing new attention mechanisms, PTv3 emphasizes architectural simplicity and scalability through efficient serialized neighbor mapping and large receptive fields. This design achieves significant improvements in speed and memory efficiency, while setting new performance records across more than twenty indoor and outdoor point-cloud benchmarks. Together, these models highlight the evolution from MLP-based encoders to attention-driven architectures, forming a strong foundation for subsequent information-theoretic regularization such as the proposed IB-PC framework.
Compression-based baselines: To disentangle the contribution of representation compression from architectural learning, we include three non-learning and autoencoding baselines: (i) Random Subsampling (RS), which uniformly reduces point density for efficiency analysis; (ii) PCA Projection, which projects points onto their top-k principal axes to evaluate linear compression performance; and (iii) Autoencoder (AE), a reconstruction-based latent embedding method serving as a learned compression reference. These methods span naive geometric reduction to unsupervised feature learning, providing a meaningful contrast to the IB-regularized framework.

5.2. Experimental Results

This section presents the empirical evaluation of the proposed Information Bottleneck (IB) framework for architectural point-cloud understanding. We begin with quantitative results on the outdoor Semantic3D and indoor TUM RGB-D benchmarks, followed by per-class analysis, ablations on bottleneck configuration, calibration and reliability studies, robustness tests, efficiency profiling, data-efficiency experiments, geometric consistency assessment, and qualitative visualization.

5.2.1. Three-Dimensional Semantic Results in the Indoor Environment

Table 3 and Table 4 summarize indoor segmentation results. Although absolute gains are smaller than in the outdoor case, improvements in calibration metrics are more pronounced. Indoor scenes contain fewer classes and exhibit cleaner geometry, leaving less room for large mIoU jumps; however, the probabilistic bottleneck regularizes the latent space and mitigates overconfident predictions. The Brier score drops from 0.132 to 0.112, a meaningful change for indoor automation where overconfidence can lead to unreliable decisions. In cluttered regions, IB concentrates model capacity on discriminative local cues, improving recognition of narrow openings and smaller architectural features, thereby reducing manual post-processing in BIM pipelines.

When combined with stronger geometric backbones such as Point Transformer and Point Transformer v3, IB continues to yield consistent benefits. Despite the already high baselines (mIoU up to 76.5%), IB further improves both segmentation and calibration, reaching 79.7% mIoU and reducing ECE from 5.6% to 3.8%. This confirms that the bottleneck acts as a complementary inductive bias, refining uncertainty even in transformer-based architectures with dense attention and global context modeling.

Per-class IoUs on TUM RGB-D are shown in Table 5. IB provides notable improvements for boundary-sensitive and structurally relevant categories such as doors, windows, and columns. Improvements of five to six IoU points in these classes result in cleaner geometric reconstruction. Easy planar categories like floors and ceilings remain stable, confirming that performance gains do not come at the expense of simpler regions.

We quantify alignment with architectural priors via plane and cylinder fitting residuals and inlier ratios for walls, floors, columns, and facades. Table 6 shows consistent improvements, suggesting that IB preserves dominant geometric regularities important for structural modeling.

To complement quantitative results, we first include a synthetic indoor layout for visual inspection of boundary behavior. Figure 3 compares ground truth, baseline, and IB-enhanced predictions. The IB variant produces cleaner object boundaries around door and column interfaces while maintaining smooth planar regions.

A second example on the S3DIS benchmark [28] highlights similar effects under realistic conditions. Figure 4 shows that IB reduces spurious fragments along wall boundaries and preserves large-scale structure, confirming its benefit in complex indoor scenes.

5.2.2. Three-Dimensional Semantic Results in the Outdoor Environment

Table 7 reports the outdoor segmentation performance. The IB-regularized model consistently surpasses both classical baselines and modern learning-based methods. Relative to PointNet++, it improves mIoU by roughly four points and MacroF1 by over four, while also reducing ECE and NLL. These results indicate that the bottleneck not only compresses redundant representation but also enhances semantic sufficiency by filtering nuisance variation in large-scale LiDAR data.

The reduction in FLOPs is substantial, around 30% compared with the uncompressed baseline without a penalty in latency. This suggests that IB achieves a more balanced trade-off between accuracy, calibration, and efficiency, which is valuable in large-scale urban modeling tasks where segmentation errors in minor classes (e.g., vehicles or artifacts) can propagate through BIM reconstruction.

When compared with compression-only baselines such as Random Subsampling (RS), PCA Projection, or Autoencoder (AE) embeddings, IB maintains clear superiority across nearly all dimensions. Subsampling provides computational savings but severely reduces accuracy; PCA fails to preserve non-linear geometric relations; and autoencoder bottlenecks, trained for reconstruction, do not guarantee semantic alignment. In contrast, IB enforces a principled balance between compression and predictive fidelity, ensuring that the latent representation Z remains closely aligned with task-relevant semantics.

Table 8 further extends this analysis to transformer-based architectures, where Point Transformer and Point Transformer v3 already achieve strong baselines. Even under such high-capacity settings, IB continues to yield tangible benefits, raising mIoU from 79.1% to 82.6% and lowering ECE from 5.1% to 3.3%. This demonstrates that the information bottleneck remains complementary to global-attention mechanisms, acting as a semantic regularizer that stabilizes uncertainty and enhances calibration in dense outdoor scenes.

We analyze the influence of the IB weight

λ

and latent dimensionality

| Z |

on Semantic3D using a single bottleneck layer within PointNet++. Table 9 shows a clear trade-off surface: moderate regularization (

λ \in [5 \times 10^{- 4}, 10^{- 3}]

) with mid-sized latents (

| Z | \in {64, 128}

) yields the most balanced outcome. Over-compression (

λ = 5 \times 10^{- 3}, | Z | = 32

) degrades performance on minor classes, reflected by lower MacroF1.

Three integration points are compared: early (after first grouping), mid (after intermediate abstraction), and late (after final aggregation). Table 10 shows that late placement provides the highest accuracy and best calibration, likely because compression on semantically enriched features retains more meaningful structure. Early insertion yields slightly lower FLOPs but at the cost of semantic detail. The observed ∼30% FLOP reduction primarily results from reduced activation variance and sparsity induced by the stochastic IB compression, rather than architectural pruning. The network topology remains unchanged across configurations, confirming that efficiency gains stem from lower computational density rather than model size.

We further assess prediction reliability through selective prediction, where outputs below a confidence threshold

τ

are deferred. Table 11 reports the area under the risk–coverage curve (AURC; lower is better) and coverage at 1% risk (Cov@1%R; higher is better). IB consistently improves both, suggesting more calibrated confidence ranking—an advantage for human-in-the-loop verification in BIM environments.

Robustness is evaluated under additive jitter, occlusion, and density variation. The Corruption Error (CE) is defined as the drop in mIoU relative to clean input. As shown in Table 12, IB consistently lowers CE, particularly under 30% occlusion, where the model learns to rely on stable structural cues instead of high-frequency artifacts.

Efficiency is profiled in terms of throughput (k pts/s), latency, peak memory, and an energy proxy (J/sample) under identical batch sizes. Table 13 shows that IB achieves higher throughput and lower memory consumption despite similar parameter counts, implying that constrained latent variance leads to more compact activations, beneficial for deployment on resource-limited devices.

We vary the fraction of labeled samples on Semantic3D and evaluate performance. As shown in Table 14, IB consistently narrows the gap at low-label regimes, supporting the intuition that compressed features generalize better under limited supervision.

We examine inference robustness to voxelization density by varying voxel sizes. As shown in Table 15, IB degrades more gracefully under coarser voxelization, indicating improved tolerance to sampling irregularities common in real scans.

5.3. Discussion and Analyses

5.3.1. Balancing Accuracy, Efficiency, and Reliability

The experiments demonstrate that the Information Bottleneck (IB) offers a practical balance among accuracy, efficiency, and reliability in architectural point-cloud analysis. On Semantic3D, the IB-augmented PointNet++ achieves higher mIoU and MacroF1 while reducing FLOPs by about 30%, with negligible latency overhead (Table 7). This behavior can be interpreted through the notion of sufficiency: by constraining the latent representation to retain only task-relevant information, redundant components that inflate computation without improving prediction are naturally suppressed. Such efficiency gains are particularly valuable in large-scale urban scanning, where throughput directly impacts project turnaround time.

Across both indoor and outdoor datasets, lower ECE, NLL, and Brier scores indicate that IB not only enhances accuracy but also improves the reliability of model confidence—an equally important dimension for automated quality control in BIM workflows. The overall outcome is a more compact and trustworthy model that maintains predictive power while reducing uncertainty in downstream use.

5.3.2. Robustness and Geometric Consistency

IB also improves robustness to common perturbations such as jitter, occlusion, and density variation (Table 12), suggesting a stronger reliance on stable geometric structure rather than fragile high-frequency details. This interpretation aligns with the observed improvements in geometric consistency (Table 6), where lower residuals and higher inlier ratios are achieved for walls, floors, columns, and façades. In practice, these improvements manifest as cleaner boundaries and fewer manual corrections in scan-to-BIM pipelines. The ablation results provide additional evidence: bottlenecks inserted at later network stages (Table 10) act on semantically richer representations, preserving class structure while filtering out noise variations.

5.3.3. Architectural and Industrial Implications

From an AEC perspective, several implications arise. First, we must consider deployability: improved computational efficiency and reduced memory footprint make real-time inference feasible on mobile or embedded systems such as site scanners and inspection robots (Table 13). Second, confidence-based supervision arises: improved calibration enhances selective prediction (Table 11), allowing human review thresholds to be systematically defined in automated quality assurance workflows. Third, we may consider scalability: robustness to density shifts and occlusion (Table 12 and Table 15) enables consistent performance across assets of varying resolution and coverage, supporting reliable digital twin construction. Notably, all these benefits are achieved without changing the original backbone architecture, minimizing engineering overhead for integration.

5.3.4. Limitations and Future Directions

Two main limitations remain. First, the approach shows sensitivity to the degree of compression: excessive regularization or very small latent dimensionality

| Z |

can reduce accuracy on thin-structure and minority classes (Table 9). Future work may address this by adopting adaptive schedules, layer-specific

λ

adjustment, or meta-learned control strategies to preserve balanced representational capacity. Second, the framework depends on variational approximations. Because the exact mutual information is intractable, standard KL-based bounds may either over-regularize or under-regularize depending on the geometry of the data. Developing tighter mutual-information estimators or contrastive objectives could improve training stability and reliability.

Looking forward, extending the Information Bottleneck to multi-modal contexts, such as incorporating imagery, CAD priors [37], or other structural constraints, appears promising for aligning learned representations with architectural semantics. Integrating domain-specific regularities such as the Manhattan-world assumption [38] into the bottleneck may further enhance geometric consistency and interpretability. Finally, translating improved calibration into practical decision procedures, including risk-aware validation, collaborative quality control, and propagation of uncertainty within subsequent modeling stages, remains an important step toward dependable scan-to-BIM automation.

6. Conclusions

This study introduced an Information Bottleneck formulation tailored for architectural point-cloud analysis, together with a comprehensive and transparent evaluation protocol. The proposed IB-PC approach achieves compact yet semantically faithful representations, improves calibration and robustness, and better preserves geometric regularities across both indoor and outdoor environments. Collectively, these results demonstrate the potential of information-theoretic regularization to enhance both efficiency and reliability in large-scale architectural modeling.

From an application perspective, the method can be readily integrated into existing workflows. The accompanying training and evaluation setup enables practitioners to reproduce results and adapt the configuration to project-specific requirements. Beyond immediate use, this formulation provides a principled foundation for embedding information-theoretic reasoning within digital twin and scan-to-BIM systems, where interpretability, consistency, and computational economy are equally essential.

Author Contributions

Conceptualization, Y.Z., Y.B. and Z.L.; methodology, Y.Z. and T.X.; software, Y.Z. and B.X.; validation, Y.Z., B.X. and T.X.; formal analysis, Y.Z.; investigation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, T.X., Y.B. and Z.L.; visualization, Y.Z.; supervision, Y.B. and Z.L.; project administration, Y.B. and Z.L.; funding acquisition, Y.B. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National R&D Program of China (Grant No. 2023YFB2504704); the National Natural Science Foundation of China (Grant Nos. 52208424, 52208416, 52078091, and 52108399); the Natural Science Foundation of Chongqing Municipality (Grant No. CSTB2022NSCQ-MSX0503); the China Postdoctoral Science Foundation (Grant No. 2022M710545); and the Chongqing Municipal Education Commission Foundation (Grant Nos. KJQN202200745 and KJQN202300728). The authors gratefully acknowledge their financial support.

Data Availability Statement

The datasets used in this study are publicly available. The TUM RGB-D dataset is accessible at https://vision.in.tum.de/data/datasets/rgbd-dataset (accessed on 2 September 2025). The Semantic3D dataset is available at http://www.semantic3d.net (accessed on 3 September 2025). No additional proprietary data were used. All data processing scripts and experimental configurations are available from the authors upon reasonable request.

Conflicts of Interest

Author Zongbin Luo was employed by the company China Merchants Chongqing Communications Research & Design Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IB	Information Bottleneck
MI	Mutual Information
BIM	Building Information Modeling
AEC	Architecture, Engineering, and Construction

Appendix A. Variational Information Bottleneck

Appendix A.1. IB Objective

Let X be input, Y be target, and Z be representation. The Information Bottleneck (IB) formulates a compression–relevance trade-off:

min_{p (z | x)} I (X; Z) - β I (Z; Y), β > 0,

(A1)

seeking a minimal sufficient Z that preserves information about Y while discarding nuisance in X.

Appendix A.2. Upper Bound on I(X;Z)

With an encoder

q_{ϕ} (z | x)

and a simple prior

r (z)

(e.g.,

N (0, I)

),

\begin{matrix} I (X; Z) & = E_{p (x)} [KL (q_{ϕ} (z | x) ∥ q_{ϕ} (z))], q_{ϕ} (z) = \int q_{ϕ} (z | x) p (x) d x, \end{matrix}

(A2)

\begin{matrix} = E_{p (x)} [KL (q_{ϕ} (z | x) ∥ r (z))] - KL (q_{ϕ} (z) ∥ r (z)) \end{matrix}

(A3)

\leq E_{p (x)} [KL (q_{ϕ} (z | x) ∥ r (z))] .

(A4)

Thus,

I (X; Z) \leq E_{p (x)} [KL (q_{ϕ} (z | x) ∥ r (z))]

, which is tractable for Gaussian

q_{ϕ}

and r.

Appendix A.3. Lower Bound on I(Z;Y)

Using

I (Z; Y) = H (Y) - H (Y | Z)

and a variational decoder

p_{θ} (y | z)

,

\begin{matrix} I (Z; Y) & = H (Y) + E_{p (x, y) q_{ϕ} (z | x)} [log p (y | z)] \end{matrix}

(A5)

\begin{matrix} \geq H (Y) + E_{p (x, y) q_{ϕ} (z | x)} [log p_{θ} (y | z)], \end{matrix}

(A6)

so maximizing

I (Z; Y)

amounts (up to the constant

H (Y)

) to maximizing the expected log-likelihood, i.e., minimizing cross-entropy for classification/segmentation.

Appendix A.4. VIB Training Objective

Combining the bounds yields the standard VIB loss:

L_{VIB} = E_{p (x, y) q_{ϕ} (z | x)} [- log p_{θ} (y | z)] + β E_{p (x)} [KL (q_{ϕ} (z | x) ∥ r (z))] .

(A7)

For point-cloud semantic segmentation, the expectation sums over points in a block/mini-batch; the first term is the point-wise cross-entropy.

Appendix A.5. Gaussian Encoder and Reparameterization

With

q_{ϕ} (z | x) = N (μ_{ϕ} (x), diag (σ_{ϕ}^{2} (x)))

and

r (z) = N (0, I)

,

z = μ_{ϕ} (x) + σ_{ϕ} (x) ⊙ ϵ, ϵ \sim N (0, I),

(A8)

enabling low-variance Monte Carlo gradients via the reparameterization trick.

Appendix A.6. Choosing the Bottleneck Strength

Larger

β

(often denoted

λ

in implementations) enforces stronger compression (

I (X; Z) ↓

), which typically improves calibration and robustness but may reduce performance on small/rare classes (e.g., doors/windows). Practical sweeps include

β \in {10^{- 4}, 5 \times 10^{- 4}, 10^{- 3}, 5 \times 10^{- 3}}

.

References

Li, Y.; Chen, H.; Yu, P.; Yang, L. A Review of Artificial Intelligence in Enhancing Architectural Design Efficiency. Appl. Sci. 2025, 15, 1476. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
Li, C.; Zhang, T.; Du, X.; Zhang, Y.; Xie, H. Generative AI Models for Different Steps in Architectural Design: A Literature Review. arXiv 2024, arXiv:2404.01335. [Google Scholar] [CrossRef]
Asadi, E.; da Silva, M.G.; Antunes, C.H.; Dias, L. Multi-objective Optimization for Building Retrofit Strategies: A Model and an Application. Energy Build. 2012, 44, 81–87. [Google Scholar] [CrossRef]
Shan, R.; Jia, X.; Su, X.; Xu, Q.; Ning, H.; Zhang, J. AI-Driven Multi-Objective Optimization and Decision-Making for Urban Building Energy Retrofit: Advances, Challenges, and Systematic Review. Appl. Sci. 2025, 15, 8944. [Google Scholar] [CrossRef]
Sharma, A.; Kosasih, E.; Zhang, J.; Brintrup, A.; Calinescu, A. Digital Twins: State of the Art Theory and Practice, Challenges, and Open Research Questions. J. Ind. Inf. Integr. 2022, 30, 100383. [Google Scholar] [CrossRef]
Matarneh, S.; Danso-Amoako, M.; Al-Bizri, S.; Gaterell, M.; Matarneh, R. Building Information Modeling for Facilities Management: A Literature Review and Future Research Directions. J. Build. Eng. 2019, 24, 100755. [Google Scholar] [CrossRef]
Khattra, S.; Jain, R. Building Information Modeling: A Comprehensive Overview of Concepts and Applications. Adv. Res. 2024, 25, 140–149. [Google Scholar] [CrossRef]
Zhang, L.; Li, Y.; Pan, Y.; Ding, L. Advanced Informatic Technologies for Intelligent Construction: A Review. Eng. Appl. Artif. Intell. 2024, 137 Pt A, 109104. [Google Scholar] [CrossRef]
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. arXiv 2020, arXiv:1912.12033. [Google Scholar] [CrossRef]
Xu, T.; Tian, B.; Zhu, Y. Tigris: Architecture and Algorithms for 3D Perception in Point Clouds. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52), Columbus, OH, USA, 12–16 October 2019; pp. 629–642. [Google Scholar] [CrossRef]
Skrzypczak, I.; Oleniacz, G.; Leśniak, A.; Zima, K.; Mrówczyńska, M.; Kazak, J.K. Scan-to-BIM Method in Construction: Assessment of the 3D Buildings Model Accuracy in Terms of Inventory Measurements. Build. Res. Inf. 2022, 50, 859–880. [Google Scholar] [CrossRef]
He, S.; Qu, X.; Wan, J.; Li, G.; Xie, C.; Wang, J. PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition. arXiv 2024, arXiv:2405.06929. [Google Scholar]
Sarker, S.; Sarker, P.; Stone, G.; Gorman, R.; Tavakkoli, A.; Bebis, G.; Sattarvand, J. A Comprehensive Overview of Deep Learning Techniques for 3D Point Cloud Classification and Semantic Segmentation. Mach. Vis. Appl. 2024, 35, 67. [Google Scholar] [CrossRef]
Tishby, N.; Pereira, F.C.; Bialek, W. The Information Bottleneck Method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 22–24 September 1999; pp. 368–377. [Google Scholar]
Tishby, N.; Zaslavsky, N. Deep Learning and the Information Bottleneck Principle. arXiv 2015, arXiv:1503.02406. [Google Scholar] [CrossRef]
Alemi, A.; Fischer, I.; Dillon, J.; Murphy, K. Deep Variational Information Bottleneck. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
Si, S.; Wang, J.; Sun, H.; Wu, J.; Zhang, C.; Qu, X.; Cheng, N.; Chen, L.; Xiao, J. Variational Information Bottleneck for Effective Low-Resource Audio Classification. arXiv 2021, arXiv:2107.04803. [Google Scholar] [CrossRef]
Sun, H.; Pears, N.; Gu, Y. Information Bottlenecked Variational Autoencoder for Disentangled 3D Facial Expression Modelling. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2334–2343. [Google Scholar] [CrossRef]
Cheng, H.; Han, X.; Shi, P.; Zhu, J.; Li, Z. Multi-Trusted Cross-Modal Information Bottleneck for 3D Self-Supervised Representation Learning. Knowl.-Based Syst. 2024, 283, 111217. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. arXiv 2017, arXiv:1612.00593. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, 30, 5105–5114. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Kong, L.; Xu, X.; Cen, J.; Zhang, W.; Pan, L.; Chen, K.; Liu, Z. Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding. arXiv 2025, arXiv:2403.17010. [Google Scholar]
Hagemann, A.; Knorr, M.; Stiller, C. Deep Geometry-Aware Camera Self-Calibration from Video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 4–6 October 2023; pp. 3415–3425. [Google Scholar] [CrossRef]
Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal, 7–12 October 2012. [Google Scholar]
Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3D Semantic Parsing of Large-Scale Indoor Spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar] [CrossRef]
Yu, S.; Yu, X.; Løkse, S.; Jenssen, R.; Principe, J.C. Cauchy-Schwarz Divergence Information Bottleneck for Regression. arXiv 2024, arXiv:2404.17951. [Google Scholar] [CrossRef]
Goldfeld, Z.; Polyanskiy, Y. The Information Bottleneck Problem and Its Applications in Machine Learning. arXiv 2020, arXiv:2004.14941. [Google Scholar] [CrossRef]
Zhu, K.; Feng, X.; Du, X.; Gu, Y.; Yu, W.; Wang, H.; Chen, Q.; Chu, Z.; Chen, J.; Qin, B. An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Bangkok, Thailand, 11–16 August 2024; pp. 1044–1069. Available online: https://aclanthology.org/2024.acl-long.59/ (accessed on 3 September 2020). [CrossRef]
Li, F.; Zhang, M.; Wang, Z.; Yang, M. InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory. arXiv 2025, arXiv:2505.19820. [Google Scholar] [CrossRef]
Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark. arXiv 2017, arXiv:1704.03847. [Google Scholar] [CrossRef]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point Transformer. Adv. Neural Inf. Process. Syst. (NeurIPS) 2021, 34, 16259–16270. [Google Scholar]
Wu, X.; Jiang, L.; Wang, P.-S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point Transformer V3: Simpler, Faster, Stronger. arXiv 2024, arXiv:2312.10035. [Google Scholar] [CrossRef]
Bank, D.; Koenigstein, N.; Giryes, R. Autoencoders. arXiv 2021, arXiv:2003.05991. [Google Scholar]
Birdal, T.; Ilic, S. CAD Priors for Accurate and Flexible Instance Reconstruction. arXiv 2017, arXiv:1705.03111. [Google Scholar] [CrossRef]
Coughlan, J.M.; Yuille, A.L. The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. In Proceedings of the 14th International Conference on Neural Information Processing Systems (NIPS’00); MIT Press: Cambridge, MA, USA, 2000; pp. 809–815. [Google Scholar]

Figure 1. Overall processing pipeline of the proposed IB-PC method, integrating the Information Bottleneck (IB) principle into point-cloud learning.

Figure 2. Integration of the proposed IB-PC method into the PointNet++ architecture. The IB layer regularizes latent features, enforcing compactness and robustness.

Figure 3. Qualitative comparison on a synthetic indoor scene. IB improves boundary localization and reduces noise in cluttered regions.

Figure 4. S3DIS example showing consistent gains in structural coherence. IB reduces fragmentation while maintaining smooth segmentation on planar surfaces.

Table 1. Statistics of the indoor (TUM RGB-D) and outdoor (Semantic3D) datasets.

Dataset	Scenes	Classes	Avg. Points/Scene
TUM RGB-D	∼50	10	1– $5 \times 10^{5}$
Semantic3D	15	8	$10^{7}$ – $10^{8}$

Table 2. Summary of the evaluation metrics employed across six complementary dimensions of performance.

Dimension	Metrics	Objective
Accuracy	mIoU, OA, mAcc, F1, $κ$	Predictive correctness
Calibration	ECE, NLL, Brier	Confidence reliability
Robustness	CE under noise/occlusion	Stability under perturbations
Computational Efficiency	FLOPs, latency, memory	Scalability and cost
Data Efficiency	mIoU vs. data fraction	Performance under low data
Geometric Consistency	GCS (plane/cylinder residuals)	Alignment with geometry priors

Table 3. TUM RGB-D (indoor): accuracy and calibration results. IB reduces overconfidence while modestly improving accuracy, which benefits human-in-the-loop applications relying on uncertainty-aware segmentation. Results are averaged over three independent runs with different random seeds; deviations are consistently small (below 0.5%) and omitted elsewhere for clarity. The best performance is in bold. ↓ indicates the lower the better.

Metric	PointNet	PointNet++	IB (Ours)	IB (PointNet++)
mIoU (%)	65.4 ± 0.3	69.1 ± 0.2	71.5 ± 0.3	73.1 ± 0.2
OA (%)	78.2 ± 0.4	81.5 ± 0.3	83.3 ± 0.3	84.6 ± 0.2
mAcc (%)	71.0 ± 0.3	74.6 ± 0.3	76.2 ± 0.2	77.9 ± 0.2
MacroF1 (%)	68.1 ± 0.4	71.6 ± 0.3	73.8 ± 0.3	75.6 ± 0.3
ECE (%)↓	8.2 ± 0.2	7.1 ± 0.2	5.5 ± 0.1	4.9 ± 0.1
Brier↓	0.142 ± 0.002	0.132 ± 0.002	0.119 ± 0.001	0.112 ± 0.001

Table 4. TUM RGB-D (indoor): comparison of Point Transformer backbones with and without Information Bottleneck (IB). IB consistently improves calibration and modestly enhances accuracy. Results are averaged over three independent runs with different random seeds. The best performance is in bold. ↓ indicates the lower the better.

Metric	Point Transformer	Point Transformer v3	IB (Ours)	IB (PTv3)
mIoU (%)	74.8 ± 0.3	76.5 ± 0.2	78.3 ± 0.3	79.7 ± 0.2
OA (%)	86.2 ± 0.3	87.5 ± 0.2	88.6 ± 0.2	89.8 ± 0.2
mAcc (%)	80.5 ± 0.3	82.1 ± 0.3	83.5 ± 0.2	84.7 ± 0.2
MacroF1 (%)	78.6 ± 0.3	80.3 ± 0.3	81.7 ± 0.3	83.1 ± 0.3
ECE (%)↓	6.2 ± 0.2	5.6 ± 0.2	4.3 ± 0.1	3.8 ± 0.1
Brier↓	0.118 ± 0.002	0.111 ± 0.002	0.103 ± 0.001	0.098 ± 0.001

Table 5. TUM RGB-D: per-class IoU for representative architectural categories. IB improves boundary-sensitive classes without sacrificing performance on large planar surfaces. The best performance is in bold.

Model	Wall	Floor	Ceiling	Column	Window	Door
PointNet++	78.3	85.6	82.1	61.4	57.9	54.2
IB (PointNet++)	81.7	87.5	84.3	65.9	62.8	59.6

Table 6. Geometric consistency on TUM RGB-D. IB emphasizes dominant geometric structures relevant to BIM extraction and architectural analysis. The best performance is in bold.

Model	Walls	Floors	Columns	Facades
PointNet++	12.5/84.9	8.8/90.1	17.2/75.1	15.5/79.6
IB (PointNet++)	10.9/87.8	7.7/92.3	15.2/79.6	13.5/83.1

Table 7. Semantic3D (outdoor): comparison of accuracy (mIoU, OA, MacroF1), calibration (ECE, NLL), and efficiency (FLOP reduction, parameters, latency). IB-enhanced variants achieve consistent gains in accuracy and reliability while maintaining practical runtime for large-scale urban reconstruction. Results are averaged over three independent runs with different random seeds; deviations are consistently below 0.5%, confirming stable performance across trials. The best performance is in bold. ↓ indicates the lower the better.

Metric	PointNet	PointNet++	RS/PCA/AE	IB (Ours)	IB (PointNet++)
mIoU (%)	68.2 ± 0.3	71.4 ± 0.3	69.5 ± 0.4	73.8 ± 0.3	75.2 ± 0.2
OA (%)	80.5 ± 0.3	83.2 ± 0.3	81.1 ± 0.3	85.0 ± 0.3	86.0 ± 0.2
MacroF1 (%)	71.3 ± 0.3	74.1 ± 0.3	72.5 ± 0.3	76.9 ± 0.3	78.6 ± 0.3
ECE (%)↓	7.8 ± 0.2	6.9 ± 0.2	7.3 ± 0.2	5.0 ± 0.1	4.4 ± 0.1
NLL↓	0.92 ± 0.01	0.81 ± 0.01	0.88 ± 0.01	0.70 ± 0.01	0.64 ± 0.01
FLOP Red.	–	–	28–35%	30%	28%
Params (M)	3.6	12.2	3.6–5.1	3.9	12.6
Latency (ms)	18	29	15–20	17	27

Table 8. Semantic3D (outdoor): accuracy (mIoU, OA, MacroF1), calibration (ECE, NLL), and efficiency comparison for transformer-based models. IB improves both reliability and segmentation quality even on high-capacity architectures, while maintaining practical runtime for large-scale point cloud processing. Results are averaged over three independent runs with deviations below 0.5%, confirming stable generalization across random seeds. The best performance is in bold. ↓ indicates the lower the better.

Metric	Point Transformer	Point Transformer v3	IB (Ours)	IB (PTv3)
mIoU (%)	77.6 ± 0.3	79.1 ± 0.2	81.2 ± 0.3	82.6 ± 0.2
OA (%)	88.9 ± 0.3	90.3 ± 0.2	91.4 ± 0.2	92.5 ± 0.2
MacroF1 (%)	80.2 ± 0.3	81.9 ± 0.3	83.5 ± 0.3	84.9 ± 0.3
ECE (%)↓	5.8 ± 0.2	5.1 ± 0.2	3.9 ± 0.1	3.3 ± 0.1
NLL↓	0.78 ± 0.01	0.72 ± 0.01	0.64 ± 0.01	0.59 ± 0.01
FLOP Red.	–	–	27%	25%
Params (M)	12.8	13.5	13.1	13.8
Latency (ms)	32	35	30	33

Table 9. Ablation on IB hyperparameters (Semantic3D). Moderate regularization and medium latent dimension achieve the best balance between accuracy, calibration, and efficiency. The best performance is in bold. ↓ indicates the lower the better.

Setting	mIoU (%)	MacroF1 (%)	ECE (%)↓	Latency (ms)
$λ = 1 \times 10^{- 4}, \| Z \| = 128$	74.2	77.4	5.1	28
$λ = 5 \times 10^{- 4}, \| Z \| = 128$	75.2	78.6	4.4	27
$λ = 1 \times 10^{- 3}, \| Z \| = 64$	74.8	78.1	4.6	26
$λ = 5 \times 10^{- 3}, \| Z \| = 32$	72.1	75.0	4.9	25

Table 10. Effect of bottleneck placement (Semantic3D). Late-stage compression on semantically rich features yields the strongest accuracy and calibration, while early insertion offers only marginal computational savings. The best performance is in bold. ↓ indicates the lower the better.

Placement	mIoU (%)	MacroF1 (%)	ECE (%)↓	FLOPs Red.
Early	73.1	76.0	5.3	33%
Mid	74.0	77.0	4.8	30%
Late	75.2	78.6	4.4	28%

Table 11. Selective prediction analysis on Semantic3D. IB improves both AURC and Cov@1%R, indicating better confidence calibration for risk-aware decision systems. The best performance is in bold. ↓ indicates the lower the better. ↑ indicates the higher the better.

Model	AURC ↓	Cov@1%R ↑
PointNet++	0.118	62.1
IB (PointNet++)	0.097	68.9

Table 12. Robustness evaluation on Semantic3D. IB reduces Corruption Error under jitter, occlusion, and density shifts, showing improved stability for real-world data. The best performance is in bold.

Corruption (Severity)	CE PointNet++	CE IB (PointNet++)
Jitter ( $σ = 0.01$ m)	5.1	3.2
Occlusion (30%)	8.5	6.1
Density Shift (+50% voxel)	7.2	4.6

Table 13. Computational profile on Semantic3D. IB improves throughput and memory efficiency without architectural modifications, supporting seamless integration in existing pipelines. The best performance is in bold.

Model	Throughput	Latency (ms)	Peak Mem (MB)	Energy (J/Sample)
PointNet++	170	29	980	0.82
IB (PointNet++)	185	27	900	0.71

Table 14. Data efficiency on Semantic3D. IB demonstrates stronger generalization with limited supervision, reducing annotation cost sensitivity. The best performance is in bold.

Train Fraction	PointNet++	IB (PointNet++)
10%	58.0	62.4
25%	64.7	68.3
50%	68.9	72.0
100%	71.4	75.2

Table 15. Effect of voxel size on Semantic3D inference. IB maintains higher mIoU as sampling becomes sparser, reflecting better resilience to density variation. The best performance is in bold.

Voxel (cm)	5	10	15	20
PointNet++	71.4	69.2	66.0	62.3
IB (PointNet++)	75.2	73.1	70.0	66.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Xie, B.; Xu, T.; Bi, Y.; Luo, Z. IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling. Electronics 2025, 14, 4399. https://doi.org/10.3390/electronics14224399

AMA Style

Zhang Y, Xie B, Xu T, Bi Y, Luo Z. IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling. Electronics. 2025; 14(22):4399. https://doi.org/10.3390/electronics14224399

Chicago/Turabian Style

Zhang, Yameng, Bingxue Xie, Ting Xu, Yanqiu Bi, and Zhongbin Luo. 2025. "IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling" Electronics 14, no. 22: 4399. https://doi.org/10.3390/electronics14224399

APA Style

Zhang, Y., Xie, B., Xu, T., Bi, Y., & Luo, Z. (2025). IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling. Electronics, 14(22), 4399. https://doi.org/10.3390/electronics14224399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IB-PC: An Information Bottleneck Framework for Point Cloud-Based Building Information Modeling

Abstract

1. Introduction

2. Related Work

2.1. Information Bottleneck and Its Variational Formulation

2.2. Extensions and Applications

2.3. Learning-Based and Compression-Based Architectures

3. Problem Formulation

4. Method

4.1. Preprocessing

4.2. Encoder Network

4.3. Information Bottleneck Layer

4.4. Decoder and Classifier

5. Experimental Setting

5.1. Datasets

5.1.1. Training Protocol

5.1.2. Evaluation Metrics

5.1.3. Baseline Methods

5.2. Experimental Results

5.2.1. Three-Dimensional Semantic Results in the Indoor Environment

5.2.2. Three-Dimensional Semantic Results in the Outdoor Environment

5.3. Discussion and Analyses

5.3.1. Balancing Accuracy, Efficiency, and Reliability

5.3.2. Robustness and Geometric Consistency

5.3.3. Architectural and Industrial Implications

5.3.4. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Variational Information Bottleneck

Appendix A.1. IB Objective

Appendix A.2. Upper Bound on I(X;Z)

Appendix A.3. Lower Bound on I(Z;Y)

Appendix A.4. VIB Training Objective

Appendix A.5. Gaussian Encoder and Reparameterization

Appendix A.6. Choosing the Bottleneck Strength

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI