SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling

Bianconi, Fabio; Filippucci, Marco; Cerbai, Claudia; Cornacchini, Filippo; Migliosi, Andrea

doi:10.3390/heritage9020070

Open AccessArticle

SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling

by

Fabio Bianconi

,

Marco Filippucci

,

Claudia Cerbai

^*

,

Filippo Cornacchini

and

Andrea Migliosi

Department of Civil and Environmental Engineering, University of Perugia, 06125 Perugia, Italy

^*

Author to whom correspondence should be addressed.

Heritage 2026, 9(2), 70; https://doi.org/10.3390/heritage9020070

Submission received: 13 January 2026 / Revised: 6 February 2026 / Accepted: 9 February 2026 / Published: 11 February 2026

(This article belongs to the Special Issue HBIM and Information Systems for Cultural Heritage Memory and Preservation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper introduces SIDe-HBIM (Single-Image Depth inference for HBIM), a semi-automated image-to-BIM pipeline aimed at improving the integration of architectural decorative elements into HBIM environments. The research addresses the difficulty of representing geometrically complex yet information-oriented heritage components when traditional survey techniques are impractical or disproportionate. Starting from a single photographic input, the methodology combines AI-based depth estimation, quantitative computational evaluation and parametric modeling to generate lightweight, morphologically coherent 3D elements suitable for non-photorealistic HBIM applications. Multiple image-to-depth models are processed in parallel and ranked through a weighted synthetic index based on geometric and structural indicators, after which the selected depthmap is converted into a continuous NURBS surface and integrated into a BIM environment. Application to three heterogeneous case studies from the Basilica of Santa Maria degli Angeli (Assisi) demonstrates that SIDe-HBIM is particularly effective for bas-reliefs and moderate-relief decorative apparatuses, offering a reproducible and efficient alternative for HBIM-oriented documentation.

Keywords:

cultural heritage digitization; Heritage Building Information Modeling (HBIM); artificial intelligence in heritage; image-to-BIM pipelines; single-image 3D reconstruction; depthmap-based modeling; HBIM integration of decorative elements

1. Introduction

The progressive integration of digital techniques into the processes of analysis and management of the built heritage is profoundly transforming the ways in which historic buildings are analyzed, documented, and managed in relation to the complexity of the information that characterizes their functions [1,2]. Within this scenario, information models developed according to HBIM logics [3,4,5], enhanced by artificial intelligence tools [6,7,8] and generative algorithms [9,10], emerge as new opportunities for the elaboration of complex contents, with the aim of activating the concatenated process, linking data, information, knowledge, and decision-making [11]. The advantages offered by the computational capacity of these disruptive technologies [12,13] can be attributed to the pursuit of automation [14] and to their ability to generate solutions [15,16].

In this context, a topic of primary interest for the construction sector is the possibility of deriving three-dimensional geometries from simple photographic images, overcoming the limitations imposed by manual modeling [17] and by traditional and digital survey techniques [18,19] in the relationship between representational effort and achieved results. Decorative apparatuses, bas-reliefs and plastic elements are characterized by high morphological complexity, are often located in conditions of limited accessibility and require time and specialized skills that make their three-dimensional restitution particularly demanding. As a result, such components are frequently simplified or excluded from digital models, leading to a significant loss of semantic, iconographic and symbolic information [20,21]. The fundamental issue addressed here relates to the core objective of surveying [22], generally attributable to the representation of form rather than to the interpretation of geometry [23].

Within this framework, the methodological proposal presented in this contribution introduces a semi-automatic pipeline designed to transform two-dimensional visual inputs, whose acquisition is rapid and feasible even in conditions of limited accessibility and minimal survey instrumentation, into coherent three-dimensional components that can be semantically structured [24]. The procedure combines depth inference, through AI models specialized in image-to-depth tasks, with the automated evaluation of the resulting solutions using numerical indicators that measure their morphological consistency, ultimately leading to parametric conversion within an interoperable environment for subsequent integration into HBIM systems. This sequence is conceived to provide an objective and repeatable workflow capable of addressing sculptural elements, bas-reliefs and plastic apparatuses, whose manual modeling is particularly onerous. The originality of the proposed approach lies in the integration of AI-based depth inference with a quantitative, metric-driven evaluation framework that guides the selection and transformation of depth maps into parametric HBIM-ready geometries, shifting the focus from generative experimentation to a controlled and decision-oriented workflow. Reflecting broader ethical principles in AI applications for heritage documentation [25], the proposed methodology guarantees that all AI-driven steps are transparent, controlled, and reproducible, ensuring the workflow remains objective and verifiable.

Integrating ornamental components into information models goes beyond greater formal accuracy, addressing a recurring structural limitation in the digitalization processes of historic built heritage. The absence or simplification of such elements results in a representation disproportionately focused on constructive aspects, reducing the legibility of the figurative and symbolic values that characterize monumental heritage. In this context, several studies have investigated the enhancement of three-dimensional representation through different reconstruction algorithms [26] and the adoption of multi-LOD mesh representations [27], particularly for managing the geometric complexity of decorative apparatuses within HBIM environments. The methodological proposal presented in this contribution is situated within this broader research framework, positioning itself as an attempt to rebalance form, information and meaning, addressing a structural limitation that frequently emerges in heritage digitalization workflows.

From a scientific perspective, the growing interest in the application of artificial intelligence within the field of Cultural Heritage has highlighted the potential of depth estimation [28] derived from two-dimensional photographs [29,30], with applications aimed at enhancing museum content [31,32]. Another interesting AI line of research concerns the semantic enrichment of three-dimensional models, which can be achieved through semantic segmentation approaches to classify architectural and decorative elements [33] or through the recognition of ornamental patterns using convolutional neural networks to identify decorative details [34]. Nevertheless, while decorative apparatuses have begun to receive specific attention, current approaches still present significant margins for improvement in the automated modeling of such elements, particularly in balancing efficiency, model lightness for visualization, and adherence to the original decorative complexity.

At a theoretical level, the study advances a broader reflection on the role of the image as a generative matrix [35]. Photography is not treated as a mere descriptive support, but rather as an information-bearing substrate from which form can be derived through inferential mechanisms. This perspective resonates with the principles of algorithmic design [36], procedural modeling [37] and visual semiotics applied to architecture [38,39], according to which the representational act already constitutes an operation of form construction.

Accordingly, the contribution demonstrates how, starting from a single photographic input, it is possible to generate three-dimensional ornamental geometries that can be readily integrated into an HBIM ecosystem through a structured, objective, and replicable workflow. It therefore becomes essential to assess the validity of the procedure in relation to different categories of decorative apparatuses. To this end, the methodology is illustrated through multiple application cases drawn from the decorative system of the Basilica of Santa Maria degli Angeli in Assisi, a site characterized by heterogeneous ornamental elements that allows the effectiveness of the workflow to be tested under varying data acquisition conditions. While deliberately foregoing the metrological accuracy of traditional survey methods, the process ensures a level of formal and iconographic fidelity that is adequate for documentary, interpretative and management-oriented purposes, thus offering a meaningful advancement in the digital representation of architectural plastic elements.

2. Materials and Methods

The adopted approach is articulated as an integrated operational sequence that synergistically connects three distinct technological environments: generative artificial intelligence models for depthmap production, a subsequent automated algorithmic evaluation developed in Python for the objective selection of results and a parametric modeling system implemented in Dynamo, aimed at integrating the outputs within Autodesk Revit. The interaction among these stages gives rise to a multi-layered computational workflow, hereafter referred to as SIDe-HBIM (Single-Image Depth-HBIM), in which each environment contributes to the progressive construction of an information-rich object, spanning from visual perception to structured encoding within an HBIM model.

2.1. Generative AI and Depthmap Generation

The initial phase of the workflow employs ComfyUI (v. 0.12.10), an open-source platform built on the Stable Diffusion architecture and particularly suited to the construction of custom pipelines through configurable functional nodes [40]. Unlike closed environments or so-called “black-box” solutions, ComfyUI enables the parallel orchestration of multiple depth-estimation models, while maintaining full transparency of the computational steps and ensuring process replicability [41], a prerequisite in HBIM contexts oriented toward the traceability of sources and digital transformations (Figure 1). Indeed, ComfyUI systematically generates an output log for each depth map, recording all relevant information including model name and parameter values. This mechanism ensures that the depth-estimation process is fully deterministic: using the same configuration consistently produces identical results, thereby guaranteeing controllability, verifiability, and replicability. Such features make the workflow fully compatible with ethical aspects and HBIM-oriented practices, where traceable and repeatable digital transformations are essential.

To constrain inference to the component of interest, an ImageCompositeMask node is introduced to apply a user-defined mask, forcing depth estimation to be computed only within the selected region corresponding to the target decorative element. By excluding the surrounding architectural context, irrelevant for subsequent modeling, this operation reduces superfluous depth indicators and limits unnecessary computational load, thereby improving both efficiency and the interpretability of the resulting depthmaps. Finally, a Save Image node ensures the simultaneous export of all generated depthmaps for subsequent evaluation. The set of image-to-depth engines reported in Table 1 was selected according to two complementary criteria: methodological representativeness and functional diversity. This is achieved by using widely adopted and actively maintained families of monocular depth networks, which are routinely employed in image-based reconstruction pipelines, ensuring that the comparison is grounded in state-of-practice solutions rather than isolated or short-lived prototypes. Functional diversity is addressed by intentionally including depth inference models with different assumptions to affect the reliability of depth cues on heritage ornament: relative-depth predictors such as Depth Anything and Depth Anything v2 Relative, which typically preserve global depth ordering and contour continuity; models explicitly designed to better accommodate geometric discontinuities, exemplified by LeReS, which may be advantageous for plastic reliefs; absolute-depth approaches such as Metric3D, which introduce an explicit scale interpretation but are often more sensitive to noise in fine detail; and general-purpose baselines trained on heterogeneous datasets, including MiDaS and the Zoe variants, which contribute complementary priors, such as indoor/outdoor specialisation and hybrid formulations, that can influence robustness under variable lighting, constrained viewpoints, and partial occlusions. The simultaneous execution of these models, starting from the same photographic input, produces heterogeneous yet formally comparable depthmaps, each expressing a distinct morphological interpretation of the subject. This diversity of outputs constitutes the basis for the subsequent stage, enabling both an initial visual inspection by the operator and the preparation of a dataset suitable for objective algorithmic evaluation, ultimately aimed at selecting the depthmap most appropriate for integration within the HBIM environment.

2.2. Automated Evaluation of Depthmaps

To overcome the subjective component inherent in visual assessments [42,43] and to introduce a stable and reproducible selection criterion [44], an automated algorithmic analysis stage was implemented through a Python (v. 3.12) script. In continuity with previously conducted and practically implemented experimental trials [45], this module, relying on established libraries for numerical computation and computer vision, ranks the generated depthmaps according to a hierarchy of morphological quality grounded in quantitative metrics. The procedure involves the computation of four independent computational indicators—contrast, edge sharpness, structural similarity, and depth distribution complexity (Table 2)—each normalized using a min–max approach to ensure scale consistency and comparability. This normalization prevents any single parameter from disproportionately influencing the overall evaluation, enabling the aggregation of heterogeneous contributions within a balanced and transparent scoring framework. Concretely, the script is implemented through a software stack in which numerical processing and metric computation are carried out using NumPy (v. 1.26.0), while image input and output operations, image resizing, and gradient-based operators such as Sobel and Laplacian filtering are handled through the OpenCV library (v. 4.12.0.88). Structural similarity is computed via the scikit-image framework, and reporting outputs, including CSV export and file management, are managed through standard Python libraries for filesystem handling. Summary plots used for inspection, comparison, and documentation purposes are generated using Matplotlib (v. 3.10.6).

The resulting values are subsequently combined through a weighted aggregation function, calibrated according to the sculptural nature of the analyzed object. This process yields a Weighted Synthetic Index, a single scalar measure that condenses the geometric and informational effectiveness of each depthmap, enabling the transparent, replicable and well-justified identification of the solution that is computationally most suitable for the subsequent parametric modeling stage.

The index is formalized by the following general expression:

W C I = \sum_{i = 1}^{n} w_{i} \cdot x_{i}

WCI = Weighted Composite Index;

n = total number of computational indicators;

x_i = normalized value of the i-th indicator;

w_i = weight assigned to the i-th indicator.

2.3. Parametric Modeling and HBIM Integration

The final stage of the workflow, dedicated to transforming the selected depthmap into an interoperable three-dimensional object, was developed within the Dynamo (v. 2.16) environment, a parametric modeling platform integrated into Autodesk Revit (v. 2023.1). Within the script, the bidimensional depth data are converted into continuous geometry that is compatible with and intelligible within the Revit model (Figure 2). Specifically, the depthmap, previously stabilized in its tonal range through preprocessing in ComfyUI, is treated as a sampled scalar field, in which each luminance value corresponds to a relative elevation. Based on this correspondence, a regular parametric grid is defined in the XY domain, whose nodes acquire depth values derived from luminance along the Z-axis. The resulting three-dimensional lattice provides a coherent approximation of the decorative apparatus morphology. A continuous NURBS surface is then interpolated over this lattice, producing geometry that can be exported as an object or embedded within a loadable family (Figure 3). The use of NURBS surfaces aligns with established Scan-to-BIM practices and facilitates the creation of adaptive families, enabling the controlled parametrization of complex ornamental geometries within the BIM environment. This enables integration into Revit as an information-rich element, to which chronological, material or typological parameters can be assigned. In this way, the workflow ensures a transparent and replicable methodological chain, from the raw depthmap to an information-enabled object integrated within the HBIM model.

2.3.1. Image Acquisition and Reading

The first stage of parametric modeling consists of transforming the selected depthmap into a structured set of numerical data interpretable by Dynamo. For this purpose, the image is imported using the file management nodes File From Path and Image.ReadFromFile, which convert the local file path into a native object recognized by the system. The subsequent decoding process, performed through image-reading functions, allows access to the internal structure of the file while preserving its pixel resolution, pixel matrix and channel organization. Prior to geometric transformation, a preliminary visual inspection is carried out using the Watch Image node to verify image integrity, gradient continuity and the absence of gaps or artifacts that could compromise model coherence. This check enables the identification of potential anomalies and, where necessary, their correction through filtering operations in subsequent stages, thereby strengthening the overall robustness of the pipeline.

Subsequently, through the Image.Pixel node, the image is treated as a sampled scalar matrix and converted into a grid of numerical values. In this way, each pixel becomes a discrete sample of the depth field and, as such, represents the minimum unit of information used for three-dimensional reconstruction. The conversion from color to luminance, performed via the Color.Brightness function, translates the RGB color triplet into a single, physically interpretable scalar value, which is assumed as a measure of relative depth and thus as the source of the future elevation along the Z-axis.

This matrix-based approach enables Dynamo to function as a rigorous translator between the image domain and the geometric domain, ensuring full process traceability and establishing the foundation for subsequent parametric sampling and surface interpolation operations.

2.3.2. UV Grid Generation

To transition from the discrete pixel matrix to a continuous parametric domain, the depthmap is projected onto a regular grid of (u,v) parameters, topologically consistent with the source image and governed by an explicit sampling parameter. In practice, the depth image is first interpreted as a pixel-indexed sampling lattice (i,j) whose indices are mapped onto the normalized (u,v) domain, so that each UV entry corresponds univocally to a sample location in the source image. The sampling density, defined by the number N of subdivisions per side and adjustable via a Number Slider node, determines the resolution of the grid that supports subsequent geometric interpolation. This configuration enables controlled modulation of the trade-off between morphological fidelity and computational efficiency. Operationally, the parametric domain is normalized through a Code Block to the dimensionless interval [0, 1] along both directions, according to a uniform partitioning scheme of the form:

u_{i} = \frac{i}{N - 1}, v_{i} = \frac{i}{N - 1} con i, j \in [0, \dots, N - 1]

In this way, the parametrization is decoupled from the pixel dimensions of the original image. The result is a scale-invariant representation in which the value of N can be varied without altering the relative metric, while simultaneously controlling the reconstructable spatial frequency as a function of the desired Level of Detail (LOD). Under this normalization, each UV sample corresponds deterministically to a location in the raster domain, allowing depth values to be sampled and transferred onto the parametric grid in a fully reproducible manner.

The resulting pair of lists (u_i,v_i) is then organized into a regular matrix through the UV.ByCoordinates node, which ensures a biunivocal correspondence between the normalized parametric domain and the discrete domain of samples derived from the image. This matrix preserves the spatial proportions of the original image, enabling the construction of a coherent parametric structure that can be efficiently queried by downstream geometric nodes. In other words, the UV matrix acts as an explicit topological scaffold: it preserves rectangular ordering and neighborhood relations required to generate an ordered grid of 3D points and to support stable NURBS interpolation downstream.

Since digital images conventionally adopt a row–column indexing system with the vertical axis oriented downwards, the pipeline includes a reconfiguration step to align the data structure with Dynamo’s Cartesian conventions. This is achieved through two operations: List.Reverse, which inverts the indexing along the vertical direction and List.Transpose which reorganizes the ordered list of lists into a matrix compatible with the structural expectations of the interpolation nodes. Together, these operations ensure full isomorphism between the topology of the image and that of the parametric grid, preventing inversions, mirrored surfaces or unintended discontinuities. Thanks to this alignment step, potential inconsistencies between image indexing and geometric ordering are prevented, which would otherwise propagate through the UV-to-XYZ mapping and result in rotated or mirrored reconstructions, even in the presence of correct depth sampling.

The outcome is a uniform lattice that explicitly defines the sampling step: denser when emphasizing micro-reliefs and high-frequency edges and coarser when a balance between fidelity and performance suggests a reduced density. This configuration provides fine-grained control over the reconstructable spatial frequency while simultaneously mitigating aliasing effects, stabilizing NURBS interpolation behavior and yielding well-conditioned knot vectors, an essential prerequisite for efficient integration within an HBIM model.

2.3.3. Construction of the Three-Dimensional Geometry

During the geometric construction phase, the depth values derived from the selected depth map are introduced as the Z component and combined with the planimetric coordinates generated from the previously defined grid, producing an ordered array of three-dimensional points. Operationally, the Point.ByCoordinates node receives as input the lists of parameters u and v, which are converted into X and Y coordinates according to a unit scale or, when required, appropriately rescaled. In parallel, it receives the list of Z values, obtained from the depth map after min–max normalization and multiplication by a vertical scaling factor α, controlled via a slider. For each index (i,j) a point (x_ij, y_ij, z_ij) is generated, where z_ij = α d_ij, and d_ij ∈ [0, 1] represents the normalized depth value. The resulting set of points defines a three-dimensional lattice that is morphologically consistent with the decorative apparatus under investigation.

A continuous surface is then interpolated over this rectangular grid of points using the NurbsSurface.ByPoints node, which is preferred over the Mesh.ByPoints alternative due to its higher degree of compatibility with BIM-oriented workflows. A NURBS surface defined on a rectangular point grid ensures at least C¹ continuity along the principal directions with standard degrees, supported by clamped knot vectors that enforce adherence at the boundaries of the lattice while maintaining a proportionate and well-conditioned number of degrees of freedom derived from uniform sampling. These properties translate into three main advantages within the Revit environment: reduced computational load, as the rational representation based on control points and weights is more compact than high-density polygonal meshes; improved operational stability in views, intersections and tagging operations typical of HBIM workflows; enhanced visual quality, owing to the regularity of the curvature field, which avoids the stepped effects and shading instabilities characteristic of non-adaptive triangulations.

From a control perspective, the combination of the number of samples N and the vertical scaling factor α provides a dual and complementary parametric lever. The former regulates the capturable spatial frequency, as an increased number of points improves geometric fidelity at the expense of higher computational cost, while the latter calibrates the vertical dynamic range, adapting the legibility of the relief to different representational contexts, from detailed drawings and as-built excerpts to overall views. The requirement of NurbsSurface.ByPoints for an ordered rectangular array further confirms the validity of the preliminary data reorganization steps, namely list reversal and transposition, as it ensures a deterministic and fully inspectable processing chain, a critical condition when the resulting object is intended to function as a schedulable family within an HBIM model. In this perspective, the NURBS surface represents the most coherent mathematical framework for producing a continuous, lightweight, and controllable geometry, in which morphological fidelity is proportional to its informational use, and sampling and scaling choices are explicitly defined and reproducible within the parametric workflow.

At the conclusion of the interpolation phase, the generated surface can be exported in interoperable formats such as .obj, .ply or .stl, enabling its reuse in heterogeneous software environments or subsequent geometric processing stages. Within an HBIM context, the resulting geometry can be imported into Revit as DirectShape or encapsulated within a loadable family, prepared for the association of chronological, material, typological or conservation-related parameters. In this way, it becomes a fully integrable informational component of the model, adaptable to different application scenarios and varying levels of detail.

2.4. Validation

The validation of the proposed methodology was designed as a metric comparison between different 3D reconstruction approaches, all reduced to point-cloud representations in order to ensure a homogeneous and directly comparable analytical framework. The point cloud acquired through laser scanning was adopted as the primary metric reference, as it derives from a survey technique capable of delivering high geometric accuracy and can therefore be regarded as a proxy for the configuration closest to the physical reality of the object. On this reference basis, the point cloud obtained through photogrammetry and the one derived from the parametric modeling workflow developed in Dynamo were independently compared (Figure 4).

In the case of photogrammetry, although the final output is typically a triangulated mesh, the dense point cloud generated during the intermediate processing stage was deliberately selected for the comparison. This methodological choice allows for a cloud-to-cloud analysis, ensuring typological consistency among the datasets under examination. Similarly, for the Dynamo-based workflow, despite the final result being a continuous NURBS surface, validation was performed using the set of three-dimensional points derived from the preliminary parametric grid, corresponding to the step immediately preceding surface generation. In this way, the metric comparison is carried out between geometrically homogeneous entities, minimizing distortions related to different representational paradigms.

Before the metric analysis, a preliminary characterization of the point clouds was conducted to assess the number of points or vertices, assumed as a direct indicator of data density and computational weight. All comparisons were performed in CloudCompare (v. 2.13.2), an open-source software widely adopted for the inspection, alignment and metric analysis of point clouds and meshes [46,47]. Dataset alignment was achieved by bringing all point clouds into a common reference system, a prerequisite for computing geometrically meaningful distances. To this end, the registration procedure based on the Align (point pairs picking) command was employed. This approach enables an initial alignment through the manual selection of homologous point pairs between two datasets, designating one cloud as the “reference” (laser scanning) and the other as the “data” (photogrammetry or Dynamo). Distinctive and recognizable features, such as edges, molding extremities or characteristic ornamental details, were selected to establish correspondences, allowing the software to estimate a rigid three-dimensional transformation that minimizes the overall error across the selected pairs and produces a coherent initial superposition.

Once alignment was achieved, the metric analysis was carried out using the Compute Cloud-to-Cloud Distance (C2C) tool, which calculates for each point in the “data” cloud its distance to the nearest point in the “reference” cloud according to a proximity-based criterion. The output consists of a scalar field associated with the points, expressing local deviations between the reconstructions: lower values indicate close adherence to the reference geometry, while higher values highlight geometric discrepancies, typically concentrated in areas characterized by pronounced depth variations, discontinuities, occlusions, or reduced quality of the initial data. In addition to the chromatic visualization of deviations, the analysis provides descriptive statistics, such as minimum and maximum distances and dispersion indicators, enabling an objective comparison between reconstruction techniques and supporting the identification of conditions under which one methodology proves more reliable than another.

3. Case Study

The Basilica of Santa Maria degli Angeli in Assisi, constructed from 1569 based on a design by Galeazzo Alessi [48], provides a particularly significant context for the comparative observation of decorative apparatuses due to the coexistence of heterogeneous visual languages. Alongside the sobriety typical of the Franciscan tradition, which characterizes the main nave and transept, the lateral chapels exhibit a richer plastic and chromatic vocabulary, concentrating sculptures, bas and high-reliefs and pictorial interventions from different historical periods [49]. This heterogeneity makes the sanctuary an ideal testing ground for evaluating both the effectiveness and the replicability of the proposed workflow. Within this setting, three representative subjects were selected, chosen for the diversity of acquisition conditions, volumetric characteristics, and their relationship with the architectural space.

The first case concerns a framed canvas located adjacent to the altar dossal of Sant’Antonio da Padova in the left transept (Figure 5). As a fully accessible element, the possibility of a frontal viewpoint allows for the acquisition of an image that is nearly orthogonal to the plane of the artwork, a condition that is optimal for image-to-depth inference models. Moreover, the limited volumetric articulation, restricted primarily to the frame and a few plastic details, makes this object a suitable test case for assessing the behavior of the workflow on predominantly two-dimensional subjects.

The second subject is a high-relief positioned on the façade of the Basilica, at a considerable height above ground level (Figure 6). In this case, distance and the impossibility of close access make traditional survey methods feasible only through drone-based acquisition. For the present experimentation, however, the photograph was captured from ground level at a significant distance. This introduces a relevant source of variability into the pipeline, as the image is affected by reduced detail, perspective compression and a diminished definition of plastic planes.

The third element is a sculptural figure placed within a niche inside the Chapel of San Giovanni Battista along the right nave (Figure 7). Due to its elevated position relative to the floor level and the limited dimensions of the chapel, the acquisition viewpoint is inherently constrained and unfavorable, preventing perpendicular framing. The restricted space does not allow sufficient distance from the subject, resulting in unavoidable perspective distortions and partial visibility conditioned by the depth of the niche itself. Compared to the other subjects, this case also exhibits greater volumetric complexity, with pronounced depth variations and articulated plastic forms that challenge the model’s capacity to infer complex three-dimensional geometries.

Beyond the different acquisition conditions, the three selected elements reflect progressively increasing levels of formal articulation. The framed canvas, characterized by an almost two-dimensional configuration, allows verification of the pipeline’s behavior on bas-relief subjects; the high-relief, maintaining a direct relationship with the wall surface, introduces an intermediate degree of three-dimensionality; the niche-contained statue, fully plastic and volumetric, represents the most challenging scenario for automatic depth generation.

The heterogeneous characteristics of the selected subjects delineate a complex experimental framework in which the methodology can be tested under markedly different conditions. Variations in shooting distance, the nature of decorative details and the depth of plastic elements enable observation of how the workflow responds to non-uniform configurations that often diverge from ideal assumptions. Rather than constituting a limitation, this diversity provides an opportunity to assess the flexibility of the pipeline and its capacity to adapt to concrete and heterogeneous application scenarios.

3.1. Framed Canvas of Sant’Antonio da Padova Altar

In the first application case, the methodology was tested on a framed canvas characterized by a combination of planar surfaces and localized moldings, an element well suited to assessing the pipeline’s ability to detect micro-reliefs and subtle depth variations.

The algorithmic evaluation reveals a clear differentiation among the tested models, with parameters related to structural similarity and depth distribution proving particularly effective in discriminating the most coherent solutions. Within the evaluated set, Depth Anything v2 Relative emerges as the best-performing model (WCI = 0.824), owing to a stable balance between contrast (0.208), high SSIM (0.831) and a depth-variation mapping that is sufficiently articulated while remaining free of noise (0.488). By contrast, alternative depthmaps exhibit specific shortcomings: LeReS tends to attenuate micro-reliefs in frames and corbels; Metric3D excessively emphasizes peripheral gradients; and the various Zoe models introduce altimetric oscillations that compromise interpolation regularity (Figure 8).

The depthmap produced by Depth Anything v2 Relative yields a continuous tonal progression that preserves the relationship between the central field and the ornamental apparatus, enabling the generation of an artifact-free NURBS surface and a formally faithful reconstruction of the frame within the HBIM model (Figure 9).

CloudCompare results (Figure 10) highlight marked differences in both metric accuracy and data “weight”. Specifically, the point cloud generated via Dynamo comprises 45,564 points, compared to 859,122 points for the photogrammetric cloud, indicating a substantial reduction in geometric complexity and a potential advantage in terms of model lightness and manageability within HBIM environments. However, cloud-to-cloud distance analysis shows that this simplification entails a loss of metric adherence to the reference. The laser scanner–Dynamo comparison yields a mean distance of 0.0260 with a standard deviation of 0.0307, whereas the laser scanner–photogrammetry comparison achieves better results, with a mean distance of 0.0105 and a standard deviation of 0.0097. These findings confirm that, for this case study, photogrammetry provides a reconstruction closer to the reference in terms of both average deviation and overall stability, while the Dynamo-based model is less accurate and locally more variable.

The spatial interpretation of the chromatic distance maps further supports this assessment: the most pronounced discrepancies are concentrated along the frame profiles and in areas with moldings or more pronounced thickness variations, where three-dimensional articulation is more significant relative to the canvas plane. Conversely, areas closer to planar geometry exhibit smaller deviations, indicating that the Dynamo pipeline is more reliable when morphology is continuous and characterized by low offsets, while performance degrades in correspondence with plastic details and geometric discontinuities. Photogrammetry proves metrically superior but requires a much denser dataset; the SIDe-HBIM pipeline, by contrast, produces a markedly lighter output at the cost of increased mean deviation and variability relative to the laser scanner. From an HBIM perspective, these results suggest that the proposed method is suitable when the objective is an informative, manageable, non-realistic representation of predominantly bidimensional elements, whereas photogrammetry remains preferable when maximum metric fidelity is required, particularly in the presence of highly projecting ornamental details.

3.2. Façade High Relief

The second subject concerns a high-relief decorative element consisting of a plastic festoon set within a rectangular field. Its morphology, characterized by limited depth and a high frequency of repeated elements, such as acorns, ribbons and pendants, poses a challenge for models that are less sensitive to bas-relief variations.

The evaluation of the depthmaps generated by the different AI models shows that several engines tend either to excessively flatten shadowed areas or, conversely, to introduce unjustified altimetric gradients along the festoon profile. Within this scenario, MiDaS Depth Map emerges as the most suitable model (WCI = 0.480), primarily due to an excellent depth distribution score (0.862), indicative of an accurate reading of height variations, albeit with lower SSIM values compared to other models. Depth Anything and Zoe, by contrast, exhibit greater instability, with a tendency toward value saturation in the central areas of the panel (Figure 11).

The selected depthmap enables a regular reconstruction of the drapery curvature and of the modular rhythm of the suspended elements, preserving a clear distinction between the background plane and the sculptural relief. This facilitates NURBS interpolation without undesired altimetric oscillations. The resulting model respects the high-relief nature of the artifact, ensuring sufficient legibility for HBIM integration and for documentary and comparative purposes (Figure 12).

Validation results (Figure 13) reveal a behavior distinct from that observed in the framed canvas case. Although the element features moderate relief and a strong relationship with a supporting plane, photogrammetry fails to achieve the expected level of metric reliability, whereas the Dynamo-based output exhibits a lower mean deviation and a comparable error distribution. Quantitatively, the laser scanner–Dynamo comparison yields a mean distance of 0.0294 with a standard deviation of 0.0231, while the laser scanner–photogrammetry comparison shows a worse mean distance (0.0392) with a similar standard deviation (0.0224). A further relevant aspect concerns the nature and consistency of the generated data: the Dynamo point cloud comprises 36,846 points, representing a relatively dense and regular sampling in relation to the subject’s morphological simplicity; the photogrammetric cloud, by contrast, contains only 8147 points, suggesting a less complete and less robust reconstruction, likely affected by acquisition conditions.

The interpretation of the chromatic distance maps corroborates the numerical analysis: in the Dynamo solution, deviations appear more controlled and evenly distributed, whereas photogrammetry displays areas with larger discrepancies and more pronounced local discontinuities. This case study supports the notion that the SIDe-HBIM pipeline can be competitive and, under certain conditions, even more effective than photogrammetry, particularly when the object is a bas-relief or high-relief element with moderate depth and when photographic acquisition does not allow for an image set adequate for conventional photogrammetric reconstruction.

3.3. Sculptural Figure in a Niche in the Chapel of San Giovanni Battista

The third case study is characterized by a strong spatial articulation between the sculptural figure, the molded pedestal and the concave apsidal background. The presence of projecting volumes, such as the body and drapery, together with a deeply recessed backdrop constitutes a demanding test case for assessing the ability of AI-based models to preserve hierarchical depth relationships among multiple planes.

Also in this case, the quantitative comparison indicates a clear preference for Depth Anything v2 Relative (WCI = 0.608), which combines balanced contrast (0.302), a satisfactory SSIM value (0.250) and, most importantly, a wide and well-structured depth distribution (0.808). This configuration allows the model to capture both the separation between the statue and the niche and the altimetric progression of the vault. By contrast, Metric3D and LeReS, respectively, tend to rigidify the gradient between the figure and the background or to flatten the upper concavity, while several Zoe variants introduce noise in peripheral areas (Figure 14).

The selected depthmap provides a continuous tonal progression along the drapery and a clear volumetric separation between the body and the background, enabling the generation of a morphologically coherent NURBS surface, free from local distortions and suitable for typological and information-oriented representation (Figure 15).

When compared against the laser scanner point cloud taken as the metric reference, photogrammetry achieves a lower mean deviation (0.0161) than the Dynamo-based reconstruction (0.0208) and also exhibits a more limited dispersion of errors, with a standard deviation of 0.0130 versus 0.0201 (Figure 16).

This indicates that the photogrammetric model is not only closer to the reference on average but also locally more stable, with less spatial variability in the error distribution. Point density further reflects the different capacities to describe a complex object: the photogrammetric point cloud contains 172,081 points, whereas the Dynamo-derived cloud includes 57,264 points. In this case, the higher density does not merely represent an increased data “weight” but corresponds to a greater ability to capture volumetric articulations, morphological discontinuities and depth variations typical of a sculptural figure in a niche. While the Dynamo cloud yields a lighter and more manageable model, it inevitably simplifies the most complex areas, particularly recessed regions, undercuts and portions that are poorly readable from a single viewpoint.

The spatial distribution of deviations, as shown by the chromatic distance maps, is consistent with this interpretation: in the Dynamo solution, higher deviations are concentrated in zones of pronounced volumetric articulation and where real depth increases rapidly, whereas photogrammetry maintains greater uniformity and closer adherence in central regions and along the main volumes. This case study confirms that the SIDe-HBIM pipeline is less effective when the object cannot be reduced to a relief directly associated with a planar support but instead requires a full reconstruction of three-dimensional form. Under such conditions, multi-view photogrammetry more effectively captures the geometry and reduces both mean deviation and error variability with respect to the laser scanner reference.

4. Discussion

The proposed method introduces a repeatable and controllable workflow that, starting from a single photograph, enables the generation of a reliable three-dimensional reconstruction that is, above all, sufficient for HBIM purposes oriented toward non-photorealistic but information-driven modeling. The approach performs best when the object can be interpreted as a bas-relief or high-relief element with moderate offsets and without significant undercuts, particularly in situations where limited accessibility reduces the effectiveness of traditional survey techniques. In the case of the façade high-relief, the most favorable conditions for the methodology are observed, together with a context in which photogrammetry tends to degrade significantly for operational reasons. The elevated position and the difficulty of close-range acquisition reduce the quality of photogrammetric data, whereas the proposed workflow is able to exploit precisely those morphologies characterized by limited relief and a strong relationship with a background plane. The absence of deep cavities and fully occluded portions allows for a coherent reconstruction of the main geometries and of micro-reliefs relevant to the interpretation of the decorative apparatus, even though elements with greater projection are not reproduced with the same fidelity achievable through accurate photogrammetric surveys. In the case of the framed canvas, validation yields an intermediate condition, as the object is predominantly bidimensional, with three-dimensional articulation concentrated in the frame and localized moldings, demonstrating the effectiveness of the pipeline for documentary and information-oriented HBIM purposes. Finally, the case of the sculptural figure in a niche highlights the principal limitation of the pipeline, which emerges when the analyzed object does not maintain a direct relationship with a reference plane but instead constitutes a fully plastic sculpture, characterized by projecting volumes, recesses and a strong articulation of depth planes. Under these conditions, reconstruction from a single image fails to capture large offsets and, above all, cannot correctly reconstruct portions that are not visible or are occluded by the geometry itself, such as inner parts of the niche or areas in deep shadow.

As particularly evidenced by the three case studies, the proposed pipeline proves effective in the automated representation of decorative elements with moderate relief, coherent with a supporting plane and often located in difficult-to-access positions. This effectiveness derives from a balanced compromise between morphological rendering, production speed and integration within an HBIM environment, while also enabling rapid and direct data acquisition even in the absence of specialized surveying equipment. Conversely, when the object exhibits high three-dimensionality, pronounced depth variation, and portions not visible from a single viewpoint, the methodology loses effectiveness in a structural manner and photogrammetry remains the most reliable approach for a complete and faithful reconstruction. In this sense, the experimentation not only confirms the robustness of the workflow for bas-reliefs and not very deep decorative apparatuses, but also clearly defines the application domain within which the results can be considered solid and repeatable.

SIDe-HBIM pipeline exhibits a structural limitation related to the calibration of the vertical depth scale along the Z-axis during the parametric modeling phase. In the described workflow, the vertical amplification of depth values is regulated through a continuous slider within the Dynamo environment, allowing the operator to modulate the intensity of the relief according to representational needs and the application context. Although this parameter does not affect the internal structural coherence of the geometry derived from the selected depthmap, it implies that the final vertical calibration is not entirely data-driven.

A further relevant consideration concerns the integration of the generated objects into the HBIM model, which varies according to their relationship with the supporting wall. Additive decorative elements can be effectively managed through host-based loadable families, whereas subtractive elements, such as niches and recesses, require the introduction of void geometries, increasing the probability of interferences and potentially reducing stratigraphic coherence within the HBIM integration process.

5. Conclusions

The proposed workflow is conceived as an open, replicable and deterministic infrastructure, designed to accommodate further methodological developments while serving as a flexible and reliable tool for the informational integration of decorative architectural elements within HBIM models. Starting from a single photographic input, the method establishes a repeatable and controlled workflow capable of generating three-dimensional reconstructions that are reliable and, above all, sufficient for HBIM applications oriented toward non-photorealistic yet information-driven modeling. This feature constitutes a significant operational advantage over photogrammetry, which typically requires structured image datasets, stricter acquisition conditions and longer processing times, especially when decorative elements are located in awkward or difficult-to-access positions. Within this framework, the proposed approach offers an effective alternative when the primary objective is the rapid integration of geometrically coherent components into the model, maintaining a balanced trade-off between formal quality, acquisition effort and process replicability.

With respect to the level of detail, the methodology is intentionally positioned within a range between LOD 300 and LOD 350, making it suitable for documentation, analytical and information-management purposes rather than for high-precision metric restitution. The workflow proves especially effective for decorative apparatuses characterized by moderate relief, where morphological legibility and volumetric coherence are prioritized over absolute dimensional accuracy. In this sense, the pipeline occupies an intermediate position between manual geometric simplification and high-density survey techniques, providing a level of detail that is adequate for HBIM integration without introducing excessive geometric or computational complexity.

The proposed pipeline also opens up several avenues for future development. One potential advancement concerns the automation of depth calibration along the Z-axis through iterative optimization strategies driven by quantitative metrics, for instance derived from CloudCompare analyses. Such an enhancement would further reduce operator-dependent decisions, strengthening the methodological robustness of the workflow and facilitating its application in large-scale or systematic HBIM contexts. Another promising direction involves the integration of the generated geometries into HBIM environments through the definition of parametric families. Converting depthmap-based geometries into information-rich components enables a shift beyond purely geometric representations, allowing the attribution of properties, metadata and customized parameters that significantly enhance their cognitive and managerial value. In this perspective, parametric families become the synthesis point between reconstructed form and structured information, ensuring that decorative elements remain interrogable, updatable and semantically consistent within the overall model. Further research may also address the development of hybrid families capable of integrating both additive and subtractive geometries, or modeling strategies that explicitly relate decorative elements to their architectural hosting systems.

Finally, the validity of the method is reflected in the geometric efficiency of its outputs. The conversion of depthmaps into continuous surfaces results in models with discretized complexity that is generally lower than that of dense point clouds and highly triangulated meshes produced by alternative approaches. This leads to a reduced number of vertices and, consequently, to lighter models that are more stable during navigation and less demanding in terms of processing, visualization and management within an HBIM ecosystem.

Author Contributions

Conceptualization, C.C., F.C. and A.M.; methodology, C.C., F.C. and A.M.; software, C.C., F.C. and A.M.; validation, C.C., F.C. and A.M.; formal analysis, C.C., F.C. and A.M.; investigation, C.C., F.C. and A.M.; resources, C.C., F.C. and A.M.; data curation, C.C., F.C. and A.M.; writing—original draft preparation, F.B., M.F., C.C., F.C. and A.M.; writing—review and editing, C.C., F.C. and A.M.; visualization, C.C., F.C. and A.M.; supervision, F.B. and M.F.; project administration, F.B. and M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available upon reasonable request from the corresponding author.

Acknowledgments

During the preparation of this experimentation, the authors used ComfyUI and Stable Diffusion in order to produce depthmaps, strictly within the methodological framework described in the paper. In addition, large language models were employed at a later stage solely for language revision and grammatical refinement of the manuscript. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Parrinello, S.; Picchio, F. Digital Strategies to Enhance Cultural Heritage Routes: From Integrated Survey to Digital Twins of Different European Architectural Scenarios. Drones 2023, 7, 576. [Google Scholar] [CrossRef]
Bianchini, C. Survey 2.0: New technologies, new equipment, new surveyors? In Italian Survey & International Experience, Proceedings of the 36th International Conference of Teachers of the Representation UID, Parma, Italy, 18–20 September 2014; Giandebiaggi, P., Vernizzi, C., Eds.; Gangemi Editore: Roma, Italy, 2014. [Google Scholar]
Yang, X.; Grussenmeyer, P.; Koehl, M.; Macher, H.; Murtiyoso, A.; Landes, T. Review of built heritage modelling: Integration of HBIM and other information techniques. J. Cult. Herit. 2020, 46, 350–360. [Google Scholar] [CrossRef]
Del Giudice, M.; Osello, A. Bim for Cultural Heritage. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2013, XL-5/W2, 225–229. [Google Scholar] [CrossRef]
Bianconi, F.; Filippucci, M.; Parisi, A.; Battaglini, S. HBIM per la gestione della pubblica amministrazione, Il caso studio di Palazzo Vitelli alla Cannoniera, HBIM for public administration management—The case study of Palazzo Vitelli alla Cannoniera. DN 2021, 9, 22–23. [Google Scholar]
Gupta, P.; Ding, B.; Guan, C.; Ding, D. Generative AI: A systematic review using topic modelling techniques. Data Inf. Manag. 2024, 8, 100066. [Google Scholar] [CrossRef]
Zhuang, X.; Zhu, P.; Yang, A.; Caldas, L. Machine learning for generative architectural design: Advancements, opportunities, and challenges. Autom. Constr. 2025, 174, 106129. [Google Scholar] [CrossRef]
Necula, S.C.; Păvăloaia, V.D. AI-Driven Recommendations: A Systematic Review of the State of the Art in E-Commerce. Appl. Sci. 2023, 13, 5531. [Google Scholar] [CrossRef]
Yang, S.; Ma, H.; Li, N.; Xu, S.; Guo, F. Energy-Saving Design Strategies for Industrial Heritage in Northeast China Under the Concept of Ultra-Low Energy Consumption. Energies 2025, 18, 1289. [Google Scholar] [CrossRef]
Zhou, E.; Lee, D. Generative artificial intelligence, human creativity, and art. PNAS Nexus 2024, 3, pgae052. [Google Scholar] [CrossRef]
Frické, M. The knowledge pyramid: A critique of the DIKW hierarchy. J. Inf. Sci. 2009, 35, 131–142. [Google Scholar] [CrossRef]
Aparicio, G. Data-Insight-Driven Project Delivery: Approach to Accelerated Project Delivery Using Data Analytics, Data Mining and Data Visualization. In ACADIA 2017: Disciplines & Disruption, Proceedings of the 37th Annual Conference of the Association for Computer Aided Design in Architecture (ACADIA), Cambridge, MA, USA, 2–4 November 2017; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Memon, S.A.; Shehata, W.; Rowlinson, S.; Sunindijo, R.Y. Generative Artificial Intelligence in Architecture, Engineering, Construction, and Operations: A Systematic Review. Buildings 2025, 15, 2270. [Google Scholar] [CrossRef]
Paoletti, I. Informed Architecture: Computational Strategies in Architectural Design; Springer: Cham, Switzerland, 2018. [Google Scholar]
Baroš, T.; Kabošová, L.; Baros, M.; Katunský, D. Experimental Form-finding: A review. IOP Conf. Ser. Mater. Sci. Eng. 2022, 1252, 012002. [Google Scholar] [CrossRef]
Adriaenssens, S.; Block, P.; Veenendaal, D.; Williams, C. Shell Structures for Architecture: Form Finding and Optimization; Routledge: New York, NY, USA, 2014. [Google Scholar]
Docci, M. Metodi e Tecniche Integrate di Rilevamento per la Realizzazione di Modelli Virtuali Dell’architettura Della Città; Gangemi: Rome, Italy, 2007. [Google Scholar]
Chachava, N.; Lekveishvili, M.; Mikadze, G.; Lekveishvili, N.; Sulashvili, G.; Sulashvili, V. The Role of 3D Laser Scanning in Historical Building Stock Analysis and Its Conceptual Development by the Method of Twinning Adaptation. In Reliability and Statistics in Transportation and Communication, Proceedings of the 23rd International Multidisciplinary Conference on Reliability and Statistics in Transportation and Communication: Digital Twins—From Development to Application, RelStat-2023, Riga, Latvia, 19–21 October 2023; Kabashkin, I., Yatskiv, I., Prentkovskis, O., Eds.; Springer: Cham, Switzerland, 2024; pp. 333–341. [Google Scholar] [CrossRef]
Bertocci, S.; Cioli, F.; Cottini, A. Unlocking cultural heritage: Leveraging georeferenced tools and open data for enhanced cultural tourism experiences. In Proceedings of the 20th International Conference Culture and Computer Science: Code and Materiality, KUI ’23, Lisbon, Portugal, 28–29 September 2023; ACM: New York, NY, USA, 2023; pp. 1–9. [Google Scholar] [CrossRef]
Thomson, C.; Boehm, J.; Remondino, F.; Gonzalez-Aguilera, D.; Lorenzo, H.; Kerle, N.; Thenkabail, P.S. Automatic Geometry Generation from Point Clouds for BIM. Remote Sens. 2015, 7, 11753–11775. [Google Scholar] [CrossRef]
Nieto Julián, J.E.; Lara, L.; Moyano, J. Implementation of a TeamWork-HBIM for the Management and Sustainability of Architectural Heritage. Sustainability 2021, 13, 2161. [Google Scholar] [CrossRef]
de Rubertis, R. Il modello conoscitivo e le “sue” rappresentazioni. In Il Rilievo Dall’architettura Concreta Al Suo Modello Immateriale; Soletti, Ed.; Centro di Stampa dell’Università degli Studi di Perugia: Perugia, Italy, 1995. [Google Scholar]
Migliari, R. Disegno Come Modello—Riflessioni Sul Disegno Nell’era Informatica. In Per Una Teoria del Rilievo Architettonico; Edizioni Kappa: Rome, Italy, 2004. [Google Scholar]
Croce, V.; Caroti, G.; Piemonte, A.; De Luca, L.; Véron, P. H-BIM and Artificial Intelligence: Classification of Architectural Heritage for Semi-Automatic Scan-to-BIM Reconstruction. Sensors 2023, 23, 2497. [Google Scholar] [CrossRef]
Tiribelli, S.; Pansoni, S.; Frontoni, E.; Giovanola, B. Ethics of Artificial Intelligence for Cultural Heritage: Opportunities and Challenges. IEEE Trans. Technol. Soc. 2024, 5, 293–305. [Google Scholar] [CrossRef]
Bianconi, F.; Filippucci, M.; Stranieri, E. Nerf VS fotomodellazione: Confronto tra tecniche di misurazione senza contatto per il rilievo architettonico. In 3D MODELING & BIM Nuove Evoluzioni; Empler, T., Caldarone, A., Fusinetti, A., Eds.; DEI s.r.l. Tipografia del Genio Civile: Rome, Italy, 2024; pp. 54–67. [Google Scholar]
Musicco, A.; Buldo, M.; Rossi, N.; Tavolare, R.; Verdoscia, C. Enhancing 3D modeling efficiency via semi-automatic point cloud segmentation and multi-LOD mesh reconstruction. SCIRES-IT 2024, 14, 233–250. [Google Scholar] [CrossRef]
Visutsak, P.; Liu, X.; Choothong, C.; Pensiri, F. SIFT-Based Depth Estimation for Accurate 3D Reconstruction in Cultural Heritage Preservation. Appl. Syst. Innov. 2025, 8, 43. [Google Scholar] [CrossRef]
Malik, A.S.; Choi, T.S.; Nisar, H. Depth Map and 3D Imaging Applications: Algorithms and Technologies; Premiere Reference Source; IGI Global Scientific Publishing: Hershey, PA, USA, 2011; pp. 1–648. [Google Scholar] [CrossRef]
Rocha, G.; Mateus, L. Using Dynamo for Automatic Reconstruction of BIM Elements from Point Clouds. Appl. Sci. 2024, 14, 4078. [Google Scholar] [CrossRef]
Bianconi, F.; Filippucci, M.; Migliosi, A.; Mommi, C. AI-D. Algorithmic experiments with AI to aid drawing. SCIRES-IT 2024, 14, 31–48. [Google Scholar] [CrossRef]
Pauls, A.; Pierdicca, R.; Mancini, A.; Zingaretti, P. The Depth Estimation of 2D Content: A New Life for Paintings. In Extended Reality (XR Salento 2023), Proceedings of the XR Salento: International Conference on Extended Reality, Lecce, Italy, 6–9 September 2023; Springer: Cham, Switzerland, 2023; pp. 199–210. [Google Scholar] [CrossRef]
Betsas, T.; Georgopoulos, A.; Doulamis, A.; Grussenmeyer, P. Deep Learning on 3D Semantic Segmentation: A Detailed Review. Remote Sens. 2025, 17, 298. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; Cornell University: New York, NY, USA, 2017; pp. 652–660. [Google Scholar] [CrossRef]
Oxman, R. Morphogenesis in the Theory and Methodology of Digital Tectonics. J. Int. Assoc. Shell Spat. Struct. 2010, 51, 195–205. [Google Scholar]
Terzidis, K. Algorithmic Architecture; Routledge: London, UK, 2006. [Google Scholar]
Andriasyan, M.; Moyano, J.; Nieto Julián, J.E.; Antón, D. From Point Cloud Data to Building Information Modelling: An Automatic Parametric Workflow for Heritage. Remote Sens. 2020, 12, 1094. [Google Scholar] [CrossRef]
Eco, U. La Struttura Assente, Introduzione alla Ricerca Semiotica; Bompiani: Milano, Italy, 1968. [Google Scholar]
Jenks, C. Visual Culture; Routledge: London, UK, 2002. [Google Scholar]
Bianconi, F.; Filippucci, M.; Cerbai, C.; Meschini, M.; Migliosi, A.; Mommi, C. Digitalizzazione e Inclusione: L’Intelligenza Artificiale per esperienze museali multisensoriali. In DAI Disegno per l’Accessibilità e l’Inclusione, Proceedings of the DAI Disegno per l’Accessibilità e l’Inclusione, Roma, Italy, 5–6 December 2024; Empler, T., Caldarone, A., Fusinetti, A., Eds.; Publica: Rome, Italy, 2024; pp. 462–475. [Google Scholar]
Wang, T.; Xinge, Z.H.U.; Pang, J.; Lin, D. Probabilistic and geometric depth: Detecting objects in perspective. In Proceedings of the 5th Conference on Robot Learning (CoRL), London, UK, 8–11 November 2021; pp. 1475–1485. [Google Scholar] [CrossRef]
Zalczer, E.; Thomas, F.X.; Chanas, L.; Facciolo, G.; Guichard, F. Depth Map Quality Evaluation for Photographic Applications. Electron. Imaging 2020, 32, art00031. [Google Scholar] [CrossRef]
Paul, S.; Jhamb, B.; Mishra, D.; Kumar, M.S. Edge loss functions for deep-learning depth-map. Mach. Learn. Appl. 2022, 7, 100218. [Google Scholar] [CrossRef]
Carpio, P.L.; Suárez, D.; Sappa, A. Depth Map Estimation from a Single 2D Image. In Proceedings of the International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Bangkok, Thailand, 8–10 November 2023; IEEE: Bangkok, Thailand, 2023; pp. 347–353. [Google Scholar] [CrossRef]
Migliosi, A.; Cerbai, C.; Ceccaroni, S.; Cornacchini, F. Image-to-BIM. AI per l’integrazione dell’apparato decorativo nei modelli informativi. In 3D Modeling & BIM 2025—Desarollos Futuros; Publica: Rome, Italy, 2025; pp. 172–181. [Google Scholar]
Figueiredo, L.; Braga, M.; Mesquita, S.P.B.S.; Treccani, D.; Adami, A. Comparative Assessment of Point Cloud Annotation Workflows for Applications in Architectural and Spatial Studies. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2025, XLVIII-M-9-2025, 997–1004. [Google Scholar] [CrossRef]
Liu, J.; Willkens, D.; López, C.; Cortés-Meseguer, L.; García-Valldecabres, J.L.; Escudero, P.A.; Alathamneh, S. Comparative analysis of point clouds obtained from a TLS survey and a 3D virtual tour for HBIM development. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2023, XLVIII-M-2-2023, 959–968. [Google Scholar] [CrossRef]
Di Boveglio, B.; D’Isola Maggiore, M. Compendio Storico del Perdono di Assisi e Della Chiesa Detta Porziuncola; Tipografia Sgariglia: Assisi, Italy, 1834. [Google Scholar]
Mancini, F.F.; Scotti, A. La Basilica di Santa Maria degli Angeli. Vol. 1: Storia e Architettura; Electa Editori Umbri Associati: Perugia, Italy, 1989. [Google Scholar]

Figure 1. Node structure used for the parallel generation of depthmaps from a single input image.

Figure 2. Node organization used for generating three dimensional geometry from the input depthmap.

Figure 3. Operational sequence for converting the depthmap into parametric three-dimensional geometry.

Figure 4. Validation procedure based on metric comparison between point clouds generated through different reconstruction processes.

Figure 5. Framed canvas in the left transept, with three-dimensional components limited to marginal ornamental profiles. Data acquired under optimal framing conditions.

Figure 6. High-relief located in the upper portion of the façade, characterized by intermediate three-dimensionality and a direct relationship with the masonry surface. Data acquired from ground level under unfavorable distance conditions.

Figure 7. Sculptural figure in a niche within the Chapel of Saint John the Baptist, exhibiting high volumetric complexity and articulated depth. Data acquired under constrained, non-orthogonal framing conditions.

Figure 8. Comparative overview of depthmaps generated by different AI models applied to the framed canvas, with corresponding numerical indicators used in the algorithmic evaluation.

Figure 9. Gradual progression from input image to three-dimensional geometry for the framed canvas.

Figure 10. Validation results for the framed canvas, including scalar deviation maps and associated quantitative parameters.

Figure 11. Comparative overview of depthmaps generated by different AI models applied to the façade high relief, with corresponding numerical indicators used in the algorithmic evaluation.

Figure 12. Gradual progression from the input image to the three-dimensional geometry for the façade high relief.

Figure 13. Validation outcomes for the façade high relief, including scalar deviation maps and associated quantitative parameters.

Figure 14. Comparative overview of depthmaps generated by different AI models applied to the sculptural figure, with corresponding numerical indicators used in the algorithmic evaluation.

Figure 15. Gradual progression from the input image to the three-dimensional geometry for sculptural figure.

Figure 16. Validation outcomes for sculptural figure, including scalar deviation maps and associated quantitative parameters.

Table 1. AI image-to-depth models used in the experimentation and their main functional characteristics.

Image-to-Depth Model	Description
Depth Anything	Versatile and robust in preserving contours; optimal for architectural geometries.
Depth Anything v2 Relative	Transformer-based model for relative depth estimation; suitable for complex shapes with partial occlusion.
LeReS DepthMap	Sensitive to geometric discontinuities; useful in the presence of strong plastic reliefs.
Metric 3D DepthMap	Generates absolute depths; useful in indoor contexts but more susceptible to noise in details.
MiDaS	Effective in conditions of complex lighting and strong variety of surface textures.
Zoe Depth Anything (indoor)	Model trained on a contextualized dataset; mainly suitable for subjects in enclosed spaces.
Zoe Depth Anything (outdoor)	Model trained on a contextualized dataset; mainly suitable for subjects in open spaces.
Zoe DepthMap	Hybrid version balancing general performance with local sensitivity to form.

Table 2. Computational indicators underlying the quantitative evaluation of the weighted synthetic index.

Computational Indicators	Description
Contrast	Standard deviation of depth values, indicative of the object’s topographic variation.
Edge preservation	Variance of the Laplacian filter, useful to quantify the ability of the map to represent details and morphological discontinuities.
SSIM (Structural Similarity Index Measure)	Measures the structural coherence between the original image and the normalized depth map, ensuring visual consistency between input and inference.
Depth Distribution	Calculated as the entropy of the histogram of depth values, reflecting the informational richness and structural complexity.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bianconi, F.; Filippucci, M.; Cerbai, C.; Cornacchini, F.; Migliosi, A. SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling. Heritage 2026, 9, 70. https://doi.org/10.3390/heritage9020070

AMA Style

Bianconi F, Filippucci M, Cerbai C, Cornacchini F, Migliosi A. SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling. Heritage. 2026; 9(2):70. https://doi.org/10.3390/heritage9020070

Chicago/Turabian Style

Bianconi, Fabio, Marco Filippucci, Claudia Cerbai, Filippo Cornacchini, and Andrea Migliosi. 2026. "SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling" Heritage 9, no. 2: 70. https://doi.org/10.3390/heritage9020070

APA Style

Bianconi, F., Filippucci, M., Cerbai, C., Cornacchini, F., & Migliosi, A. (2026). SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling. Heritage, 9(2), 70. https://doi.org/10.3390/heritage9020070

Article Menu

SIDe-HBIM: Single-Image Depth Inference as a Tool for Semi-Automatic Decorative Modeling

Abstract

1. Introduction

2. Materials and Methods

2.1. Generative AI and Depthmap Generation

2.2. Automated Evaluation of Depthmaps

2.3. Parametric Modeling and HBIM Integration

2.3.1. Image Acquisition and Reading

2.3.2. UV Grid Generation

2.3.3. Construction of the Three-Dimensional Geometry

2.4. Validation

3. Case Study

3.1. Framed Canvas of Sant’Antonio da Padova Altar

3.2. Façade High Relief

3.3. Sculptural Figure in a Niche in the Chapel of San Giovanni Battista

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI