1. Introduction
Building Information Modelling (BIM) has consolidated its role as the standard for producing and coordinating project information in the architecture, engineering and construction (AEC) sector. However, BIM was not originally conceived for the systematic analysis of project information: the data stored in the models are heterogeneous, scattered across disciplines and semantically weak—in the sense that the typing of properties, the convention of Property Sets and the use of classifications vary widely between authoring tools and project teams, with no schema-level enforcement of the values as expected for an analytical pipeline [
1]—for structuring consistent analytical workflows. The result is a widely documented paradox in the recent literature: a growing number of AEC organisations accumulate BIM models in their Common Data Environments, yet only a minority exploit that information via advanced analytics, big data integration or Business Intelligence tools [
2].
This underutilisation manifests itself in daily sector practice. Updating budgets from the model still requires significant manual effort for exporting, cleansing and reconciliation, which prevents real-time tracking of cost deviations and forces project managers to operate on outdated cost figures; this gap between continuously updated BIM models and operationally usable analytical figures is empirically documented in case studies that report manual preparation cycles measured in hours per delivery [
3]. Historical analysis of changes across projects is practically non-existent, with the consequent loss of institutional learning [
4]. And risk and cost-overrun forecasting lacks the analytical support that is routine in other sectors [
5]. The causes of this situation extend beyond purely technological limitations and are structured around four technical barriers: structural heterogeneity without automated reconciliation mechanisms, semantic limitations inherent to the practical use of the IFC standard, absence of traceability and lineage from BIM models to analytical indicators, and dependence on manual integration that prevents scalability. To these a transversal aggravating factor is added: the widespread lack of data-governance frameworks at the organisational level [
6,
7].
The contrast with digitally mature sectors is notable. In manufacturing, retail, banking and logistics, Business Intelligence has consolidated as a core capability, with automated ETL architectures, advanced analytics and data-governance frameworks that sustain substantial improvements in productivity and operational efficiency [
8,
9]. The AEC sector, by contrast, maintains a disproportionate dependence on personal experience and intuition compared with industries where decision-making is systematically supported by BI platforms [
10]. Information produced during design and construction phases is seldom reused continuously for asset management or for optimising decisions across the life cycle [
11], and the native analytical capabilities being incorporated by commercial BIM platforms operate on pre-configured metrics, premium licencing models with high total cost of ownership, perpetuating dependence on closed proprietary ecosystems [
12]. The industrial relevance of the problem is further evidenced by the direction the main software vendors are taking: in 2025 Autodesk released a public roadmap for its AEC Data Model, with a public-beta API for granular access to model data within Autodesk Construction Cloud [
13], and in March 2026 it consolidated its cloud offering under the unified Autodesk Forma brand—which integrates the former Autodesk Construction Cloud and its associated modules (Docs, BIM Collaborate Pro, Build, Takeoff)—as confirmation of a strategic shift towards an end-to-end AEC data ecosystem [
14]. Two implications follow analytically from this commercial trajectory. The first is that the extraction and analytical exploitation of BIM data outside the authoring environment has been validated as a market need by the dominant vendor itself, which closes the question of whether the problem is industrially relevant. The second is that the answer chosen by the vendor materialises as a proprietary cloud platform with subscription-based access, granular APIs that are versioned and deprecated under vendor control, and a data-residency model that is anchored to its own ecosystem; this is precisely the gap that this work addresses, since the same problem can be solved with an open, reproducible and platform-independent architecture grounded in IFC, IDS and BCF. This structural gap between the BI paradigm consolidated in other sectors and its effective application in AEC, together with the limitations of the proprietary approaches that have begun to emerge, motivates the present research.
The research is guided by the following question: can an ETL architecture based on openBIM standards reduce the manual effort of BIM data preparation and, at the same time, improve the traceability and quality of the analytical datasets of AEC projects? From this question the working hypothesis is formulated that it is possible to define and validate a BIM-BI integration architecture that is expected to substantially reduce the manual intervention required in project data preparation, while at the same time preserving the traceability and quality of the information throughout the full extract, transform and load flow. The architecture relies on the openBIM standards Industry Foundation Classes (IFC), Information Delivery Specification (IDS) and BIM Collaboration Format (BCF), and on consolidated data-engineering practices, with the goal of generating coherent, verifiable datasets fully compatible with Business Intelligence tools.
The overall aim of the work is to design and validate a BIM-BI integration architecture that combines an ETL pipeline with openBIM standards and that significantly reduces the time and manual effort of data preparation, while increasing the traceability and quality of analytical datasets in AEC projects. This aim is specified in six specific objectives: (i) to analyse the current state of BIM-BI integration through a systematic review, identifying gaps in interoperability, data quality and traceability; (ii) to design an analytical data model that structures BIM information in schemas compatible with BI tools; (iii) to develop an automated ETL pipeline that extracts, validates and transforms data from IFC files by applying openBIM standards to guarantee quality and traceability; (iv) to implement the architecture in representative sector use cases; (v) to evaluate the proposal quantitatively and qualitatively against traditional workflows; and (vi) to formulate a reproducible methodological framework and recommendations for its transfer to professional practice.
The contributions of the work are fourfold. First, a scientific contribution: the BIM2BI architecture in four functional layers (sources, transformation and orchestration, analytical storage, and exploitation) that formalises the separation of responsibilities, data contracts between layers and an extensibility-without-redesign principle that, to the best of the authors’ knowledge, had not been articulated jointly in the previous literature. Second, a methodological contribution: the reinterpretation of the IDS standard as a pre-ETL quality gate within automated analytical workflows, and the development of a three-level staged validation strategy applicable to high-volume scenarios. Third, a practical contribution: an open-source reference implementation published under the MIT licence that materialises the architecture in a portable, reproducible system. And fourth, a contribution to the state of the art: empirical evidence on the real behaviour of openBIM standards in analytical workflows, including findings not previously documented on the reference implementation of the IDS 1.0 schema.
The remainder of the article is organised as follows.
Section 2 places the work in context by characterising the digital evolution of the AEC sector, the openBIM standards as technical enablers, the current data-governance frameworks and prior work on BIM-BI integration, with explicit identification of the gaps that the proposed architecture addresses.
Section 3 describes the research methodology based on Design Science Research, the phases of the work and the evaluation criteria.
Section 4 presents the BIM2BI architecture layer by layer and introduces the reference implementation.
Section 5 documents the empirical validation on ten use cases grouped in three complexity profiles and summarises the most relevant findings.
Section 6 discusses the results against the state of the art, and
Section 7 synthesises the conclusions and future lines of work.
4. The BIM2BI Architecture
4.1. Overview and Design Principles
The BIM2BI architecture is conceived as an intermediate data-integration layer between BIM models and Business Intelligence platforms. Its purpose is to transform the artefacts contained in openBIM deliveries—IFC models [
23], IDSs [
24], BCF incidents [
25] and FIEBDC-3 budgets—into a stable, traceable and reusable analytical model that serves as a generic basis for different types of analysis (cost, space, informational quality, operation) without the need to redesign the integration pipeline for each use case.
The design of the architecture is articulated around five structural principles: (i) explicit separation of responsibilities between orchestration, processing logic, configuration and data; (ii) isolation by project and delivery, so that each execution is managed independently and enables the comparative analysis of a project over time; (iii) process reproducibility, understood as the ability to reconstruct a specific execution from its inputs and configuration; (iv) data traceability and lineage, so that every analytical observation can be tracked back to its originating IFC entity; and (v) controlled extensibility, which allows the incorporation of new sources, requirements or dimensions without redesigning the fundamental structure of the system.
Effective operation of the architecture requires the formalisation of three specific roles without consolidated referents in the AEC literature: the BIM Data Steward, responsible for guaranteeing the quality and governance of the data; the BIM Data Engineer, responsible for the design and maintenance of the ETL pipeline; and the BIM/BI Analyst, who transforms data into actionable information for decision-making. The technical materialisation of the architecture is articulated in four functional layers (
Figure 1), whose operational responsibilities are described in the following subsections.
4.2. Layer 1: Data Sources
Layer 1 is the entry point of the system and is conceived as a passive repository of raw information: it receives, versions and contextualises the data without applying any transformation or validation process. This passivity is a deliberate design decision that preserves data traceability from its origin and prevents early contamination of the pipeline with unaudited transformations. The layer also acts as the formal informational-contract point between the project’s production environment and the analytical system.
Two typologies of sources are distinguished. The sources associated with the delivery (inbox/) represent the informational state of the project at a given moment—IFC models, BC3 budget files, BCF incidents and applicable IDS files—and are incorporated into the system in a versioned manner under the hierarchy inbox/{project_id}/{delivery_id}/. The reference and requirement sources (reference/), external to any specific delivery, define the normative frameworks, reusable IDS profiles, classification dictionaries and validation rules that govern the interpretation and transformation of the data.
Each delivery is accompanied by a delivery.meta.json file that describes its identity (project, delivery, author, date, milestone, applied IDS profile) and a _READY.flag marker that triggers the pipeline execution. This organisation materialises the principle of isolation by project and delivery and enables the management of the data life cycle—retention, correction or selective deletion—to be applied at individual-delivery granularity, covering an operational gap of existing BIM governance frameworks.
4.3. Layer 2: Transformation and Orchestration
Layer 2 is the operational core of the architecture. It is articulated as an ETL pipeline orchestrated under the principles of modularity and lineage control, whose objectives are: to isolate transformation processes from the original sources, to preserve the IfcGlobalId identifiers as the axis of traceability, to systematically validate compliance with the informational requirements through IDS, to normalise the extracted information into structures compatible with analytical models and to record the intermediate pipeline state as support for auditing.
The reference implementation materialises this layer through twelve decoupled Python scripts that independently execute the different phases of the process: IFC model extraction with the ifcopenshell library (extract_ifc.py), extraction and parsing of FIEBDC-3 budgets (extract_bc3.py, parse_bc3.py), normalisation of BCF incidents in the four active variants of the standard (parse_bcf.py), IDS validation with ifctester (validate_ids.py), transformation to the star schema (transform_star.py), load into the analytical repository (load_dw.py), DDL schema management and versioned migrations (init_dwh.py), generation of delivery audit records (generate_report.py), structured logging of executions (pipeline_logger.py) and complete orchestration of the flow (run_pipeline.py). This separation favours reuse, debugging and the progressive evolution of the system without compromising the global architecture.
The orchestration of the phases is performed through a Streamlit interface that sequentially invokes the scripts from the user’s confirmation, ensuring that the state of each delivery is visible at all times (
Figure 2). The operational flow relies on a folder structure that separates raw data (inbox/), intermediate processing artefacts (staging/), output documentation (reports/) and automatic backups of the analytical repository (backups/).
4.3.1. Pre-ETL IDS Quality Gate
The pipeline incorporates a quality gate based on the IDS standard [
24] that is executed before any analytical transformation. Its function is to guarantee that only IFC models meeting the minimum informational requirements of the use case advance towards the analytical repository, preventing the propagation of incomplete or inconsistent data to subsequent layers. This reinterpretation of the IDS standard—originally conceived as a validation mechanism for interchange between BIM authoring and coordination tools—extends its scope of application to data-quality governance within automated analytical pipelines.
The implementation provides two execution levels. When the ifctester library is available a full validation is run against the IDS file based on the official XSD schema of buildingSMART; otherwise the pipeline applies a set of basic checks on the intermediate extracted data, preserving system operability in environments with limited dependencies. Detected deviations are recorded in BCF-JSON format and subsequently loaded into the fact_incident table, turning data quality into an analytical metric comparable over time.
The quality-gate result classifies each delivery as CONFORMING, NON-CONFORMING or PENDING. NON-CONFORMING deliveries only register the dim_delivery dimension with the attempt and its incidents, without contaminating the analytical fact tables; in this way the longitudinal traceability of the project is preserved without degrading the aggregated indicators. This differentiated treatment makes it possible to document the rejection of a delivery as an auditable fact, not as a valid analytical record.
4.3.2. Staged IDS Validation Strategy (L1/L2/L3)
The application of the monolithic quality gate on high-volume deliveries revealed an operational limitation: in a linear-infrastructure case with twenty-five IFC models and an approximate volume of two gigabytes, a single profile with thirty-eight rules generates an unmanageable number of incidents that prevents effective prioritisation and correction by the BIM team. In response, a staged IDS validation strategy was developed that distributes the rules across three independent profiles according to their impact on the BIM2BI pipeline.
The stratification criterion is grounded in three functional questions derived from the layered architecture. Level 1 (critical gate) groups the rules whose failure prevents the correct load into the analytical repository (GlobalId, asset identification, discipline); its failure blocks model ingestion. Level 2 (warnings) groups the rules that degrade dashboard indicators or compromise the integrity of the analytical dimensions (complete identification, minimum location, classification, main budget item); its failure generates high-priority BCF incidents but allows conditional ingestion. Level 3 (informative) groups the remaining rules, associated with model informational-maturity requirements whose failure does not affect the immediate indicators but documents the state of the model for phase closure.
The execution flow is sequential: the L1 profile is evaluated first on all delivery models; models that do not pass it are rejected and notified to the BIM team through BCF; those that pass it are processed by Level 2, which generates high-priority incidents without blocking ingestion; Level 3 is then executed asynchronously to produce the informational-maturity report. This stratification establishes an explicit correspondence between validation levels and architectural layers: L1 protects the integrity of Layer 3, L2 protects the analytical quality of Layer 4, and L3 documents maturity for long-term data governance (
Table 4).
4.4. Layer 3: Analytical Storage
Layer 3 is the analytical repository of the system and acts as the pipeline’s convergence point: it is the final destination of the processed data and the starting point of the analytical exploitation. Unlike the preceding layers, oriented to processing and validation, this layer is conceived as a stable persistence layer. The data that reach it have passed the IDS quality gate and have been normalised in accordance with the informational requirements defined for each project. Its design directly conditions the analytical capabilities available in Layer 4, determining which metrics can be computed, with what granularity and with what capacity for comparison between projects and deliveries.
The analytical data model is grounded in a star schema [
40], widely adopted in Business Intelligence architectures for its direct compatibility with consolidated BI tools, its explicit separation between metrics and context, and the controlled extensibility that allows new fact or dimension tables to be incorporated without altering the existing ones.
4.4.1. Star Schema Dimensional Model
The analytical schema of the reference implementation v1.0.0 includes four main fact tables. fact_element records the instances of construction elements of the model with their associated metrics (IFC type, net area, volume, discipline), identified by their IfcGlobalId. fact_space records the instances of spaces and rooms (surface, volume, level, intended use) and enables functional-programme and spatial-efficiency analyses. fact_cost records the budget items from BC3 files linked to the model elements when the budget references the IfcGlobalId in its COMENTARIO field, closing the cycle between the BIM model and its associated cost. fact_incident records the coordination incidents from BCF files, likewise linked to the model element, and makes it possible to analyse the evolution of the project’s informational quality across deliveries.
These tables are contextualised through six dimensions: dim_model describes the model and the delivery (IFC version, discipline, authoring tool, applied IDS profile); dim_element, dim_space, dim_cost_item and dim_incident_type provide the descriptive attributes of elements, spaces, budget items and incidents respectively; and dim_delivery records the attributes of each delivery (date, author, quality-gate result, processing metrics), enabling longitudinal analysis and monitoring of pipeline performance (
Figure 3).
4.4.2. Data Lineage Through IfcGlobalId
End-to-end lineage is articulated through the preservation of the IfcGlobalId as a technical key in every fact table, alongside surrogate keys that link to the dimensions. This dual level of identification combines the efficiency proper to dimensional models with the capability of reconstructing the link between any analytical indicator and the concrete IFC entity of the source model from which it originates, thereby guaranteeing result auditability regardless of the number of deliveries or models accumulated in the repository (
Figure 4).
The mechanism is complemented by two additional resources. The pipeline_logger.py module generates a unique execution identifier and a cumulative historical CSV record that enables the reconstruction of the complete sequence of steps applied to the data. The init_dwh.py module performs an automatic backup of the analytical repository before each load, named with {timestamp}_{project_id}_{delivery_id}, which supports the selective cascade deletion of a specific delivery (fact_* → dim_model → dim_delivery) without affecting the history of the remaining deliveries. The combination of these three mechanisms materialises a traceable lineage at entity, execution and delivery level, covering a gap repeatedly documented in the BIM-BI literature.
4.4.3. Dual-Write Pattern for Multi-Project Deployments
When the same BIM2BI installation must serve several concurrent projects, the load_dw.py module supports a dual-write pattern: the analytical results are materialised simultaneously into a central repository (the canonical bim2bi.db that aggregates all projects of the organisation) and into a per-project repository configured at the project metadata. This pattern operationalises the classical distinction between corporate data warehouse and project data mart [
40] within the lightweight deployment of the system, and avoids the access conflicts that arise when different project managers require independent BI connections on the same physical file. The empirical evaluation of the dual-write pattern in the multi-project scalability scenario is reported in
Section 5.3.
4.5. Layer 4: Analytical Exploitation
Layer 4 transforms the data persisted in Layer 3 into actionable knowledge for project stakeholders (technical management, BIM Manager, client, asset manager) through interactive indicators, metrics and visualisations. The layer is deliberately agnostic with respect to the visualisation tool employed: the star schema of the repository ensures its compatibility with any BI platform that supports standard relational connections, preserving the technological neutrality that articulates the whole architecture.
The reference implementation uses Microsoft Power BI for its consolidated adoption in the AEC sector, connected to the SQLite repository through a Python connector that removes the dependence on ODBC drivers on the user’s machine. The indicators are defined as DAX measures on the semantic model and are organised in four categories: (i) model-composition indicators (number of elements by IFC type, surface by discipline, volume by level); (ii) informational-quality indicators (ratio of compliant elements by IFC type, incidents per delivery, evolution of compliance across versions); (iii) cost indicators (construction budget by zone, cost by typology, variance across deliveries); and (iv) traceability indicators (surface variation between versions, elements added or removed, evolution of the information level).
These indicators are articulated in three complementary dashboards: a model-composition view aimed at the BIM Manager, an informational-quality view aimed at the BIM Quality Manager and a cross-delivery traceability view aimed at the project director and the client. The combination of the star schema of Layer 3 with the DAX measures of Layer 4 materialises the central goal of the architecture: converting IFC models into comparable, traceable and reproducible analytical knowledge, independently of the BIM authoring tool or the visualisation platform employed.
4.6. Open-Source Reference Implementation
Version 1.0.0 of the reference implementation materialises the architecture in a functional, documented and reproducible system, publicly available on GitHub under the MIT licence. The technology stack was selected under the criterion of maximum portability: the system must run on any Windows machine without server installation, without network dependencies and at zero licence cost. The technological decisions of each layer consolidate into a single open-source stack: local file system (Layer 1); Python 3.11.9 with ifcopenshell, ifctester, pandas, SQLAlchemy and a Streamlit 1.55.0 interface (Layer 2); embedded SQLite 3.45 (Layer 3); and Power BI Desktop 2.149.1395.0 with a Python connector (Layer 4).
Deployment of the system in a clean environment requires three steps—cloning the repository, installing Python dependencies through pip and running the INICIAR.bat launcher—with a total time below five minutes. This portability reduces the adoption barrier for AEC organisations without a specialised technical profile and enables the full reproduction of the use cases of the validation section by researchers and professionals external to the development team. The published repository bundles the twelve ETL scripts, the Streamlit interface, the DDL schema of the analytical repository, the Power BI dashboard templates and the IDS profiles corresponding to the ten use cases documented in the validation.
5. Validation
5.1. Validation Strategy and Use Case Selection
Validation combines two complementary dimensions characteristic of the DSR framework. Technical validation evaluates compliance with functional requirements (correct extraction of IFC entities, operability of the IDS quality gate, transformation to the star schema without loss of traceability, and load with referential integrity) through inspection of the artefacts produced by the pipeline (staging CSVs, SQLite repository, delivery audit records) and the structured execution logs. Empirical validation evaluates the non-functional requirements (performance, scalability, maintainability and vendor independence) from the execution of the system on real data and documents the emerging findings that motivated architectural refinements not foreseen in the initial design.
The validation set was built through active collection of real BIM deliveries from projects executed in Spain and Chile between 2024 and 2025, applying three combined criteria: availability of complete real data (excluding synthetic models), diversity of delivery configuration (covering extreme ranges in volume, number of IFC files, presence or absence of BC3, IDS and BCF, and IFC schema version) and confidentiality of the originating project’s data. The resulting set comprises ten use cases (CDU_001 to CDU_010) with an approximate total volume of 5 GB distributed over 69 IFC files, grouped in three profiles of increasing complexity.
Profile A (single-discipline deliveries) brings together four cases with a single architectural or structural IFC file accompanied by a budget and an IDS profile, and represents the baseline case of the pipeline in everyday professional working conditions. Profile B (standard multidisciplinary deliveries) brings together four cases with between four and six IFC files processed in a single execution, introducing the consolidation of shared dimensions across models and, in two cases, BCF coordination incidents. Profile C (massive infrastructure deliveries) brings together two cases with more than twenty IFC files, volumes above 500 MB and, in CDU_004, the only delivery of the set with three staged IDS profiles applied to 25 IFC4X3 models approximately 2 GB in size. The diversity of IFC versions present in the set (IFC2X3 in CDU_007, IFC4 in the majority and IFC4X3_ADD2 (i.e., IFC 4.3 ADD2 according to the official ISO 16739-1 denomination) in CDU_004) is in itself a robustness requirement empirically verified: the pipeline processed the three versions without changes to the extraction code (
Table 5).
5.2. Evaluation of Functional Requirements
The extraction and load into the star schema executed correctly in all ten cases, with verifiable preservation of the IfcGlobalId from the input IFC file to the Power BI dashboard indicators. The systematic inspection of foreign keys in the SQLite repository confirmed the referential integrity between fact and dimension tables, and the end-to-end traceability was maintained independently of the delivery profile, the number of models processed and the IFC schema version. This result validates the principle of traceability as an architectural guarantee of the system and not as a property dependent on the configuration of each delivery.
The operability of the IDS quality gate was evaluated across the ten cases, with differentiated behaviours depending on the profile. In the Profile A cases, conformance rates were consistently high, with localised incidents traceable to specific model elements (for instance, 17 IfcMember and IfcPlate elements lacking the required discipline attribute in CDU_001). In the Profile C cases conformance rates were lower due to the greater complexity of the informational requirements associated with linear-infrastructure projects—a greater number of specialised disciplines, a wider variety of IFCs and greater heterogeneity in naming conventions between subcontractors—a pattern coherent with the literature on BIM data quality in infrastructure [
45]. System reproducibility was verified through clean deployment of the repository and complete execution of the ten cases in a secondary environment in less than five minutes of installation, validating the portability criterion described in
Section 4.6.
5.3. Evaluation of Non-Functional Requirements
The performance evaluation showed a stronger correlation between processing time and the number of IFC files than between time and total volume in megabytes (
Figure 5). CDU_004 (23 IFC, 1.97 GB) exceeds CDU_006 (22 IFC, 500 MB) by a proportion lower than what the difference in volume would suggest, which evidences that the phase of opening and parsing IFC files with ifcopenshell dominates the processing time with respect to transformation and load operations in SQLite. The Profile A cases were processed consistently below 90 s, whereas the most demanding case (CDU_004) set the upper bound of system performance around 15 min in the reference configuration with SQLite as analytical repository.
Multi-project scalability was evaluated by exercising the dual-write pattern described in
Section 4.4.3, which avoids the access conflicts that arise when different project managers require independent Power BI connections on the same file. The validation cases were executed in turn against both repositories (central and per-project) without observed integrity divergences between them. Operational maintainability was consolidated through the introduction of an explicit project registry in settings.json with formal metadata and automatic backward compatibility with pre-existing folders, a transition from implicit management by naming convention to explicit management with control metadata that ISO 19650 applies to BIM information containers and that DAMA-DMBOK extends to analytical repositories.
Vendor independence was verified by construction: no pipeline component depends on proprietary software beyond the optional IFC export from Autodesk Revit in those cases where the native file was available. Replacing Power BI with any other BI tool compatible with standard relational connections reduces to reconfiguring the connection, without changes to the analytical repository or to the ETL pipeline. This result operationalises the principle of technological neutrality that articulates the whole architecture.
5.4. Empirical Findings on openBIM Standards
Empirical validation generated three findings not previously documented in the BIM-BI architectures literature, constituting a concrete contribution to the knowledge of the real behaviour of openBIM standards within analytical workflows. The first is the incompatibility between the IFC4X3 identifier and the official XSD schema of IDS 1.0: during the implementation of the quality gate on the IFC 4.3 ADD2 models of CDU_004 (technical identifier IFC4X3_ADD2) it was detected that the value IFC4X3 is not valid in the ifcVersion attribute of the IDS 1.0 XSD schema, which only accepts IFC2X3, IFC4 or IFC4X3_ADD2. The discrepancy originates from a change introduced in the final version of the standard—the IDS 1.0 release notes document the update of the accepted ifcVersion value from IFC4X3 to the official FILE_SCHEMA identifier IFC4X3_ADD2 [
46]—and the same problem has been reported by other implementers in the official ifcopenshell repository [
47]. The finding illustrates a real limitation of the maturity of IDS 1.0 that only emerges in practical implementations, not in the theoretical reading of the specification, and reinforces the value of the use cases documented in this work as empirical evidence of the real behaviour of the openBIM ecosystem.
The second finding affects the loading strategy of the conformed dimensions of the analytical repository [
40]. The initial replace strategy on dimension tables removed records from dim_cost_item and dim_element that had dependent rows in the fact tables coming from prior deliveries, generating 6109 orphan foreign keys during incremental-load testing. This phenomenon—deferred referential-integrity violation—does not produce an immediate error in SQLite, which does not enforce foreign-key constraints by default, but manifests silently as fact records without associated dimension and degrades the analytical indicators without any visible error signal. The implemented solution establishes that dimensions shared between deliveries are always loaded with an additive strategy (append + INSERT OR IGNORE) preserving the existing surrogate keys and adding only the new records. The finding confirms that in a BIM data warehouse where deliveries accumulate across the project’s life cycle, the loading strategy of dimensions must be additive rather than substitutive, unlike transactional data warehouses where periodic replacement is usual.
The third is the concurrency limitation inherent to SQLite as an analytical repository in an architecture where storage and exploitation layers share the same physical file. When Power BI maintains an active connection on bim2bi.db the write operations of the pipeline fail with the error “database is locked”, behaviour inherent to the single-writer model of SQLite that cannot be resolved by configuration. The reference implementation handles this limitation through an explicit retry protocol (BEGIN IMMEDIATE with visual feedback in the interface, up to sixty seconds before an actionable error) and prevents the problem architecturally by means of the dual-write pattern described in the previous subsection. The detailed characterisation of the limit and its solution, transferable to engines with ACID concurrency (MariaDB, PostgreSQL), constitutes a practical contribution for future lightweight BIM-BI implementations. The optional activation of an alternative engine is envisaged in the implementation through the environment variable DW_ENGINE, without modification of the ETL code.
6. Discussion
The BIM2BI architecture presented in this article addresses jointly the four transversal gaps identified in
Section 2 that no prior work covers together. Systematic dimensional modelling is materialised in the star schema of Layer 3, with four fact tables (fact_element, fact_space, fact_cost, fact_incident) and six contextual dimensions that transform object-oriented IFC data into structures compatible with consolidated BI tools, aligning with the practice established since Kimball [
40] but explicitly adapting it to the AEC domain. Granular lineage is solved by preserving the IfcGlobalId as a technical key in every fact table, complemented by per-execution structured logging and per-delivery backups, which enables any analytical indicator to be tracked back to the concrete IFC entity of the source model. Automated pre-ingestion validation is materialised through the IDS quality gate and the three-level staged validation strategy, which extend the application scope of the IDS standard beyond interchange between authoring tools to the governance of data quality in analytical pipelines. And dependence on proprietary platforms is eliminated by construction by operating the pipeline directly on the open IFC standard and integrating with any BI tool compatible with standard relational connections.
The positioning of BIM2BI with respect to prior approaches can be synthesised by grouping them into their four lines of work. With respect to proposals of direct querying over IFC [
32], BIM2BI incorporates the dimensional transformation missing from those approaches and a granular lineage mechanism they do not consider. With respect to semantic-web approaches [
33,
34,
35], BIM2BI avoids the learning overhead and the performance overhead of the RDF/OWL/SPARQL stack, maintaining native compatibility with BI tools consolidated in the sector. With respect to project-specific BIM-BI dashboards [
12,
36,
37,
38], BIM2BI contributes an automated ETL pipeline and a systematic dimensional schema that avoid dependence on manual exports from authoring software and allow cross-project analytical consolidation. And with respect to current commercial initiatives from the major vendors—singularly the AEC Data Model API consolidated in March 2026 under the unified Autodesk Forma brand [
14]—BIM2BI maintains platform independence as a foundational principle, offering an alternative reproducible in any organisation regardless of its technological stack.
To make the differentiation concrete, against the direct-IFC-querying approach typified by Barzegar et al. [
32] (IFC loaded into a PostGIS spatial database through FME), BIM2BI adds the dimensional transformation, the IDS pre-ETL gate and the full-stack BI exploitation that direct-querying approaches leave to the consumer; against the semantic-web approach typified by ifcOWL [
33], BIM2BI avoids the RDF/SPARQL adoption cost while retaining the same source-element traceability granularity through the IfcGlobalId; against the BI-dashboard approach typified by Rodrigues et al. [
12] (Revit-to-Power BI integration through manual export), BIM2BI replaces the manual export with an automated pipeline grounded in the open IFC standard and adds dimensional modelling and lineage that those works did not document; and against the data-quality approach typified by Zhang et al. [
39], focused on coordination-time IFC validation, BIM2BI extends the IDS standard from intra-BIM interchange validation to a pre-ETL gate within an automated analytical workflow.
The acknowledged limitations do not invalidate the proposed architecture but define the perimeter within which the results are valid and guide future work. Validation was carried out in controlled and representative scenarios—ten real use cases from Spain and Chile between 2024 and 2025—but does not equate to an exhaustive evaluation in production environments at industrial scale with multiple projects and heterogeneous teams operating simultaneously. Additionally, the adoption of Design Science Research as the methodological framework implies that the conclusions are inherently dependent on the designed artefact: the generalisation to BIM-BI architectures with different design decisions (alternative database engine, non-sequential orchestrator, analytical schema different from the star schema) would require further research. The reference implementation operates without authentication, authorisation or data-privacy mechanisms, which conditions its direct deployment in multi-user environments, and the quality of the IFC models used directly conditions the IDS conformance rates and the completeness of the generated dimensional schemas, an external variability that the architecture cannot control but can characterise and record as an analytical metric. Furthermore, in line with the inherent characteristics of Design Science Research, the validation reported in
Section 5 was conducted by the same team that designed and implemented the BIM2BI architecture; while this is methodologically standard for DSR artefacts, an independent third-party evaluation by an external team applying BIM2BI to its own deliveries would substantially strengthen the external validity of the findings and is identified as a priority line for future work.
On the practical adoption plane, the three roles introduced in
Section 4.1 (BIM Data Steward, BIM Data Engineer and BIM/BI Analyst) extend rather than replace the canonical ISO 19650 information-management figures [
18]. The Information Manager retains responsibility for the project-level information delivery cycle and the operation of the Common Data Environment; the BIM Data Steward layers on top a focus on the analytical readiness of the data, including IDS profile authoring, conformance monitoring and dimension governance; the BIM Data Engineer operates the ETL pipeline and maintains the analytical schema; and the BIM/BI Analyst translates analytical insights into decisions for project stakeholders. Successful adoption of BIM2BI therefore requires an organisational maturity that includes a defined ownership of the analytical asset (the bim2bi.db repository), skill sets that combine BIM authoring familiarity with data-engineering competence (Python, SQL, dimensional modelling, IDS), and a governance protocol that reconciles the per-delivery validity scope of ISO 19650 with the longitudinal accumulation logic of an analytical repository. These requirements position BIM2BI as a complement to consolidated ISO 19650 workflows rather than a substitute, and emphasise that its deployment is as much an organisational change as a technical one.
The implications of the work are distributed across three planes. On the academic plane, BIM2BI provides the unifying BIM-BI framework that the prior literature did not articulate and demonstrates the viability of applying Design Science Research to the design of data architectures in a technical domain with a strong normative component (openBIM, ISO 19650), furthermore generating quantitative and qualitative evidence on the real behaviour of the standards in analytical workflows that can serve as a reference point for future research on scalability or extension to other organisational contexts. On the industrial plane, the architecture offers to client organisations a reproducible framework to convert the BIM deliveries from their supply chain into longitudinally consolidable analytical assets; to consulting and construction firms, a way to reduce dependence on individual technician judgement in data preparation and to satisfy ISO 19650 requirements as a natural by-product of the pipeline; and to platform vendors, confirmation—reinforced by recent commercial moves—that the demand for granular and decoupled access to BIM data is a market need, not an exclusively academic subject.
On the standardisation plane, the empirical findings of
Section 5 contribute three concrete directions to the buildingSMART agenda. The use of IDS as a pre-ETL quality gate demonstrates that the standard is technically viable in automated analytical workflows but reveals expressive limitations for requirements proper to BI environments (consistency of numerical values, admissible ranges, referential integrity between Property Sets) and discrepancies between the documented specification and its reference implementation (the IFC4X3 vs. IFC4X3_ADD2 case) that could guide future extensions of the schema. The propagation of the IfcGlobalId as a lineage key along the pipeline evidences that IFC already provides the traceability mechanisms required but that their effective exploitation requires implementation conventions that the current standards do not prescribe explicitly. And the integration of BCF-JSON as a structured data source for fact_incident evidences the potential of the standard beyond model coordination, suggesting that its analytical profile could be enriched with fields oriented to data-quality metrics. Together, BIM2BI contributes to the openBIM agenda not as normative development but as a documented analytical use case, in line with the iterative development model that buildingSMART applies to its standards and with the rigour cycle of Design Science Research.
7. Conclusions
This article has presented BIM2BI, an open and reproducible answer to the long-standing question of how to systematically and reliably move BIM-produced information into the analytical layer of organisational decision-making. The central take-home message of the work is that a four-layer architecture grounded in openBIM standards (IFC, IDS, BCF) can transform heterogeneous, semantically weak BIM data into auditable, traceable analytical datasets through a process that combines pre-ETL validation, dimensional modelling and end-to-end IfcGlobalId lineage, all without dependence on proprietary platforms. The empirical validation across ten real-world use cases, with 69 IFC files and approximately 4.5 GB of input volume, demonstrates that this transformation is feasible in practice and that the resulting analytical datasets reflect the structural decisions of the architecture (29,418 element rows, 19,994 cost rows and 19,368 incident rows loaded for the conforming deliveries; entries in the fact tables suppressed for non-conforming deliveries, as the architecture specifies).
The significance of the contribution is threefold. For the academic community, BIM2BI provides the unifying framework that the BIM-BI literature had not articulated jointly, with explicit closure of the four transversal gaps identified in
Section 2.4. For AEC practice, the open-source reference implementation (MIT licence, portable deployment in less than five minutes) and the formalisation of three new operational roles (BIM Data Steward, BIM Data Engineer, BIM/BI Analyst) lower the adoption barrier for organisations seeking to convert their BIM deliveries into reusable analytical assets aligned with ISO 19650. And for the standardisation agenda, the empirical findings on the real behaviour of openBIM standards in analytical workflows—in particular the IDS 1.0 schema discrepancy with IFC 4.3 ADD2 and the operational viability of the staged three-level IDS validation—constitute concrete inputs to the buildingSMART development process.
These conclusions should be interpreted within the boundaries of the validation scope. The empirical evidence is grounded in ten use cases from projects in Spain and Chile (2024–2026); generalisation to industrial-scale, multi-team and multi-vendor production environments would require additional independent evaluations. The reference implementation, as deployed for this work, operates without authentication and without an ACID-concurrency repository, conditioning its direct deployment in multi-user environments. And the design-science methodology, by construction, validates the artefact against its own design intent rather than against external alternative architectures. These are not structural defects but explicit boundaries of the present study.
Looking forward, four research lines derive directly from these limits and from the results obtained. First, the extension of the analytical repository towards additional dimensions of BIM data (fact_schedule, fact_energy, fact_sustainability) for areas with high regulatory and market demand such as the EU Energy Performance of Buildings Directive and the Corporate Sustainability Reporting Directive. Second, the closure of the definition-validation cycle through LOINXML as a formal layer of informational requirements and integration with the buildingSMART Data Dictionary (bSDD) as a source of normalised definitions. Third, the evolution of the deployment towards multi-user environments, combining authentication and access-control with the replacement of SQLite by an ACID-concurrency engine and orchestration through Apache Airflow, both already anticipated by the DW_ENGINE variable and the compatibility of the pipeline with both execution modes. And fourth, the incorporation of Layer 1 connectors oriented to proprietary cloud APIs—singularly the AEC Data Model API operated under the Autodesk Forma brand—as a complementary source to the IFC file, keeping the same data contract towards the transformation and load layers and preserving vendor independence as a foundational principle. An independent third-party validation of the BIM2BI architecture, as identified in
Section 6, is a transversal requirement for all four lines.