Schema-Agnostic Data Type Inference and Validation for Exchanging JSON-Encoded Construction Engineering Information
Abstract
1. Introduction
1.1. Research Background and Motivation
1.2. Research Objectives and Scope
1.3. Contributions of This Study
2. Theoretical Background and Related Work
2.1. Construction Information Modeling and Data Exchange
2.2. Integration of Unstructured Project Information with Structured Object Representations
2.3. Schema Inference and Ontology Matching for Dynamic Data Objects
2.3.1. Schema Inference
2.3.2. Ontology Matching
2.3.3. Data Mapping
2.4. Verifying Correctness of Dynamic Data Objects
2.4.1. Structural Correctness
2.4.2. Semantic Correctness
3. Proposed Methodology: Schema-Agnostic Information Exchange Based on Dynamic Data Objects
3.1. Overview of Methodology
3.1.1. Data Dictionary
3.1.2. Object Class Inference Mechanism
3.2. Core Components
3.2.1. Data Dictionary/Ontology
- Semantic Definition: It disambiguates the meaning of keys or attribute names used in the dynamic data; for instance, it clarifies whether a “length” attribute refers to a physical dimension (e.g., of a beam) or a time duration (e.g., of a task).
- Concept Identification: It formally identifies and describes the core entities pertinent to the construction project domain, such as physical elements, spaces, activities, and resources.
- Relationship Indication: It explicitly defines the key relationships between concepts, including inheritance (is-a), composition/decomposition (part-of), and association (refers-to). This helps the receiving systems to understand the intended structure and context (e.g., “Column” is-a “StructuralMember”; “ReinforcementBar” part-of “Beam”).
- Data Type and Unit Guidance: It specifies the expected data types (e.g., string, number, Boolean, object reference, and list) and the associated units of measure (e.g., m, mm, kg, and MPa) for the attributes. Although not strictly enforced at the schema level, this guidance is crucial for ensuring consistency during data generation and subsequent validation.
3.2.2. Dynamic Data Object
3.3. Mitigating Uncertainty in Class Inference Using a Data Dictionary
- -
- J is the input JSON object.
- -
- C is a candidate class definition from the data dictionary.
- -
- Keys(J) is the set of keys present in the JSON object J.
- -
- Members(C) is the set of all the members (attributes) defined for class C in the dictionary.
- -
- MatchedMembers(J, C) is the subset of Members(C) for which a corresponding key exists in Keys(J) (considering synonym mapping).MatchedMembers(J, C) = {m ∈ Members(C)|∃ k ∈ Keys(J) such that k maps to m}.
- -
- ConformingMembers(J, C) is the subset of MatchedMembers(J, C), where the value associated with the corresponding key in J conforms to the constraints (type, format, range, unit, etc.) specified for the member m in the dictionary definition C.
- -
- NoiseKeys(J, C) is the subset of Keys(J) that does not map to any member in Members(C).NoiseKeys(J, C) = {k ∈ Keys(J)|¬∃ m ∈ Members(C) such that k maps to m}.
- -
- |S| denotes the cardinality (number of elements) of a set S.
- -
- Coverage (ScoreCov): Calculate the percentage of members defined in the dictionary (especially mandatory members) that are present in the JSON object.
- -
- Accuracy/Conformance (ScoreAcc): Measure how well the values of the members present in the JSON object conform to the specifications (type, format, unit, range, enum, etc.) in the dictionary.
- -
- Redundancy/Noise (ScoreNoise): Quantify the proportion of members present in the JSON object that are not defined in the dictionary class definition. A higher noise score generally indicates a poorer match.
- -
- (Optional) Semantic Fit: Incorporate a score reflecting the semantic similarity between key names, even if they do not exactly match. Inferred meanings that align well can contribute positively to the score.
- -
- If the primary objective is to ensure that all the expected data fields are present (completeness), then wCov is assigned a relatively high value.
- -
- If the correctness of the values for the present fields (precision) is more critical, wAcc has a higher weight.
- -
- wNoise determines the penalty applied for extraneous keys present in the JSON object but not defined in the dictionary class.
3.4. Validation Framework for Dynamic Data
3.4.1. Preparation Phase: Data Dictionary Construction and Management
3.4.2. Data Generation and Sharing Phase
- Dictionary Context Information: Details specifying the classification system(s) and version(s) of the Data Dictionary that were used as the basis for generating the data or are intended to be used for its interpretation (e.g., {“dictionaryRefs”: [“KCCS_v2.1”, “UniClass2015_v1.8”]}). This ensures that the recipient uses the correct semantic reference framework.
- Inference Results of the Sender (Optional): The outcome of any preliminary type inference performed by the sender, potentially including the most likely class type and an associated confidence score (e.g., {“senderInferredType”: {“type”: “KCCS:11-22-33”, “confidence”: 0.92}}). If provided, this information can serve as a suggestion or starting point for the independent inference process of the receiver but does not replace it.
- Match Score of the Sender (Optional): The match score calculated by the sender, quantifying the conformance of the data to the dictionary definition from the perspective of the originator (e.g., {“senderMatchScore”: 0.88}). When available, this score offers the receiver an initial indication of the data quality as assessed by the sender, although the receiver must perform its own comprehensive validation to make a definitive quality judgment.
3.4.3. Data Validation Phase—Reception and Evaluation
- Data Reception and Context Verification:The receiving system first ingests the data package, which includes dynamic JSON objects and the accompanying metadata. A critical initial action involves parsing the metadata to identify the specified data dictionary version(s) referenced by the sender. The system must ensure that it has the correct version of the dictionary, as this provides the essential semantic framework for all subsequent interpretation and validation tasks.
- Execution of Type Inference:For each received JSON object, the core-type inference algorithm, detailed in Section 3.3, is executed. By utilizing the loaded data dictionary, the algorithm analyzes the structure keys and potentially the values of the object to identify the most probable class definition(s) that it represents. Although the sender-provided indications may be available in the metadata, the receiving system performs this inference independently to arrive at an objective classification. This process typically yields one or more candidate class types, each associated with a calculated confidence or similarity score that reflects the strength of the match based on the inference logic.
- Initial Screening via Inference Results:The outcome of the type inference is subjected to a preliminary assessment. If the highest calculated confidence score for an object fails to meet a predefined minimum threshold (e.g., 0.7), or if the scores for the top two candidate classes are too close to distinguish reliably, the object may be flagged as “Type Ambiguous” or “Unclassifiable.” Such objects may be excluded from further automated processing or routed for manual inspection, thereby preventing erroneous interpretations downstream. The objects that pass this initial screening proceed with a provisionally assigned inferred type.
- Quantification of Match Degree:Following a successful type inference, the system performs a detailed quantification of the match degree between the actual JSON object and the specific definition of its inferred class within the data dictionary. This involves applying the comprehensive scoring mechanism described in Section 3.3, which calculates metrics, such as Coverage, Accuracy/Conformance, and Noise/Redundancy (as defined conceptually in Section 3.4.1). The system meticulously compares the object keys, conformance of its values (type, format, range, and unit), and structural characteristics against the dictionary requirements for the inferred class. This calculation yields individual scores for each metric and culminates in a single, comprehensive final match score (e.g., a value between 0 and 1 or 0–100%), objectively representing the overall conformance of the object to its inferred semantic definition.
- Final Validation Verdict:The calculated final match score is then compared against a predefined acceptance threshold (e.g., 0.8 or 80%) to determine the final validation outcome.
- -
- Validation Passed: If the score meets or exceeds the threshold, the object is considered successfully validated. It is typically marked as “Verified” or “Passed,” confirming its inferred type and deeming it sufficiently reliable for integration into downstream applications, analyses, or data stores.
- -
- Validation Failed/Review Required: If the score falls below the threshold, the object is flagged as “Validation Failed” or “Requires Review.” Critically, the individual metric scores (Coverage, Accuracy, Noise) calculated during the quantification of the degree of match provide valuable diagnostic information. By examining these sub-scores, users or automated systems can identify the specific reasons for the low overall score, such as missing mandatory attributes (low coverage), incorrect data value formats (low accuracy), and the presence of many unexpected fields (high noise). This allows for targeted troubleshooting or data cleansing.
- Utilization of Results:All results from the validation process, including the inferred class type, confidence score, final match score, individual metric scores, and final validation status (Passed/Failed/Review), are recorded for each data object. This information serves multiple purposes: generating data quality reports, enabling filtering or querying of data based on the validation status or quality scores, prioritizing data remediation tasks, and potentially providing insights to improve the upstream data generation processes.
3.5. Expected Benefits
4. Experimentation and Evaluation
4.1. Experiment Design
- Loading and parsing the Data Dictionary.
- Ingesting and iterating through the objects in both JSON datasets.
- Executing the type inference algorithm (detailed in Section 3.3) based on member matching against the dictionary.
- Calculating the degree of match metrics (Coverage, Accuracy) and the final match score for each object against its inferred class definition (as per Section 3.3 and Section 3.4.3).
4.1.1. Datasets Used
4.1.2. Data Dictionary Construction
4.1.3. Experimental Procedure
- Convert the IFC file to JSON (ifcJSON-style type annotations included for result validation only).
- Run a software script to transform the EXPRESS schema (matching the IFC version of the dataset) into a dictionary.
- For each object of the dataset:
- Compare the attribute names of the dynamic object against the data dictionary to extract the candidate entity types.
- Eliminate any entity whose required properties are not present in the object from the candidate set.
- Sort the remaining candidates in descending order by the number of attributes they share with the object (i.e., entities with a higher ScoreAcc are ranked first). The inverse references are also counted as attributes.
- Apply a higher weight (WAcc) to attributes defined in the higher-level supertypes. The datasets used in these experiments contain no noise, and the weighted scores are not adjusted.
- Report both the entity with the highest raw ScoreAcc and that with the highest WAcc-adjusted score as the inferred types and compare them against the true IFC entity type of the objects.
4.1.4. Evaluation Metrics
4.2. Experimental Results
- The top-scoring candidate entity type exactly matches the actual entity type of the object (exact match).
- The top-scoring candidate corresponds to a supertype of the actual entity (e.g., the inferred candidate is IfcWall, where the true type is IfcWallStandardCase).
- There are multiple top-scoring (tied) candidates, one of which matches the actual entity type of the object (i.e., match among ties).
- Among multiple top-scoring ties, one candidate corresponds to a supertype of the actual entity (match with the supertype among the ties).
- None of the candidates match the actual entity type of the object (no match).
4.3. Analysis and Discussion of the Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CAD | Computer-Aided Design |
CAE | Computer-Aided Engineering |
IFC | Industry Foundation Classes |
IoT | Internet of Things |
ISO | International Organization for Standardization |
JSON | JavaScript Object Notation |
LLM | Large Language Model |
MEP | Mechanical, Engineering and Plumbing |
ML | Machine Learning |
OWL | Web Ontology Language |
STEP | Standard for the Exchange of Product Model Data |
URN | Uniform Resource Name |
W3C | World-Wide Web Consortium |
XML | Extensive Markup Language |
YAML | YAML Ain’t Markup Language |
References
- Eastman, C.; Teicholz, P.; Sacks, R.; Liston, K. BIM Handbook: A Guide to Building Information Modeling for Owners, Managers, Designers, Engineers, and Contractors; John Wiley & Sons: Hoboken, NJ, USA, 2008; pp. 66–91. [Google Scholar]
- Solihin, W.; Eastman, C.; Lee, Y.C. A framework for fully integrated building information models in a federated environment. Adv. Eng. Inform. 2016, 30, 168–189. [Google Scholar] [CrossRef]
- Sachs, R.; Wang, Z.; Ouyang, B.; Utkucu, D.; Chen, S. Toward artificially intelligent cloud-based building information modelling for collaborative multidisciplinary design. Adv. Eng. Inform. 2022, 53, 101711. [Google Scholar] [CrossRef]
- Introducting JSON. Available online: https://www.json.org/json-en.html (accessed on 21 August 2025).
- East, E.W. Construction Operations Building Information Exchange (COBIE) Requirements Definition and Pilot Implementation Standard (EDRC/CERL TR-07-30); Engineer Research and Development Center, US Army Corps of Engineers: Champaign, IL, USA, 2007. [Google Scholar]
- Sobkhiz, S.; El-Diraby, T. Dynamic integration of unstructured data with BIM using a no-model approach based on machine learning and concept networks. Autom. Constr. 2023, 150, 104859. [Google Scholar] [CrossRef]
- Cánovas Izquierdo, J.L.; Cabot, J. Discovering implicit schemas in JSON data. In Proceedings of the International Conference on Web Engineering, Aalborg, Denmark, 8–12 July 2013. [Google Scholar] [CrossRef]
- Klettke, M.; Störl, U.; Shenavai, M.; Scherzinger, S. NoSQL schema evolution and big data migration at scale. In Proceedings of the 2016 IEEE International Conference on Big Data, Washington, DC, USA, 5–8 December 2016. [Google Scholar] [CrossRef]
- Baazizi, M.-A.; Colazzo, D.; Ghelli, G.; Sartiani, C. Parametric schema inference for massive JSON datasets. VLDB J. 2019, 28, 497–521. [Google Scholar] [CrossRef]
- Miller, R.J.; Haas, L.M.; Hernández, M.A. Schema mapping as query discovery. In Proceedings of the International Conference on Very Large Data Bases, Cairo, Egypt, 10–14 September 2000. [Google Scholar]
- Madhavan, J.; Bernstein, P.A.; Rahm, E. Generic schema matching with cupid. VLDB 2001, 1, 49–58. [Google Scholar]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007. [Google Scholar]
- Beach, T.H.; Rezgui, Y.; Li, H.; Kasim, T. A rule-based semantic approach for automated regulatory compliance in the construction sector. Expert. Syst. Appl. 2015, 42, 5219–5231. [Google Scholar] [CrossRef]
- Klettke, M.; Störl, U.; Scherzinger, S. Schema extraction and structural outlier detection for JSON-based NoSQL data stores. In Proceedings of the Database System for Business, Technology and Web, Hamburg, Germany, 6 March 2015. [Google Scholar]
- Afsari, K.; Eastman, C.M.; Castro-Lacouture, D. JavaScript Object Notation (JSON) data serialization for IFC schema in web-based BIM data exchange. Autom. Constr. 2017, 77, 24–51. [Google Scholar] [CrossRef]
- Solihin, W.; Eastman, C.; Lee, Y.C. Toward robust and quantifiable automated IFC quality validation. Adv. Eng. Inform. 2015, 29, 739–756. [Google Scholar] [CrossRef]
- Lee, Y.-C.; Eastman, C.M.; Solihin, W. Rules and validation processes for interoperable BIM data exchange. J. Comput. Des. Eng. 2021, 8, 97–114. [Google Scholar] [CrossRef]
- Venugopal, M.; Eastman, C.M.; Teizer, J. An ontology-based analysis of the industry foundation class schema for building information model exchanges. Adv. Eng. Inform. 2015, 29, 940–957. [Google Scholar] [CrossRef]
- Terkaj, W.; Šojić, A. Ontology-based representation of IFC EXPRESS rules: An enhancement of the ifcOWL ontology. Autom. Constr. 2015, 57, 188–201. [Google Scholar] [CrossRef]
- Pauwels, P.; Van Deursen, D.; Verstraeten, R.; De Roo, J.; De Meyer, R.; Van de Walle, R.; Van Campenhout, J. A semantic rule checking environment for building performance checking. Autom. Constr. 2011, 20, 506–518. [Google Scholar] [CrossRef]
- Pauwels, P.; Krijnen, T.; Terkaj, W.; Beetz, J. Enhancing the ifcOWL ontology with an alternative representation for geometric data. Autom. Constr. 2017, 80, 77–94. [Google Scholar] [CrossRef]
- Han, K.K.; Cline, D.; Golparvar-Fard, M. Formalized knowledge of construction sequencing for visual monitoring of work-in-progress via incomplete point clouds and low-LoD 4D BIMs. Adv. Eng. Inform. 2015, 29, 889–901. [Google Scholar] [CrossRef]
- Sacks, R.; Ma, L.; Yosef, R.; Borrmann, A.; Daum, S.; Kattel, U. Semantic enrichment for building information modeling: Procedure for compiling inference rules and operators for complex geometry. J. Comput. Civ. Eng. 2017, 31, 1–12. [Google Scholar] [CrossRef]
- Zhou, P.; El-Gohary, N. Automated matching of design information in BIM to regulatory information in energy codes. In Proceedings of the Construction Research Congress 2018, New Orleans, Louisiana, 2–4 April 2018. [Google Scholar]
- Chung, T.; Bok, J.H.; Ji, H.W. A multi-story expandable systematic hierarchical construction information classification system for implementing information processing in highway construction. Appl. Sci. 2023, 13, 10191. [Google Scholar] [CrossRef]
- Sample & Test Files. Available online: https://github.com/buildingSMART/Sample-Test-Files (accessed on 16 June 2025).
- Sample & Test Files. Available online: https://github.com/buildingsmart-community/Community-Sample-Test-Files (accessed on 16 June 2025).
- ISO 10303-21:2002; Industrial Automation Systems and Integration—Product Data Representation and Exchange—Part 21: Implementation Methods: Clear Text Encoding of the Exchange Structure. ISO: Geneva, Switzerland, 2002.
- You, S.J.; Kim, S.W. Schema-agnostic dynamic object data exchange methodology applicable to digitized construction engineering information. J. Korea Acad.-Ind. Coop. Soc. 2023, 10, 848–855. (In Korean) [Google Scholar]
Advantages | Disadvantages | |
---|---|---|
Schema clarity and semantic interoperability | Well-defined schemas clarify structure and semantics, enabling cross-system data exchange, especially geometric shape information. | - |
Data integrity and consistency | Static models promote data integrity and consistency; strict rules simplify maintaining structural correctness. | Strictness reduces flexibility when novel or atypical data must be represented. |
Standards maturity and tooling | Mature standards (STEP, IFC) and an abundance of software tools. | Standards are slow and complex to update. |
Extensibility/adding new types | Predictable schemas ease integration and parsing. | Difficult to add types/attributes not in the predefined schema, hindering rapid response to new needs. |
Data conversion fidelity | Neutral formats enable cross-software exchange and reduce ad-hoc translations. | Conversion can lose or distort vendor-unique data when importing/exporting. |
Support for unstructured/semi-structured data | - | Neutral models poorly accommodate unstructured or semi-structured data (simulation results, sensor streams, documents). |
Method | Input Data | Core Technique | Limitation | Relation to This Study |
---|---|---|---|---|
Integration of Unstructured Information | ||||
Dynamic Integration [6] | Project documents, user communication network | Combines semantic networks from NLP with communication network analysis to derive project-specific models. | Ensuring the derived model is understandable and shareable by all project participants is a challenge. | Contrasts their ML method by using a rule-based approach with a shared dictionary to ensure common understanding. |
Schema Inference | ||||
Iterative Schema Merging [7] | Individual JSON documents | Extracts and progressively merges schemas from separate JSON documents to discover an implicit overall schema. | May result in an overly permissive or complex ‘union schema’ if data sources are highly heterogeneous. | Matches an instance to a dictionary class to infer its type, instead of inferring a schema from multiple documents. |
Reverse Engineering [8,14] | JSON data | Uses a structure identification graph to reverse-engineer an explicit schema and detect structural outliers. | The quality of the inferred schema is highly dependent on the initial dataset | This study’s ‘Noise’ score is similar to their ‘outlier’ concept but measures against a predefined dictionary, not an inferred schema. |
Parametric Inference [9] | Massive, schema-less JSON datasets | Analyzes field occurrence frequencies to identify mandatory/optional fields. | The absence of a schema can negatively affect query correctness and optimization. | Unlike their method of inferring from data, this study explicitly predefines mandatory attributes in the dictionary. |
Ontology Matching | ||||
Instance-based Matching [10] | Schema structure and data instances | Infers schema mappings by analyzing data value characteristics, such as types and patterns. | Primarily designed to transform source data into a fixed target schema. | Incorporates their value-based checks into this study’s ‘Accuracy’ score calculation. |
Schema-graph Matching [11] | Two schemas | Models schemas as graphs and generates mappings by calculating linguistic and structural similarities. | Intended for translation between two existing schemas. | Addresses instance-to-class matching, a different problem from their schema-to-schema translation. |
Hybrid/Extensible Ontology [12] | Wikipedia, WordNet | Integrates diverse techniques to automatically extract facts and build the YAGO knowledge base. | Prone to factual inaccuracies and biases inherited from its source (Wikipedia). | Focuses on using a predefined dictionary for inference, rather than a large knowledge base. |
Data Mapping | ||||
Rule-Based Semantic Mapping [13] | BIM data (IFC) | Transforms BIM data to RDF and applies semantic web rules (SPARQL, SWRL) for automated compliance checking. | Requires significant expertise to author and maintain complex semantic rules (SWRL). | Also rule-based, but this study first infers the type of schema-less data before applying quantitative validation. |
Correctness Verification | ||||
Structural Correctness [15] | JSON documents representing BIM data | Performs formal validity checks and then validates against a defined ifcJSON4 schema. | Relies on a rigid, predefined schema (ifcJSON4), offering little flexibility for data variations or project-specific extensions. | Operates in a schema-less context, requiring a type inference step before validation can occur. |
Semantic Correctness/Rule-based [16,17] | IFC models, MVDs | Validates data conformance against predefined rules from an EXPRESS schema or Model View Definitions (MVDs). | Tightly coupled to the IFC standard and its MVDs; creating and managing comprehensive rule sets for each MVD is complex. | This study adds a necessary preceding step of type inference for unknown objects. |
Semantic Correctness/Ontology-Based [18,19,21] | ifcOWL (IFC as an ontology) | Uses formal ontologies and logical reasoning engines to verify semantic correctness and compliance. | High computational overhead and complexity of semantic reasoners. | This study uses a lightweight data dictionary and a quantitative scoring system instead of formal ontologies. |
Advantages | Disadvantages | |
---|---|---|
Explicit Identification via Embedded Metadata | Provides a direct and clear identification of the object’s type; allows receiving systems to quickly find the class definition by checking a single known field. | All parties must agree on the specific metadata keys to be used (e.g., @type); can reintroduce a degree of structural rigidity, reducing the flexibility of schema-less data. |
Contextual Identification via Wrapping Structures | Structures data by logically grouping objects of the same type; keeps individual objects free from explicit metadata fields, focusing purely on their properties. | Imposes a stricter overall data structure, which limits flexibility; becomes difficult to use when objects of diverse types must be intermingled in the same list. |
External Identification via Manifest Files | Keeps the primary data files clean and lean, containing only property data; clearly separates instance data from their classification metadata. | Adds the complexity of managing and linking multiple files (data and manifest); requires systems to correlate information from separate files to determine object types. |
Implicit Identification via Member Matching | Allows for processing data even when they lack any explicit type information; aligns well with the dynamic and unpredictable nature of data from distributed sources. | High potential for misclassification, especially when different classes share similar member names; accuracy is degraded by inconsistent naming, missing members, or unexpected “noise” keys; accurate matching logic often requires incorporating domain-specific knowledge. |
Column Name | Description | Applies To 1 | Notes |
---|---|---|---|
Entry ID 2 | A globally unique identifier for this specific dictionary entry (row). Could be a URI, UUID, or another unique scheme. | Both | Primary key for the dictionary table. |
Entry Type 3 | Specifies whether this row defines a “Class” or a “Member.” | Both | Determines the interpretation of other columns in this row. |
Parent Class ID 4 | If Entry Type is “Member,” this contains the Entry ID of the Parent Class this member belongs to. | Member | Null/empty if Entry Type is “Class.” Links Member to its Class. |
Key Name 5 | If Entry Type is “Member,” this is the standardized key/name used in data instances (e.g., JSON key). | Member | Null/empty if Entry Type is “Class.” |
Human-Readable Name | A user-friendly name for the Class or Member (potentially multilingual). | Both | E.g., “Reinforced Concrete Beam”, “Element Length.” |
Description | Detailed explanation of the Class concept or the purpose and meaning of the Member. | Both | Provides semantic context. |
Semantic Synonyms | If Entry Type is “Member,” lists alternative names/abbreviations mapping to the Key Name. | Member | Null/empty if Entry Type is “Class.” Aids in robust key matching. |
Data Type 5 | If Entry Type is “Member,” specifies the expected data type (e.g., string, number, Boolean, object, array). | Member | Null/empty if Entry Type is “Class.” |
Mandatory Status 5 | If Entry Type is “Member,” indicates if this member is mandatory (“True”) or optional (“False”) for its Parent Class. | Member | Null/empty if Entry Type is “Class.” |
Format Constraint | If Entry Type is “Member,” defines specific format rules (e.g., regex, ISO date). | Member | Null/empty if Entry Type is “Class.” |
Unit of Measure | If Entry Type is “Member” and Data Type is numeric, specifies the expected unit (e.g., m, kg, MPa). | Member | Null/empty otherwise. |
Enumerated Values | If Entry Type is “Member,” lists permissible values if it is an enumerated type. | Member | Null/empty otherwise. |
Nested Structure | If Entry Type is “Member” and Data Type is object/array, defines the expected structure/cardinality of the nested element(s). | Member | Null/empty otherwise. |
Discriminative Wt. | If Entry Type is “Member,” indicates its importance weight for class inference. | Member | Null/empty if Entry Type is “Class.” |
Coverage Wt. | If Entry Type is “Member,” indicates its weight for quality score calculation (esp. if mandatory). | Member | Null/empty if Entry Type is “Class.” |
Version Info | Information about the version of this dictionary entry (e.g., creation date, modification date, version number). | Both | Important for dictionary management. |
Number | Original IFC File Name | IFC Schema Version | Number of Objects | Description |
---|---|---|---|---|
1 | Duplex_A_20110907.ifc | 2 × 3 TC1 | 33,510 | A duplex house with MEP assemblies |
2 | construction-scheduling-task.ifc | 4 | 246 | Building walls and a slab; each element is assigned with scheduled construction tasks |
3 | Infra-Road.ifc | 4.3 Addendum 2 | 887 | Road parts; each part consists of multiple layers |
4 | Infra-Bridge.ifc | 4.3 Addendum 2 | 883 | Two generic road bridges |
Metric | Description |
---|---|
Exact Match Rate | The percentage of objects where the inferred class precisely matches the ground-truth class |
Superclass Match Rate | The percentage of objects where the inferred class is a direct superclass of the ground-truth class in the dictionary hierarchy |
Overall Success Rate | The sum of Exact Match Rate and Superclass Match Rate |
JSON Dataset | Inference Method | Exact Match | Among Ties | Supertype Match | Supertype Among Ties | No. Match |
---|---|---|---|---|---|---|
Data 1 (Duplex, 33,510 objects) | Unweighted | 26,358 (76.5%) | 122 (0.4%) | 0 (0%) | 5393 (16.1%) | 2357 (7.0%) |
Weighted | 25,625 (76.5%) | 532 (1.6%) | 0 (0%) | 4853 (14.5%) | 2500 (7.5%) | |
Data 2 (Construction Schedule, 246 objects) | Unweighted | 151 (61.4%) | 10 (4.1%) | 0 (0%) | 5 (2.0%) | 80 (32.5%) |
Weighted | 151 (61.4%) | 24 (9.8%) | 0 (0%) | 4 (1.6%) | 67 (27.2%) | |
Data 3 (Infra-Road, 887 objects) | Unweighted | 672 (75.8%) | 117 (13.2%) | 0 (0%) | 17 (1.9%) | 81 (9.1%) |
Weighted | 662 (74.6%) | 181 (20.4%) | 0 (0%) | 0 (0%) | 44 (5.0%) | |
Data 4 (Infra-Bridge, 883 objects) | Unweighted | 665 (75.3%) | 102 (11.6%) | 0 (0%) | 40 (4.5%) | 76 (8.6%) |
Weighted | 682 (77.2%) | 149 (16.9%) | 0 (0%) | 0 (0%) | 52 (5.9%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
You, S.; Ji, H.W.; Kwak, H.; Chung, T.; Bae, M. Schema-Agnostic Data Type Inference and Validation for Exchanging JSON-Encoded Construction Engineering Information. Buildings 2025, 15, 3159. https://doi.org/10.3390/buildings15173159
You S, Ji HW, Kwak H, Chung T, Bae M. Schema-Agnostic Data Type Inference and Validation for Exchanging JSON-Encoded Construction Engineering Information. Buildings. 2025; 15(17):3159. https://doi.org/10.3390/buildings15173159
Chicago/Turabian StyleYou, Seokjoon, Hyon Wook Ji, Hyunseok Kwak, Taewon Chung, and Moongyo Bae. 2025. "Schema-Agnostic Data Type Inference and Validation for Exchanging JSON-Encoded Construction Engineering Information" Buildings 15, no. 17: 3159. https://doi.org/10.3390/buildings15173159
APA StyleYou, S., Ji, H. W., Kwak, H., Chung, T., & Bae, M. (2025). Schema-Agnostic Data Type Inference and Validation for Exchanging JSON-Encoded Construction Engineering Information. Buildings, 15(17), 3159. https://doi.org/10.3390/buildings15173159