Next Article in Journal
MTPA Control Strategy for Brushless DC Motors Based on Zero-Sequence Current Injection
Next Article in Special Issue
On the Role of Feature Extraction in Transformer PD Severity Classification: A Controlled Comparison of PCA and Autoencoder Models
Previous Article in Journal
Machine Vision for In Situ Measurement and Control of Wire Stickout in LWDED Process
Previous Article in Special Issue
Efficient Dual-Stream Network with Soft-Gated Fusion for Bearing Fault Diagnosis Using Acoustic Emission Signals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BIM and PdM of Railway Rolling Stock with Automatic Upgrading Based on GenAI

by
João Matos Coutinho
1,2,3,*,
Hugo Raposo
3,4,
José M. Torres Farinha
3,4 and
Antonio J. Marques Cardoso
2
1
Departamento de Engenharia Eletromecânica, University Beira Interior, 6201-001 Covilhã, Portugal
2
CISE—Electromechatronic Systems Research Centre, University of Beira Interior, Calçada Fonte do Lameiro, 6201-001 Covilhã, Portugal
3
RCM2+—Research Center for Management Engineering and Asset Systems, Institute of Engineering, Polytechnic University of Coimbra, Rua Pedro Nunes—Quinta da Nora, 3030-199 Coimbra, Portugal
4
Department of Industrial Engineering and Management, Coimbra Institute of Engineering, IPC—Polytechnic University of Coimbra, Rua da Misericórdia, Lagar dos Cortiços, S. Martinho do Bispo, 3045-093 Coimbra, Portugal
*
Author to whom correspondence should be addressed.
Machines 2026, 14(5), 535; https://doi.org/10.3390/machines14050535
Submission received: 8 March 2026 / Revised: 20 April 2026 / Accepted: 8 May 2026 / Published: 11 May 2026
(This article belongs to the Special Issue Condition Monitoring and Fault Diagnosis)

Abstract

The paradigm transition of the life cycle management of physical assets in the railway sector demands new maintenance models that imply the conventional predictive approaches to be surpassed. This paper proposes an innovative methodology that integrates Building Information Modelling (BIM) with predictive maintenance (PdM) systems to be applied to rolling stock and, in this way, be enhanced by Generative Artificial Intelligence (GenAI). The research focuses on the autonomous synchronisation of the Rolling Stock Digital Twin (DT). Unlike static BIM models, the proposed solution enables the use of GenAI algorithms to process continuous data streams from integrated sensors, allowing the digital model to evolve autonomously as physical wear occurs. In this framework, GenAI (via Generative Adversarial Networks—GANs) is essential for data augmentation, enabling the simulation of rare “long-tail” failure events that are scarce in real-world historical data. By synthesising these degradation scenarios, the model learns complex mechanical collapse patterns that otherwise would be ignored by traditional PdM approaches. GenAI is employed to synthesise degradation scenarios, perform real-time parametric updates within the IFC (Industry Foundation Classes) schema, and optimise maintenance workflows. The application of this framework demonstrates a significant reduction in diagnostic latency and optimises the rolling stock’s operational life cycle by automating updates and reducing the need for manual data entry. This study concludes that the convergence among BIM, PdM, and GenAI establishes a robust framework for railway fleet management. While the current validation focuses on bogie systems using Random Forest and LLMs, it paves the way for a future Industrial Metaverse where immersive diagnostics can be integrated into the maintenance lifecycle.

1. Introduction

Currently, new projects are facing increasingly complex challenges, such as in the execution of the work and in the innovation [1]. With advances in deep learning and computer vision, the railway industry has been experiencing significant innovations in terms of safety through increasingly efficient anomaly detection and infrastructure inspection [2]. Static BIM models fail to manage dynamic assets on railways. The IFC standard (buildingSMART [3]) is increasingly integrated into the mechanical, electrical, and hydraulic components of the 3D model, thus allowing it to go beyond the traditional architecture. Despite these advancements, a significant research gap persists: traditional Building Information Modelling (BIM) frameworks are predominantly stationary and cannot autonomously adapt to the rapid physical degradation of rolling stocks. Literature identifies a “semantic gap” because digital models remain static “as-built” documents, failing to capture the “as-maintained” operational evolution of kinetic systems such as bogies and braking units. Furthermore, an interoperability crisis often results in critical semantic loss during data exchange, where complex parametric geometries are reduced to generic boundary representations (BREP). To address these limitations, the main objective of this paper is to propose an innovative tri-layer methodology that integrates BIM with predictive maintenance (PdM) systems, enhanced by Generative Artificial Intelligence (GenAI). With physical wear and tear and with the updating digital models significantly enhancing PdM, what helps to improve its level is aided by GenAI. The framework utilises Large Language Models (LLMs) to interpret unstructured sensor logs, translating them into IFC-compliant parameters for real-time Digital Twin (DT) synchronisation. In this study, a framework is proposed where GenAI acts as a key engine for the translation between data and sensors and the BIM structure and parameters.
This enables the digital model to evolve autonomously as physical wear occurs, transforming the BIM environment from a static geometric repository into a dynamic management tool. This workflow facilitates the extensive automation of synchronisation tasks, thereby significantly reducing the information degradation typical of manual reporting and minimising the latency of the digital twin update.
The remainder of this article is organised as follows: Section 2 provides a theoretical background on BIM, PdM, and GenAI; Section 3 describes the methodology and the proposed system architecture; Section 4 presents the experimental case study focused on railway bogies; Section 5 presents bogies architecture and components; Section 6 discusses the results and validation of the model; and Section 7 presents the conclusions and suggestions for future work.

2. Theory Framework

The transition towards sustainable asset management is driven by Maintenance 4.0, a paradigm shift that leverages proactive strategies such as Condition-Based Maintenance (CBM). Recent frameworks published in 2026, such as those by Hu et al. [4], which emphasise that integrating DT within infrastructure industries is essential to transcend traditional temporal heuristics. Furthermore, the convergence with 6G networks and GenAI-driven agentic systems establishes the necessary infrastructure for processing massive telemetry flows in real-time.
Central to this framework is the precise categorisation of digital representations. In the scope of this research, a clear distinction is made between BIM and the DT. Whilst BIM functions as a static, high-fidelity semantic database, the DT is regarded as the dynamic evolution of this model. In this way, DT uses real-time data from rolling stock sensors to reflect the ‘as-built’ condition. Thus, BIM provides the structural foundation, whilst the integration of the DT enables a continuous feedback loop between the physical bogie and its digital representation.

2.1. BIM and Digital Twin Integration

The transition from passive digital models to autonomous ecosystems represents a critical evolution in the state of the art. According to Dholakia et al. [5], the integration of GenAI 2.0 into Digital Twins (DTs) establishes an operational dynamic that enhances decision-making capabilities. This paradigm shift differs from previous iterations by replacing static processing with real-time adaptive response systems. GenAI 2.0 enables:
  • Dynamic response—the system can process anomalies in real time, utilising specialised diagnostic agents;
  • Contextual validation—the use of Retrieval-Augmented Generation (RAG) ensures that all detected faults are cross-referenced with technical manuals and maintenance histories, increasing the accuracy of responses;
  • Autonomous logic—the DT’s reasoning shifts from rigid scripts to a flexible multi-agent architecture based on Large Language Models (LLMs).
The Generative Adversarial Networks (GANs) function as a synthetic data engine to model rare “long-tail” failure events, providing a balanced dataset for the primary predictive model. Already, Large Language Models (LLMs) serve as a semantic interface, parsing unstructured logs into structured IFC-compliant parameters. They do not replace the diagnostic logic but facilitate the synchronisation between telemetry and DT.
Furthermore, the implementation of the Model Context Protocol, as presented by Xu et al. [6], provides the necessary abstraction layer to coordinate these multi-agent systems. The DT now autonomously checks the inventory of components, schedules interventions and closes the decision cycle without human supervision. This convergence is highlighted in recent literature, which establishes that the optimisation of rolling stock life cycles depends on the complementary intersection of BIM/IFC semantic data, advanced predictive intelligence and autonomy.

2.2. The Role of GenAI in Semantic Translation

The integration of GenAI is not merely for data visualisation but acts as a semantic bridge. The proposed workflow utilises LLMs to parse unstructured sensor logs and maintenance reports. This process involves:
  • Feature extraction—identifying critical threshold breaches in vibration and temperature;
  • Mapping to IFC—the GenAI engine maps these findings to specific IFC parameters (e.g., IfcPropertySet);
  • Command generation—the system generates automated scripts that update the GUID-specific properties of the railway components within the BIM environment, ensuring that the transition from raw data to model parameters occurs without semantic loss.

2.3. System Architecture: A Tri-Layer Framework for Autonomous Synchronisation

The proposed architecture is designed to bridge the gap between raw physical signals and structured BIM semantics. This is achieved through a coordinated flow across three distinct functional layers: Acquisition, Semantic Processing, and BIM Integration.
The architecture is initiated by the following:
  • Acquisition (physical layer)—responsible for collecting high-frequency data from axle boxes and bogie suspension systems;
  • Semantic processing (semantic layer)—the data is filtered and interpreted. By processing the data at the edge before sending interpreted summaries to the GenAI engine, the system reduces the computational load, thereby avoiding network bottlenecks;
  • BIM integration (integration layer)—uses automated APIs to insert the processed data into the IFC schema. This modular approach enables real-time synchronisation, allowing the DT to analyse physical degradation as it occurs.
Within this framework, the architecture integrates a generative engine as a core component to ensure the autonomous evolution of the DT. Unlike traditional systems that limit their analysis to historical sensor data, this approach utilises GenAI, specifically Generative Adversarial Networks (GANs), to perform advanced data augmentation.
This layer is essential for synthesising ‘long-tail events’—rare but critical mechanical failure scenarios that are naturally scarce in real-world historical records. By generating these synthetic failure patterns, the architecture allows the predictive model to learn and recognise complex mechanical collapse trajectories before they manifest physically.
The GenAI engine guarantees that DT is prepared for extreme conditions, automatically upgrading its diagnostic parameters within the IFC buildingSMART [3] schema. This ensures that virtual representation is not just a mirror of past performance but a proactive model capable of anticipating structural boundaries and rare breakdown events.
Figure 1 shows the schematic representation of the data flow and synchronisation framework—from raw physical signal acquisition via on-board IoT sensors through semantic interpretation using LLM to automated BIM model updates via Python:
  • Spyder version: 6.1.3 (standalone);
    Python version: 3.12.11 64-bit;
    Qt version: 5.15.15;
    PyQt5 version: 5.15.11;
    Operating System: Windows-10-10.0.19045-SP0 and IfcOpenShell.
  • Data acquisition layer (IoT and sensor fusion)
    The first layer manages the continuous ingestion of telemetry from the rolling stock’s integrated sensors. In this stage, raw numerical data—including vibration frequencies, temperature gradients, and pressure levels—are collected via IoT gateways. Unlike traditional systems that merely archive these logs, this layer performs initial signal conditioning and timestamps the data, preparing it for high-level semantic interpretation. This stage represents the “physical pulse” of the asset, capturing the degradation indicators that are often lost in static BIM models.
  • Semantic processing layer (GenAI and LLM interpretation)
    The main innovation of this framework lies in the functional distinction between numerical analysis and semantic interpretation. While the RF algorithm acts as the primary diagnostic mechanism, responsible for identifying failure patterns and estimating RULs, the LLM acts as the semantic translation mechanism. At this stage, the LLM converts the RF diagnostic results into structured maintenance narratives and IFC-compliant update commands. This ensures that complex analytical outputs are accurately translated into the DT architectural scheme without losing technical context.
    The core innovation of this framework lies in the second layer, where Generative AI, specifically Large Language Models (LLMs), acts as the primary diagnostic engine. At this stage, the unstructured and numerical sensor logs are processed to identify fault patterns or wear trends. The LLM functions as a “semantic translator”, interpreting the numerical deviations (e.g., a 15% increase in bearing temperature) and converting them into logical maintenance narratives. By synthesising these degradation scenarios, the GenAI identifies exactly which parameters within the Digital Twin require updating, moving beyond simple threshold alerts to complex diagnostic reasoning.
  • BIM integration layer (automated scripting and IfcOpenShell)
    The final layer executes the physical-to-digital synchronisation. Once the GenAI has defined the necessary updates, the system triggers automated Python scripts utilising the IfcOpenShell library. This layer translates the AI’s semantic conclusions into direct modifications of the IFC (buildingSMART [3]) schema. These modifications include real-time parametric updates—such as altering the ‘State of Health’ property sets or adjusting 3D geometry to reflect component wear. This process ensures that the BIM model evolves autonomously, eliminating the “information degradation” typical of manual data entry and providing an up-to-date Digital Twin for decision-making.

2.4. Data Augmentation via GenAI: Overcoming the Long-Tail Failure Challenge

In the context of railway rolling stock, the definitive transition to proactive and PdM models faces a persistent obstacle linked to a severe imbalance in operational datasets. Although data from normal operations is abundant, records of critical mechanical failures or catastrophic breakdowns are scarce, characterising what the literature defines as “long-tail events”. These rare occurrences, despite their low statistical frequency, carry high risks and potentially catastrophic consequences for the safety and availability of the system. Conventional predictive models, which rely strictly on historical and frequency-based patterns, face substantial difficulties in identifying these anomalies due to a lack of representative training samples, which compromises the algorithms’ sensitivity to states of severe degradation.
To address this shortcoming, the proposed framework makes the integration of GenAI, specifically through GANs, the central engine for data augmentation. The use of GANs enables the synthesis of high-fidelity failure scenarios that are not present in sufficient quantities in real-world data, allowing the system to learn the underlying distribution of mechanical stresses and wear patterns. By synthesising rare failure modes that mimic the structural behaviour of components during collapse, GenAI enables the balancing of training datasets, providing algorithms with a comprehensive library of degradation trajectories.
This approach enhances the autonomy of the DT, enabling the simulation of hypothetical scenarios that advance the system’s diagnostic capabilities even in the absence of immediate physical failures. Consequently, GenAI acts as a catalyst for the evolution of the IFC model, transforming it from a static representation of information into a dynamic predictive asset, capable of understanding and anticipating complex mechanical failure trajectories in an autonomous and resilient manner.

2.5. BIM for Rolling Stock—Geometric Representation to Dynamic Semantics

Traditionally, BIM has been applied to stationary infrastructure, where geometric accuracy and spatial coordination are the primary objectives. However, as noted by Volk et al. [7], while BIM for existing buildings has matured, its application to complex, moving assets, such as railway rolling stock, requires a shift from static “as-built” documentation to “as-maintained” dynamic systems. Current literature identifies a significant gap in how rolling stock is represented: most models remain restricted to a geometric representation, acting as digital repositories of fixed dimensions and material properties that fail to capture the asset’s operational evolution.
To achieve a true DT, the model must transition towards dynamic semantics. This implies that the BIM environment must be capable of updating its internal logic and property sets—such as State of Health (SoH) or remaining useful life (RUL)—in response to continuous operational data. In the railway context, a bogie or a braking system cannot be treated as a static entity, such as a wall or a slab; it is a kinetic system whose functional properties change with every kilometre travelled.
To the development of this methodology, it is essential to distinguish the functions of BIM and DT concepts that play distinct roles in the management of railway assets. BIM is defined as the structured database and the geometric and semantic representation of the rolling stock, providing the static repository of parametric information and metadata based on the IFC schema. DT constitutes the “living ecosystem” that uses the BIM model as its structural foundation. DT transcends static representation by integrating continuous streams of data from sensors in real time, allowing the digital model to evolve autonomously to the “as-maintained” state through artificial intelligence agents.
As argued by Gerbino et al. [8], the challenge lies in moving beyond the “static snapshot” to overcome the “information degradation” often found in standard IFC exchanges. The prerequisite for integrating predictive maintenance (PdM) with BIM is a framework where the Industry Foundation Classes (IFC) schema can host real-time semantic data. This transition allows the digital model to reflect the functional reality of the fleet rather than just its initial design, providing a robust foundation for the autonomous synchronisation proposed in this study.

2.5.1. Interoperability Crisis and the Evolution of the IFC Standard

The BIM base for rolling stock is a detailed 3D model that can be developed using parametric modelling to allow greater flexibility in design and simulation [9]. The IFC standard, developed by buildingSMART [3], operates with the OpenBIM. This standard was originally designed for static civil engineering structures. The BIM foundation for rolling stock utilises the ISO 16739-1:2018 standard [3], which governs the IFC. However, to address the kinetic complexity of assets such as bogies, this research integrates the latest iterations of IFC 4.3. This version provides the necessary schema to host dynamic semantic data, moving beyond the static limitations of earlier BIM iterations. Authors, such as Li et al. [10] and Hasan [11], after exhaustive reviews on standardisation, argue that even in the most recent iterations, such as IFC 4.3, the lack of native and robust class definitions for the kinetic complexity of bogies, braking systems, and pantographs remains.
The representation of a bogie, or a braking system, requires more than geometry because it has different types of behaviours. To overcome this, Antunes et al. [12] propose an approach based on the analysis of inheritance diagrams and the context of project representation, namely, when exporting models from mechanical engineering software to IFC, an “information degradation” often occurs: complex objects are converted into Boundary Representations (BREP) or Generic Data Models such as ifcBuildingElementProxy or ifcDiscreteAccessory. Gerbino et al. [8] classify this phenomenon as “critical semantic loss”. This ‘Interoperability Crisis’ is fundamentally rooted in the static nature of current BIM models, which fail to manage the dynamic assets required for railway rolling stock. While the IFC standard has evolved, even recent iterations, such as IFC 4.3, lack robust class definitions for the kinetic complexity of components such as bogies and braking systems, often resulting in ‘information degradation’ during data exchange. To overcome these structural limitations, the framework proposed in this paper positions GenAI as an intelligent ‘translator’ and a key engine for synchronisation. By utilising LLMs to interpret unstructured sensor logs, the system autonomously converts numerical data into IFC-compliant parameters for real-time DT synchronisation. This enables the digital model to evolve autonomously as physical wear occurs, transforming the BIM environment from a static geometric repository into a dynamic, semantic management tool that operates without the need for human intervention. The technical viability of this methodology is supported by a strategic overlay of technologies. Each layer plays a fundamental role in the digital ecosystem. Within this structure, 6G networks and MCP constitute an essential infrastructure for transporting and articulating data, providing a rapid response for processing massive telemetry flows. While BIM serves as a high-fidelity semantic and structural repository, DT functions as the dynamic instance that reflects the real-time operational state of the asset. This convergence is essential to transcend the static nature of traditional “as-built” models in order to allow the system to act as an active agent in asset lifecycle management, where GenAI leverages continuous data flows to update IFC schema parameters autonomously and deterministically. For a maintenance system, knowing that an object is a metal cylinder is irrelevant because the system needs to know that the cylinder is a “shaft”, which is subject to thermomechanical fatigue and has a history of 500,000 cycles.
To mitigate this difference, the research intellectualises the extensive use of property sets (Psets) that are defined by the user. However, Lee et al. [13] warn that this flexibility carries risks of inconsistency. Its experimental validation suggests that robust interoperability requires “heterogeneous data integration”, where data conversion tables must be analysed not only visually but also syntactically to ensure that vital metadata (metal alloy or installation date) survives the transition between closed-source software and the open model.
Figure 2 summarises the challenge of technical interoperability in the railway sector. In the first block, physical assets, such as bogies, have a high information density in a mechanical engineering environment. During the transition to the BIM environment, the system undergoes information degradation, resulting in critical semantic loss, where parametric geometries are reduced in generic representations of the Boundary Representation (BREP) type. BREP (https://l1nq.com/mOyIs, accessed on 15 February 2026) is used to store 3D models to define shapes through their geometric boundaries instead of polygons. In the third block, the proposed methodology is directed towards the enrichment and validation phase. Through diagram analysis, it is possible to insert vital metadata via property sets (Psets), which refer to data structures used in BIM to define the management of maintenance requirements, history, and physical attributes of assets. Although this flexibility requires syntactic rigour to avoid inconsistency, its application results in an enriched OpenBIM model, block 4. This final model is no longer a geometric representation to become a “serviceable” object. This allows for the retention of crucial information, such as metal alloys, wear and tear of the components, and the history of the interventions carried out.

2.5.2. GenAI Configuration

In this framework, GenAI serves as the core engine for translating raw sensor data into the BIM semantic structure. The configuration adheres to the following technical specifications:
  • Model selection—the GPT-4o model was utilised, with the temperature set to 0. This specific calibration ensures precise technical outputs by eliminating stochastic variability (hallucinations), thereby guaranteeing that DT synchronisation remains deterministic and reliable.
  • Prompt engineering strategy—a structured prompt engineering approach was developed to interpret non-structured error logs. The system instructs the LLM to map detected physical anomalies directly to IFC parameters, preventing the “critical semantic loss” previously identified in manual exports.
The transformation of unstructured sensor logs into IFC-compliant update commands follows a deterministic logic, as illustrated in the following example:
  • System input (sensor log): [2025-03-05 14:20] ALERT: Sensor_Bogie_A1 detected 15% wear on brake pad (ID: BK-772). Current thickness: 22 mm.
  • GenAI processing: the model identifies the specific IfcMechanicalFastener entity and the corresponding Pset_BrakeMaintenance.
  • Output (IFC update command):
  • JSON{
    “ifc_entity”: “IfcMechanicalFastener”,
    “global_id”: “BK-772”,
    “property_set”: “Pset_BrakeMaintenance”,
    “parameter”: “BrakePadThickness”,
    “new_value”: 22.0,
    “unit”: “mm”
    }

2.5.3. IFC Model Manipulation and Semantic Mapping

The technical bridge between the GenAI interpreted commands and the physical DT is established through the IfcOpenShell library. This Python-based open-source library is utilised to programmatically parse and modify the IFC schema without requiring manual intervention in BIM authoring software.
The synchronisation process follows a precise operational workflow:
  • Component localisation—the system uses the GlobalId as the primary key to locate the specific railway component within the model spatial and semantic;
  • Property set modification—once the entity is identified (e.g., an IfcMechanicalFastener), the algorithm accesses the Pset_Maintenance;
  • Parametric update—the interpreted values from the GenAI (wear percentage or remaining useful life) are injected directly into the corresponding attributes. This ensures that the digital representation evolves alongside the physical asset degradation, mitigating the “critical semantic loss” typically found in static exports.

2.5.4. Information Specification (IDM/MVD) and Level of Detail (LOD)

The 3D model does not guarantee usefulness for the Operation & Maintenance (O&M) phase. Volk et al. [7] state that there is a need to filter information through Information Delivery Manuals (IDMs) and Model View Definitions (MVDs). Gigante-Barrera et al. [14], when carrying out the research, noted the mechanical complexity of the rolling stock, stating that risk management is only effective if the Development Level (LOD) is specified phase by phase. A model with excessive geometric detail (LOD 500) at the computational level can be cumbersome and useless if it does not include critical functional metadata (LOD I), such as allowable wear tolerances. The first step consists of digitising the rolling stock bogie, which is different from the fixed infrastructure treated based on road geometry studies [15], where the rolling stock requires a Development Level (LOD 400), bearing in mind that there are several types of development (Figure 3).
The level of development is a reference that allows the Architecture, Engineering and Construction (AEC) industry to verify the stages of the project’s design-construction process.
The discretisation, the description and the importance of each stage of the LOD level of development are the following:
  • LOD 100—definition of the project under study, area, volume, location and orientation;
  • LOD 200—in this stage, the construction model includes approximate quantities, dimensions, simplified shapes, location and orientation;
  • LOD 300—in this phase, the model is more accurate with detail drawings where the elements are defined for construction from structures, models and budgets;
  • LOD 350—the elements are consolidated so there can be an interconnection of information in the project and everyone involved in it;
  • LOD 400—it is already considered a high level of development, having reached the planning phase, physical-financial schedule, documentation and execution;
  • LOD 500—The elements are modelled, reflecting the built elements and with information for the maintenance and operation of the equipment.
In this situation, the LOD 400 uses the Scan-to-BIM technique to convert dense point clouds into parametric geometries compatible with the IFC 4.3 Railway scheme, which is highlighted [16,17].
Critical components, such as wheelsets, axles, and suspensions, are classified as IFCMechanicalDevice. In order to ensure the interoperability required in more complex projects [18], customised property sets (Psets) are defined to store the history of interventions and technical specifications, following the asset management frameworks suggested by Bellon et al. [19] and Bensalah [18].
This need for accuracy is reinforced by Fontul et al. [17], who investigate the integration of Non-Destructive Monitoring (NDM) data, such as Ground Penetrating Radar (GPR) and Light Detection and Ranging (LiDAR), directly into the BIM environment. LiDAR is a remote sensor active on board platforms that has its own power source and serves the function of capturing data. The ability to update the IFC model with the actual degradation data of the mechanical components transforms the finished or historical BIM model but remains accessible within a common data environment for consultation and maintenance. As Sobhkhiz et al. [20] point out, Automated Rule Checking (ARC) is, in practice, developing tools that allow for the automatic capture and verification of rules in relation to a project or even a model. In the future, ARC will allow the creation of verification algorithms in order to clean the IFC file, not to find design errors, but to validate the safety parameters of the rolling stock. This type of validation includes European Standards (EN) and Technical Specifications for Interoperability (TSI), automating a task that still consumes thousands of engineering hours and can be subject to human error.

2.6. Generative Revolution: Scenario Synthesis and Data Augmentation

Unlike traditional ML, which only analyses what exists, GenAI has the ability to create new data that respects the statistical and physical distribution of the original system.
According to Wilczok [21], the use of Deep Generative Models in the context of biological ageing has a direct parallel to the degradation of materials. The central technical application lies in Generative Adversarial Networks (GANs). The GAN consists of two distinct neural networks in competition: a “Generator” with the aim of creating synthetic data “failures”, such as vibration from a faulty bearing, and a “discriminator” with the purpose of searching and later distinguishing real data from synthetic data. Over time, the generator becomes so efficient that it can produce failure scenarios that are indistinguishable from reality [22].
Methodological analogies are drawn from Deep Generative Models used in biological ageing research, as noted by Wilczok [21], that provide a direct parallel to the stochastic nature of material degradation in mechanical components. Similarly, AI applications for predictive analytics in cyber intelligence, as explored by Patel & Kumar [23], offer robust frameworks for gathering anomaly data in complex industrial environments, justifying their inclusion as conceptual benchmarks for this study.
With this “high synthetic data” capability, it helps to solve problems of data scarcity. According to Patel & Kumar [23] and Saidur [24], this is the evolution to “Prescriptive Maintenance” (Figure 4). In addition, the system does not just tell which “component will fail” because it uses GenAI to simulate the possible ones under different load and weather conditions. Falegan [25] and Singh et al. [22] validate these types of models trained with this mix of real and synthetic data (Hybrid Training) and drastically outperform existing models in detecting unprecedented failures (Zero-shot Learning), and in this way, it will be possible to obtain a previously impossible operational robustness.

2.6.1. PdM—Overcoming the “Data Bottleneck” with GenAI

The transition from preventive maintenance (planned or mileage) to PdM was a major commitment to industry 4.0. However, in practical application in high security and reliability systems, there is a major fundamental mathematical limitation of traditional algorithms.

2.6.2. Statistical Limitations of Discriminative Machine Learning

Machine learning (ML) algorithms, namely Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) are, by definition, discriminative. That is, these types of models learn to draw a statistical boundary between failure and normality based on historical data provided during model learning. The investigations conducted by Kumari [26] and Stradi [27] on asset management in the railway area identified a logical impasse: the safety of the railway sector requires that failure data be extremely scarce. Such catastrophic failures, namely in bearings, cracked shafts or pantograph collapses, are some examples of “long-tail events”. Negi et al. [28] demonstrate that training an ML model with 99.9% of “scrupulous” data and only 0.1% of “failed” data results in biased models, unable to generalise. These models learn to predict that “everything is compliant”, but for this to be possible, it has to maximise its statistical accuracy and thus fail when it is necessary. In addition, Katangoori [29] gives the challenge of noise in Big Data. Nowadays, the most modern sensors generate terabytes of data that are not used due to noise, thermography, and videos. To extract patterns of causality from this data requires a piece of manual feature engineering that is untenable for large fronts. Traditional ML confronts discerning between a real anomaly and a useless operational variation (e.g., vibration caused by a rough track or vibration caused by a wheel defect) without intensive human supervision.

2.7. Digital Twin and Industrial Metaverso

The culmination of this theoretical foundation is the transition towards the Industrial Metaverse. However, within the scope of this paper, this concept is presented as a long-term vision. The current experimental evidence is focused on the autonomous synchronisation of semantic data for mechanical components, specifically railway bogies. Until recently, DT was seen as a passive repository of a 3D model on a screen. With the advancement of technologies, Naeem et al. [30], Gebreab et al. [31], and Xu et al. [6] reinforce its importance with the advances of GenAI-Driven Agentic Digital Twin and 6G networks.

2.7.1. Real-Time Development and Synchronisation

One of the biggest obstacles in the adoption of DT is the cost of modelling. Gebreab et al. [31] published a set of fundamental research on how GenAI acts as an accelerator of development. Generative AI algorithms can currently ingest raw LiDAR point clouds and scattered 2D technical drawings to automatically reconstruct the semantic and textured 3D models of rolling stock. What used to take months of manual modelling is reduced to days, thus ensuring that the initial digital model is a faithful representation of “As-Built”.
Memon et al. [32] and Guan et al. [33] discuss the data architecture that is necessary to keep the model permanently updated. In the railway sector, mechanical components are replaced, repaired and subsequently worn out on a daily basis, while static DT becomes obsolete when the locomotive leaves the workshop. This type of integration uses GenAI to interpret sensor data in real-time and automatically update the parameters in the BIM model, without any human intervention.

2.7.2. Cognitive Agents and the MCP Protocol

Xu et al. [6] introduce the use of the Model Context Protocol (MCP) as the abstraction layer that enables the coordination of multi-agent systems within the context of Digital Twins (DTs). In this architecture, the DT’s “reasoning” shifts from a rigid script to an ecosystem of agents based on specialised Large Language Models (LLMs). As proposed by Dholakia et al. [5], GenAI 2.0 allows for a dynamic response: upon sensing an anomaly, a diagnostic agent processes the signal, while a second agent employs Retrieval-Augmented Generation (RAG) to validate the fault against technical manuals and maintenance histories, ensuring technical rigour and mitigating the risk of hallucinations. Simultaneously, a third agent executes physical simulations on the digital model to calculate the remaining useful life (RUL). This vision is expanded by Ghosh [34], who integrates the logistical dimension by enabling the system to autonomously verify component inventory and schedule interventions. This workflow, illustrated in Figure 5, closes the decision cycle (closed-loop), transforming the BIM model from a static repository into an active asset management agent capable of autonomously updating its IFC parameters.

2.7.3. Immersive Visualisation and Metaverse Collaboration

With this type of technology, the human interface consists of the “Industrial Metaverse”. Shrestha & Imamoto [35] explored how GenAI can generate virtual training and diagnostic environments that replicate the exact conditions of the railway track. Their research allows engineers to virtually inspect rolling stock in motion and thus diagnose complex problems (such as dynamic pantograph interaction) in a safe and immersive environment.

3. Methodology

This section outlines the systematic framework developed to bridge the gap between physical sensor data and DT. The proposed workflow is designed to ensure technical accuracy and semantic consistency throughout the maintenance cycle. To provide a clear overview of the process, a high-level block diagram was developed (Figure 6). Figure 1 and Figure 2 were produced with the assistance of Gemini—2.5. The authors supplied the descriptive technical data, the 3D models, and conceptual frameworks, using this tool to synthesise these elements into the final visual representations. All outputs were strictly verified and redesigned by the authors to ensure technical accuracy. The methodology follows a sequential pipeline: data acquisition from the bogie → Signal Filtering of Perturbations → predictive modelling via Random Forest → GenAI-based semantic translation (IFC) → BIM model update.

3.1. Data Acquisition and Pre-Processing

Raw signals were captured from vibration and thermal sensors mounted on the vehicle’s bogie. To address the inherent imbalance in railway failure datasets (where healthy states significantly outnumber fault states), a data augmentation step was included. This ensures the predictive model is exposed to sufficient failure signatures to achieve statistical significance.

3.2. Predictive Modelling and Feature Selection

The RF tool was chosen due to its superior performance in handling nonlinear correlations between sensors and its ability to classify the importance of features.
Within this framework, the model performs as a high-fidelity identifier, differentiating between nominal operating conditions and impending component fatigue. This classification is essential for identifying which sensors are most indicative of mechanical degradation, providing potential predictability for CBM alerts.

3.3. BIM-IFC Integration Logic

The integration between the predictive engine and the BIM environment requires a robust translation layer to ensure that numerical predictions are correctly mapped to architectural and mechanical entities.
A critical aspect of the proposed framework is the preservation of semantic integrity during the automated translation of sensor data into the BIM environment. To prevent information loss or engineering misinterpretations, the GenAI engine operates under a strict schema-mapping protocol. Every unstructured diagnostic output interpreted by the LLM is cross-validated against a predefined set of mechanical thresholds and IFC property constraints (e.g., IfcPropertySet). This rule-based verification layer ensures that the generative process does not produce ‘hallucinations’ or invalid parametric values, maintaining the technical accuracy required for railway safety standards and ‘as-maintained’ digital documentation.
The integration layer employs Python scripts that bridge the Semantic Layer’s JSON outputs with the physical IFC file via IfcOpenShell. By targeting specific GUIDs, the system performs surgical updates to Pset_Maintenance attributes, such as BrakePad_Thickness, in under 45 s.
By utilising specific Global Unique Identifiers (GUIDs), the system targets the exact component in the 3D model, such as a specific wheelset or axle box, and updates its status automatically, allowing for a real-time visualisation of the rolling stock’s health.
The proposed architecture incorporates a rule-based validation gate. Once the GenAI processes a sensor log, the resulting command must pass a series of hard-coded checks defined by European Standards (EN) and technical specifications before the BIM model is updated. This ensures that no value can compromise the structural integrity of the “as-maintained” documentation.

3.4. Technical Justification and Predictive Modelling

3.4.1. Random Forest for Predictive Maintenance Modelling

The Random Forest (RF) tool was designated as the primary predictive engine for this framework. RF demonstrates superior robustness against overfitting compared to alternative machine learning architectures, such as Support Vector Machines (SVMs) or standalone decision trees, due to its ensemble learning methodology. Within the domain of railway rolling stock—where sensor outputs regarding vibration and temperature are frequently non-linear and susceptible to noise, RF yields high predictive accuracy. Critically, it provides an inherent measure of feature importance. This enables the identification of physical parameters that most significantly drive bogie degradation, offering a level of transparency that “black-box” models, such as deep neural networks, typically lack without secondary interpretability layers.
To ensure the reliability of the results and avoid overfitting, k-fold cross-validation (k = 5) was implemented during the training phase. The consistency between the accuracy metrics from training and validation indicates that the model effectively generalises to synthetic and unobserved historical data, maintaining its predictive power across different operational cycles.
To ensure the statistical integrity of the results, the data were split into a trial (70%) and an independent test (30%). This split was maintained to verify whether the model’s performance reflects genuine learning of failure patterns, rather than memorisation of specific data points. The data between the trial accuracy and the test recall of 0.99 indicates that the model generalises effectively to unobserved operational states, reducing the risk of memorisation.
Furthermore, the “importance of sensors” in this context is quantified using the average decrease in the Gini index. This metric ranks input variables based on their direct statistical impact on the estimated remaining useful life (RUL). By prioritising sensors with higher Gini scores, the framework distinguishes between primary failure factors and secondary operational noise, providing a mathematically grounded hierarchy for the maintenance decision-making process.

3.4.2. Data Augmentation and Synthetic Data Generation

A major challenge in predictive railway maintenance is the scarcity of empirical data on failures. Safety protocols are stringent, and preventive maintenance cycles ensure that catastrophic failures remain rare, resulting in highly imbalanced datasets dominated by ‘healthy’ operational states. To mitigate this situation, generative modelling techniques were used for data augmentation. By synthesising artificial failure scenarios based on historical patterns, a more comprehensive training environment is established. This type of methodology enables the model to identify early signs of degradation that are rarely captured in standard logs, thereby increasing the sensitivity of maintenance alerts without compromising data integrity.

3.4.3. Analysis of Dominant vs. Recursive Perturbations

The reliability of the predictive model is bolstered by distinguishing between dominant and recursive perturbations within the sensor signals. Dominant perturbations are typically associated with external environmental variables or track irregularities, whereas recursive perturbations serve as indicators of internal component wear, such as bearing fatigue. Adopting established methodologies regarding bidirectional Gated Recurrent Units (GRUs) and matrix decomposition, the proposed framework filters these signals to ensure the BIM model reflects authentic physical degradation rather than transient noise. This distinction is essential for preserving the semantic integrity of the DT and preventing the triggering of false-positive maintenance events within the IFC environment.

4. Case Study—Predictive Maintenance of Railway Bogies

A block diagram has been developed to illustrate the integration process between the physical components and the digital environment. This workflow ensures that data retains its semantic integrity from the moment of capture through the final update in the BIM model (Figure 6).
As illustrated in Figure 6, the system operates via a sequential workflow:
  • Sensors (bogie)—capture raw dynamic signals;
  • Noise filter—identifies and removes environmental noise, focusing on recurring disturbances indicative of wear;
  • Random Forest model—processes the filtered data to estimate the RUL;
  • GenAI (IFC Translation)—Interprets these results and generates the corresponding parameters in accordance with the IFC;
  • BIM update—digital model is automatically updated via the API, reflecting the current ‘as-maintained’ status.

4.1. Technical Specifications of the Rolling Stock

The experimental phase of this study focuses on a standard two-axle railway bogie (Y25 series or equivalent), commonly utilised in European freight and passenger rolling stock. The unit features a primary suspension system with coil springs and vertical hydraulic dampers. For the purposes of this research, the physical parameters—such as axle load and wheelbase—were integrated into the BIM environment as parametric constraints.
Regarding data privacy, it is important to clarify that while the technical specifications of the bogie are based on public manufacturer data, the historical sensor monitoring datasets utilised for model training are treated as sensitive industrial data. To ensure reproducibility without compromising proprietary information, these data were anonymised and supplemented by the generative augmentation process described in Section 3.

4.2. Sensor Mapping

To monitor the mechanical integrity of the bogie in real time, a strategic network of sensors has been installed. Figure 7 shows the configuration and cable routing:
  • Accelerometers—detect abnormal vibration patterns in the axle boxes;
  • Thermal sensors—allow the heat dissipation of the bearings to be monitored;
  • Displacement transducers—measure the vertical deflection of the primary suspension.

4.3. Predictive Maintenance Workflow and Experimental Validation

The experimental robustness of the proposed framework was validated using the high-fidelity MetroPT-3 dataset, which provides real-world telemetry from the Metro do Porto fleet. The transition from raw signal acquisition to maintenance decision-making was evaluated through a rigorous confusion matrix, as illustrated in Figure 11.
  • Predictive performance and safety reliability—the Random Forest (RF) model demonstrated an exceptional ability to identify critical states, correctly classifying 875,942 failure events (True Positives);
  • Critical safety metric (zero false negatives)—from a railway safety perspective, the most significant result is the total absence of false negatives (0). This indicates that the system did not allow any critical mechanical degradation to go undetected, fulfilling the stringent safety requirements of the railway sector;
  • Diagnostic precision—the volume of false positives was restricted to only 106 occurrences. While these represent a minor operational cost for unnecessary checks, they are statistically negligible compared to the total support of 324,450 instances analysed;
  • Sensitivity and recall—the integration of GenAI-based data augmentation raised the model’s sensitivity (recall) from a baseline of 0.797 to 0.99. This proves that the framework is capable of recognising rare “long-tail” failure patterns that traditional models typically ignore due to dataset imbalance.
As evidenced by these metrics, the system ensures that the bogie remains operational within controlled limits, effectively transitioning the maintenance strategy from a reactive or time-based approach to a CBM paradigm. The hierarchy of sensor importance, led by Oil_temperature (0.40 index) and Motor_current (0.15 index), further validates that the model’s reasoning is grounded in authentic thermodynamic and mechanical stress indicators.

5. Bogie Architecture and Components

The methodology presented here aims to predict the condition of the bogie on the Eurotram vehicles of the ADtranz—Berlin, Germany brand, which was acquired by Bombardier in 2001 used in the Metro do Porto. The MP00 models feature vertically mounted motors positioned on the outside of the wheels. By fusing data from multiple sensors, this enables just-in-time interventions.
In Figure 7, it can be seen that it is a part of the subway motor, which contains a passage of temperature sensors (P110), a sensor cable to the TCU, and power cables to the inverter. To facilitate the identification in the field or in a technical manual, the focus is on the following points:
  • Motors (temperature sensor)—unlike conventional trains, the MP00 has the motors mounted vertically and outside the wheels. The temperature sensor (usually a PT100 resistor) is embedded in the stator. This wire is shielded and goes directly into the engine shield.
  • TP2 and TP3 (passage terminals)—these connections are located on the connecting crossbar (of the bogie). As the bogie rotates in relation to the vehicle housing, these terminals serve as an interface so that the flexible cables do not suffer excessive stress. They are protected by insulating covers near the centre of the bogie.
  • Control wiring—the cables come out of the motor. Then they climb to a technical rail that takes them to the PT points, where the bridge is made to the power electronics located on the roof of the metro.

5.1. Sensor Mapping and Data Acquisition

Monitoring is carried out via a network of sensors that capture signals from the asset. The data stream includes the following variables from the MetroPT-3 dataset [36], each mapped to specific physical failure modes:
  • TP2 and TP3 (pressure)—these sensors monitor the brake line and secondary reservoirs. Physically, a deviation in these readings indicates a loss of pneumatic circuit integrity, such as valve leakage or seal failure, which directly impacts braking reliability and suspension levelling;
  • H1 (vibration)—this accelerometer identifies dynamic anomalies by capturing recursive perturbations indicative of internal wear. It is crucial for detecting imbalances in the wheelsets or early-stage bearing fatigue, as it can isolate mechanical degradation signatures from transient environmental noise;
  • Oil_temperature—identified as the most robust predictor (index 0.40), it measures thermal stress and lubrication efficiency. From a physical standpoint, heat dissipation is the earliest indicator of friction-related wear in the motor units and gearboxes, allowing the model to anticipate failures before structural boundaries are reached;
  • Motor_current—records torque variations that reflect the mechanical load regime. Variations in current consumption are direct indicators of mechanical resistance caused by component misalignment or overload conditions.
Within this framework, accumulated wear (Figure 8 and Figure 9) is defined as the diametrical reduction or the effective wear of the wheel tread. It is important to emphasise that while the model simplifies this parameter as a linear progression for GenAI-based updates, the physical reality involves multifaceted factors such as flange wear and micro-cracks, which are captured indirectly through the high-frequency sampling of the thermal and vibration sensors mentioned above.

5.2. Overcoming Industrial Data Secrecy

A key contribution is the use of the public MetroPT-3 [36] dataset. In the railway industry, access to sensor data is restricted due to confidentiality. The use of this real-world data allows the model’s effectiveness to be validated under real-world conditions, overcoming the scarcity of data on catastrophic failures.

5.3. Maintenance System Block Diagram

The system operates in a closed-loop decision cycle:
  • Acquisition—collection of raw data via IoT;
  • Semantic processing—GenAI (LLM) interprets anomalies and translates numerical logs into maintenance narratives;
  • BIM integration—automatic updating of the IFC via Python (IfcOpenShell); thereby, modifying property sets (Psets) in real time.

5.4. Semantic Synchronisation Layer: From Sensor to IFC

The operational integration is underpinned by a dedicated Semantic Processing layer that bridges telemetry and the DT schema. When identifying numerical anomalies, the system triggers a context-aware translation via an LLM. Rather than generating generic alerts, the LLM synthesises a maintenance narrative mapped directly to the BIM environment through a Python-based pipeline. This automation targets specific GUIDs to execute surgical updates to the IFC schema in real-time, effectively reducing update latency from days to seconds.
This narrative is then autonomously mapped to the BIM environment. Through a Python-based pipeline utilising the IfcOpenShell library, the system targets the exact Global Unique Identifier (GUID) of the affected component. The script executes a direct update to the IFC schema, modifying IfcPropertySet attributes or generating dynamic IfcWorkOrder entries in real-time.
The primary innovation of this workflow is the total automation of the semantic translation (Sensor → LLM → IFC), which aims at reducing the vulnerability to operational failures inherent in manual intervention. Consequently, the update latency for the Digital Twin is collapsed from a scale of several days to mere seconds, ensuring a continuous “as-maintained” state for the MetroPT-3 [36] fleet.

5.5. Random Forest and Data Augmentation

5.5.1. Justification for Random Forest

The choice of the RF algorithm is based on its robustness and ability to handle the complexity of railway telemetry data. Unlike linear models, RF utilises the principle of ensemble learning, consulting a ‘set’ of decision trees to reach a consensus. The main reasons for its implementation in this framework include:
  • Resilience to overfitting—the model uses random sampling to train 100 independent trees, thereby ensuring stability to face the statistical noise common in operational sensors;
  • Processing of non-linear relationships—the algorithm is effective at identifying complex patterns among thermal, pneumatic and electrical variables that define the metro’s state;
  • Maximisation of information gain—using the entropy criterion, the classifier isolates the most relevant signatures, such as oil temperature and motor current, which have proven to be the most robust predictors.

5.5.2. Synthetic Data and Generative Data Augmentation

A fundamental challenge in developing predictive models for the railway sector is the “safety paradox”. Due to stringent maintenance protocols and high operational reliability, empirical records of catastrophic mechanical failures, such as bearing seizures or axle fractures, are exceptionally scarce. This results in highly imbalanced datasets where healthy states represent over 99% of the total support, leaving traditional discriminative algorithms (e.g., standard CNNs or SVMs) biased towards predicting “normal” status and failing to recognise the early signatures of collapse.
To overcome this industrial data bottleneck, the integration of GANs was implemented as a core technical requirement rather than an elective enhancement. The generative process enabled the synthesis of high-fidelity “long-tail” failure scenarios that respect the underlying physical distribution of the MetroPT-3 system.
The synthetic failure scenarios generated by the GANs utilise a two-step validation protocol: first, a statistical correlation analysis was performed to confirm that the generated vibration and temperature data correspond to the substantive physical distributions of the original MetroPT-3 [36] system. Then, these scenarios were compared against established mechanical limits—such as thermodynamic limits and pneumatic pressure gradients—to ensure that the synthetic data points represented plausible engineering failures. This rigorous filtering ensures that the high recall rate is the result of capturing authentic degradation trajectories and not a measurement error.
The impact of this generative strategy on the model’s robustness is quantifiable:
  • Sensitivity enhancement—the inclusion of synthetic failure modes raised the model’s recall for anomalies from 0.797 to 0.99;
  • Diagnostic reliability—this increase in sensitivity ensures that critical, albeit rare, degradation trajectories are no longer ignored by the classifier, providing a deterministic safety margin essential for rolling stock management;
  • Dataset balancing—by simulating predictive scenarios, the framework transformed an initially skewed dataset into a comprehensive training tool, allowing the Random Forest tool to establish a precise decision boundary between nominal operation and imminent mechanical failure;
  • Consequently, the use of Generative AI serves as a “semantic bridge” that compensates for the natural scarcity of failure data, ensuring the Digital Twin remains proactive and resilient under extreme conditions.
Consequently, the use of GenAI serves as a semantic bridge that compensates for the natural scarcity of failure data. In this architecture, the diagnostic logic remains anchored in the Random Forest’s ensemble learning, whilst the LLM facilitates the autonomous synchronisation between telemetry conclusions and the IFC building SMART schema. This separation of concerns allows the model to maintain high diagnostic sensitivity (recall of 0.99) while promoting seamless integration into the BIM environment.

5.5.3. Prediction and Validation Matrix

The effectiveness of this approach is validated by the validation matrix, where the cross-referencing of historical and generative data resulted in the correct identification of 583 instances of anomalies, with a residual error margin of less than 1%. This process ensures that DT evolves with precision, mitigating the risk of AI “hallucinations” with real physical data.

5.6. Methodology and Innovation

This study of predictive maintenance bogies of Metro Porto focuses on the use of public dataset data MetroPT-3 [36] (data from Metro do Porto) as an alternative to the secrecy that protects the industrial data of railways. The project overcomes common challenges in high-security systems through an approach that combines historical rigour and technological innovation, based on four fundamental axes:
  • Real data (historical obligation)—use of the MetroPT-3 [36] dataset to ensure technical integrity. The model analyses real variables from pressure sensors (TP2 and TP3), vibration (H1) and oil temperature, ensuring that it understands the real physical behaviour of the machine.
Figure 8 shows a scatter plot that presents the historical detection of anomalies in the pressure system (bogie), specifically for the TP2 sensor, based on the MetroPT-3 [36] dataset.
The graph correlates the pressure measured by the TP2 sensor (Y-axis) over a time interval on February 1 (X-axis), distinguishing between normal operation and detected anomalies.
  • Blue (0)—represents the normal operating state;
  • Red (1)—represents the detection of an anomaly;
  • TP2—this sensor typically measures the pressure in the brake line (or secondary reservoirs) of the bogie.
Based on the context of Metro do Porto, this graph is a health indicator (health score) of the pneumatic system in leak detection, algorithm efficiency and false positives versus critical:
  • Leak detection—the scattering of red dots between levels 2 and 7 is a strong indication of compressed air leaks. If the TP2 pressure drops without a corresponding brake command, the ML model signals the anomaly.
  • Algorithm efficiency—the model appears to be effective at identifying drift before the system fails entirely, allowing the maintenance team to intervene on the bogie before a line breakdown.
  • False positives versus critical—red dots at the top (near 10) should be analysed with caution; these can be pressure spikes (overpressure) that are also harmful to the sealing components.
The graph shows that the bogie pressure system presents intermittent instabilities during the period analysed. The concentration of anomalies at low pressure levels suggests that the focus of maintenance should be on the tightness of the pneumatic circuit and the integrity of the control valves:
  • Attribute engineering (pre-processing accuracy)—application of Moving Averages to clean up noise from analogue sensors. This step is crucial to identify long-term degradation trends, distinguishing them from mere momentary reading errors (Equation (1)).
T P 2 S m o o t h = 1 n i = 0 n 1 x t i
where
n = 20 : represents the size of the observation time window (the number of past samples that are calculated together).
x t i : represents the values of the individual data points along time.
This equation aims to identify persistent pressure drops that could be ignored in an end-to-end analysis, capturing long-term degradation trends.
The choice of n = 20 for the time window T P 2 S m o o t h = 1 n i = 0 n 1 x t i was empirically determined to filter out the high-frequency noise inherent to the Metro’s analogue sensors without compromising the model’s responsiveness to sudden pressure variations.
Figure 9 shows the relationship between the accumulated wear and the RUL. In this context, accumulated wear is defined as the combined reduction of the wheel flange thickness and tread diameter, measured in millimetres (mm) against the nominal manufacturing profile. It can be analysed when the wear is 0 mm, where the wheel has its maximum service life of 200 units (days or cycles of operation). The trend line (Red) has a perfect linear descent. As accumulated wear increases, RUL decreases proportionally. The substitution threshold (dashed line) is the critical point that occurs when the RUL reaches 0, marking the limit where the component no longer meets safety standards for flange height or surface integrity. There is an intersection point according to the trend line, the wheel reaches its replacement limit (RUL = 0) when the accumulated wear reaches approximately 200 mm. In Figure 9, it is also possible to analyse that there is a critical zone or Negative Values; that is, they are data that exceed 200 mm of wear, entering negative RUL values (up to −50); this indicates that wheels have operated beyond the recommended safety limit, which is a high mechanical risk.
Figure 10 presents the lifetime projection of RUL. In this analysis, wear is specifically defined as the loss of wheel tread diameter and the degradation of the rolling surface, including the appearance of micro-fissures and spalling. These lines establish the basic rule for maintenance:
  • The blue graph has a perfect linear ratio, meaning there is a direct and steady drop in RUL as the accumulated wear increases. The service life reaches the exact zero when the wear reaches 200 mm. There is a huge risk. The blue dots extending beyond 200 mm (negative RUL) represent wheels that have continued in service past the safety limit, reaching up to 250 mm of wear. When it increases, the spindle speed increases by 20%. The risk becomes visible from the displacement of the curve, increased to a maximum wear and criticism;
  • The orange (simulated) curve is the offset of the curve that shifts significantly to the right compared to the blue (current) curve. This simulation was generated using GenAI-based data augmentation to predict fleet behaviour under stress conditions where real-world historical data is scarce. Regarding the increase of maximum wear, it is mostly below 200 mm; the simulation shows a high probability of wear reaching values between 250 mm and 300 mm. At the risk level, with an increase of only 20% in speed, it does not increase wear linearly; it “pushes” a large part of the fleet into the negative RUL zone before the time stipulated by the manufacturer.
In Table 1, it is possible to analyse a possible RUL scenario in the case of normal wheel wear or speed wear above 20%. It can be concluded that the 20% increase in the speed of the meter causes a disproportionate increase in wheel wear, raising the maximum limit from 200 mm to 275 mm (an increase of 37.5%). Consequently, while the current scenario remains within operational limits, the increase in speed moves the component to a zone of high probability of failure, drastically reducing the RUL and requiring an anticipation of maintenance interventions in order to avoid unscheduled downtime.
The RUL can be calculated according to Equation (2):
R U L =   t f a i l   t c u r r e n t
where
t f a i l is the estimated time when wear will reach the critical limit,
t c u r r e n t is the time of the current inspection.
Table 1 contents are based on the following information, from the technical report of the Eurotram bogie:
Status—ALERT
RUL Dear—50.00 units
Terms and conditions—1600 RPM|150 mm of wear
Procedure—schedule preventative maintenance
With the help of the Python language, it is possible to conclude that the component operating at 1600 RPM has a wear of 150 mm, still within the limit of the normal wear scenario (0–200 mm). However, the system issues an ALERT status due to the reduced RUL of 50.0 units. This condition indicates that, although the absolute wear has not reached the critical failure threshold, the degradation rate suggests an imminent transition to the high probability of failure scenario (characteristic of operations with overload or speed above 20%). Bearing in mind that the technical recommendation is to immediately schedule preventive maintenance to ensure the integrity of the bogie before the tolerance limit is exceeded.

5.7. Generative AI

Random Forest is utilised as the analytical core to identify imminent failure behaviours, whereas GenAI is employed specifically for data augmentation and the semantic translation of raw sensor data into the BIM structure. This ensures that DT synchronisation remains deterministic, with the LLM mapping the diagnostic engine’s findings directly to specific IFC parameters.
The analysis of the effectiveness of the predictive model for the MetroPT-3 [36] bogie is based on the relationship between AI diagnosis and the real state of the asset. The Python-based framework enables a rigorous evaluation of the AI system applied to railway maintenance, which requires a rigorous analysis of its decision-making capacity. Table 2 shows the structure of the confusion matrix that is represented in Figure 9. The structure of the confusion matrix works as a detailed scheme between the predictions of the AI tool and the asset’s physical reality. This tool makes it possible to distinguish four fundamental states: correct answers under normal conditions (True Negatives), false alarms (false positives), correct detection of failures (True Positives) and, most critical for security, undetected failures (False Negatives). The output of the classification models was defined as a binary state, where the value 0 corresponds to normal operation and the value 1 denotes the detection of an anomaly or failure.
To prove the reliability of the system, the effectiveness of predictive maintenance is validated through the confusion matrix (Figure 11), focusing on two indicators vital to railway operation.
The system demonstrated remarkable robustness, as evidenced by the high-performance metrics for the ‘Anomaly’ class. Instead of relying on overall accuracy, which can be misleading in unbalanced datasets where normal operations significantly outweigh failure states, the model’s effectiveness was validated through accuracy, recall, and F1-Score. This integration of data augmented by GenAI allows the model to achieve a recall of 0.99 for failures, thus demonstrating high sensitivity in identifying critical degradation paths. This accuracy, apart from the anomaly, reached 0.99, indicating a residual false alarm rate and confirming the model’s reliability for operational decision-making.
In this study, the “importance of sensors” is defined as the statistical contribution of each monitored variable to the overall accuracy of the PdM model. This ranking is derived from the mean decrease Gini index provided by the RF algorithm, which classifies sensors based on their direct impact on the estimated RUL and the detection of anomalous states. This approach allows for a data-driven distinction between primary failure drivers and secondary operational noise.
Figure 12 shows the dependence between the input variables, evidencing that the thermodynamic behaviour and the load regime are the main agents of system failure. As illustrated in the chart of the importance of failures, the variable Oil_temperature stands out as the most robust predictor, which represents an importance index of approximately 0.40.
This predominance suggests that the degradation of the asset is essentially linked to thermal stress and lubrication efficiency, showing that oil temperature is a more reliable indicator for the identification of anomalies. However, it should be noted that in this type of scenario, the variables TP2 and Motor_current play a secondary but fundamental role, with contributions of 0.23 and 0.15, respectively.
The relevance of the motor current indicates that variations in electrical consumption and operating torque are critical sources of material for mechanical wear or overload. A significant discrepancy can be observed between the TP2 variable and its smoothed counterpart, TP2_Smooth. The fact that the crude variable has a higher importance than the filtered version suggests that the model extracts greater predictive value from the brief phenomena and pressure/temperature peaks in the long-term trends, reinforcing the need for a high sampling rate at these monitoring points. On the other hand, sensors such as H1, TP3 and DV_pressure demonstrated a secondary influence on the overall performance of the algorithm.
In particular, the low relevance of differential pressure (DV_pressure) indicates that, for this dataset and the failure modes analysed, they are parameters that do not dictate additional substantial information. From the point of view of the engineering of the railway sector, these systems present results that allow a strategic optimisation of the monitoring infrastructure. By excluding the least impactful variables, the computational complexity can be reduced as well as the data storage costs, without significantly compromising the accuracy of the forecast. It can be concluded that predictive architecture benefits from a hierarchical approach, prioritising thermal monitoring and electrical load dynamics being fundamental and important pillars of composition. The high importance of Oil_temperature (0.40) compared to TP2_Smooth confirms that for this specific railway bogie, thermal expansion and lubrication degradation are leading indicators of failure. This allows maintenance managers to prioritise thermal sensor health in the monitoring infrastructure.
The evaluation of the performance of the Random Forest model, consolidated in Table 3 and Table 4, demonstrates an exceptional predictive capacity, essential for railway operational safety. The model achieved an overall accuracy of 99%, a highly reliable indicator for the general system diagnosis. While the accuracy is robust, the detailed analysis of the classification metrics reveals the true effectiveness of the algorithm facing the inherent imbalance of the data, where the normal operation category has significantly higher support (305,675,000) than the anomaly class (18,775,000 instances).
Specifically, regarding the detection of bogie failures, the model has a recall of 0.99 (or 99%). In practical terms, this can translate into the ability to identify almost all failures before they occur, drastically minimising the risk of catastrophic failures; in addition, the accuracy metric for anomalies (identified as accuracy (failure) in Table 4) reached 0.99, which implies the reduction of “false alarms” to residual levels. For the Metro management, this precision is vital, as it avoids unnecessary inspections, which allows the allocation of human and material resources to be optimised.
The balance between these metrics is ratified by the F1-Score of 0.99 for the failure category, representing an ideal match between detection sensitivity and classification accuracy. Even under a more conservative analysis (Table 3), the weighted F1-Score remains at 0.983, reinforcing the stability of the model in different sampling scenarios. These results validate the implementation of Random Forest as a high-fidelity decision support tool, allowing the transition from a time-based preventive maintenance strategy to a predictive approach based on the actual condition of the asset.

5.8. Prediction Tools

5.8.1. Random Forest Tools

Random Forest (https://sl1nk.com/U6w9H, accessed on 27 February 2026) is used to accurately predict the timing of future failures based on time windows (lags).
For this type of predictive modelling study, the Random Forest algorithm came with the aim of accurately anticipating the occurrence of future failures through the detailed analysis of time windows, usually called lags. This choice was based on three pillars that were decisive for the reliability of the rail system (composition): first, because of the robustness of the model that allows high resilience to overfitting, which is an essential feature when dealing with the statistical noise common in the data captured by sensors in an operational environment. Secondly, the algorithm demonstrates a unique ability to process complex and nonlinear relationships between the various thermal, pneumatic and electrical variables that characterise the bogie state. Finally, the technical configuration of the model was optimised with an architecture of 100 independent decision trees, using the entropy criterion to maximise the gain of statistical information about the state of the asset. This multifaceted structure allows the classifier to consolidate multiple perspectives of the data, resulting in a more stable and accurate final forecast.
The selection of the RF tool is justified by its significant robustness against overfitting, particularly when processing noisy sensor data inherent in railway environments. By aggregating multiple decision trees, this ensemble method effectively minimises variance and handles the complex non-linear relationships among variables, such as speed, load, and vibration, which simpler linear models fail to capture accurately. This ensures a more reliable estimation of the asset’s RUL under real-world operating conditions.
The detailed technical configuration, based on an ensemble of 100 decision trees, was decisive in achieving the observed performance levels. Whilst the overall accuracy reached 99%, the most significant indicator for railway safety has a recall of 0.99 in failure detection. This high degree of precision in the anomaly category (0.911) demonstrates that the model not only identifies the failure but does so with an extremely low risk of false positives, facilitating the transition to a purely predictive maintenance strategy.
This performance consolidates the transition to a purely predictive maintenance strategy, ensuring that bogie interventions are carried out only when degradation is technically proven, thus optimising fleet availability.

5.8.2. Model Performance (Random Forest)

Random Forest is based on the principle of ensemble learning [37]. This type of model consults a “multitude” of decision trees. It can be achieved through random sampling (Bagging), attribute selection, voting or averaging, as follows: first, random sampling is trained with a random subsample of the original data; secondly, the attribute selection analyses all the variables to make a cut; RF chooses a random subset of features at each node. Finally, voting or averaging determines the class that is most voted by trees; however, there is a particularity when it comes to regression: the result is the arithmetic average of the predictions of all trees.

5.9. Generative Complementary Data

The use of GenAI for the synthesis of fault states was the crucial and determining factor in balancing the original dataset, ensuring the feasibility of this project. Given the historical records of MetroPT-3 [36], it demonstrates a natural shortage of catastrophic failures (a direct consequence of the high safety essential in the railway sector’s systems). The learning model would, without the application of this data augmentation technique, have a recall significantly lower than that necessary for safe operation. The lack of real examples of malfunctions in the historical records of MetroPT-3 [36] would by itself prevent the effective training of any classification algorithm, since it would not have enough samples to learn the patterns that precede a mechanical collapse.
In this context, the simulation of predictive scenarios through the generation of synthetic data allowed to schematise, with high precision, the decision boundary between nominal operation and progressive mechanical degradation. This data augmentation was necessary to overcome the inherent scarcity of real-world railway failure data, ensuring the model could learn from rare but critical fault signatures. This failure survey methodology has transformed an initially unbalanced dataset into a robust and comprehensive training tool, ensuring that the system is able to recognise complex failure patterns before they even physically occur in the circulating fleet. Furthermore, the model accounts for recursive perturbations (such as bearing wear) separating them from dominant environmental noise to maintain signal integrity.
The effectiveness of this approach is validated by the convergence observed in the validation matrix (Figure 13). In this matrix, the binary values refer to the operational status: “0” represents a healthy component, while “1” indicates a detected anomaly. The cross-referencing between historical and generative data resulted in the correct identification of 583 anomaly instances, confirming the model’s robustness as a predictive classification tool.
The technical robustness achieved through this integration is directly reflected in the final results presented by the RF classifier—selected for its superior ability to handle non-linear sensor correlations and provide clear feature importance rankings. A level of 99% was reached in critical metrics such as recall and F1-Score for failures. This value represents a drastic evolution compared to the results obtained only with historical data, where the model had a sensitivity of only 0.797 for the detection of bogie anomalies. Therefore, the systematic validation between the actual history and the artificially generated data not only made the model possible, but also raised its reliability to operational levels, ensuring that the risk of a major failure going unnoticed is currently less than 1%.
Furthermore, the 99% recall rate achieved with GenAI-augmented data proves that the model is sensitive to rare failure modes, which are often missed by traditional models trained on unbalanced, real-world datasets.

6. Discussion of Results

The integrated analysis of the data demonstrates that the robustness of the predictive maintenance model does not lie only in the isolated choice of the Random Forest algorithm but in the deep synergy established between the careful selection of variables and the synthetic enrichment of the dataset. The analysis of the importance of the sensors, evidenced in Figure 11, revealed that Oil_temperature and Motor_current constitute the pillars of diagnostic capacity, holding the greatest statistical influence on the behaviour of the asset. This hierarchy was instrumental in guiding the application of Generative AI, allowing the creation of synthetic data to focus on realistic thermal and electrical patterns, thus correcting the critical deficit of failure examples in MetroPT-3’s [36] historical records.
The effectiveness of this data augmentation strategy is validated by the drastic evolution of performance metrics, where the model, initially limited by a recall of 0.797 for anomalies, reached a level of 99% after balancing the data set. The final validation matrix attests to this accuracy by registering only 5 false negatives in a universe of hundreds of monitored events, confirming that the system is suitable for a safe implementation in an operational decision support environment.

6.1. Semantic Enrichment and Dynamic IFC Property Sets

Current BIM standards, specifically the IFC 4.3 schema, often struggle to capture the high-frequency dynamic changes inherent in rolling stock components. Traditionally, physical degradation is recorded in external databases, leading to a “semantic gap” where the BIM model remains a static geometric representation (typically as IfcMechanicalElement or generic proxies) while the real asset evolves.
To resolve this, this research proposes the implementation of dynamic property sets (Psets) (Table 5). Unlike static attributes, these GenAI-driven Psets are designed to be updated in real-time. For instance, instead of a static “Thickness” value for a brake pad, the GenAI agent dynamically populates a custom Pset_RollingStockMaintenance with real-time wear values derived from sensor fusion.

6.2. Performance Benchmarking and Efficiency Analysis

To quantify the operational benefits of the proposed autonomous workflow, comparative benchmarking was conducted between the GenAI-BIM framework and the traditional manual diagnostic procedures currently employed in railway maintenance standards. The evaluation is focused on temporal latency, measuring the duration from the initial anomaly detection to the final synchronisation of the DT.
The efficiency of the proposed autonomous workflow was benchmarked against traditional manual diagnostic procedures. The results, summarised in Table 6, demonstrate a significant reduction in the latency between fault detection and model synchronisation.

6.2.1. Quantitative Latency Reduction

The results in Table 6 show a dramatic reduction in the time required to process critical faults, such as tyre wear or brake pad degradation:
  • Data interpretation—traditional specialist review of logs typically spans 2 to 4 h. In contrast, the AI inference model processes the same dataset in under 30 s, achieving a reduction in processing time of over 99%;
  • Root cause analysis—by leveraging Retrieval-Augmented Generation (RAG) to parse technical manuals, the requirement for a 1-to-2-day expert panel review is superseded by an automated process taking 5 to 10 min;
  • DT synchronisation—the update frequency has shifted from weekly or monthly manual CAD interventions to real-time synchronisation, facilitated by automated writing directly into the IFC schema.

6.2.2. Quantitative Impact on Asset Management

The transition to an ecosystem of intelligent agents makes it possible to move beyond conventional predictive approaches. The ability to process continuous data streams automatically reduces the probability of synchronisation errors caused by manual handling and ensures that DT remains a high-fidelity mirror of physical reality.
It is important to clarify that the performance metrics achieved—such as the 99% recall in failure detection—refer strictly to the predictive modelling of the bogie assembly. The integration into a broader Metaverse collaboration environment remains a conceptual objective to be explored as the framework scales.
In terms of sustainability, this efficiency gain translates into a reduction in component waste. Parts are no longer replaced based on conservative estimates of time or mileage but are instead used up to the safe limit of their service life CBM, minimising raw material consumption and the carbon footprint associated with the production of heavy components.

6.3. Discussion: Scientific and Framework Limitations

The transition towards an autonomous synchronisation workflow between physical rolling stock and its DT demonstrates a significant reduction in diagnostic latency, as evidenced by the results in Table 6. However, the integration of GenAI into critical railway infrastructure poses substantial risks to operational integrity. To ensure operational safety, the framework does not rely on Generative AI for final safety-critical decision-making. Instead, GenAI is strictly utilised for data augmentation and semantic translation. By employing a deterministic verification layer, the system ensures that every AI-generated output is cross-validated against established engineering thresholds and physical constraints within the IFC schema. This approach mitigates the risks associated with stochastic volatility while leveraging the synthetic capabilities of GANs to address the inherent scarcity of failure data in the railway sector.

Mitigating the Risk of AI

A primary concern in the application of Large Language Models (LLMs) for asset management is the phenomenon of hallucination, where the model might generate fictitious wear values or misinterpret sensor noise as critical structural failures. In the proposed framework, the risk of the AI “inventing” incorrect degradation data is mitigated through a robust, multi-layered validation approach:
  • Semantic anchoring in the processing layer—the LLM does not generate data in isolation; it acts as a “semantic translator” for raw numerical inputs (such as vibration frequencies and temperature gradients) collected in the Acquisition Layer. This ensures that every diagnostic narrative is strictly anchored to the “physical pulse” of the asset.
  • Parametric constraint mapping—before any modification to the IFC schema occurs, the GenAI identifies specific parameters within the DT that require updating. These proposed updates are filtered through predefined engineering thresholds within the Python scripts that utilise the IfcOpenShell library, preventing the execution of physically impossible geometry or property set changes.
  • Final validation protocols—although the framework optimises the operational life cycle through automation, it maintains a “human-in-the-loop” capability for high-consequence maintenance decisions. This ensures that the autonomous evolution of the BIM model remains a transparent and verifiable process, eliminating the “information degradation” typical of manual entry without sacrificing technical accuracy.

6.4. System Limitations

Despite the benefits in real-time DT synchronisation, the current methodology faces limitations regarding the quality of the initial data acquisition. If the IoT sensors in the Acquisition Layer provide corrupted or noisy data, the LLM interpretation and the subsequent semantic mapping may be compromised. Furthermore, the reliance on constant cloud connectivity for LLM processing remains a challenge for rolling stock operating in remote areas with limited network coverage. Future iterations should focus on edge-computing solutions to enhance the resilience of the Semantic Processing Layer and ensure operational autonomy.
The main conclusions of this study are:
  • Autonomous synchronisation—the system proves to be able to convert raw sensor logs into asset health parameters in real time, eliminating the “information degradation”, typical of manual processes and ensuring the continuous updating of the DT;
  • Predictive effectiveness—the RF model achieved 99% accuracy and recall, validating the robustness of the data augmentation strategy to mitigate the scarcity of real failure data and balance the training datasets;
  • Resource optimisation—benchmarking analysis revealed a drastic reduction in diagnostic latency, with time savings of over 98% compared to traditional manual inspection and reporting methods;
  • Industrial sustainability—the transition to an “as-maintained” DT enables an effective shift to CBM, reducing premature component waste and the fleet’s operational carbon footprint.
The implementation of the framework demonstrates that the convergence of dynamic BIM models and GenAI represents a critical development for asset management in the railway sector. By overcoming the limitations of static models through the integration of components via IFC, it has been possible to establish a direct, semantic link between physical sensors and the structure.
This study demonstrates that innovation in anomaly detection depends not only on the mass collection of data but also on the intelligence applied to enrich it semantically. By transforming raw data into dynamic BIM parameters, the system ensures more resilient and efficient asset management for the Metro do Porto fleet.
To mitigate the dependence on constant connectivity, the architecture provides for a strategic division of tasks. While LLM and IFC 4.3 scheme synchronisation benefit from cloud scalability, the initial noise filtering layers (Moving Averages) and physical boundary validation logic can be executed via Edge Computing. This local processing ensures that the rolling stock maintains its basic diagnostic autonomy and data capture integrity in network failure scenarios or operations in remote areas, guaranteeing that railway safety is not compromised by communication latency.
Future developments include plans to extend this methodology to other critical subsystems, such as traction units and signalling systems. Furthermore, the transition from classification models to regression models focused on estimating RUL and the integration of track infrastructure data will enable the creation of a holistic monitoring ecosystem for the railway sector.

6.5. Case Study and Validation of the Autonomous Workflow

To validate the proposed framework, a pilot case study was conducted focusing on a railway bogie assembly. The environment and parameters were defined as follows:
  • Model detail (LOD)—the rolling stock digital representation was developed at LOD 400 (LOD, Level of Detail), ensuring that all functional components, including brake pads, axles, and wheelsets, were geometrically defined and parametrically linked to the IFC schema.
  • Simulated failure scenarios—two primary degradation modes were simulated to test the GenAI’s interpretive accuracy:
    • Wheelset wear—a progressive reduction of 2 mm in the wheel diameter (flange wear);
    • Brake pad degradation—a critical thickness reduction from 25 mm to 18 mm, triggered by high-frequency braking logs.
  • Efficiency analysis—Table 5 reflects a shift from manual engineering oversight to an automated pipeline. While the traditional manual method requires approximately 120 min for an engineer to interpret the sensor log, locate the component in the BIM environment, and manually update the Pset (property set), the GenAI-BIM framework completed the same task in under 45 s. This represents a time saving of over 98%, virtually eliminating human-induced synchronisation errors.
Figure 14 presents a comparison of processing latency between traditional manual workflows and the proposed GenAI-BIM autonomous framework (logarithmic scale). The results highlight a reduction in response time across all critical maintenance phases. The use of a logarithmic scale is because the difference between “2 days” (172,800 s) and “30 s” is so large that, with this solution, the GenAI bar is better visible.

6.6. Sustainability and Asset Life Cycle Management

The implementation of the proposed methodology demonstrates that the transition from static BIM models to autonomous DT synchronisation supports industrial sustainability. Rather than replacing parts based on conservative estimates, the framework contributes to the circular economy by allowing components to be utilised until their safe operational limit is reached via CBM. This approach minimises raw material consumption and reduces the carbon footprint associated with the manufacturing of heavy components, such as wheelsets.
The paradigm transition of the life cycle management of physical assets within the railway sector is not merely a technical evolution but a prerequisite for operational resilience. By surpassing conventional predictive approaches through an ecosystem of intelligent agents, a robust foundation for industrial sustainability is established. The transition from static BIM models to an autonomous synchronisation of the rolling stock DT ensures that asset health is monitored with unprecedented precision, allowing the digital model to evolve autonomously as physical wear occurs.
This capability to process continuous data streams from integrated sensors redefines resource efficiency. By employing GenAI to synthesise degradation scenarios, the system facilitates an effective shift towards true CBM. From a sustainability perspective, this translates into a drastic reduction in component waste; parts are no longer replaced based on conservative time or mileage estimates but are instead utilised until the safe limit of their service life is reached. This circular economy approach minimises raw material consumption and reduces the carbon footprint associated with the production and disposal of heavy components, such as wheelsets and braking systems. Furthermore, the optimisation of the rolling stock’s operational life cycle achieved through this framework has direct impacts on the fleet’s energy efficiency. The execution of real-time parametric updates within the IFC schema ensures that the DT remains a high-fidelity mirror of physical reality, allowing for the detection and correction of minor mechanical inefficiencies that would otherwise result in parasitic energy losses. The significant reduction in diagnostic errors and the automation of maintenance workflows not only enhance reliability and sustainability in the railway sector but also guarantee data continuity throughout the entire life cycle—from manufacturing to decommissioning. Ultimately, the convergence of BIM, PdM and GenAI establishes a new paradigm for railway fleet management, where artificial intelligence acts as the catalyst for a greener, more efficient, and resilient infrastructure.

7. Conclusions and Future Developments

The implementation of the proposed framework demonstrates that the convergence between dynamic BIM models and Generative Artificial Intelligence represents a critical evolution for asset management in the railway sector. By overcoming the limitations of static models through the integration of components via Industry Foundation Classes (IFC), it was possible to establish a direct link between the physical sensors and the digital structure. The pivotal innovation of this research lies in the total automation of the semantic translation pipeline: a continuous data flow from sensors to LLMs and, subsequently, into the IFC schema. This structural break facilitates the potential reduction of semantic discrepancies, thereby effectively minimising the risks associated with manual data interpretation and overcoming the “semantic gap” inherent in traditional asset documentation. The system provides a robust framework for automating repetitive diagnostic tasks, ensuring that operational safety is enhanced through systematic monitoring, rather than relying solely on human oversight. Data synchronisation is carried out by a Semantic Processing layer that uses LLMs to translate raw numerical deviations into structured maintenance reports. The information is integrated into the BIM environment via Python scripts and the IfcOpenShell library, which updates the model’s properties in real-time using specific GUIDs (Globally Unique Identifiers). The system proves capable of converting raw sensor logs into asset health parameters in real time, automating the update process and reducing the need for manual data entry, which mitigates the ‘information degradation’ typical of traditional processes and ensures the continuous updating of the DT. This automated method reduces the DT update latency from days to seconds, ensuring an “as-maintained” status that reflects physical reality with high fidelity. This advancement allowed the predictive maintenance model to operate with a balanced database, where Generative AI played the key role in translating the scarcity of historical records into realistic and actionable failure scenarios.
The transition to an “as-maintained” DT facilitates an effective shift to CBM, contributing to the circular economy by extending component life cycles and reducing premature waste, which, in turn, lowers the fleet’s operational environmental impact.
The final results validate the effectiveness of this approach, with the system’s sensitivity to detect anomalies reaching 99% accuracy and recall. The analysis confirmed that thermal and electrical monitoring forms the backbone of diagnostic reliability, while the reduction of false negatives to values below 1% ensures that operational safety is enhanced. Ultimately, this work proves that innovation in anomaly detection does not depend only on data collection but on intelligence applied to enrich it without human oversight. By transforming raw sensor data into dynamic BIM parameters, the system ensures more resilient and efficient asset management, ensuring maximum availability of the MetroPT-3 [36] fleet.
The success achieved in bogie monitoring paves the way for the expansion of this methodology to other critical subsystems of the rolling stock, such as traction units and braking systems, where the scarcity of actual failure data is also an obstacle to the training of efficient predictive models. Additionally, because of the diversity of sensors, techniques such as Principal Component Analysis (PCA), among others, can optimise the efforts of data processing and storage without compromising the overall accuracy of the system. The current implementation successfully demonstrates a robust transition from traditional planned maintenance to predictive maintenance through binary anomaly classification, achieving 99% accuracy and recall. While the current model identifies the immediate health status of the bogie, future research will involve transitioning from classification models to regression, targeting precise RUL estimation. This evolution will enable even more proactive logistics planning and holistic monitoring of railway infrastructure. The success achieved in bogie monitoring validates the core methodology of automating the semantic translation pipeline. Future developments will focus on expanding this logic to traction units and signalling systems, eventually realising the immersive diagnostic capabilities promised by the Industrial Metaverse. Finally, it is planned to integrate data from the railway track with the vibratory patterns detected in the vehicles, aiming to create a holistic monitoring ecosystem that guarantees the integrity of both the rolling stock and the railway infrastructure.

Author Contributions

Conceptualization, J.M.C.; Methodology; J.M.C.; Formal analysis, J.M.C. and J.M.T.F.; Investigation, J.M.C.; Resources, J.M.C.; Writing—original draft, J.M.C.; Writing—review & editing, H.R. and J.M.T.F.; Visualization, J.M.C. and A.J.M.C.; Supervision, J.M.T.F.; Project administration, J.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this study, the authors used Gemini—2.5 (Google) for the purposes of generating Figure 1 and Figure 2, using original technical data and conceptual frameworks provided by the authors. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AECArchitecture, Engineering, and Construction
ARCAutomated Rule Checking
AIArtificial Intelligence
BIMBuilding Information Modelling
BREPBoundary Representations
CBMCondition-Based Maintenance
CNNsConvolutional Neural Networks
DTsDigital Twins
ENEuropean Standards
GANsGenerative Adversarial Networks
GenAIGenerative Artificial Intelligence
GUIDGlobal Unique Identifier
GPRGround Penetrating Radar
IDMsInformation Delivery Manuals
IFCsIndustry Foundation Classes
LiDARLight Detection and Ranging
LLMsLarge Language Models
LODLevel of Development
MCPModel Context Protocol
MLMachine Learning
MPMaintenance Predictive
MVDsModel View Definitions
NDMNon-Destructive Monitoring
O&MOperation and Maintenance
PstesProperty Sets
RAGRetrieval- Augmented Generation
RFRandom Forest
RULRemaining Useful Life
SoHState of Health
TSITechnical Specifications for Interoperability

References

  1. Kavaliauskas, P.; Fernandez, J.B.; McGuinness, K.; Jurelionis, A. Automação do monitoramento do progresso da construção por meio da integração de dados de nuvem de pontos 3D com um modelo BIM baseado em IFC. Buildings 2022, 12, 1754. [Google Scholar] [CrossRef]
  2. Krishnan Vadakkum Vadukkal, U.; Cardellicchio, A.; Nitti, M.; Fiume, M.; Renó, V. Recent Advances and Innovative Approaches to Railway Safety Based on Applications, Sensors and Algorithms: A Systematic Review. IEEE Access 2025, 13, 200271–200289. [Google Scholar] [CrossRef]
  3. ISO 16739-1:2018; Industry Foundation Classes (IFC) for Data Sharing in the Construction and Facility Management Industries. buildingSMART: Kings Langley, Hertfordshire, UK, 2018. Available online: https://www.buildingsmart.org/standards/bsi-standards/industry-foundation-classes/ (accessed on 20 March 2026).
  4. Hu, W.; Ou, Y.; Liu, H.; Ni, P.; Chang, C. Integrating Digital Twin Technologies for Maintenance 4.0 in the Building Industry: A Review and Conceptual Framework. Build. Environ. 2026, 288, 113997. [Google Scholar] [CrossRef]
  5. Dholakia, N.; Shukla, M.; Khan, S.B.; Jadeja, R. An Overview of GenAI 2.0 Partnering With Digital Twin to Enhance Decision Making. IEEE Commun. Stand. Mag. 2025, 9, 42–48. [Google Scholar] [CrossRef]
  6. Xu, H.; Sun, Y.; Tupayachi, J.; Omitaomu, O.; Zlatanova, S.; Li, X. Towards Autonomous Freight Intermodal Optimization via Generative AI and Agentic Digital Twins. SSRN Sch. 2025, 5623194. [Google Scholar] [CrossRef]
  7. Volk, R.; Stengel, J.; Schultmann, F. Building Information Modeling (BIM) for Existing Buildings—Literature Review and Future Needs. Autom. Constr. 2014, 38, 109–127. [Google Scholar] [CrossRef]
  8. Gerbino, S.; Cieri, L.; Rainieri, C.; Fabbrocino, G. On BIM Interoperability via the IFC Standard: An Assessment from the Structural Engineering and Design Viewpoint. Appl. Sci. 2021, 11, 11430. [Google Scholar] [CrossRef]
  9. Biancardo, S.A.; Intignano, M.; Viscione, N.; Guerra De Oliveira, S.; Tibaut, A. Procedural Modeling-Based BIM Approach for Railway Design. J. Adv. Transp. 2021, 2021, 8839362. [Google Scholar] [CrossRef]
  10. Li, Y.; Zhao, Q.; Yang, M.; Ma, Z.; Hei, X. Advancements and Applications of Industry Foundation Classes Standards in Engineering: A Comprehensive Review. Buildings 2025, 15, 2927. [Google Scholar] [CrossRef]
  11. Hasan, M. BIM (IFC Standardization Interoperability). Front. Struct. Eng. Educ. Train. 2025. [Google Scholar] [CrossRef]
  12. Antunes, M.L.R.; César Júnior, K.M.L.; Ribeiro, J.C.L.; Oliveira, D.S.; Carvalho, J.M.F. Analysis of IFC Interoperability Data Schema for Project Representation. Autom. Constr. 2024, 166, 105650. [Google Scholar] [CrossRef]
  13. Lee, Y.-C.; Eastman, C.M.; Lee, J.-K. Validations for Ensuring the Interoperability of Data Exchange of a Building Information Model. Autom. Constr. 2015, 58, 176–195. [Google Scholar] [CrossRef]
  14. Gigante-Barrera, Á.; Dindar, S.; Kaewunruen, S.; Ruikar, D. LOD BIM Element Specification for Railway Turnout Systems Risk Mitigation Using the Information Delivery Manual. IOP Conf. Ser. Mater. Sci. Eng. 2017, 245, 042022. [Google Scholar] [CrossRef]
  15. Vale, C.; Simões, M.L. Prediction of Railway Track Condition for Preventive Maintenance by Using a Data-Driven Approach. Infrastructures 2022, 7, 34. [Google Scholar] [CrossRef]
  16. Justo, A.; Lamas, D.; Sánchez-Rodríguez, A.; Soilán, M.; Riveiro, B. Generating IFC-Compliant Models and Structural Graphs of Truss Bridges from Dense Point Clouds. Autom. Constr. 2023, 149, 104786. [Google Scholar] [CrossRef]
  17. Fontul, S.; Falcão Silva, M.J.; Couto, P. IFC Development for BIM Application to Railway Projects. In Eleventh International Conference on the Bearing Capacity of Roads, Railways and Airfields; CRC Press: Boca Raton, FL, USA, 2022; Volume 2, pp. 413–422. [Google Scholar] [CrossRef]
  18. Bensalah, M.; Elouadi, A.; Mharzi, H. BIM Integration into Railway Projects—Case Study. Commun. Eng. Sci. 2018, 11, 2181–2199. [Google Scholar] [CrossRef]
  19. Bellon, F.G.; Martins, A.C.P.; Franco de Carvalho, J.M.; de Souza, C.A.F.; Ribeiro, J.C.L.; César Júnior, K.M.L.; de Oliveira, D.S. IFC Framework for Inspection and Maintenance Representation in Facility Management. Autom. Constr. 2025, 174, 106157. [Google Scholar] [CrossRef]
  20. Sobhkhiz, S.; Zhou, Y.-C.; Lin, J.-R.; El-Diraby, T.E. Framing and Evaluating the Best Practices of IFC-Based Automated Rule Checking: A Case Study. Buildings 2021, 11, 456. [Google Scholar] [CrossRef]
  21. Wilczok, D. Deep Learning and Generative Artificial Intelligence in Aging Research and Healthy Longevity Medicine. Aging 2025, 17, 251–275. [Google Scholar] [CrossRef]
  22. Singh, P.; Hazarika, B.; Singh, K.; Huang, W.-J.; Duong, T.Q. GenAI-Enhanced Federated Multiagent DRL for Digital-Twin-Assisted IoV Networks. IEEE Internet Things J. 2025, 12, 4834–4851. [Google Scholar] [CrossRef]
  23. Patel, H.; Kumar, S. Identification of AI for Predictive Analytics in Cyber Threat Intelligence Gathering. Unique J. Artif. Intell. 2026, 4, 11–26. Available online: https://l1nq.com/TJK73 (accessed on 8 February 2026).
  24. Saidur, M.J.I. AI-Enhanced Business Intelligence Dashboards for Predictive Market Strategy in U.S. Enterprises. Int. J. Bus. Econ. Innov. 2025, 5, 603–648. [Google Scholar] [CrossRef]
  25. Falegan, O.; Aniebonam, S. Data-Driven Performance Optimization of Produced Water Treatment Infrastructure, A Conceptual Modeling Approach. Eng. Technol. J. 2026, 11, 41. [Google Scholar] [CrossRef]
  26. Kumari, J. Augmented Asset Management of Railway System Empowered by Industrial AI. Master’s Thesis, Luleå University of Technology, Luleå, Sweden, 2022. Available online: https://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-90416 (accessed on 5 March 2026).
  27. Stradi, J.A. A Digital-Cooperative Framework for Cross-Border Railway Maintenance Decision Support. Master’s Thesis, Delft University of Technology, Delft, The Netherlands, 2024. Available online: https://resolver.tudelft.nl/uuid:86de5a0c-a9e7-4509-8544-22a66827c01d (accessed on 2 March 2026).
  28. Negi, G.S.; Mohan, H.; Gupta, M.K.; Singh, R.; Gehlot, A.; Thakur, A.K.; Dogra, S.; Gupta, L.R. Leveraging Machine Learning for Optimized Microgrid Management: Advances, Applications, Challenges, and Future Directions. Renew. Sustain. Energy Rev. 2026, 226, 116345. [Google Scholar] [CrossRef]
  29. Katangoori, A. The Role of Big Data in Advancing Artificial Intelligence: Methods and Case Studies. Int. J. Artif. Intell. Mach. Learn. 2026, 6, 37–54. [Google Scholar] [CrossRef]
  30. Naeem, F.; Ali, M.; Kaddoum, G.; Faheem, Y.; Zhang, Y.; Debbah, M.; Yuen, C. A Survey on GenAI-Driven Digital Twins: Toward Intelligent 6G Networks and Metaverse Systems. IEEE Open J. Commun. Soc. 2025, 6, 10365–10402. [Google Scholar] [CrossRef]
  31. Gebreab, S.; Musamih, A.; Salah, K.; Jayaraman, R.; Boscovic, D. Accelerating Digital Twin Development With Generative AI: A Framework for 3D Modeling and Data Integration. IEEE Access 2024, 12, 185918–185936. [Google Scholar] [CrossRef]
  32. Memon, S.A.; Shehata, W.; Rowlinson, S.; Sunindijo, R.Y. Generative Artificial Intelligence in Architecture, Engineering, Construction, and Operations: A Systematic Review. Buildings 2025, 15, 2270. [Google Scholar] [CrossRef]
  33. Guan, W.; Li, P.; Zhang, H.; Wu, Y. Integrating Generative AI with Network Digital Twin for 6G: An Edge-Cloud Collaborative Approach. IEEE Commun. Mag. 2025, 63, 181–187. [Google Scholar] [CrossRef]
  34. Ghosh, D.P. Intelligent Infrastructure Delivery: AI-Driven Solutions for Lifecycle Design and Engineering Management. 2025. Available online: https://www.researchgate.net/publication/391279441_Intelligent_Infrastructure_Delivery_AI-Driven_Solutions_for_Lifecycle_Design_and_Engineering_Management?channel=doi&linkId=68112c54bfbe974b23bd7370&showFulltext=true (accessed on 31 January 2026).
  35. Shrestha, A.; Imamoto, K. Generative AI Based Industrial Metaverse Creation Methodology. In Proceedings of the 2024 Artificial Intelligence for Business (AIxB), Online, 30 September–2 October 2024; pp. 53–57. [Google Scholar] [CrossRef]
  36. Neupane, A. MetroPT3 Database. 2025. Available online: https://www.kaggle.com/code/neupaneaaditya/metropt3/input (accessed on 6 March 2026).
  37. Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef]
Figure 1. Intelligent maintenance system architecture for rolling stock [Source: AI, Gemini—2.5].
Figure 1. Intelligent maintenance system architecture for rolling stock [Source: AI, Gemini—2.5].
Machines 14 00535 g001
Figure 2. Conceptual data flow diagram [source: AI, Gemini—2.5].
Figure 2. Conceptual data flow diagram [source: AI, Gemini—2.5].
Machines 14 00535 g002
Figure 3. LOD levels. Adapted from: https://l1nq.com/xmZYS accessed on 1 January 2026.
Figure 3. LOD levels. Adapted from: https://l1nq.com/xmZYS accessed on 1 January 2026.
Machines 14 00535 g003
Figure 4. GAN-based architecture for material failure.
Figure 4. GAN-based architecture for material failure.
Machines 14 00535 g004
Figure 5. Digital twin architecture based on GenAI 2.0 for autonomous prescriptive maintenance (2025).
Figure 5. Digital twin architecture based on GenAI 2.0 for autonomous prescriptive maintenance (2025).
Machines 14 00535 g005
Figure 6. Smart bogie monitoring workflow and digital upgrade.
Figure 6. Smart bogie monitoring workflow and digital upgrade.
Machines 14 00535 g006
Figure 7. Motors metro source: adapted from https://sl1nk.com/9mwldau (accessed on 6 February 2026).
Figure 7. Motors metro source: adapted from https://sl1nk.com/9mwldau (accessed on 6 February 2026).
Machines 14 00535 g007
Figure 8. Historical detection of anomalies in the pressure system (bogie).
Figure 8. Historical detection of anomalies in the pressure system (bogie).
Machines 14 00535 g008
Figure 9. Eurotram case study: wheel service life projection correlation between the cumulative wheel flange wear (measured as the reduction in flange thickness in mm) and the remaining useful life (RUL) of the wheelset.
Figure 9. Eurotram case study: wheel service life projection correlation between the cumulative wheel flange wear (measured as the reduction in flange thickness in mm) and the remaining useful life (RUL) of the wheelset.
Machines 14 00535 g009
Figure 10. Generative analysis: impact of speed increase on wear.
Figure 10. Generative analysis: impact of speed increase on wear.
Machines 14 00535 g010
Figure 11. Confusion matrix of the classification model for bogie. The labels represent the operational state: 0—normal operation.
Figure 11. Confusion matrix of the classification model for bogie. The labels represent the operational state: 0—normal operation.
Machines 14 00535 g011
Figure 12. The importance of sensors for predictive maintenance.
Figure 12. The importance of sensors for predictive maintenance.
Machines 14 00535 g012
Figure 13. Performance validation matrix. Legend: 0—normal operation; 1—anomaly/failure.
Figure 13. Performance validation matrix. Legend: 0—normal operation; 1—anomaly/failure.
Machines 14 00535 g013
Figure 14. Efficiency benchmark: manual versus autonomous workflow.
Figure 14. Efficiency benchmark: manual versus autonomous workflow.
Machines 14 00535 g014
Table 1. RUL scenario and status.
Table 1. RUL scenario and status.
ScenarioWear and TearRUL Status
Current0–200 mmUsually within the limit
Speed +20%0–275 mmHigh probability of failure/over-threshold
Table 2. Structure of the confusion matrix.
Table 2. Structure of the confusion matrix.
IA Previous: NormalAI Predicted: FAILURE
Reality: NormalTrue Negatives (VN)
(The metro is fine and the AI does not bother)
False Positives (False Alarms)
(The AI says there is a glitch, but that is okay)
Reality: FAILUREFalse Negatives (THE DANGER)
(The bogie will fail and the AI did not warn you)
True Positives (Success)
(The AI detected the flaw before it happened)
Table 3. Model effectiveness.
Table 3. Model effectiveness.
PrecisionRecallF1-ScoreSupport
Operation Normal0.9880.9950.991305,675.000
Anomaly Bogie0.9110.7970.85018,775.000
Accuracy0.9840.9840.9840.984
Macro Avg0.9490.8960.921324,450.000
Weighted avg0.9830.9840.983324,450.000
Table 4. Performance of the predictive maintenance model (Random Forest).
Table 4. Performance of the predictive maintenance model (Random Forest).
MetricsValue ObtainedResult for the Metro
GLOBAL Accuracy99%High reliability of general system diagnostics
Recall (Failure)0.99Ability to detect 99% of failures before they occur
Precision (Failure)0.99Minimised false alarms, reducing unnecessary inspections
F1-Score (Failure)0.99Perfect balance between detection sensitivity and precision
Table 5. Standard IFC versus GENAI-Enhanced IFC.
Table 5. Standard IFC versus GENAI-Enhanced IFC.
FeatureStandard IFC (Static)GenAI-Enhanced IFC (Dynamic)
Entity ClassIfcBuildingElementProxyIfcMechanicalElement (Brake System)
AttributeName: “Brake_Pad_01”Name: “Brake_Pad_01”
Property SetNone or StaticPset_Maintenance {Wear_Level: 14.2 mm}
Semantic ValueGeometry onlyGeometry + Real-time Health State
Table 6. Comparative analysis: traditional maintenance vs. GenAI-autonomous BIM update.
Table 6. Comparative analysis: traditional maintenance vs. GenAI-autonomous BIM update.
Process PhaseTraditional Manual MethodProposed GenAI-BIM FrameworkTime Saving (%)
Data Interpretation2–4 h (Manual Review)<30 s (AI Inference)~99%
Root Cause Analysis1–2 Days (Expert Panel)5–10 m (RAG Search)~95%
Digital Twin UpdateWeekly/Monthly (Manual CAD)Real-time (Automated IFC Write)~100%
Maintenance SchedulingManual ERP EntryAutonomous (via Logistics Agent)~90%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Coutinho, J.M.; Raposo, H.; Farinha, J.M.T.; Marques Cardoso, A.J. BIM and PdM of Railway Rolling Stock with Automatic Upgrading Based on GenAI. Machines 2026, 14, 535. https://doi.org/10.3390/machines14050535

AMA Style

Coutinho JM, Raposo H, Farinha JMT, Marques Cardoso AJ. BIM and PdM of Railway Rolling Stock with Automatic Upgrading Based on GenAI. Machines. 2026; 14(5):535. https://doi.org/10.3390/machines14050535

Chicago/Turabian Style

Coutinho, João Matos, Hugo Raposo, José M. Torres Farinha, and Antonio J. Marques Cardoso. 2026. "BIM and PdM of Railway Rolling Stock with Automatic Upgrading Based on GenAI" Machines 14, no. 5: 535. https://doi.org/10.3390/machines14050535

APA Style

Coutinho, J. M., Raposo, H., Farinha, J. M. T., & Marques Cardoso, A. J. (2026). BIM and PdM of Railway Rolling Stock with Automatic Upgrading Based on GenAI. Machines, 14(5), 535. https://doi.org/10.3390/machines14050535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop