1. Introduction
The construction sector is undergoing a critical transition toward sustainability, resilience, and digitalization. Increasing urbanization, evolving environmental and social regulations, and the need to reduce the ecological footprint of production activities are reshaping project and organizational management paradigms [
1]. Construction activities generate considerable environmental and social impacts that influence not only firms’ competitiveness but also safety conditions, operational efficiency, and the overall quality of building processes [
2].
Table 1 summarizes the main ESG (Environmental, Social, and Governance) impact areas in the construction sector, providing indicative quantitative reference values and a single-source mapping to support comparability and traceability across processes.
Climate change, the growing scarcity of resources, and rising operational costs are increasingly influencing the stability of construction activities, the competitiveness of firms, and the sector’s overall ability to guarantee safety, efficiency, and quality standards [
10]. On the policy side, initiatives such as the United Nations 2030 Agenda, the European Green Deal, and the Corporate Sustainability Reporting Directive have strengthened the role of environmental and social sustainability within institutional and industrial agendas. These programs encourage construction companies to use more advanced tools to measure, report, and manage sustainability performance [
11,
12]. Within this framework, ESG standards represent a key reference that guides organizations toward transparent and resilient management practices consistent with long-term value creation [
13].
Even though awareness of ESG principles has grown, their practical and systematic adoption still presents difficulties. Common barriers include fragmented data availability, the lack of shared evaluation criteria, dependence on static analytical systems, and the limited use of predictive technologies [
14]. For this reason, the ability to transform diverse and dispersed information into coherent, actionable knowledge becomes crucial for effective, sustainability-oriented decision-making. Digital technologies can help overcome these barriers by turning ESG management from a reactive, fragmented process into a more proactive and automated one [
15,
16,
17]. In this regard, combining Artificial Intelligence (AI) and Process Mining (PM) can enhance traceability and transparency across processes, reveal inefficiencies or inconsistencies, and support timely, data-informed decisions [
17].
Nevertheless, the current body of research remains largely fragmented and rarely proposes fully integrated frameworks that bring together predictive analytics, process mining, and automation specifically designed for construction management. Several recent studies have explored AI-driven ESG scoring [
18,
19], the use of PM for compliance monitoring [
20], and the application of digital twins for sustainability assessment [
21]. However, these efforts generally remain disconnected, limited by data type or analytical focus. So far, no study has developed a continuous workflow that effectively connects data collection, risk weighting, and process optimization within a single, unified framework.
This research addresses this gap by proposing an integrated digital framework that unites AI, PM, and Robotic Process Automation (RPA) for ESG risk assessment and management in construction. The framework’s novelty lies in the combination of predictive modeling, process automation, and conformance analysis within a single BPMN-based structure capable of operating dynamically with real or simulated data.
The objective is twofold: first, to enhance the ability of construction companies to anticipate and manage environmental and social risks; second, to promote the sector’s evolution toward a sustainability-first management paradigm grounded in data-driven and resilient decision-making aligned with regulatory, market, and societal expectations.
3. Methodology
3.1. Process Structure
The operational model developed in this research is based on the digitalization of construction processes and the combined use of technologies such as AI, RPA, and Machine Learning (ML). The framework has been formalized through a Business Process Model and Notation (BPMN) representation that details the main phases of ESG risk assessment applied to construction management [
35,
36].
Figure 2 presents the high-level BPMN model.
Each lane of the BPMN corresponds to a functional actor involved in the ESG–AI workflow: (1) RPA Bot, (2) BI Analyst, (3) ML Engineer, (4) PM Analyst, and (5) Manager. Events (start, intermediate, and end), gateways (exclusive and parallel), and artifacts (data stores, messages, and documents) are defined to ensure transparency and traceability throughout the information flow.
The modeling objective is twofold:
- -
to identify, within each operational area, activities that provide added value and those that are repetitive or standardizable [
37];
- -
to optimize the decision-making process through automation and predictive tools [
38,
39].
The process mapping revealed several critical issues associated with manual and poorly traceable tasks that slow analysis and reduce the overall effectiveness of risk assessment. The BPMN model illustrates how data, actors, and automated routines interact across the five core phases of ESG risk management: Risk Identification, Data Integration, Risk Weighting, Clustering, and Action Planning. Each lane defines the responsible actor, the data inputs, and the outputs, providing a complete view of operational dependencies across the workflow.
To support reproducibility,
Table A2 lists the main BPMN tasks together with their input and output specifications, derived from BIM, IoT, and ESG reporting systems. This mapping also forms the basis for the simulated event log used in the process-mining validation.
As an illustrative example, in the Project Alpha simulation, a new project dataset is uploaded, triggering the start event of the process. The RPA Bot automatically extracts ESG indicators from BIM, IoT, and web-based sources, which are then validated and integrated by the BI Analyst. The ML Engineer trains the predictive models and calculates ESG exposure scores. The PM Analyst performs clustering and process conformance checks, and finally the Manager reviews and approves the ESG Action Report (end event). This sequential flow demonstrates how a single project instance moves across all lanes of the BPMN model and how its corresponding event log is generated for process-mining validation.
The BPMN model tailored to the analysis process is presented in
Table A2 and
Figure 3.
Each lane of the BPMN model represents a functional actor. Events, gateways, and artifacts are explicitly defined to ensure traceability from data acquisition to managerial reporting. The main activities, Risk Identification, Data Collection, Risk Weighting, Clustering, and Action Plan, are replicated in a simulated event log that is later used to evaluate process mining capabilities.
3.2. ESG Risk Assessment Process
The ESG risk assessment process is structured into integrated, sequential phases, each producing outputs essential for the subsequent stage.
3.2.1. Phase 1—Risk Identification
Using standardized questionnaires, adaptable to the specific project context, ESG risk categories are identified. RPA-based automation enables the generation of uniform checklists, reducing errors and human variability [
35]. This phase provides the consistency and comparability needed for subsequent data collection and analysis activities.
3.2.2. Phase 2—Data Collection and Integration
To overcome the information fragmentation typical of the sector [
36], Web Scraping techniques and Business Intelligence (BI) tools are introduced to acquire and integrate data from:
- -
internal systems (BIM, project reports, digital archives),
- -
external sources (regulatory databases, environmental indicators, institutional portals).
These techniques, summarized in
Table 4, enrich decision-making with dynamic and up-to-date information [
37,
38]. The outputs of this phase directly feed into risk weighting, ensuring a robust and comprehensive data foundation.
3.2.3. Phase 3—Risk Weighting
Risk category weighting is supported by integrated ML models selected for their complementarity:
- -
Random Forest, to identify the most relevant variables [
46];
- -
Gradient Boosting Machine (GBM), to build accurate predictive models [
47];
- -
Logistic Regression, for transparency and interpretability of decisions [
48].
The combination of the three algorithms allows predictive power to be combined with interpretability, an essential requirement in the construction industry where multiple stakeholders are involved.
3.3. Simulated Case Study: Project Alpha
To demonstrate the feasibility of the proposed framework in a controlled and reproducible setting, and given the current lack of publicly available ESG datasets for the construction sector, a simulated case study was developed as a methodological proof of concept. A synthetic dataset of 100 construction projects (the “Project Alpha” benchmark) was generated to emulate realistic operating conditions and to test the internal consistency of the analytical workflow. Each project represents a distinct case in which ESG-related variables are observed and processed through the workflow described above. The objective of the simulation is not to provide empirical validation on real data but to verify the analytical soundness, interoperability, and transparency of the proposed framework under realistic parameter assumptions.
Six ESG indicators were simulated using truncated normal distributions to reflect realistic industry ranges: CO
2 intensity (kg CO
2/m
2), waste recycling rate (%), safety incidents (per 10,000 work hours), local employment (%), supplier ESG compliance (%), and reporting delay (days). These indicators were selected to represent the three ESG pillars: environmental (CO
2 intensity, waste recycling rate), social (safety incidents, local employment), and governance (supplier compliance, reporting delay). A composite ESG exposure score ranging from 0 to 1 was computed as a weighted linear combination of these variables:
Small Gaussian noise (σ = 0.03) was added to simulate unobserved variability. The top 30% of projects were labeled as high ESG exposure (exposure = 1), while the remaining 70% were assigned exposure = 0. This labeling approach allows supervised learning algorithms to classify projects according to ESG risk intensity.
All variables were normalized to the [0, 1] range, and missing values, randomly introduced to emulate incomplete reporting (<3%), were imputed using median substitution. The dataset was divided into a 70% training subset and a 30% testing subset using stratified sampling at the project level to avoid information leakage. Within the training set, a five-fold cross-validation procedure was carried out to tune model hyperparameters and verify the overall stability of the results.
Three supervised algorithms were tested: Random Forest, Gradient Boosting Machine (GBM), and Logistic Regression. These methods were chosen because they combine predictive accuracy with interpretability, offering complementary perspectives on model performance. The evaluation relied on standard metrics, including Accuracy, AUROC, Precision, Recall, and Brier Score, while 95% confidence intervals were calculated using bootstrap resampling with 1000 iterations. On average, AUROC values ranged from 0.81 for Logistic Regression to 0.88 for GBM, indicating a consistent and coherent analytical configuration.
Feature importance analysis using permutation methods indicated that CO2 intensity and supplier ESG compliance were the strongest predictors of overall exposure, followed by safety incidents and reporting delay. For tree-based models, SHAP value analysis further supported these findings, showing that environmental and governance variables played the dominant role in determining ESG risk exposure.
A parallel event log was also generated to represent process activities across the five BPMN phases: Risk Identification, Data Integration, ML Weighting, Clustering, and Action Planning. Each of the 100 cases includes five events with timestamps and resource assignments (RPA Bot, BI Analyst, ML Engineer, PM Analyst, Manager). This synthetic log enables process-mining analyses such as conformance checking and bottleneck detection without disclosing proprietary project data. It is used for validation in the
Section 4. A sample of the simulated dataset structure and the event log schema is provided in
Appendix A and
Supplementary Materials, ensuring transparency, reproducibility, and full traceability of the simulation setup.
3.4. Strategic Planning and Action Plan Definition
After the assessment phase, risks are grouped into priority clusters. The process then proceeds to the definition of mitigation strategies and the development of a managerial action plan that enables continuous monitoring of deviations and iterative adjustment of measures based on observed results. Traditionally, this phase has relied on discretionary evaluation; however, it is now supported by AI-based tools capable of generating intermediate outputs that assist decision-makers, reducing subjectivity and improving transparency in the decision-making process [
48].
In the simulated case study, each project cluster (low, medium, high exposure) activates a differentiated set of recommendations, which include emission reduction priorities, supplier audit scheduling, and social impact monitoring actions. These outputs are automatically mapped within the BPMN workflow (
Figure 4), illustrating how RPA can translate ESG strategies into structured operational routines.
4. Results and Discussion
4.1. Data Quality and Accuracy of Assessment
The simulated application of the model demonstrates how the integration of automation techniques and advanced analytics can significantly improve the quality of the information supporting ESG risk assessment. In the “Project Alpha” benchmark, this effect was observable from the initial methodological phases. Risk identification, carried out using standardized checklists automated through RPA, ensured consistency and uniformity in defining ESG categories [
49]. Likewise, data collection and integration, supported by Web Scraping and Business Intelligence tools, generated a broad, dynamic, and up-to-date information base for subsequent predictive analyses [
42].
The adoption of digital tools for data gathering and aggregation allowed the construction of a more complete, coherent, and continuously updated information system. The simulated evidence suggests that digital automation can mitigate distortions arising from fragmented and heterogeneous data sources, thereby reinforcing the knowledge base on which analytical processes rely [
50,
51].
To evaluate the internal consistency of this digital workflow, the ESG–AI model was applied to the simulated “Project Alpha” dataset described in
Section 3.3. The controlled and reproducible simulation environment enabled a quantitative assessment of the model’s predictive performance across 100 synthetic construction projects. In this experimental setup, three machine learning algorithms were trained to classify projects according to ESG exposure. The performance metrics confirmed the analytical reliability of the proposed framework: Accuracy ranged between 0.81 and 0.88, AUROC values between 0.81 and 0.88, and Brier Scores between 0.11 and 0.15.
The application of ML models (Phase 3) improved the accuracy of risk classification and weighting. The integration of different algorithms balanced predictive performance and interpretability, ensuring that the results remained both reliable and transparent [
45]. By expressing prediction accuracy through explicit evaluation metrics rather than descriptive assessment alone, the study provides measurable evidence of the model’s validity. This synergy strengthens the model’s ability to generate robust assessments, reducing the probability of systematic errors and increasing confidence in the outputs.
Finally, the clustering procedure (Phase 4) grouped projects into three ESG exposure categories (low, medium, high) based on the computed scores. This quantitative classification, derived directly from the synthetic dataset, represents a key step in translating analytical outputs into operational guidance [
50].
Although the results are based on simulated data, they are intended to verify the internal analytical coherence and reproducibility of the ESG–AI model rather than provide empirical generalization. Applications on real project data will be required to validate external robustness and operational scalability. The overall structure of the ESG–AI model and the analytical techniques employed in each phase are summarized in
Table 5, which outlines how automation and predictive analysis jointly improve data quality and the accuracy of ESG risk assessment.
4.2. Validation of the Simulated Case (Project Alpha)
Since no open ESG dataset is currently available for the construction sector, the validation of the proposed framework was carried out through a controlled simulation (“Project Alpha”), designed as a methodological proof of concept rather than an empirical test. The synthetic dataset, composed of 100 projects, was processed through the analytical pipeline to verify its internal coherence and reproducibility. The objective of this validation step was to assess whether the combined use of AI, Process Mining, and RPA techniques can consistently produce interpretable and quantitatively reliable outputs.
The predictive performance of the machine learning algorithms and the clustering results confirm the analytical soundness of the framework. The quantitative metrics summarized in
Table 6 show that all three models achieved satisfactory classification accuracy and calibration, indicating that the simulated workflow performs robustly even when tested under simplified, non-real data conditions.
Table 5.
Overview of the ESG–AI framework phases and corresponding tools for construction management. Author’s elaboration.
Table 5.
Overview of the ESG–AI framework phases and corresponding tools for construction management. Author’s elaboration.
| Phase | Main Activities | Tools/Techniques Used | Contribution to Results |
|---|
| Phase 1—Risk Identification | Defining ESG categories (environmental, social, governance) | Standardized checklists, RPA automation | Ensures consistency in classification and reduces errors and subjectivity |
| Phase 2—Data Collection & Integration | Acquiring and unifying internal and external data sources | Web Scraping, Business Intelligence, regulatory and environmental databases, BIM systems | Overcomes data fragmentation and creates an updated knowledge base |
| Phase 3—Risk Weighting | Evaluating and assigning weights to ESG variables | Machine Learning (Random Forest, GBM, Logistic Regression) | Achieves higher predictive accuracy and balances analytical power with interpretability |
| Phase 4—Clustering & Risk Definition | Grouping projects into ESG risk classes (low, medium, high) | Clustering algorithms using ML model outputs | Translates analyses into a hierarchy of actionable priorities for decision-makers |
As reported in
Table 6, the Gradient Boosting Machine achieved the highest AUROC (0.88), indicating a strong capacity to distinguish high-exposure projects while maintaining interpretability through feature-importance analysis. CO
2 intensity and supplier ESG compliance emerged as the most influential predictors, confirming their role as key ESG performance drivers in the construction context. These quantitative findings are aligned with the feature-importance outcomes presented in
Appendix A (
Table A1).
To verify the operational coherence of the framework, the simulated data were also used to generate a process-mining event log corresponding to the five BPMN phases: Risk Identification, Data Integration, ML Weighting, Clustering, and Action Planning. Projects were grouped into three exposure categories (low, medium, high), with 30% classified as high-risk. This distribution reflects the predetermined labeling ratio and confirms consistency between the clustering results and the supervised classification phase.
A visualization of the clustering outcomes and a comparative overview of the traditional manual process and the AI-supported workflow are presented in the following section, highlighting the added value of automation and predictive analytics in reducing subjectivity and increasing process transparency. The structure of the simulated dataset and the event log schema used for process-mining validation are provided in
Appendix A and
Supplementary Materials, ensuring transparency, traceability, and full reproducibility of the experimental design.
Although based on synthetic data, this validation confirms that the proposed ESG–AI pipeline operates coherently across all phases, from data integration to predictive modeling and clustering, thus providing a solid methodological foundation for future testing on real construction projects.
4.3. Decision Support and Reduction of Discretionality
Beyond data quality, another relevant result concerns the model’s capacity to translate analytical outputs into tangible decision-support tools. The ability to generate progressive and continuously updatable evaluations based on ML scores (Phase 3), and subsequently organize them into risk clusters (Phase 4), reduces individual discretion while providing decision-makers with objective and traceable information. This promotes a more structured and less bias-prone decision process, enhancing the transparency and reliability of strategic choices [
18,
29].
In the “Project Alpha” simulation, these decision-support capabilities were quantitatively demonstrated through the clustering of 100 projects into low, medium, and high exposure groups. The resulting distributions, aligned with the supervised labels, confirmed that the ESG–AI model can autonomously classify projects using measurable indicators such as CO2 intensity, supplier compliance, and reporting delay. This indicates that managerial prioritization can be guided by transparent, data-driven logic rather than subjective judgment.
The integration of AI and PM makes the process not only more efficient but also more responsive, due to the ability to promptly detect deviations or anomalies in operational workflows, as illustrated in the BPMN representation [
52,
53]. In the simulated event log (
Supplementary Materials), Process Mining algorithms successfully reproduced the expected workflow, allowing the automatic detection of bottlenecks and conformance deviations.
The capacity to generate intermediate outputs and continuously update risk levels transforms the model from a static evaluation tool into an adaptive governance mechanism capable of supporting decisions throughout the project life cycle. In this sense, the framework provides both predictive insights (based on ML) and diagnostic insights (based on PM), strengthening managerial control. This characteristic is particularly relevant for ESG risk management, which requires constant monitoring and rapid adjustment in response to evolving environmental, social, and regulatory conditions.
Overall, the simulation results empirically support the conceptual assertion that combining AI, PM, and RPA reduces discretion in ESG evaluation by converting qualitative assessments into traceable, quantitative, and continuously updated decision inputs. In this way, the model functions as a structural support to decision-making, aligning risk management practices with the strategic sustainability objectives of the construction sector.
4.4. Strategic Impacts on the Construction Sector
At the strategic level, the simulated results highlight the transformative potential of the proposed model, which emerges as a key tool to support the sector’s transition toward more sustainable, resilient, and data-driven management practices. An integrated approach based on AI and PM not only improves operational efficiency but also enhances firms’ ability to anticipate risks and adapt to changing conditions. In this perspective, ESG management shifts from a compliance-oriented activity to a strategic lever for medium- to long-term value creation [
25,
54,
55].
The validation of the “Project Alpha” simulation provides initial methodological evidence of this potential. By demonstrating consistent classification accuracy (AUROC up to 0.88) and traceable clustering of ESG risk levels, the model shows that sustainability-related decisions can be systematically informed by predictive indicators rather than subjective evaluations. These findings should be interpreted as simulation-based results, illustrating analytical feasibility rather than empirical performance. Nevertheless, they support the transition from reactive ESG compliance to proactive, data-driven governance.
A central insight from the analysis concerns the model’s ability to bridge operational and strategic dimensions. Automation and optimization activities deliver immediate benefits in terms of reduced inefficiencies and improved accuracy of assessments. Process transparency ensured by PM, combined with the predictive capacity of ML models, supports a more robust decision-making cycle that is less exposed to subjective distortions. This combination strengthens corporate governance, enhances compliance capacity, and increases the credibility of firms among stakeholders, investors, and regulatory authorities (
Table 7).
The analysis also highlights how the model supports the consolidation of stakeholder trust by improving transparency in both risk management and results communication. This represents a significant competitive advantage for a sector that is increasingly shaped by global climate goals and new ESG disclosure requirements.
In addition, the results of the simulated validation show that substantial improvements can be achieved even in the absence of complete real-world datasets. This finding suggests that digital transformation initiatives may start from pilot simulations and later evolve toward large-scale ESG data integration. Such evidence confirms the framework’s value as a proof of concept and provides a replicable methodological basis for subsequent empirical investigations.
Despite the potential benefits, some challenges remain for large-scale implementation. Reliable and standardized data are essential for analytical validity, and construction firms require adequate digital competences. Additionally, the heterogeneity of company size within the sector means that not all organizations possess the same resources to adopt advanced digital systems. For this reason, full integration of AI and PM requires a gradual digitalization approach that considers firms’ digital maturity and investment capacity [
15,
17].
A comparative visualization of the simulated workflow and the traditional process is presented in
Figure 5, highlighting the methodological differences between manual and AI-supported ESG assessment.
5. Conclusions and Future Perspectives
This study addressed the challenge of ESG risk management in the construction sector, aiming to overcome long-standing limitations identified in both academic and professional contexts, namely, fragmented data, the lack of integrated predictive methodologies, and the persistence of static and qualitative reporting systems.
The proposed framework demonstrates how the integration of Artificial Intelligence (AI), Process Mining (PM), Robotic Process Automation (RPA), and Machine Learning (ML) within a unified BPMN-based model can improve the quality, consistency, and timeliness of ESG information. Through automation, predictive analytics, and data integration, the model enables more accurate risk assessments and reduces discretionality in managerial decision-making.
In relation to existing literature, the simulated findings align with recent studies that underscore the role of digital intelligence in sustainability governance. For instance, Seow [
51] demonstrates the rising application of machine learning for ESG-score prediction across sectors, while the Jordan building-sector study [
52] highlights the specific applicability of AI-driven risk management in construction. The present framework advances these approaches by embedding predictive analytics into an end-to-end BPMN workflow, thus ensuring traceability from data acquisition to decision support. Consistent with the systematic review by Cruz et al. [
53], which identifies a gap in methodological integration in the construction sector, our work provides a process-aware methodology that links ESG scoring and process mining. Similarly, the green finance study [
54] shows how financial mechanisms influence ESG outcomes, complementing our operational emphasis by broadening the view toward strategic value creation. Overall, the simulated results corroborate previous evidence that AI and process mining enhance transparency, responsiveness and strategic alignment in sustainability management, while contributing a unified operational framework that previous research has only partially addressed.
Unlike purely conceptual approaches, this research provided a quantitative proof-of-concept through the “Project Alpha” simulation, which tested the model’s internal coherence using synthetic but realistic data. The resulting performance metrics, showing AUROC values up to 0.88 and stable clustering outcomes, confirm that the integrated ESG-AI pipeline can produce interpretable and reproducible outputs even in a controlled environment. These results, detailed in
Appendix A and
Supplementary Materials, provide a transparent and replicable methodological baseline.
From a scientific perspective, this study introduces an original, process-oriented framework that integrates predictive modeling with process optimization, addressing a gap in the literature where these two approaches are often examined separately. From an operational standpoint, it provides construction firms and project managers with a structured and replicable tool to identify, classify, and mitigate ESG risks, while supporting compliance with international standards and responding to stakeholder expectations.
The validation conducted through simulated data (Project Alpha) offers an initial quantitative demonstration of how the framework functions across the main phases of identification, integration, weighting, and clustering. Although the findings confirm internal consistency and analytical feasibility, additional empirical studies based on real construction projects are required to evaluate the scalability and applicability of the framework within different organizational settings.
However, the validation remains limited to simulated data and does not yet reflect external variability or organizational heterogeneity. Further empirical testing on real-world projects is therefore essential to assess scalability, generalizability, and sectoral adaptation of the proposed digital workflow.
Some limitations remain, particularly concerning the availability of standardized dataset, the digital maturity of construction companies, and the diffusion of advanced technological competencies. These aspects highlight the need for gradual digital transformation strategies and targeted capacity building within the sector.
Looking ahead, future developments may include:
- -
the integration of digital twins and advanced simulation tools to dynamically monitor ESG performance;
- -
the linkage of life cycle assessment (LCA) and ESG indicators for more holistic evaluations;
- -
and the adoption of harmonized international metrics to ensure comparability and transparency in ESG reporting.
In conclusion, while the present study validates the framework’s analytical soundness through simulated evidence, its next evolution lies in large-scale empirical implementation. This step will enable the framework to transition from a demonstrative model to an operational tool for ESG performance governance in the construction sector. Within this perspective, the synergy between AI and PM emerges as a decisive enabler of the sector’s transition from compliance-driven sustainability to data-driven strategic value creation, supporting the evolution of the construction industry toward a more transparent, resilient, and sustainable ecosystem.